Posts Tagged ‘fix’

Debug and fix Virtuozzo / KVM broken Hypervisor error: ‘PrlSDKError(‘SDK error: 0x80000249: Unable to connect to Virtuozzo. You may experience a connection problem or the server may be down.’ on CentOS Linux howto

Thursday, January 28th, 2021

Reading Time: 6minutes

fix-sdkerror-virtuozzo-kvm-how-to-debug-problems-with-hypervisor-host-linux

I've recently yum upgraded a CentOS Linux server runinng Virtuozzo kernel and Virtuozzo virtualization Virtual Machines to the latest available CentOS Linux release 7.9.2009 (Core) just to find out after the upgrade there was issues with both virtuozzo (VZ) way to list installed VZ enabled VMs reporting Unable to connect to Virtuozzo error like below:
 

[root@CENTOS etc]# prlctl list -a
Unable to connect to Virtuozzo. You may experience a connection problem or the server may be down. Contact your Virtuozzo administrator for assistance.


Even the native QEMU KVM VMs installed on the Hypervisor system failed to work to list and bring up the VMs producing another unexplainable error with virsh unable to connect to the hypervisor socket

[root@CENTOS etc]# virsh list –all
error: failed to connect to the hypervisor
error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory


In dmesg cmd kernel log messages the error found looked as so:

[root@CENTOS etc]# dmesg|grep -i sdk


[    5.314601] PrlSDKError('SDK error: 0x80000249: Unable to connect to Virtuozzo. You may experience a connection problem or the server may be down. Contact your Virtuozzo administrator for assistance.',)

To fix it I had to experiment a bit based on some suggestions from Google results as usual and what turned to be the cause is a now obsolete setting for disk probing that is breaking libvirtd

Disable allow_disk_format_probing in /etc/libvirt/qemu.conf

The fix to PrlSDKError('SDK error: 0x80000249: Unable to connect to Virtuozzo comes to commenting a parameter inside 

/etc/libvirt/qemu.conf

which for historical reasons seems to be turned on by default it is like this

allow_disk_format_probing = 1


Resolution is to either change the value to 0 or completely comment the line:

[root@CENTOS etc]# grep allow_disk_format_probing /etc/libvirt/qemu.conf
# If allow_disk_format_probing is enabled, libvirt will probe disk
#allow_disk_format_probing = 1
#allow_disk_format_probing = 1


Debug problems with Virtuozzo services and validate virtualization setup

What really helped to debug the issue was to check the extended status info of libvirtd.service vzevent vz.service libvirtguestd.service prl-disp systemd services

[root@CENTOS etc]# systemctl -l status libvirtd.service vzevent vz.service libvirtguestd.service prl-disp

Here I had to analyze the errors and googled a little bit about it


Once this is changed I had to of course restart libvirtd.service and rest of virtuozzo / kvm services

[root@CENTOS etc]# systemctl restart libvirtd.service ibvirtd.service vzevent vz.service libvirtguest.service prl-disp


Another useful tool part of a standard VZ install that I've used to make sure each of the Host OS Hypervisor components is running smoothly is virt-host-validate(tool is part of libvirt-client rpm package)

[root@CENTOS etc]# virt-host-validate
  QEMU: Checking for hardware virtualization                                 : PASS
  QEMU: Checking if device /dev/kvm exists                                   : PASS
  QEMU: Checking if device /dev/kvm is accessible                            : PASS
  QEMU: Checking if device /dev/vhost-net exists                             : PASS
  QEMU: Checking if device /dev/net/tun exists                               : PASS
  QEMU: Checking for cgroup 'memory' controller support                      : PASS
  QEMU: Checking for cgroup 'memory' controller mount-point                  : PASS
  QEMU: Checking for cgroup 'cpu' controller support                         : PASS
  QEMU: Checking for cgroup 'cpu' controller mount-point                     : PASS
  QEMU: Checking for cgroup 'cpuacct' controller support                     : PASS
  QEMU: Checking for cgroup 'cpuacct' controller mount-point                 : PASS
  QEMU: Checking for cgroup 'cpuset' controller support                      : PASS
  QEMU: Checking for cgroup 'cpuset' controller mount-point                  : PASS
  QEMU: Checking for cgroup 'devices' controller support                     : PASS
  QEMU: Checking for cgroup 'devices' controller mount-point                 : PASS
  QEMU: Checking for cgroup 'blkio' controller support                       : PASS
  QEMU: Checking for cgroup 'blkio' controller mount-point                   : PASS
  QEMU: Checking for device assignment IOMMU support                         : PASS
  QEMU: Checking if IOMMU is enabled by kernel                               : WARN (IOMMU appears to be disabled in kernel. Add intel_iommu=on to kernel cmdline arguments)
   LXC: Checking for Linux >= 2.6.26                                         : PASS
   LXC: Checking for namespace ipc                                           : PASS
   LXC: Checking for namespace mnt                                           : PASS
   LXC: Checking for namespace pid                                           : PASS
   LXC: Checking for namespace uts                                           : PASS
   LXC: Checking for namespace net                                           : PASS
   LXC: Checking for namespace user                                          : PASS
   LXC: Checking for cgroup 'memory' controller support                      : PASS
   LXC: Checking for cgroup 'memory' controller mount-point                  : PASS
   LXC: Checking for cgroup 'cpu' controller support                         : PASS
   LXC: Checking for cgroup 'cpu' controller mount-point                     : PASS
   LXC: Checking for cgroup 'cpuacct' controller support                     : PASS
   LXC: Checking for cgroup 'cpuacct' controller mount-point                 : PASS
   LXC: Checking for cgroup 'cpuset' controller support                      : PASS
   LXC: Checking for cgroup 'cpuset' controller mount-point                  : PASS
   LXC: Checking for cgroup 'devices' controller support                     : PASS
   LXC: Checking for cgroup 'devices' controller mount-point                 : PASS
   LXC: Checking for cgroup 'blkio' controller support                       : PASS
   LXC: Checking for cgroup 'blkio' controller mount-point                   : PASS
   LXC: Checking if device /sys/fs/fuse/connections exists                   : PASS


One thing to note here that virt-host-validate helped me to realize the  fuse (File system in userspace) module kernel support enabled on the HV was missing so I've enabled temporary for this boot with modprobe and permanently via a configuration like so:

# to load it one time
[root@CENTOS etc]#  modprobe fuse
# to load fuse permnanently on next boot

[root@CENTOS etc]#  echo fuse >> /etc/modules-load.d/fuse.conf

Disable selinux on CentOS HV

Another thing was selinux was enabled on the HV. Selinux is really annoying thing and to be honest I never used it on any server and though its idea is quite good the consequences it creates for daily sysadmin work are terrible so I usually disable it. It could be that a Hypervisor Host OS might work just normal with the selinux enabled but just in case I decided to remove it. This is how

[root@CENTOS etc]#  sestatus
SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   enforcing
Mode from config file:          enforcing
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Max kernel policy version:      31

To temporarily change the SELinux mode from targeted to permissive with the following command:

[root@CENTOS etc]#  setenforce 0

Edit /etc/selinux/config file and set the SELINUX mod to disabled

[root@CENTOS etc]# vim /etc/selinux/config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#       enforcing – SELinux security policy is enforced.
#       permissive – SELinux prints warnings instead of enforcing.
#       disabled – No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
#       targeted – Targeted processes are protected,
#       mls – Multi Level Security protection.
SELINUXTYPE=targeted

Finally rebooted graceously the machine just in case with the good recommended way to reboot servers with shutdown command instead of /sbin/reboot

[root@CENTOS etc]# shutdown -r now

The advantage of shutdown is that it tries to shutdown each service by sending stop requests but usually this takes some time and even a shutdown request could take longer to proccess as each service such as a WebServer application is being waited to close all its network connections etc. |
However if you want to have a quick reboot and you don't care about any established network connections to third party IPs you can go for the brutal old fashioned /sbin/reboot 🙂

How to fix rkhunter checking dev for suspisiocus files, solve rkhunter checking if SSH root access is allowed warning

Friday, November 20th, 2020

Reading Time: 4minutes

rkhunter-logo

On a server if you have a rkhunter running and you suddenly you get some weird Warnings for suspicious files under dev, like show in in the screenshot and you're puzzled how comes this happened as so far it was not reported before the regular package patching update conducted …

root@haproxy-server ~]# rkhunter –check

rkhunter-warn-screenshot

To investigate further I've checked rkhunter produced log /var/log/rkhunter.log for a verobose message and found more specifics there on what is the exact files which rkhunter finds suspicious.
To further investigate what exactly are this suspicious files for or where, they're used for something on the system or in reality it is a hacker who hacked our supposibly PCI compliant system,
I've used the good old fuser command which is capable to show which system process is actively using a file. To have fuser report for each file from /var/log/rkhunter.log with below shell loop:

[root@haproxy-server ~]#  for i in $(tail -n 50 /var/log/rkhunter/rkhunter.log|grep -i /dev/shm|awk '{ print $2 }'|sed -e 's#:##g'); do fuser -v $i; done
                     BEN.        PID ZUGR.  BEFEHL
/dev/shm/qb-1783-1851-27-f1sTlC/qb-request-cpg-header:
                     root       1783 ….m corosync
                     hacluster   1851 ….m attrd
                     BEN.        PID ZUGR.  BEFEHL
/dev/shm/qb-1783-1844-26-Znk1UM/qb-event-quorum-data:
                     root       1783 ….m corosync
                     root       1844 ….m pacemakerd
                     BEN.        PID ZUGR.  BEFEHL
/dev/shm/qb-1783-1844-26-Znk1UM/qb-event-quorum-header:
                     root       1783 ….m corosync
                     root       1844 ….m pacemakerd
                     BEN.        PID ZUGR.  BEFEHL
/dev/shm/qb-1783-1844-26-Znk1UM/qb-response-quorum-data:
                     root       1783 ….m corosync
                     root       1844 ….m pacemakerd
                     BEN.        PID ZUGR.  BEFEHL
/dev/shm/qb-1783-1844-26-Znk1UM/qb-response-quorum-header:
                     root       1783 ….m corosync
                     root       1844 ….m pacemakerd
                     BEN.        PID ZUGR.  BEFEHL
/dev/shm/qb-1783-1844-26-Znk1UM/qb-request-quorum-data:
                     root       1783 ….m corosync
                     root       1844 ….m pacemakerd
                     BEN.        PID ZUGR.  BEFEHL
/dev/shm/qb-1783-1844-26-Znk1UM/qb-request-quorum-header:
                     root       1783 ….m corosync
                     root       1844 ….m pacemakerd
                     BEN.        PID ZUGR.  BEFEHL
/dev/shm/qb-1783-1844-25-oCdaKX/qb-event-cpg-data:
                     root       1783 ….m corosync
                     root       1844 ….m pacemakerd
                     BEN.        PID ZUGR.  BEFEHL
/dev/shm/qb-1783-1844-25-oCdaKX/qb-event-cpg-header:
                     root       1783 ….m corosync
                     root       1844 ….m pacemakerd
                     BEN.        PID ZUGR.  BEFEHL
/dev/shm/qb-1783-1844-25-oCdaKX/qb-response-cpg-data:
                     root       1783 ….m corosync
                     root       1844 ….m pacemakerd
                     BEN.        PID ZUGR.  BEFEHL
/dev/shm/qb-1783-1844-25-oCdaKX/qb-response-cpg-header:
                     root       1783 ….m corosync
                     root       1844 ….m pacemakerd
                     BEN.        PID ZUGR.  BEFEHL
/dev/shm/qb-1783-1844-25-oCdaKX/qb-request-cpg-data:
                     root       1783 ….m corosync
                     root       1844 ….m pacemakerd
                     BEN.        PID ZUGR.  BEFEHL
/dev/shm/qb-1783-1844-25-oCdaKX/qb-request-cpg-header:
                     root       1783 ….m corosync
                     root       1844 ….m pacemakerd
                     BEN.        PID ZUGR.  BEFEHL
/dev/shm/qb-1783-1844-24-GKyj3l/qb-event-cfg-data:
                     root       1783 ….m corosync
                     root       1844 ….m pacemakerd
                     BEN.        PID ZUGR.  BEFEHL
/dev/shm/qb-1783-1844-24-GKyj3l/qb-event-cfg-header:
                     root       1783 ….m corosync
                     root       1844 ….m pacemakerd
                     BEN.        PID ZUGR.  BEFEHL
/dev/shm/qb-1783-1844-24-GKyj3l/qb-response-cfg-data:
                     root       1783 ….m corosync
                     root       1844 ….m pacemakerd
                     BEN.        PID ZUGR.  BEFEHL
/dev/shm/qb-1783-1844-24-GKyj3l/qb-response-cfg-header:
                     root       1783 ….m corosync
                     root       1844 ….m pacemakerd
                     BEN.        PID ZUGR.  BEFEHL
/dev/shm/qb-1783-1844-24-GKyj3l/qb-request-cfg-data:
                     root       1783 ….m corosync
                     root       1844 ….m pacemakerd
                     BEN.        PID ZUGR.  BEFEHL
/dev/shm/qb-1783-1844-24-GKyj3l/qb-request-cfg-header:
                     root       1783 ….m corosync
                     root       1844 ….m pacemakerd


As you see from the output all the /dev/shm/qb/ files in question are currently opened by the corosync / pacemaker and necessery for proper work of the haproxy cluster processes running on the machines.
 

How to solve the /dev/ suspcisios files rkhunter warning?

To solve we need to tell rkhunter not check against this files this is done via  /etc/rkhunter.conf first I thought this is done by EXISTWHITELIST= but then it seems there is  a special option for rkhunter whitelisting /dev type of files onlyALLOWDEVFILE.

Hence to resolve the warning for the upcoming planned early PCI audit and save us troubles we had to add on running OS which is CentOS Linux release 7.8.2003 (Core) in /etc/rkhunter.conf

ALLOWDEVFILE=/dev/shm/qb-*/qb-*

Re-run

# rkhunter –check

and Voila, the warning should be no more.

rkhunter-check-output

Another thing is on another machine the warnings produced by rkhunter were a bit different as rkhunter has mistakenly detected the root login is enabled where in reality PermitRootLogin was set to no in /etc/ssh/sshd_config

rkhunter-warning

As the problem was experienced on some machines and on others it was not.
I've done the standard boringconfig comparison we sysadmins do to tell
why stuff differs.
The result was on first machine where we had everything working as expected and
PermitRootLogin no was recognized the correct configuration was:

— SNAP —
#ALLOW_SSH_ROOT_USER=no
ALLOW_SSH_ROOT_USER=unset
— END —

On the second server where the problem was experienced the values was:

— SNAP —
#ALLOW_SSH_ROOT_USER=unset
ALLOW_SSH_ROOT_USER=no
— END —

Note that, the warning produced regarding the rsyslog remote logging is allowed is perfectly fine as, we had enabled remote logging to a central log server on the machines, this is done with:

This is done with config options under /etc/rsyslog.conf

# Configure Remote rsyslog logging server
*.* @remote-logging-server.com:514
*.* @remote-logging-server.com:514

Fix FTP active connection issues “Cannot create a data connection: No route to host” on ProFTPD Linux dedicated server

Tuesday, October 1st, 2019

Reading Time: 5minutes

proftpd-linux-logo

Earlier I've blogged about an encounter problem that prevented Active mode FTP connections on CentOS
As I'm working for a client building a brand new dedicated server purchased from Contabo Dedi Host provider on a freshly installed Debian 10 GNU / Linux, I've had to configure a new FTP server, since some time I prefer to use Proftpd instead of VSFTPD because in my opinion it is more lightweight and hence better choice for a small UNIX server setups. During this once again I've encounted the same ACTIVE FTP not working from FTP server to FTP client host machine. But before shortly explaining, the fix I find worthy to explain briefly what is ACTIVE / PASSIVE FTP connection.

 

1. What is ACTIVE / PASSIVE FTP connection?
 

Whether in active mode, the client specifies which client-side port the data channel has been opened and the server starts the connection. Or in other words the default FTP client communication for historical reasons is in ACTIVE MODE. E.g.
Client once connected to Server tells the server to open extra port or ports locally via which the overall FTP data transfer will be occuring. In the early days of networking when FTP protocol was developed security was not of such a big concern and usually Networks did not have firewalls at all and the FTP DATA transfer host machine was running just a single FTP-server and nothing more in this, early days when FTP was not even used over the Internet and FTP DATA transfers happened on local networks, this was not a problem at all.

In passive mode, the server decides which server-side port the client should connect to. Then the client starts the connection to the specified port.

But with the ever increasing complexity of Internet / Networks and the ever tightening firewalls due to viruses and worms that are trying to own and exploit networks creating unnecessery bulk loads this has changed …

active-passive-ftp-explained-diagram
 

2. Installing and configure ProFTPD server Public ServerName

I've installed the server with the common cmd:

 

apt –yes install proftpd

 

And the only configuration changed in default configuration file /etc/proftpd/proftpd.conf  was
ServerName          "Debian"

I do this in new FTP setups for the logical reason to prevent the multiple FTP Vulnerability Scan script kiddie Crawlers to know the exact OS version of the server, so this was changed to:

 

ServerName "MyServerHostname"

 

Though this is the bad security through obscurity practice doing so is a good practice.
 

3. Create iptable firewall rules to allow ACTIVE FTP mode


But anyways, next step was to configure the firewall to be allowed to communicate on TCP PORT 21 and 20 to incoming source ports range 1024:65535 (to enable ACTIVE FTP) on firewal level with iptables on INPUT and OUTPUT chain rules, like this:

 

iptables -A INPUT -p tcp –sport 1024:65535 -d 0/0 –dport 21 -m state –state NEW,ESTABLISHED -j ACCEPT
iptables -A INPUT -p tcp -s 0/0 –sport 1024:65535 -d 0/0 –dport 20 -m state –state NEW,ESTABLISHED -j ACCEPT
iptables -A OUTPUT -p tcp -s 0/0 –sport 21 -d 0/0 –dport 1024:65535 -m state –state ESTABLISHED -j ACCEPT
iptables -A OUTPUT -p tcp -s 0/0 –sport 20 -d 0/0 –dport 1024:65535 -m state –state ESTABLISHED,RELATED -j ACCEPT


Talking about Active and Passive FTP connections perhaps for novice Linux users it might be worthy to say few words on Active and Passive FTP connections

Once firewall has enabled FTP Active / Passive connections is on and FTP server is listening, to test all is properly configured check iptable rules and FTP listener:
 

/sbin/iptables -L INPUT |grep ftp
ACCEPT     tcp  —  anywhere             anywhere             tcp spts:1024:65535 dpt:ftp state NEW,ESTABLISHED
ACCEPT     tcp  —  anywhere             anywhere             tcp spts:1024:65535 dpt:ftp-data state NEW,ESTABLISHED
ACCEPT     tcp  —  anywhere             anywhere             tcp dpt:ftp
ACCEPT     tcp  —  anywhere             anywhere             tcp dpt:ftp-data

netstat -l | grep "ftp"
tcp6       0      0 [::]:ftp                [::]:*                  LISTEN    

 

4. Loading nf_nat_ftp module and net.netfilter.nf_conntrack_helper (for backward compitability)


Next step of course was to add the necessery modules nf_nat_ftpnf_conntrack_sane that makes FTP to properly forward ports with respective Firewall states on any of above source ports which are usually allowed by firewalls, note that the range of ports given 1024:65535 might be too much liberal for paranoid sysadmins and in many cases if ports are not filtered, if you are a security freak you can use some smaller range such as 60000-65535.

 

Here is time to say for sysadmins who haven't recently had a task to configure a new (unecrypted) File Transfer Server as today Secure FTP is almost alltime used for file transfers for the sake of security might be puzzled to find out the old Linux kernel ip_conntrack_ftp which was the standard module used to make FTP Active connections work is substituted nowadays with  nf_nat_ftp and nf_conntrack_sane.

To make the 2 modules permanently loaded on next boot on Debian Linux they have to be added to /etc/modules

Here is how sample /etc/modules that loads the modules on next system boot looks like

cat /etc/modules
# /etc/modules: kernel modules to load at boot time.
#
# This file contains the names of kernel modules that should be loaded
# at boot time, one per line. Lines beginning with "#" are ignored.
softdog
nf_nat_ftp
nf_conntrack_sane


Next to say is that in newer Linux kernels 3.x / 4.x / 5.xthe nf_nat_ftp and nf_conntrack-sane behaviour changed so simply loading the modules would not work and if you do the stupidity to test it with some FTP client (I used gFTP / ncftp from my Linux desktop ) you are about to get FTPNo route to host errors like:

 

Cannot create a data connection: No route to host

 

cannot-create-a-data-connection-no-route-to-host-linux-error-howto-fix


Sometimes, instead of No route to host error the error FTP client might return is:

 

227 entering passive mode FTP connect connection timed out error


To make the nf_nat_ftp module on newer Linux kernels hence you have to enable backwards compatibility Kernel variable

 

 

/proc/sys/net/netfilter/nf_conntrack_helper

 

echo 1 > /proc/sys/net/netfilter/nf_conntrack_helper

 

To make it permanent if you have enabled /etc/rc.local legacy one single file boot place as I do on servers – for how to enable rc.local on newer Linuxes check here

or alternatively add it to load via sysctl

sysctl -w net.netfilter.nf_conntrack_helper=1

And to make change permanent (e.g. be loaded on next boot)

echo 'net.netfilter.nf_conntrack_helper=1' >> /etc/sysctl.conf

 

5. Enable PassivePorts in ProFTPD or PassivePortRange in PureFTPD


Last but not least open /etc/proftpd/proftpd.conf find PassivePorts config value (commented by default) and besides it add the following line:

 

PassivePorts 60000 65534

 

Just for information if instead of ProFTPd you experience the error on PureFTPD the configuration value to set in /etc/pure-ftpd.conf is:
 

PassivePortRange 30000 35000


That's all folks, give the ncftp / lftp / filezilla or whatever FTP client you prefer and test it the FTP client should be able to talk as expected to remote server in ACTIVE FTP mode (and the auto passive mode) will be not triggered anymore, nor you will get a strange errors and failure to connect in FTP clients as gftp.

Cheers 🙂

Why du and df reporting different on a filesystem / How to fix inconsistency between used space on FS and disk showing full strangeness

Wednesday, July 24th, 2019

Reading Time: 6minutes

linux-why-du-and-df-shows-different-result-inconsincy-explained-filesystem-full-oddity

If you're a sysadmin on a large server environment such as a couple of hundred of Virtual Machines running Linux OS on either physical host or OpenXen / VmWare hosted guest Virtual Machine, you might end up sometimes at an odd case where some mounted partition mount point reportsits file use different when checked with
df
cmd than when checked with du command, like for example:
 

root@sqlserver:~# df -hT /var/lib/mysql
Filesystem   Type  Size Used Avail Use% Mounted On
/dev/sdb5      ext4    19G  3,4G    14G  20% /var/lib/mysql

Here the '-T' argument is used to show us the filesystem.

root@sqlserver:~# du -hsc /var/lib/mysql
0K    /var/lib/mysql/
0K    total

 

1. Simple debug on what might be the root cause for df / du inconsistency reporting

 

Of course the basic thing to do when in that weird situation is to be totally shocked how this is possible and to investigate a bit what is the biggest first level sub-directories that eat up the space on the mounted location, with du:

 

# du -hkx –max-depth=1 /var/lib/mysql/|uniq|sort -n
4       /var/lib/mysql/test
8       /var/lib/mysql/ezmlm
8       /var/lib/mysql/micropcfreak
8       /var/lib/mysql/performance_schema
12      /var/lib/mysql/mysqltmp
24      /var/lib/mysql/speedtest
64      /var/lib/mysql/yourls
144     /var/lib/mysql/narf
320     /var/lib/mysql/webchat_plus
424     /var/lib/mysql/goodfaithair
528     /var/lib/mysql/moonman
648     /var/lib/mysql/daniel
852     /var/lib/mysql/lessn
1292    /var/lib/mysql/gallery

The given output is in Kilobytes so it is a little bit hard to read, if you're used to Mbytes instead, do

 

 # du -hmx –max-depth=1 /var/lib/mysql/|uniq|sort -n|less

 

I've also investigated on the complete /var directory contents sorted by size with:

 

 # du -akx ./ | sort -n
5152564    ./cache/rsnapshot/hourly.2/localhost
5255788    ./cache/rsnapshot/hourly.2
5287912    ./cache/rsnapshot
7192152    ./cache


Even after finding out the bottleneck dirs and trying to clear up a bit, continued facing that inconsistently shown in two commands and if you're likely to be stunned like me and try … to move some files to a different filesystem to free up space or assigned inodes with a hope that shown inconsitency output will be fixed as it might be caused  due to some kernel / FS caching ?? and this will eventually make the mounted FS to refresh …

But unfortunately, if you try it you'll figure out clearing up a couple of Megas or Gigas will make no difference in cmd output.

In my exact case /var/lib/mysql is a separate mounted ext4 filesystem, however same issue was present also on a Network Filesystem (NFS) and thus, my first thought that this is caused by a network failure problem or NFS bug turned to be wrong.

After further short investigation on the inodes on the Filesystem, it was clear enough inodes are available:
 

# df -i /var/lib/mysql
Filesystem       Inodes  IUsed   IFree IUse% Mounted on
/dev/sdb5      1221600  2562 1219038   1% /var/lib/mysql

 

So the filled inodes count assumed issue also has been rejected.
P.S. (if you're not well familiar with them read manual, i.e. – man 7 inode).
 

– Remounting the mounted filesystem

To make sure the filesystem shown inconsistency between du and df is not due to some hanging network mount or bug, first logical thing I did is to remount the filesytem showing different in size, in my case this was done with:
 

# mount -o remount,rw -t ext4 /var/lib/mysql

For machines with NFS remote mounted storage locations, used:

# mount -o remount,rw -t nfs /var/www


FS remount did not solved it so I continued to ponder what oddity and of course I thought of a workaround (in case if this issues are caused by kernel bug or OS lib issue) reboot might be the solution, however unfortunately restarting the VMs was not a wanted easy to do solution, thus I continued investigating what is wrong …

Next check of course was to check, what kind of network connections are opened to the affected hosts with:
 

# netstat -tupanl


Did not found anything that might point me to the reported different Megabytes issue, so next step was to check what is the situation with currently opened files by running processes on the weird df / du reported systems with lsof, and boom there I observed oddity such as multiple files

 

# lsof -nP | grep '(deleted)'

COMMAND   PID   USER   FD   TYPE DEVICE    SIZE NLINK  NODE NAME
mysqld   2588  mysql    4u   REG 253,17      52     0  1495 /var/lib/mysql/tmp/ibY0cXCd (deleted)
mysqld   2588  mysql    5u   REG 253,17    1048     0  1496 /var/lib/mysql/tmp/ibOrELhG (deleted)
mysqld   2588  mysql    6u   REG 253,17       777884290     0  1497 /var/lib/mysql/tmp/ibmDFAW8 (deleted)
mysqld   2588  mysql    7u   REG 253,17       123667875     0 11387 /var/lib/mysql/tmp/ib2CSACB (deleted)
mysqld   2588  mysql   11u   REG 253,17       123852406     0 11388 /var/lib/mysql/tmp/ibQpoZ94 (deleted)

 

Notice that There were plenty of '(deleted)' STATE files shown in memory an overall of 438:

 

# lsof -nP | grep '(deleted)' |wc -l
438


As I've learned a bit online about the problem, I found it is also possible to find deleted unlinked files only without any greps (to list all deleted files in memory files with lsof args only):

 

# lsof +L1|less


The SIZE field (fourth column)  shows a number of files that are really hard in size and that are kept in open on filesystem and in memory, totally messing up with the filesystem. In my case this is temp files created by MYSQLD daemon but depending on the server provided service this might be apache's www-data, some custom perl / bash script executed via a cron job, stalled rsync jobs etc.
 

2. Check all the list open files with the mysql / root user as part of the the server filesystem inconsistency debugging with:

 

– Grep opened files on server by user

# lsof |grep mysql
mysqld    1312                       mysql  cwd       DIR               8,21       4096          2 /var/lib/mysql
mysqld    1312                       mysql  rtd       DIR                8,1       4096          2 /
mysqld    1312                       mysql  txt       REG                8,1   20336792   23805048 /usr/sbin/mysqld
mysqld    1312                       mysql  mem       REG               8,21      24576         20 /var/lib/mysql/tc.log
mysqld    1312                       mysql  DEL       REG               0,16                 29467 /[aio]
mysqld    1312                       mysql  mem       REG                8,1      55792   14886933 /lib/x86_64-linux-gnu/libnss_files-2.28.so

 

# lsof | grep root
COMMAND    PID   TID TASKCMD          USER   FD      TYPE             DEVICE   SIZE/OFF       NODE NAME
systemd      1                        root  cwd       DIR                8,1       4096          2 /
systemd      1                        root  rtd       DIR                8,1       4096          2 /
systemd      1                        root  txt       REG                8,1    1489208   14928891 /lib/systemd/systemd
systemd      1                        root  mem       REG                8,1    1579448   14886924 /lib/x86_64-linux-gnu/libm-2.28.so

Other command that helped to track the discrepancy between df and du different file usage on FS is:
 

# du -hxa  / | egrep '^[[:digit:]]{1,1}G[[:space:]]*'
 

 

3. Fixing large files kept in memory filesystem problem


What is the real reason for ending up with this file handlers opened by running backgrounded programs on the Linux OS?
It could be multiple  but most likely it is due to exceeded server / client interactions or breaking up RAM or HDD drive with writing plenty of logs on the FS without ending keeping space occupied or Programming library bugs used by hanged service leaving the FH opened on storage.

What is the solution to file system files left in memory problem?

The best solution is to first fix custom script or hanged service and then if possible to simply restart the server to make the kernel / services reload or if this is not possible just restart the problem creation processes.

Once the process is identified like in my case this was MySQL on systemd enabled newer OS distros, just do:

 

 

# systemctl restart mysqld.service


or on older init.d system V ones:

# /etc/init.d/service restart


For custom hanged scripts being listed in ps axuwef you can grep the pid and do a kill -HUP (if the script is written in a good way to recognize -HUP and restart the sub-running process properly – BE EXTRA CAREFUL IF YOU'RE RESTARTING BROKEN SCRIPTS as this might cause your running service disruptions …).

# pgrep -l script.sh
7977 script.sh


# kill -HUP PID

 

Now finally this should either mitigate or at best case completely solve the reported disagreement between df and du, after which the calculated / reported disk space should be back to normal and show up approximately the same (note that size changes a bit as mysql service is writting data) constantly extending the size between the two checks.

 

# df -hk /var/lib/mysql; du -hskc /var/lib/mysql
Filesystem       Inodes  IUsed   IFree IUse% Mounted on
/dev/sdb5        19097172 3472744 14631296  20% /var/lib/mysql
3427772    /var/lib/mysql
3427772    total

 

What we learned?

What I've explained in this article is why and how it comes that 'zoombie' files reside on a filesystem
appearing to be eating disk space on a mounted local or network partition, giving strange inconsistent
reports, leading to system service disruptions and impossibility to have correctly shown information on used
disk space on mounted drive.

I went through with some standard logic on debugging service / filesystem / inode issues up explainat, that led me to the finding about deleted files being kept in filesystem and producing the filesystem strange sized / showing not correct / filled even after it was extended with tune2fs and was supposed to have extra 50GBs.

Finally it was explained shortly how to HUP / restart hanging script / service to fix it.

Some few good readings that helped to fix the issue:

What to do when du and df report different usage is here
df in linux not showing correct free space after file removal is here
Why do “df” and “du” commands show different disk usage?
 

Automatic network restart and reboot Linux server script if ping timeout to gateway is not responding as a way to reduce connectivity downtimes

Monday, December 10th, 2018

Reading Time: 4minutes

automatic-server-network-restart-and-reboot-script-if-connection-to-server-gateway-inavailable-tux-penguing-ascii-art-bin-bash

Inability of server to come back online server automaticallyafter electricity / network outage

These days my home server  is experiencing a lot of issues due to Electricity Power Outages, a construction dig operations to fix / change waterpipe tubes near my home are in action and perhaps the power cables got ruptered by the digger machine.
The effect of all this was that my server networking accessability was affected and as I didn't have network I couldn't access it remotely anymore at a certain point the electricity was restored (and the UPS charge could keep the server up), however the server accessibility did not due restore until I asked a relative to restart it or under a more complicated cases where Tech aquanted guy has to help – Alexander (Alex) a close friend from school years check his old site here – alex.pc-freak.net helps a lot.to restart the machine physically either run a quick restoration commands on root TTY terminal or generally do check whether default router is reachable.

This kind of Pc-Freak.net downtime issues over the last month become too frequent (the machine was down about 5 times for 2 to 5 hours and this was too much (and weirdly enough it was not accessible from the internet even after electricity network was restored and the only solution to that was a physical server restart (from the Power Button).

To decrease the number of cases in which known relatives or friends has to  physically go to the server and restart it, each time after network or electricity outage I wrote a small script to check accessibility towards Default defined Network Gateway for my server with few ICMP packages sentwith good old PING command
and trigger a network restart and system reboot
(in case if the network restart does fail) in a row.

1. Create reboot-if-nwork-is-downsh script under /usr/sbin or other dir

Here is the script itself:

 

#!/bin/sh
# Script checks with ping 5 ICMP pings 10 times to DEF GW and if so
# triggers networking restart /etc/inid.d/networking restart
# Then does another 5 x 10 PINGS and if ping command returns errors,
# Reboots machine
# This script is useful if you run home router with Linux and you have
# electricity outages and machine doesn't go up if not rebooted in that case

GATEWAY_HOST='192.168.0.1';

run_ping () {
for i in $(seq 1 10); do
    ping -c 5 $GATEWAY_HOST
done

}

reboot_f () {
if [ $? -eq 0 ]; then
        echo "$(date "+%Y-%m-%d %H:%M:%S") Ping to $GATEWAY_HOST OK" >> /var/log/reboot.log
    else
    /etc/init.d/networking restart
        echo "$(date "+%Y-%m-%d %H:%M:%S") Restarted Network Interfaces:" >> /tmp/rebooted.txt
    for i in $(seq 1 10); do ping -c 5 $GATEWAY_HOST; done
    if [ $? -eq 0 ] && [ $(cat /tmp/rebooted.txt) -lt ‘5’ ]; then
         echo "$(date "+%Y-%m-%d %H:%M:%S") Ping to $GATEWAY_HOST FAILED !!! REBOOTING." >> /var/log/reboot.log
        /sbin/reboot

    # increment 5 times until stop
    [[ -f /tmp/rebooted.txt ]] || echo 0 > /tmp/rebooted.txt
    n=$(< /tmp/rebooted.txt)
        echo $(( n + 1 )) > /tmp/rebooted.txt
    fi
    # if 5 times rebooted sleep 30 mins and reset counter
    if [ $(cat /tmprebooted.txt) -eq ‘5’ ]; then
    sleep 1800
        cat /dev/null > /tmp/rebooted.txt
    fi
fi

}
run_ping;
reboot_f;

You can download a copy of reboot-if-nwork-is-down.sh script here.

As you see in script successful runs  as well as its failures are logged on server in /var/log/reboot.log with respective timestamp.
Also a counter to 5 is kept in /tmp/rebooted.txt, incremented on each and every script run (rebooting) if, the 5 times increment is matched

a sleep is executed for 30 minutes and the counter is being restarted.
The counter check to 5 guarantees the server will not get restarted if access to Gateway is not continuing for a long time to prevent the system is not being restarted like crazy all time.
 

2. Create a cron job to run reboot-if-nwork-is-down.sh every 15 minutes or so 

I've set the script to re-run in a scheduled (root user) cron job every 15 minutes with following  job:

To add the script to the existing cron rules without rewriting my old cron jobs and without tempering to use cronta -u root -e (e.g. do the cron job add in a non-interactive mode with a single bash script one liner had to run following command:

 

{ crontab -l; echo "*/15 * * * * /usr/sbin/reboot-if-nwork-is-down.sh 2>&1 >/dev/null; } | crontab –


I know restarting a server to restore accessibility is a stupid practice but for home-use or small client servers with unguaranteed networks with a cheap Uninterruptable Power Supply (UPS) devices it is useful.

Summary

Time will show how efficient such a  "self-healing script practice is.
Even though I'm pretty sure that even in a Corporate businesses and large Public / Private Hybrid Clouds where access to remote mounted NFS / XFS / ZFS filesystems are failing a modifications of the script could save you a lot of nerves and troubles and unhappy customers / managers screaming at you on the phone 🙂


I'll be interested to hear from others who have a better  ideas to restore ( resurrect ) access to inessible Linux server after an outage.?
 

MySQL crashes after upgrade from MySQL to MariaDB and how to fix it

Tuesday, August 21st, 2018

Reading Time: 3minutes

how-to-fix-crashing-mysql-after-upgrade-to-mariadb-database-mariadb-logo.png

If you have recently upgraded your Debian / Ubuntu / CentOS Linux Server to the latest RPM / DEB packages as part of the upgrade you might have noticed the upgrade of MySQL Community Server  (which was bought by Oracle Corporation few years ago) is automatically upgradedto MariaDB (which is a MySQL fork made by the original developers of MySQL and guaranteed to stay open source. Just to name some of the Notable users include Wikipedia, WordPress.com and Google.).

You might have noticed MariaDB's restart script which is still under /etc/init.d/mysql  won't start and a quick check in /var/log/mysql.err | /var/log/mysql.log
shows errors of /usr/bin/mysqld crashing with errors like:

140502 14:13:05 [Note] Plugin 'FEDERATED' is disabled.
InnoDB: Log scan progressed past the checkpoint lsn 108 1057948207
140502 14:13:06  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files…
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer…
InnoDB: Doing recovery: scanned up to log sequence number 108 1058059648
InnoDB: 1 transaction(s) which must be rolled back or cleaned up
InnoDB: in total 15 row operations to undo
InnoDB: Trx id counter is 0 562485504
140502 14:13:06  InnoDB: Starting an apply batch of log records to the database…
InnoDB: Progress in percents: 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
InnoDB: Apply batch completed
InnoDB: Starting in background the rollback of uncommitted transactions
140502 14:13:06  InnoDB: Rolling back trx with id 0 562485192, 15 rows to undo
140502 14:13:06  InnoDB: Started; log sequence number 108 1058059648
140502 14:13:06  InnoDB: Assertion failure in thread 1873206128 in file ../../../storage/innobase/fsp/fsp0fsp.c line 1593
InnoDB: Failing assertion: frag_n_used > 0
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html
InnoDB: about forcing recovery.
140502 14:13:06 – mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=16777216
read_buffer_size=131072
max_used_connections=0
max_threads=151
threads_connected=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 345919 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong…
stack_bottom = (nil) thread_stack 0x30000
140502 14:13:06 [Note] Event Scheduler: Loaded 0 events
140502 14:13:06 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.1.41-3ubuntu12.10'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  (Ubuntu)
/usr/sbin/mysqld(my_print_stacktrace+0x2d) [0xb7579cbd]
/usr/sbin/mysqld(handle_segfault+0x494) [0xb7245854]
[0xb6fc0400]
/lib/tls/i686/cmov/libc.so.6(abort+0x182) [0xb6cc5a82]
/usr/sbin/mysqld(+0x4867e9) [0xb74647e9]
/usr/sbin/mysqld(btr_page_free_low+0x122) [0xb74f1622]
/usr/sbin/mysqld(btr_compress+0x684) [0xb74f4ca4]
/usr/sbin/mysqld(btr_cur_compress_if_useful+0xe7) [0xb74284e7]
/usr/sbin/mysqld(btr_cur_pessimistic_delete+0x332) [0xb7429e72]
/usr/sbin/mysqld(btr_node_ptr_delete+0x82) [0xb74f4012]
/usr/sbin/mysqld(btr_discard_page+0x175) [0xb74f41e5]
/usr/sbin/mysqld(btr_cur_pessimistic_delete+0x3e8) [0xb7429f28]
/usr/sbin/mysqld(+0x526197) [0xb7504197]
/usr/sbin/mysqld(row_undo_ins+0x1b1) [0xb7504771]
/usr/sbin/mysqld(row_undo_step+0x25f) [0xb74c210f]
/usr/sbin/mysqld(que_run_threads+0x58a) [0xb74a31da]

/usr/sbin/mysqld(trx_rollback_or_clean_all_without_sess+0x3e3) [0xb74ded43]
/lib/tls/i686/cmov/libpthread.so.0(+0x596e) [0xb6f9f96e]
/lib/tls/i686/cmov/libc.so.6(clone+0x5e) [0xb6d65a4e]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

Any recommendations?
mysql

I hoped to solve the /usr/bin/mysqldsegfault error with server reboot as I though the problem is caused by the fact libc library was updated, but even a reboot did not solve it.

I've investigated online for a solution and found following MySQL corruption and recovery article.

The solution outlined there is very simple and comes to adding the line:
 

innodb_force_recovery = 1


to /etc/mysql/my.cnf

Assuming the mysql server is not running before restarting mariadb server.

1. Make a backup (Dump) of all MySQL tables

mysql:~# mysqldump -A > dump.sql

2. Drop all databases which need recovery.
You can do that from mysql cli or phpmyadmin

3. Stop mysqld.

mysql:~# /etc/init.d/mysql restart

4.  Remove /var/lib/mysql/ib*

mysql:~# rm -rf /var/lib/mysql/ib*

5. Comment out innodb_force_recovery in /etc/mysql/my.cnf

6. Restart mysqld. Look at mysql error log.
If everything is fine and you have problems with broken or missing databases the best thing next is to stop again mariadb and

7. Restore databases from the dump

mysql:~# mysql < dump.sql

 

 

 

Fixing Disappeared intel net link 5100 wifi on Toshiba Satellite L300 1YA

Friday, October 29th, 2010

Reading Time: 2minutes

I've recently had to fix a Toshiba Satellite L300 1YA notebook , running the shitty Windows Vista operating system.
For some reason suddenly the Net Link 5100 Wi-FI Network Adapter of the laptop mysteriously disappeared.
The led indicating the device is enabled was blinking the driver showed properly installed in Windows -> System and everything.
Even though that the wireless network scanning in the notebook was not working.

In networking in System there were two lines showing up with excalammation mark "!".

To fix the mess it took me 3.5 hours. I tried many things first logical thing I tried was reinstalling the driver with the latest available from Intel's website. Anyways this doesn't helped so I was about to think about other solutions.
What made thinks even worse was that the Vista installed was in Chineese!, yes you red this correctly chineese. You cannot imagine what a hell it is to deal with Windows whose language pack was Chineese …

The Vista had installed also the Chineese antivirus program Rising which also provided the system with some weird firewall and this made the dealing with the problem even more complicated.

After many tries I finally completely removed the wireless driver on the system and reinstalled it with the latest version from Intel's website.
Thanksfully after a couple of reboots and going into save mode the Intel Net Link 5100 started working again by itself?!

Well you know how things goes with Windows, you never know what will happen next.

Maybe a lot of notebooks suffer the same weird issue with Wireless Wi-Fi adapter suddenly stopping to work.

So now you know the solution, remove the driver install it again, restart and it should be working again.
Hope this quick and dirty article will save somebody an hours of time to figure it out …

Must have software on freshly installed windows – Essential Software after fresh Windows install

Friday, March 18th, 2016

Reading Time: 4minutes

Install-update-multiple-programs-applications-at-once-using-ninite

If you're into IT industry even if you don't like installing frequently Windows or you're completely Linux / BSD user, you will certainly have a lot of friends which will want help from you to re-install or fix their Windows 7 / 8 / 10 OS. At least this is the case with me every year, I'm kinda of obliged to install fresh windowses on new bought friends or relatives notebooks / desktop PCs.

Of course according to for whom the new Windows OS installed the preferrences of necessery software varies, however more or less there is sort of standard list of Windows Software which is used daily by most of Avarage Computer user, such as:
 

I tend to install on New Windows installs and thus I have more or less systematized the process.

I try to usually stick to free software where possible for each of the above categories as a Free Software enthusiast and luckily nowadays there is a lot of non-priprietary or at least free as in beer software available out there.

For Windows sysadmins or College and other public institutions networks including multiple of Windows Computers which are not inside a domain and also for people in computer repair shops where daily dozens of windows pre-installs or a set of software Automatic updates are  necessery make sure to take a look atNinite

ninite-automate-windows-program-deploy-and-update-on-new-windows-os-openoffice-screenshot

As official website introduces Ninite:

Ninite – Install and Update All Your Programs at Once

Of course as Ninite is used by organizations as NASA, Harvard Medical School etc. it is likely the tool might reports your installed list of Windows software and various other Win PC statistical data to Ninite developers and most likely NSA, but this probably doesn't much matter as this is probably by the moment you choose to have installed a Windows OS on your PC.

ninite-choises-to-build-an-install-package-with-useful-essential-windows-software-screenshot
 

For Windows System Administrators managing small and middle sized network PCs that are not inside a Domain Controller, Ninite could definitely save hours and at cases even days of boring install and maintainance work. HP Enterprise or HP Inc.Employees or ex-employees would definitely love Ninite, because what Ninite does is pretty much like the well known HP Internal Tool PC COE.

Ninite could also prepare an installer containing multiple applications based on the choice on Ninite's website, so that's also a great thing especially if you need to deploy a different type of Users PCs (Scientific / Gamers / Working etc.)

Perhaps there are also other useful things to install on a new fresh Windows installations, if you're using something I'm missing let me know in comments.

Fix MySQL ibdata file size – ibdata1 file growing too large, preventing ibdata1 from eating all your server disk space

Thursday, April 2nd, 2015

Reading Time: 4minutes

fix-solve-mysql-ibdata-file-size-ibdata1-file-growing-too-large-and-preventing-ibdata1-from-eating-all-your-disk-space-innodb-vs-myisam

If you're a webhosting company hosting dozens of various websites that use MySQL with InnoDB  engine as a backend you've probably already experienced the annoying problem of MySQL's ibdata1 growing too large / eating all server's disk space and triggering disk space low alerts. Theibdata1 file, taking up hundreds of gigabytes is likely to be encountered on virtually all Linux distributions which run default MySQL server <= MySQL 5.6 (with default distro shipped my.cnf). The excremental ibdata1 raise appears usually due to a application software bug on how it queries the database. In theory there are no limitation for ibdata1 except maximum file size limitation set for the filesystem (and there is no limitation option set in my.cnf) meaning it is quite possible that under certain conditionsibdata1 grow over time can happily fill up your server LVM (Storage) drive partitions.

Unfortunately there is no way to shrink the ibdata1 file and only known work around (I found) is to set innodb_file_per_table option in my.cnf to force the MySQL server create separate *.ibd files under datadir (my.cnf variable) for each freshly created InnoDB table.
 

1. Checking size of ibdata1 file

On Debian / Ubuntu and other deb based Linux servers datadir is /var/lib/mysql/ibdata1

server:~# du -hsc /var/lib/mysql/ibdata1
45G     /var/lib/mysql/ibdata1
45G     total


2. Checking info about Databases and Innodb storage Engine

server:~# mysql -u root -p
password:

mysql> SHOW DATABASES;
+——————–+
| Database           |
+——————–+
| information_schema |
| bible              |
| blog               |
| blog-sezoni        |
| blogmonastery      |
| daniel             |
| ezmlm              |
| flash-games        |


Next step is to get some understanding about how many existing InnoDB tables are present within Database server:

 

mysql> SELECT COUNT(1) EngineCount,engine FROM information_schema.tables WHERE table_schema NOT IN ('information_schema','performance_schema','mysql') GROUP BY engine;
+————-+——–+
| EngineCount | engine |
+————-+——–+
|         131 | InnoDB |
|           5 | MEMORY |
|         584 | MyISAM |
+————-+——–+
3 rows in set (0.02 sec)

To get some more statistics related to InnoDb variables set on the SQL server:
 

mysqladmin -u root -p'Your-Server-Password' var | grep innodb


Here is also how to find which tables use InnoDb Engine

mysql> SELECT table_schema, table_name
    -> FROM INFORMATION_SCHEMA.TABLES
    -> WHERE engine = 'innodb';

+————–+————————–+
| table_schema | table_name               |
+————–+————————–+
| blog         | wp_blc_filters           |
| blog         | wp_blc_instances         |
| blog         | wp_blc_links             |
| blog         | wp_blc_synch             |
| blog         | wp_likes                 |
| blog         | wp_wpx_logs              |
| blog-sezoni  | wp_likes                 |
| icanga_web   | cronk                    |
| icanga_web   | cronk_category           |
| icanga_web   | cronk_category_cronk     |
| icanga_web   | cronk_principal_category |
| icanga_web   | cronk_principal_cronk    |


3. Check and Stop any Web / Mail / DNS service using MySQL

server:~# ps -efl |grep -E 'apache|nginx|dovecot|bind|radius|postfix'

Below cmd should return empty output, (e.g. Apache / Nginx / Postfix / Radius / Dovecot / DNS etc. services are properly stopped on server).

4. Create Backup dump all MySQL tables with mysqldump

Next step is to create full backup dump of all current MySQL databases (with mysqladmin):

server:~# mysqldump –opt –allow-keywords –add-drop-table –all-databases –events -u root -p > dump.sql
server:~# du -hsc /root/dump.sql
940M    dump.sql
940M    total

 

If you have free space on an external backup server or remotely mounted attached (NFS or SAN Storage) it is a good idea to make a full binary copy of MySQL data (just in case something wents wrong with above binary dump), copy respective directory depending on the Linux distro and install location of SQL binary files set (in my.cnf).
To check where are MySQL binary stored database data (check in my.cnf):

server:~# grep -i datadir /etc/mysql/my.cnf
datadir         = /var/lib/mysql

If server is CentOS / RHEL Fedora RPM based substitute in above grep cmd line /etc/mysql/my.cnf with /etc/my.cnf

if you're on Debian / Ubuntu:

server:~# /etc/init.d/mysql stop
server:~# cp -rpfv /var/lib/mysql /root/mysql-data-backup

Once above copy completes, DROP all all databases except, mysql, information_schema (which store MySQL existing user / passwords and Access Grants and Host Permissions)

5. Drop All databases except mysql and information_schema

server:~# mysql -u root -p
password:

 

mysql> SHOW DATABASES;

DROP DATABASE blog;
DROP DATABASE sessions;
DROP DATABASE wordpress;
DROP DATABASE micropcfreak;
DROP DATABASE statusnet;

          etc. etc.

ACHTUNG !!!DON'T execute!DROP database mysql; DROP database information_schema; !!! – cause this might damage your User permissions to databases

6. Stop MySQL server and add innodb_file_per_table and few more settings to prevent ibdata1 to grow infinitely in future

server:~# /etc/init.d/mysql stop

server:~# vim /etc/mysql/my.cnf
[mysqld]
innodb_file_per_table
innodb_flush_method=O_DIRECT
innodb_log_file_size=1G
innodb_buffer_pool_size=4G

Delete files taking up too much space –ibdata1 ib_logfile0 and ib_logfile1

server:~# cd /var/lib/mysql/
server:~#  rm -f ibdata1 ib_logfile0 ib_logfile1
server:~# /etc/init.d/mysql start
server:~# /etc/init.d/mysql stop
server:~# /etc/init.d/mysql start
server:~# ps ax |grep -i mysql

 

You should get no running MySQL instance (processes), so above ps command should return blank.
 

7. Re-Import previously dumped SQL databases with mysql cli client

server:~# cd /root/
server:~# mysql -u root -p < dump.sql

Hopefully import should went fine, and if no errors experienced new data should be in.

Altearnatively if your database is too big and you want to import it in less time to mitigate SQL downtime, instead import the database with:

server:~# mysql -u root -p
password:
mysql>  SET FOREIGN_KEY_CHECKS=0;
mysql> SOURCE /root/dump.sql;
mysql> SET FOREIGN_KEY_CHECKS=1;

 

If something goes wrong with the import for some reason, you can always copy over sql binary files from /root/mysql-data-backup/ to /var/lib/mysql/
 

8. Connect to mysql and check whether databases are listable and re-check ibdata file size

Once imported login with mysql cli and check whther databases are there with:

server:~# mysql -u root -p
SHOW DATABASES;

Next lets see what is currently the size of ibdata1, ib_logfile0 and ib_logfile1
 

server:~# du -hsc /var/lib/mysql/{ibdata1,ib_logfile0,ib_logfile1}
19M     /var/lib/mysql/ibdata1
1,1G    /var/lib/mysql/ib_logfile0
1,1G    /var/lib/mysql/ib_logfile1
2,1G    total

Now ibdata1 will grow, but only contain table metadata. Each InnoDB table will exist outside of ibdata1.
To better understand what I mean, lets say you have InnoDB table named blogdb.mytable.
If you go into /var/lib/mysql/blogdb, you will see two files
representing the table:

  •     mytable.frm (Storage Engine Header)
  •     mytable.ibd (Home of Table Data and Table Indexes for blogdb.mytable)

Now construction will be like that for each of MySQL stored databases instead of everything to go to ibdata1.
MySQL 5.6+ admins could relax as innodb_file_per_table is enabled by default in newer SQL releases.


Now to make sure your websites are working take few of the hosted websites URLs that use any of the imported databases and just browse.
In my case ibdata1 was 45GB after clearing it up I managed to save 43 GB of disk space!!!

Enjoy the disk saving! 🙂

‘host-name’ is blocked because of many connection errors; unblock with ‘mysqladmin flush-hosts’

Sunday, May 20th, 2012

Reading Time: 3minutes

mysql-logo-host-name-blocked-because-of-many-connection-errors
My home run machine MySQL server was suddenly down as I tried to check my blog and other sites today, the error I saw while trying to open, this blog as well as other hosted sites using the MySQL was:

Error establishing a database connection

The topology, where this error occured is simple, I have two hosts:

1. Apache version 2.0.64 compiled support externally PHP scripts interpretation via libphp – the host runs on (FreeBSD)

2. A Debian GNU / Linux squeeze running MySQL server version 5.1.61

The Apache host is assigned a local IP address 192.168.0.1 and the SQL server is running on a host with IP 192.168.0.2

To diagnose the error I've logged in to 192.168.0.2 and weirdly the mysql-server was appearing to run just fine:
 

debian:~# ps ax |grep -i mysql
31781 pts/0 S 0:00 /bin/sh /usr/bin/mysqld_safe
31940 pts/0 Sl 12:08 /usr/sbin/mysqld –basedir=/usr –datadir=/var/lib/mysql –user=mysql –pid-file=/var/run/mysqld/mysqld.pid –socket=/var/run/mysqld/mysqld.sock –port=3306
31941 pts/0 S 0:00 logger -t mysqld -p daemon.error
32292 pts/0 S+ 0:00 grep -i mysql

Moreover I could connect to the localhost SQL server with mysql -u root -p and it seemed to run fine. The error Error establishing a database connection meant that either something is messed up with the database or 192.168.0.2 Mysql port 3306 is not properly accessible.

My first guess was something is wrong due to some firewall rules, so I tried to connect from 192.168.0.1 to 192.168.0.2 with telnet:
 

freebsd# telnet 192.168.0.2 3306
Trying 192.168.0.2…
Connected to jericho.
Escape character is '^]'.
Host 'webserver' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'
Connection closed by foreign host.

Right after the telnet was initiated as I show in the above output the connection was immediately closed with the error:

Host 'webserver' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'Connection closed by foreign host.

In the error 'webserver' is my Apache machine set hostname. The error clearly states the problems with the 'webserver' apache host unable to connect to the SQL database are due to 'many connection errors' and a fix i suggested with mysqladmin flush-hosts

To temporary solve the error and restore my normal connectivity between the Apache and the SQL servers I logged I had to issue on the SQL host:

mysqladmin -u root -p flush-hostsEnter password:

Thogh this temporar fix restored accessibility to the databases and hence the websites errors were resolved, this doesn't guarantee that in the future I wouldn't end up in the same situation and therefore I looked for a permanent fix to the issues once and for all.

The permanent fix consists in changing the default value set for max_connect_error in /etc/mysql/my.cnf, which by default is not too high. Therefore to raise up the variable value, added in my.cnf in conf section [mysqld]:

debian:~# vim /etc/mysql/my.cnf
...
max_connect_errors=4294967295

and afterwards restarted MYSQL:

debian:~# /etc/init.d/mysql restart
Stopping MySQL database server: mysqld.
Starting MySQL database server: mysqld.
Checking for corrupt, not cleanly closed and upgrade needing tables..

To make sure the assigned max_connect_errors=4294967295 is never reached due to Apache to SQL connection errors, I've also added as a cronjob.

debian:~# crontab -u root -e
00 03 * * * mysqladmin flush-hosts

In the cron I have omitted the mysqladmin -u root -p (user/pass) input options because for convenience I have already stored the mysql root password in /root/.my.cnf

Here is how /root/.my.cnf looks like:

debian:~# cat /root/.my.cnf
[client]
user=root
password=a_secret_sql_password

Now hopefully, this would permanently solve SQL's 'failure to accept connections' due to too many connection errors for future.