Posts Tagged ‘log’

Resolving “nf_conntrack: table full, dropping packet.” flood message in dmesg Linux kernel log

Wednesday, March 28th, 2012

On many busy servers, you might encounter in /var/log/syslog or dmesg kernel log messages like

nf_conntrack: table full, dropping packet

to appear repeatingly:

[1737157.057528] nf_conntrack: table full, dropping packet.
[1737157.160357] nf_conntrack: table full, dropping packet.
[1737157.260534] nf_conntrack: table full, dropping packet.
[1737157.361837] nf_conntrack: table full, dropping packet.
[1737157.462305] nf_conntrack: table full, dropping packet.
[1737157.564270] nf_conntrack: table full, dropping packet.
[1737157.666836] nf_conntrack: table full, dropping packet.
[1737157.767348] nf_conntrack: table full, dropping packet.
[1737157.868338] nf_conntrack: table full, dropping packet.
[1737157.969828] nf_conntrack: table full, dropping packet.
[1737157.969928] nf_conntrack: table full, dropping packet
[1737157.989828] nf_conntrack: table full, dropping packet
[1737162.214084] __ratelimit: 83 callbacks suppressed

There are two type of servers, I've encountered this message on:

1. Xen OpenVZ / VPS (Virtual Private Servers)
2. ISPs – Internet Providers with heavy traffic NAT network routers
 

I. What is the meaning of nf_conntrack: table full dropping packet error message

In short, this message is received because the nf_conntrack kernel maximum number assigned value gets reached.
The common reason for that is a heavy traffic passing by the server or very often a DoS or DDoS (Distributed Denial of Service) attack. Sometimes encountering the err is a result of a bad server planning (incorrect data about expected traffic load by a company/companeis) or simply a sys admin error…

- Checking the current maximum nf_conntrack value assigned on host:

linux:~# cat /proc/sys/net/ipv4/netfilter/ip_conntrack_max
65536

- Alternative way to check the current kernel values for nf_conntrack is through:

linux:~# /sbin/sysctl -a|grep -i nf_conntrack_max
error: permission denied on key 'net.ipv4.route.flush'
net.netfilter.nf_conntrack_max = 65536
error: permission denied on key 'net.ipv6.route.flush'
net.nf_conntrack_max = 65536

- Check the current sysctl nf_conntrack active connections

To check present connection tracking opened on a system:

:

linux:~# /sbin/sysctl net.netfilter.nf_conntrack_count
net.netfilter.nf_conntrack_count = 12742

The shown connections are assigned dynamicly on each new succesful TCP / IP NAT-ted connection. Btw, on a systems that work normally without the dmesg log being flooded with the message, the output of lsmod is:

linux:~# /sbin/lsmod | egrep 'ip_tables|conntrack'
ip_tables 9899 1 iptable_filter
x_tables 14175 1 ip_tables

On servers which are encountering nf_conntrack: table full, dropping packet error, you can see, when issuing lsmod, extra modules related to nf_conntrack are shown as loaded:

linux:~# /sbin/lsmod | egrep 'ip_tables|conntrack'
nf_conntrack_ipv4 10346 3 iptable_nat,nf_nat
nf_conntrack 60975 4 ipt_MASQUERADE,iptable_nat,nf_nat,nf_conntrack_ipv4
nf_defrag_ipv4 1073 1 nf_conntrack_ipv4
ip_tables 9899 2 iptable_nat,iptable_filter
x_tables 14175 3 ipt_MASQUERADE,iptable_nat,ip_tables

 

II. Remove completely nf_conntrack support if it is not really necessery

It is a good practice to limit or try to omit completely use of any iptables NAT rules to prevent yourself from ending with flooding your kernel log with the messages and respectively stop your system from dropping connections.

Another option is to completely remove any modules related to nf_conntrack, iptables_nat and nf_nat.
To remove nf_conntrack support from the Linux kernel, if for instance the system is not used for Network Address Translation use:

/sbin/rmmod iptable_nat
/sbin/rmmod ipt_MASQUERADE
/sbin/rmmod rmmod nf_nat
/sbin/rmmod rmmod nf_conntrack_ipv4
/sbin/rmmod nf_conntrack
/sbin/rmmod nf_defrag_ipv4

Once the modules are removed, be sure to not use iptables -t nat .. rules. Even attempt to list, if there are any NAT related rules with iptables -t nat -L -n will force the kernel to load the nf_conntrack modules again.

Btw nf_conntrack: table full, dropping packet. message is observable across all GNU / Linux distributions, so this is not some kind of local distribution bug or Linux kernel (distro) customization.
 

III. Fixing the nf_conntrack … dropping packets error

- One temporary, fix if you need to keep your iptables NAT rules is:

linux:~# sysctl -w net.netfilter.nf_conntrack_max=131072

I say temporary, because raising the nf_conntrack_max doesn't guarantee, things will get smoothly from now on.
However on many not so heavily traffic loaded servers just raising the net.netfilter.nf_conntrack_max=131072 to a high enough value will be enough to resolve the hassle.

- Increasing the size of nf_conntrack hash-table

The Hash table hashsize value, which stores lists of conntrack-entries should be increased propertionally, whenever net.netfilter.nf_conntrack_max is raised.

linux:~# echo 32768 > /sys/module/nf_conntrack/parameters/hashsize
The rule to calculate the right value to set is:
hashsize = nf_conntrack_max / 4

- To permanently store the made changes ;a) put into /etc/sysctl.conf:

linux:~# echo 'net.netfilter.nf_conntrack_count = 131072' >> /etc/sysctl.conf
linux:~# /sbin/sysct -p

b) put in /etc/rc.local (before the exit 0 line):

echo 32768 > /sys/module/nf_conntrack/parameters/hashsize

Note: Be careful with this variable, according to my experience raising it to too high value (especially on XEN patched kernels) could freeze the system.
Also raising the value to a too high number can freeze a regular Linux server running on old hardware.

- For the diagnosis of nf_conntrack stuff there is ;

/proc/sys/net/netfilter kernel memory stored directory. There you can find some values dynamically stored which gives info concerning nf_conntrack operations in "real time":

linux:~# cd /proc/sys/net/netfilter linux:/proc/sys/net/netfilter# ls -al nf_log/
total 0
dr-xr-xr-x 0 root root 0 Mar 23 23:02 ./
dr-xr-xr-x 0 root root 0 Mar 23 23:02 ../
-rw-r--r-- 1 root root 0 Mar 23 23:02 0
-rw-r--r-- 1 root root 0 Mar 23 23:02 1
-rw-r--r-- 1 root root 0 Mar 23 23:02 10
-rw-r--r-- 1 root root 0 Mar 23 23:02 11
-rw-r--r-- 1 root root 0 Mar 23 23:02 12
-rw-r--r-- 1 root root 0 Mar 23 23:02 2
-rw-r--r-- 1 root root 0 Mar 23 23:02 3
-rw-r--r-- 1 root root 0 Mar 23 23:02 4
-rw-r--r-- 1 root root 0 Mar 23 23:02 5
-rw-r--r-- 1 root root 0 Mar 23 23:02 6
-rw-r--r-- 1 root root 0 Mar 23 23:02 7
-rw-r--r-- 1 root root 0 Mar 23 23:02 8
-rw-r--r-- 1 root root 0 Mar 23 23:02 9

 

IV. Decreasing other nf_conntrack NAT time-out values to prevent server against DoS attacks

Generally, the default value for nf_conntrack_* time-outs are (unnecessery) large.
Therefore, for large flows of traffic even if you increase nf_conntrack_max, still shorty you can get a nf_conntrack overflow table resulting in dropping server connections. To make this not happen, check and decrease the other nf_conntrack timeout connection tracking values:

linux:~# sysctl -a | grep conntrack | grep timeout
net.netfilter.nf_conntrack_generic_timeout = 600
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 120
net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 60
net.netfilter.nf_conntrack_tcp_timeout_established = 432000
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 60
net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_close = 10
net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300
net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300
net.netfilter.nf_conntrack_udp_timeout = 30
net.netfilter.nf_conntrack_udp_timeout_stream = 180
net.netfilter.nf_conntrack_icmp_timeout = 30
net.netfilter.nf_conntrack_events_retry_timeout = 15
net.ipv4.netfilter.ip_conntrack_generic_timeout = 600
net.ipv4.netfilter.ip_conntrack_tcp_timeout_syn_sent = 120
net.ipv4.netfilter.ip_conntrack_tcp_timeout_syn_sent2 = 120
net.ipv4.netfilter.ip_conntrack_tcp_timeout_syn_recv = 60
net.ipv4.netfilter.ip_conntrack_tcp_timeout_established = 432000
net.ipv4.netfilter.ip_conntrack_tcp_timeout_fin_wait = 120
net.ipv4.netfilter.ip_conntrack_tcp_timeout_close_wait = 60
net.ipv4.netfilter.ip_conntrack_tcp_timeout_last_ack = 30
net.ipv4.netfilter.ip_conntrack_tcp_timeout_time_wait = 120
net.ipv4.netfilter.ip_conntrack_tcp_timeout_close = 10
net.ipv4.netfilter.ip_conntrack_tcp_timeout_max_retrans = 300
net.ipv4.netfilter.ip_conntrack_udp_timeout = 30
net.ipv4.netfilter.ip_conntrack_udp_timeout_stream = 180
net.ipv4.netfilter.ip_conntrack_icmp_timeout = 30

All the timeouts are in seconds. net.netfilter.nf_conntrack_generic_timeout as you see is quite high – 600 secs = (10 minutes).
This kind of value means any NAT-ted connection not responding can stay hanging for 10 minutes!

The value net.netfilter.nf_conntrack_tcp_timeout_established = 432000 is quite high too (5 days!)
If this values, are not lowered the server will be an easy target for anyone who would like to flood it with excessive connections, once this happens the server will quick reach even the raised up value for net.nf_conntrack_max and the initial connection dropping will re-occur again …

With all said, to prevent the server from malicious users, situated behind the NAT plaguing you with Denial of Service attacks:

Lower net.ipv4.netfilter.ip_conntrack_generic_timeout to 60 – 120 seconds and net.ipv4.netfilter.ip_conntrack_tcp_timeout_established to stmh. like 54000

linux:~# sysctl -w net.ipv4.netfilter.ip_conntrack_generic_timeout = 120
linux:~# sysctl -w net.ipv4.netfilter.ip_conntrack_tcp_timeout_established = 54000

This timeout should work fine on the router without creating interruptions for regular NAT users. After changing the values and monitoring for at least few days make the changes permanent by adding them to /etc/sysctl.conf

linux:~# echo 'net.ipv4.netfilter.ip_conntrack_generic_timeout = 120' >> /etc/sysctl.conf
linux:~# echo 'net.ipv4.netfilter.ip_conntrack_tcp_timeout_established = 54000' >> /etc/sysctl.conf

 

Share this on

How to disable tidy HTML corrector and validator to output error and warning messages

Sunday, March 18th, 2012

I've noticed in /var/log/apache2/error.log on one of the Debian servers I manage a lot of warnings and errors produced by tidy HTML syntax checker and reformatter program.

There were actually quite plenty frequently appearing messages in the the log like:

...
To learn more about HTML Tidy see http://tidy.sourceforge.net
Please fill bug reports and queries using the "tracker" on the Tidy web site.
Additionally, questions can be sent to html-tidy@w3.org
HTML and CSS specifications are available from http://www.w3.org/
Lobby your company to join W3C, see http://www.w3.org/Consortium
line 1 column 1 - Warning: missing <!DOCTYPE> declaration
line 1 column 1 - Warning: plain text isn't allowed in <head> elements
line 1 column 1 - Info: <head> previously mentioned
line 1 column 1 - Warning: inserting implicit <body>
line 1 column 1 - Warning: inserting missing 'title' element
Info: Document content looks like HTML 3.2
4 warnings, 0 errors were found!
...

I did a quick investigation on where from this messages are logged in error.log, and discovered few .php scripts in one of the websites containing the tidy string.
I used Linux find + grep cmds find in all php files the "tidy "string, like so:

server:~# find . -iname '*.php'-exec grep -rli 'tidy' '{}' \;
find . -iname '*.php' -exec grep -rli 'tidy' '{}' \; ./new_design/modules/index.mod.php
./modules/index.mod.php
./modules/index_1.mod.php
./modules/index1.mod.php

Opening the files, with vim to check about how tidy is invoked, revealed tidy calls like:

exec('/usr/bin/tidy -e -ashtml -utf8 '.$tmp_name,$rett);

As you see the PHP programmers who wrote this website, made a bigtidy mess. Instead of using php5's tidy module, they hard coded tidy external command to be invoked via php's exec(); external tidy command invocation.
This is extremely bad practice, since it spawns the command via a pseudo limited apache shell.
I've notified about the issue, but I don't know when, the external tidy calls will be rewritten.

Until the external tidy invocations are rewritten to use the php tidy module, I decided to at least remove the tidy warnings and errors output.

To remove the warning and error messages I've changed:

exec('/usr/bin/tidy -e -ashtml -utf8 '.$tmp_name,$rett);

exec('/usr/bin/tidy --show-warnings no --show-errors no -q -e -ashtml -utf8 '.$tmp_name,$rett);

The extra switches meaning is like so:

q – instructs tidy to produce quiet output
-e – show only errors and warnings
–show warnings no && –show errors no, completely disable warnings and error output

Onwards tidy no longer logs junk messages in error.log Not logging all this useless warnings and errors has positive effect on overall server performance especially, when the scripts, running /usr/bin/tidy are called as frequently as 1000 times per sec. or more

Share this on

How to solve “[crit] [client xxx.xxx.xxx.xxx] (13)Permission denied: /var/lib/ejabberd/.htaccess pcfg_openfile: unable to check htaccess file, ensure it is readable, referer: http://jabber.mydomain.com/” – Error Cause and Solution

Thursday, January 5th, 2012

While configuring JWchat domain, I've come across around an error:

pcfg_openfile: unable to check htaccess file, ensure it is readable

The exact error I got in /var/log/apache2/error.log looked like so:

[crit] [client xxx.xxx.xxx.xxx] (13)Permission denied: /var/lib/ejabberd/.htaccess pcfg_openfile: unable to check htaccess file, ensure it is readable, referer: http://jabber.mydomain.com/

The error message suggested /var/lib/ejabberd/.htaccess – is missing or not readable, however after checking i've seen .htaccess existed as well as was readable:

debian:~# ls -al /var/lib/ejabberd/.htaccess
-rw-r--r-- 1 www-data www-data 114 2012-01-05 07:44 /var/lib/ejabberd/.htaccess

At first glimpse it seems like the message is misleading and not true, however when I switched to www-data user (the user with which Apache runs on Debian), I've figured out the error meaning of unreadability is exactly correct:

www-data@debian:$ ls -al /var/lib/ejabberd/.htaccess
ls: cannot access /var/lib/ejabberd/.htaccess: Permission denied

This permission denied was quite strange, especially when considering the .htaccess is readable for all users:

debian:~# ls -al /var/lib/ejabberd/.htaccess
-rw-r--r-- 1 www-data www-data 114 2012-01-05 07:44 /var/lib/ejabberd/.htaccess

After a thorough look on what might go wrong, thanksfully I've figured it out. The all issues were caused by wrong permissions of /var/lib/ejabberd/.htaccess .You can see below the executable flag for all users (including apache's www-data) is missing :

debian:/var/lib# ls -ld /var/lib/ejabberd/drw-r--r-- 3 ejabberd ejabberd 4096 2012-01-05 07:45 /var/lib/ejabberd/

Solving the error, hence is as easy as adding +x flag to /var/lib/ejabberd :

debian:/var/lib# chmod +x /var/lib/ejabberd

Another way to fix the error is to chmod to 755 to the directory which holds .htaccess:

From now onwards pcfg_openfile: unable to check htaccess file, ensure it is readable err is no more ;)

Share this on

How to test if USB Camera is working with Cheese on GNU / Linux

Friday, December 23rd, 2011

I just bought an USB Camera (my notebook doesn't include an embedded camera). The camera is some infamous brand chineese name Eilondo
and on the camera all that is written is SUPER USB2.0 1.3 mega pixel

I bought exactly this camera because I was said by the shop reseller that the camera works without any driver installations on Windows XP and Windows Vista

On my Debian Squeeze GNU / Linux it was detected in dmesg without any troubles, here is how the camera got detected in my kernel log :

debian:~# dmesg |tail -n 10
[25385.734932] usb 2-1: USB disconnect, address 4
[25388.905049] usb 2-1: new high speed USB device using ehci_hcd and address 5
[25389.050753] usb 2-1: New USB device found, idVendor=1e4e, idProduct=0102
[25389.050757] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[25389.050760] usb 2-1: Product: USB2.0 Camera
[25389.050762] usb 2-1: Manufacturer: Etron Technology, Inc.
[25389.050936] usb 2-1: configuration #1 chosen from 1 choice
[25389.056056] uvcvideo: Found UVC 1.00 device USB2.0 Camera (1e4e:0102)
[25389.058242] uvcvideo: UVC non compliance - GET_DEF(PROBE) not supported. Enabling workaround.
[25389.059113] input: USB2.0 Camera as /devices/pci0000:00/0000:00:1d.7/usb2/2-1/2-1:1.0/input/input26

I was troubled by the message uvcvideo: UVC non compliance – GET_DEF(PROBE) not supported. Enabling workaround. , and hence looked for an application to test if the camera can recored properly.

While checking in packages available in Software Center , I found a plenty of programs under the search keyword Camera
I however decided to test it using just one application Cheese A tool to take pictures and videos from your webcam which I've seen to be quite popular among Liunx users.
Cheese is part of GNOME Desktop, so that was another reason I decided to give it a try. I was pleasently surprised about how good these tiny but functional proggie is.

To run Cheese in GNOME I nagivated to the menus:

Applications -> Sound & Video -> Cheese Webcam Booth

Just in case if Cheese is not installed, installing it with apt:

debian:~# apt-get install cheese

Cheese has capabilities to take pictures, a consequential photos take up, as well as create Video movies.

Cheese take camera testing tool Debian GNU Linux

The program has support to apply 12 Effects / (Masks) to add some fun to the pictures or video snapshots.

Test Video Camera on Debian Linux Cheese Effects

Probably the best thing about Cheese is its simplistic interface, which for me personally is a main criterion to evaluate a program quality ;) .

Share this on

How to solve “IPv6 addrconf: prefix with wrong length 48″

Friday, December 9th, 2011

While reading some log files on one of the co-located servers at UK2.net , I've noticed dmesg log was filling in with tons of junk messages like:

[4288245.609762] IPv6 addrconf: prefix with wrong length 48
[4288445.984153] IPv6 addrconf: prefix with wrong length 48
[4288646.296110] IPv6 addrconf: prefix with wrong length 48
[4288846.609119] IPv6 addrconf: prefix with wrong length 48
[4289046.922604] IPv6 addrconf: prefix with wrong length 48
[4289247.267273] IPv6 addrconf: prefix with wrong length 48
[4289447.545800] IPv6 addrconf: prefix with wrong length 48
[4289647.857789] IPv6 addrconf: prefix with wrong length 48
[4289848.169308] IPv6 addrconf: prefix with wrong length 48
[4290048.595104] IPv6 addrconf: prefix with wrong length 48
[4290248.808497] IPv6 addrconf: prefix with wrong length 48
[4290449.103503] IPv6 addrconf: prefix with wrong length 48
[4290649.418747] IPv6 addrconf: prefix with wrong length 48
[4290849.742731] IPv6 addrconf: prefix with wrong length 48

After checking the message to make sure it would not suddeny lead to server hang ups I figured out the message is not dangerous but just an annoying warning that some other (routing) host on the same network as mine is advertising something using IPv6, that doesn't fit with my IPv6 server config.
Actually the server doesn't use the IPv6 configuration at all, and the assigned configuration is just some kind of auto set IPv6 IP address.
The server, where this message appeared is powered by 64 bit Debian GNU / Linux Squeeze

To resolve the annoying message, 5 of the kernel sysctl settings needs to be modified with cmds:

debian:~# sysctl net.ipv6.conf.all.accept_ra=0
debian:~# sysctl net.ipv6.conf.all.autoconf=0
debian:~# sysctl net.ipv6.conf.lo.autoconf=0
debian:~# sysctl net.ipv6.conf.eth0.autoconf=0
debian:~# sysctl net.ipv6.conf.eth1.autoconf=0

Furthermore to prevent the IPv6 addrconf: prefix with wrong length 48 to re-appear after future server reboots / boots the two sysctl values of course needs to be included in /etc/sysctl.conf e.g.:

debian:~# echo 'net.ipv6.conf.all.accept_ra = 0' >> /etc/sysctl.conf
debian:~# echo 'net.ipv6.conf.all.autoconf = 0' >> /etc/sysctl.conf
echo 'net.ipv6.conf.lo.autoconf = 0' >> /etc/sysctl.conf
echo 'net.ipv6.conf.eth0.autoconf = 0' >> /etc/sysctl.conf
echo 'net.ipv6.conf.eth1.autoconf = 0' >> /etc/sysctl.conf

My server has 2 etherhet interfaces, eth0 and eth1 that's the reason I had to set up autoconf kernel the two vars net.ipv6.conf.eth0.autoconf and net.ipv6.conf.eth1.autoconf , for more interfaces more kernel vars (eth2, eth3) etc. needs to be set to "0"

I've seen posts online of people complaining about a similar errors to IPv6 addrconf: prefix with wrong length 48, like:

IPv6 addrconf: prefix with wrong length 96
IPv6 addrconf: prefix with wrong length 128

The solution to this messages is also done by setting the above described sysctl kernel vars. Setting the vars will suppress the messages which by the way with time could take up A LOT of disk space and fills /var/log/dmesg with this useless message, hence applying the "fix" is a must ;)

Another thing, I've noticed while I was researching about the error and the respective fix is that people on other deb based distributions like Ubuntu as well as on Fedora GNU / Linux had also experienced the issue.

Share this on

How to fix “sslserver: fatal: unable to load certificate” Qmail error on GNU / Linux

Friday, October 14th, 2011

After setupping a brand new Qmail installation following the QmailRocks Thibs Qmail Debian install guide , I've come across unexpected re-occuring error message in /var/log/qmail/qmail-smtpdssl/ , here is the message:

@400000004e9807b10d8bdb7c command-line: exec sslserver -e -vR -l my-mailserver-domain.com -c 30 -u 89 -g 89 \
-x /etc/tcp.smtp.cdb 0 465 rblsmtpd -r zen.spamhaus.org -r dnsbl.njabl.org -r dnsbl.sorbs.net -r bl.spamcop.net qmail-smtpd \
my-mailserver-domain.com /home/vpopmail/bin/vchkpw /bin/true 2>&1
@400000004e9807b10dae2ca4 sslserver: fatal: unable to load certificate

I was completely puzzled initially by the error as the sertificate file /var/qmail/control/servercert.pem was an existing and properly self generated one. Besides that qmail daemontools init script /service/qmail-smtpd/run was loading the file just fine, where the same file failed to get loaded when sslserver command with the cert argument was invoked via /service/qmail-smtpdssl/run

It took me quite a while to thoroughfully investigate on what's wrong with the new qmail install. Thanksfully after almost an hour of puzzling I found it out and I was feeling as a complete moron to find that the all issues was caused by incorrect permissions of the /var/qmail/control/servercert.pem file.
Here are the incorrect permissions the file possessed:

linux:~# ls -al /var/qmail/control/servercert.pem
-rw------- 1 qmaild qmail 2311 2011-10-12 13:21 /var/qmail/control/servercert.pem

To fix up the error I had to allow all users to have reading permissions over servercert.pem , e.g.:

linux:~# chmod a+r /var/qmail/control/servercert.pem

After adding all users readable bit on servercert.pem the file permissions are like so:

linux:~# ls -al /var/qmail/control/servercert.pem
-rw-r--r-- 1 qmaild qmail 2311 2011-10-12 13:21 /var/qmail/control/servercert.pem

Consequently I did a qmail restart to make sure the new readable servercert.pem will get loaded from the respective init script:

linux:~# qmailctl restart
* Stopping qmail-smtpdssl.
* Stopping qmail-smtpd.
* Sending qmail-send SIGTERM and restarting.
* Restarting qmail-smtpd.
* Restarting qmail-smtpdssl.

Now the annoying sslserver: fatal: unable to load certificate message is no more and all works fine, Hooray! ;)

Share this on

How to set up Path to .exe GNUWin32 binary files in Windows XP / Vista / 2003 / 2008 (Setting PATH to executables on Windows)

Tuesday, August 23rd, 2011

I’ve been working on a servers running Windows 2003 and Windows 2008 these days.
As I wanted to be more flexible on what I can do from the command line I decided to install GNUwin (provides port of GNU tools), most of which are common part of any Linux distribution).
Having most of the command line flexibility on a Windows server is a great thing, so I would strongly recommend GNUWin to any Windows server adminsitrator out there.

Actually it’s a wonderful thing that most of the popular Linux tools can easily be installed and used on Windows for more check GnuWin32 on sourceforge

One of the reasons I installed Gnuwin was my intention to use the good old Linux tail command to keep an eye interactive on the IIS server access log files, which by the way for IIS webserver are stored by default in C:\Windows\System32\LogFiles\W3SVC1\*.log

I’ve managed to install the GNUWin following the install instructions, not with too much difficulties. The install takes a bit of time, cause many packs containing different parts of the GNUWin has to be fetched.

To install I downloaded the GNUWin installer available from GNUWin32′s website and instructed to extracted the files into C:\Program Files\Gnuwin
Then I followed the install instructions suggestions, e.g.:

C:\> cd c:\Program Files\GnuWin
C:\Program Files\GnuWin> download.bat
...
C:\Program Files\GnuWin> install c:\gnuwin32
...

After the installation was succesfully completed on the two Windows machines, both of which by the way are running 64 bit Windows, it was necessery to add the newly installed GNU .exe files to my regular cmd.exe PATH variable in order to be able to access the sed, tail and the rest of the gnuwin32 command line tools.

In order to add C:\GnuWin32\bin directory to the windows defined Command line Path , I had to do the following:

a. Select (Properties) for My Computer

Start (button) -> My Computer (choose properties)

b. Select the My Computer Advanced (tab)

Then, from the My Computer pane press on Advanced tab

c. Next press on Environment Variables

Windows environment variables screenshot

You see in above’s screenshot the Environment Variables config dialog, to add the new path location in System Variables sectiom, between the list I had to add the c:\GNUwin32\bin path locatiion. To find I pressed on Edit button scrolled down to find the Variable and hence added at the end of the long list defined paths.
After adding in GNUwin, the Windows path looks like this:

C:\Program Files (x86)\EWANAPI;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\Program Files (x86)\Intel\NGSMS\MPFiles;C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\;C:\Program Files\Microsoft SQL Server\100\Tools\Binn\;C:\Program Files\Microsoft SQL Server\100\DTS\Binn\;C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\VSShell\Common7\IDE\;C:\Program Files (x86)\Microsoft Visual Studio 9.0\Common7\IDE\PrivateAssemblies\;C:\Program Files (x86)\Microsoft SQL Server\100\DTS\Binn\;C:\WINDOWS\system32\WindowsPowerShell\v1.0;C:\gnuwin32\bin

Further on, I launched the tail command to intercatively take an eye on who is accessing the IIS webserver.
Sadly this worked not, trying to use tail with the IIS ex10116.log log;

C:\Windows\System32\Logfiles\W3SVC1> tail -f ex10116.log

Spit an error tail: ex10116.log: Bad file descriptor

Since I couldn’t use tail -f I looked for alternative and a quick search led me to Tail 4 Win32 . Lest the name suggests it is supposed to work on 32 bit arch Windows the version on tailforwin32′s website is working perfectly fine on 64 bit Windows as well.
What it does is to simulate a normal tail -f command inside a very simplistic window interface. You see it in action with opened IIS log on below’s screenshot:

GUI Tail for Windows screenshot

Finally my goal is achieved and I can take an eye interactively on IIS logs. End of the article, hope it wasn’t too boring ;)

Share this on

How to auto restart CentOS Linux server with software watchdog (softdog) to reduce server downtime

Wednesday, August 10th, 2011

How to auto restart centos with software watchdog daemon to mitigate server downtimes, watchdog linux artistic logo

I’m in charge of dozen of Linux servers these days and therefore am required to restart many of the servers with a support ticket (because many of the Data Centers where the servers are co-located does not have a web interface or IPKVM connected to the server for that purpose). Therefore the server restart requests in case of crash sometimes gets processed in few hours or in best case in at least half an hour.

I’m aware of the existence of Hardware Watchdog devices, which are capable to detect if a server is hanged and auto-restart it, however the servers I administrate does not have Hardware support for Watchdog timer.

Thanksfully there is a free software project called Watchdog which is easily configured and mitigates the terrible downtimes caused every now and then by a server crash and respective delays by tech support in Data Centers.

I’ve recently blogged on the topic of Debian Linux auto-restart in case of kernel panic , however now i had to conifgure watchdog on some dozen of CentOS Linux servers.

It appeared installation & configuration of Watchdog on CentOS is a piece of cake and comes to simply following few easy steps, which I’ll explain quickly in this post:

1. Install with yum watchdog to CentOS

[root@centos:/etc/init.d ]# yum install watchdog
...

2. Add to configuration a log file to log watchdog activities and location of the watchdog device

The quickest way to add this two is to use echo to append it in /etc/watchdog.conf:

[root@centos:/etc/init.d ]# echo 'file = /var/log/messages' >> /etc/watchdog.conf
echo 'watchdog-device = /dev/watchdog' >> /etc/watchdog.conf

3. Load the softdog kernel module to initialize the software watchdog via /dev/watchdog

[root@centos:/etc/init.d ]# /sbin/modprobe softdog

Initialization of softdog should be indicated by a line in dmesg kernel log like the one above:

[root@centos:/etc/init.d ]# dmesg |grep -i watchdog
Software Watchdog Timer: 0.07 initialized. soft_noboot=0 soft_margin=60 sec (nowayout= 0)

4. Include the softdog kernel module to load on CentOS boot up

This is necessery, because otherwise after reboot the softdog would not be auto initialized and without it being initialized, the watchdog daemon service could not function as it does automatically auto reboots the server if the /dev/watchdog disappears.

It’s better that the softdog module is not loaded via /etc/rc.local but the default CentOS methodology to load module from /etc/rc.module is used:

[root@centos:/etc/init.d ]# echo modprobe softdog >> /etc/rc.modules
[root@centos:/etc/init.d ]# chmod +x /etc/rc.modules

5. Start the watchdog daemon service

The succesful intialization of softdog in step 4, should have provided the system with /dev/watchdog, before proceeding with starting up the watchdog daemon it’s wise to first check if /dev/watchdog is existent on the system. Here is how:

[root@centos:/etc/init.d ]# ls -al /dev/watchdogcrw------- 1 root root 10, 130 Aug 10 14:03 /dev/watchdog

Being sure, that /dev/watchdog is there, I’ll start the watchdog service.

[root@centos:/etc/init.d ]# service watchdog restart
...

Very important note to make here is that you should never ever configure watchdog service to run on boot time with chkconfig. In other words the status from chkconfig for watchdog boot on all levels should be off like so:

[root@centos:/etc/init.d ]# chkconfig --list |grep -i watchdog
watchdog 0:off 1:off 2:off 3:off 4:off 5:off 6:off

Enabling the watchdog from the chkconfig will cause watchdog to automatically restart the system as it will probably start the watchdog daemon before the softdog module is initialized. As watchdog will be unable to read the /dev/watchdog it will though the system has hanged even though the system might be in a boot process. Therefore it will end up in an endless loops of reboots which can only be fixed in a linux single user mode!!! Once again BEWARE, never ever activate watchdog via chkconfig!

Next step to be absolutely sure that watchdog device is running it can be checked with normal ps command:

[root@centos:/etc/init.d ]# ps aux|grep -i watchdog
root@hosting1-fr [~]# ps axu|grep -i watch|grep -v greproot 18692 0.0 0.0 1816 1812 ? SNLs 14:03 0:00 /usr/sbin/watchdog
root 25225 0.0 0.0 0 0 ? ZN 17:25 0:00 [watchdog] <defunct>

You have probably noticed the defunct state of watchdog, consider that as absolutely normal, above output indicates that now watchdog is properly running on the host and waiting to auto reboot in case of sudden /dev/watchdog disappearance.

As a last step before, after being sure its initialized properly, it’s necessery to add watchdog to run on boot time via /etc/rc.local post init script, like so:

[root@centos:/etc/init.d ]# echo 'echo /sbin/service watchdog start' >> /etc/rc.local

Now enjoy, watchdog is up and running and will automatically restart the CentOS host ;)

Share this on

How to make pptp VPN connection to use IPMI port (IPKVM / Web KVM) on Debian Linux

Wednesday, July 27th, 2011

If you have used KVM, before you certainly have faced the requirement asked by many Dedicated Server Provider, for establishment of a PPTP (mppe / mppoe) or the so called Microsoft VPN tunnel to be able to later access via the tunnel through a Private IP address the web based Java Applet giving control to the Physical screen, monitor and mouse on the server.

This is pretty handy as sometimes the server is not booting and one needs a further direct access to the server physical Monitor.
Establishing the Microsoft VPN connection on Windows is a pretty trivial task and is easily achieved by navigating to:

Properties > Networking (tab) > Select IPv4 > Properties > Advanced > Uncheck "Use default gateway on remote network".

However achiving the same task on Linux seemed to be not such a trivial, task and it seems I cannot find anywhere information or precise procedure how to establish the necessery VPN (ptpt) ms encrypted tunnel.

Thanksfully I was able to find a way to do the same tunnel on my Debian Linux, after a bunch of experimentation with the ppp linux command.

To be able to establish the IPMI VPN tunnel, first I had to install a couple of software packages, e.g.:

root@linux:~# apt-get install ppp pppconfig pppoeconf pptp-linux

Further on it was necessery to load up two kernel modules to enable the pptp mppe support:

root@linux:~# modprobe ppp_mppe
root@linux:~# modprobe ppp-deflate

I've also enabled the modules to be loading up during my next Linux boot with /etc/modules to not be bother to load up the same modules after reboot manually:

root@linux:~# echo ppp_mppe >> /etc/modules
root@linux:~# echo ppp-deflate >> /etc/modules

Another thing I had to do is to enable the require-mppe-128 option in /etc/ppp/options.pptp.
Here is how:

root@linux:~# sed -e 's$#require-mppe-128$require-mppe-128$g' /etc/ppp/options.pptp >> /tmp/options.pptp
root@linux:~# mv /tmp/options.pptp /etc/ppp/options.pptp
root@linux:~# echo 'nodefaultroute' >> /etc/ppp/options.pptp

In order to enable debug log for the ppp tunnel I also edited /etc/syslog.conf and included the following configuration inside:

root@linux:~# vim /etc/syslog.conf
*.=debug;\
news.none;mail.none -/var/log/debug
*.=debug;*.=info;\
*.=debug;*.=info;\
root@linux:~# killall -HUP rsyslogd

The most important part of course is the command line with ppp command to connect to the remote IP via the VPN tunnel ;) , here is how I achieved that:

root@linux:~# pppd debug require-mppe pty "pptp ipmiuk2.net --nolaunchpppd" file /etc/ppp/options.pptp user My_Dedi_Isp_Given_Username password The_Isp_Given_Password


This command, brings up the ppp interface and makes the tunnel between my IP and the remote VPN target host.

Info about the tunnel could be observed with command:

ifconfig -a ppp
ppp0 Link encap:Point-to-Point Protocol
inet addr:10.20.254.32 P-t-P:10.20.0.1 Mask:255.255.255.255
UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1496 Metric:1
RX packets:7 errors:0 dropped:0 overruns:0 frame:0
TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:3
RX bytes:70 (70.0 B) TX bytes:672 (672.0 B)

One more thing before I could finally access the IPMI's web interface via the private IP was to add routing to the private IP address via the tunnel other side IP address:

# 10.20.0.1 P-t-P IP address
ip route add 10.20.1.124/32 dev ppp0

Now logically one would thing the Web interface to login and use the Java Applet to connect to the server would be accessible but no IT wasn't !

It took me a while to figure out what is the problem and if not the guys in irc.freenode.net ##networking helped me I would never really find out why http://10.20.1.124/ and https://10.20.1.124/ were inaccessible.

Strangely enough both ports 80 and 443 were opened on 10.20.1.124 and it seems like working, however though I can ping both 10.20.1.124 and 10.20.0.1 there was no possible way to access 10.20.1.124 with TCP traffic.

Routing to the Microsoft Tunnel was fine as I've double checked all was fine except whether I tried accessing the IPMI web interface the browser was trying to open the URL and keeps opening like forever.

Thanksfully after a long time of futile try outs, a tip was suggested by a good guy in freenode nick named ne2k

To make the TCP connection in the Microsoft Tunnel work and consequently be able to access the webserver on the remote IPMI host, one needs to change the default MTU set for the ppp0 tunnel interface.
Here is how:


ip link set ppp0 mtu 1438

And tadam! It's done now IPKVM is accessible via http://10.20.1.124 or https://10.20.1.124 web interface. Horay ! :)

Share this on

How to reboot remotely Linux server if reboot, shutdown and init commands are not working (/sbin/reboot: Input/output error) – Reboot Linux in emergency using MagicSysRQ kernel sysctl variable

Saturday, July 23rd, 2011

SysRQ an alternative way to restart unrestartable Linux server

I’ve been in a situation today, where one Linux server’s hard drive SCSI driver or the physical drive is starting to break off where in dmesg kernel log, I can see a lot of errors like:

[178071.998440] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[178071.998440] end_request: I/O error, dev sda, sector 89615868

I tried a number of things to remount the hdd which was throwing out errors in read only mode, but almost all commands I typed on the server were either shown as missng or returning an error:
Input/output error

Just ot give you an idea what I mean, here is a paste from the shell:

linux-server:/# vim /etc/fstab
-bash: vim: command not found
linux-server:/# vi /etc/fstab
-bash: vi: command not found
linux-server:/# mcedit /etc/fstab
-bash: /usr/bin/mcedit: Input/output error
linux-server:/# fdisk -l
-bash: /sbin/fdisk: Input/output error

After I’ve tried all kind of things to try to diagnose the server and all seemed failing, I thought next a reboot might help as on server boot the filesystems will get checked with fsck and fsck might be able to fix (at least temporary) the mess.

I went on and tried to restart the system, and guess what? I got:

/sbin/reboot init Input/output error

I hoped that at least /sbin/shutdown or /sbin/init commands might work out and since I couldn’t use the reboot command I tried this two as well just to get once again:

linux-server:/# shutdown -r now
bash: /sbin/shutdown: Input/output error
linux-server:/# init 6
bash: /sbin/init: Input/output error

You see now the situation was not pinky, it seemed there was no way to reboot the system …
Moreover the server is located in remote Data Center and I the tech support there is conducting assigned task with the speed of a turtle.
The server had no remote reboot, web front end or anything and thefore I needed desperately a way to be able to restart the machine.

A bit of research on the issue has led me to other people who experienced the /sbin/reboot init Input/output error error mostly caused by servers with failing hard drives as well as due to HDD control driver bugs in the Linux kernel.

As I was looking for another alternative way to reboot my Linux machine in hope this would help. I came across a blog post Rebooting the Magic Way http://www.linuxjournal.com/content/rebooting-magic-way

As it was suggested in Cory’s blog a nice alternative way to restart a Linux machine without using reboot, shutdown or init cmds is through a reboot with the Magic SysRQ key combination

The only condition for the Magic SysRQ key to work is to have enabled the SysRQ – CONFIG_MAGIC_SYSRQ in Kernel compile time.
As of today luckily SysRQ Magic key is compiled and enabled by default in almost all modern day Linux distributions in this numbers Debian, Fedora and their derivative distributions.

To use the sysrq kernel capabilities as a mean to restart the server, it’s necessery first to activate the sysrq through sysctl, like so:

linux-server:~# sysctl -w kernel.sysrq=1
kernel.sysrq = 1

I found enabling the kernel.sysrq = 1 permanently in the kernel is also quite a good idea, to achieve that I used:

echo 'kernel.sysrq = 1' >> /etc/sysctl.conf

Next it’s wise to use the sync command to sync any opened files on the server as well stopping as much of the server active running services (MySQL, Apache etc.).

linux-server:~# sync

Now to reboot the Linux server, I used the /proc Linux virtual filesystem by issuing:

linux-server:~# echo b > /proc/sysrq-trigger

Using the echo b > /proc/sysrq-trigger simulates a keyboard key press which does invoke the Magic SysRQ kernel capabilities and hence instructs the kernel to immediately reboot the system.
However one should be careful with using the sysrq-trigger because it’s not a complete substitute for /sbin/reboot or /sbin/shutdown -r commands.
One major difference between the standard way to reboot via /sbin/reboot is that reboot kills all the running processes on the Linux machine and attempts to unmount all filesystems, before it proceeds to sending the kernel reboot instruction.

Using echo b > /proc/sysrq-trigger, however neither tries to umount mounted filesystems nor tries to kill all processes and sync the filesystem, so on a heavy loaded (SQL data critical) server, its use might create enormous problems and lead to severe data loss!

SO BEWARE be sure you know what you’re doing before you proceed using /proc/sysrq-trigger as a way to reboot ;) .

Share this on