Posts Tagged ‘type’

How to convert file content encoded in windows-cp1251 charset to UTF-8 (with iconv) to be delivered properly encoded to browsing end clients

Wednesday, May 16th, 2012

windows-cp1251 bulgarian to UTF-8 / Encoding Communication Decoding Communication Funny Picture

I have a bunch of old html files all encoded in the historically obsolete Windows-cp1251. Windows-CP1251 used to be common used 7 years ago and therefore still big portions of the web content in Bulgarian / Russian Cyrillic is still transferred to the end users in this encoding.

This was just before the "UTF-8 revolution", where massively people started using UTF-8,
Well it was clear the specific national country text encoding standards will quickly be moved by to UTF-8 – Universal Encoding format which abbreviation stands for (Unicode Transformation Format).

Though UTF-8 was clear to be "the future", many web developers mostly because of their incompetency or using an old sources of learning how to writen in HTML continued to use windows-cp1251 in HTMLs. I'm even convinced, there are still developers out there who are writting websites for Bulgarian / Russian / Macedonian customers using obsolete encodings …

The smarter developers of those accustomed to windows-cp1251, KOI-8R etc. etc., were using the meta tag to specify the type of charset of the web page content with:

<meta http-equiv="content-type" content="text/html;charset=windows-cp1251">

or

<meta http-equiv="content-type" content="text/html;charset=koi-8r">

Anyhow, still many devs even didn't placed the windows-cp1251 in the head of the HTML …

The result for the system administrator is always a mess – a lot of webpages that are showing like unreadable signs and tons of unhappy customers.
As always the system administrator is considered responsible, for the programmer mistakes :) . So instead of programmers fix their bad cooking, the admin has to fix it all!

One quick work around me as admin has applied to failing to display pages in Cyrillic using the Windows-cp1251 character encoding was to force windows-cp1251 as a default encoding for the whole virtualhost or Apache directory with Apache directives like:

<VirtualHost *:80>
ServerAdmin some_user@some_host.com
DocumentRoot /var/www/html
AddDefaultCharset windows-cp1251
ServerName the_host_name.com
ServerAlias www.the_host_name.com
....
....
<Directory>
AddDefaultCharset windows-cp1251
>/Directory>
</VirtualHost>

Though this mostly would, work there are some occasions, where only a particular html files from all the content served by Apache is encoded in windows-cp1251, if most of the content is already written in UTF-8, this could be a big issues as you cannot just change the UTF-8 globally to windows-cp1251, just because few pages are written in archaic encoding….
Since most of the content is displayed to the client by Apache (as prior explained) just fine, only particular htmls lets's ay single.html, single2.html etc. etc. are displayed with some question marks or some non-human readable "hieroglyphs".

Below is a screenshot from two pages returned to my browser in wrongly set htmls charset:

Improper Windows CP1251 encoding with Apache set to serve UTF-8 encoding questiomarks

Improper Windows CP1251 delivered page in UTF-8 browser view

Apache returns cp1251 in some non-UTF8 wrong encoding (webserver improperly served cyrillic encoding)

Improperly served encoding CP1251 delivered by Apache in non-utf-8 encoding

When this kind of issues occur, the only solution is to simply login to the server and use iconv command to convert all files returning unreadable content from whatever the non UTF-8 encoding is lets say in my case Bulgarian typeset of cp1251 to UTF-8

Here is how the iconv command to convert between windows-cp1251 to utf-8 the two sample files named single1.html and single2.html

server:/web# /usr/bin/iconv -f WINDOWS-1251 -t UTF-8 single1.html > single1.html.utf8
server:/web# mv single1.html single1.html.bak;
server:/web# mv single1.html.utf8 single1.html
server:/web# /usr/bin/iconv -f WINDOWS-1251 -t UTF-8 single2.html > single2.html.utf8
server:/web# mv single2.html single2.html.bak;
server:/web# mv single2.html.utf8 single2.html

I always, make copies of the original cp1251 encoded files (as you see mv single1.html single1.html.bak), because if something goes wrong with convertion I can easily revert back.

If there are 10 files with consequential numbers naming they can be converted using a short for loop, like so:

server:/web# for i $(seq 1 10); do
/usr/bin/iconv -f WINDOWS-1251 -t UTF-8 single$i.html > single$i.html.utf8;mv single$i.html single$i.html.bak
mv single$i.html.utf8 single$i.html
done

Just as earlier mentioned if single1.html, single2.html … has in the html <head>:

<meta http-equiv="Content-Type" content="text/html; charset=windows-1251">

You should open, each of the files in question and wipe out the line either by hand or use sed to wipe it in one loop if it has to be done for lets say 10 files named (single{1..10})

server:/web# for i in $(seq 1 10); do
sed '/<meta http-equiv="Content-Type" content="text\/html; charset=windows-1251>/d' single$i.txt > single$i.txt.new;
mv single$i.txt single$i.txt.bak;
mv single$i.txt.new single$i.txt

Well now,

Share this on

How to disable ACPI (power saving) support in FreeBSD / Disable acpi on BSD kernel boot time

Tuesday, May 15th, 2012

FreeBSD disable ACPI how ACPI Basic works basic diagram

On FreeBSD the default kernel is compiled to support ACPI. Most of the modern PCs has already embedded support for ACPI power saving instructions.
Therefore a default installed FreeBSD is trying to take advantage of this at cases and is trying to save energy.
This is not too useful on servers, because saving energy could have at times a bad impact on server performance if the server is heavy loaded at times and not so loaded at other times of the day.

Besides that on servers saving energy shouldn't be the main motivator but server stability and productivity is. Therefore in my personal view on FreeBSD used on servers it is better to disable complete the ACPI in order to disable CPU fan control to change rotation speeds all the time from low to high rotation cycles and vice versa at times of low / high server load.

Another benefit of removing the ACPI support on a server is this would probably increase the CPU fan life span and possibly prevent the CPU to be severely heated at times.

Moreover, some piece of hardware might have troubles in properly supporting ACPI specifications and thus ACPI could be a reason for unexpected machine hang ups.

With all said I would recommend to anyone willing to use BSD for a server to disable the ACPI (Advanced Configuration and Power Interface), just like I did.

Here is how;

1. Quick review on how ACPI is handled on FreeBSD

acpi support is being handled on FreeBSD by a number of loadable kernel modules, here is a complete list of all the kernel modules dealins with acpi:

freebsd# cd /boot
freebsd# find . -iname '*acpi*.ko'
./kernel/acpi.ko
./kernel/acpi_aiboost.ko
./kernel/acpi_asus.ko
./kernel/acpi_fujitsu.ko
./kernel/acpi_ibm.ko
./kernel/acpi_panasonic.ko
./kernel/acpi_sony.ko
./kernel/acpi_toshiba.ko
./kernel/acpi_video.ko
./kernel/acpi_dock.ko

By default on FreeBSD, if hardware has some support for ACPI the acpi gets activated by acpi.ko kernel module. The specific type of vendors specific ACPI like IBM, ASUS, Fujitsu are controlled by the respective kernel module from the list …

Hence, to control if ACPI is loaded or not on a FreeBSD system with no need to reboot one can use kldload, kldunload module management BSD cmds.

a) Check if acpi is loaded on a BSD

freebsd# kldstatkldstat | grep -i acpi
9 1 0xc9260000 57000 acpi.ko

b) unload kernel enabled ACPI support

freebsd# kldunload acpi

c) Load acpi support (not the case with me but someone might need it, if for instance BSD is running on laptop)

freebsd# kldload acpi

2. Disabling ACPI to load on bootup on BSD

a) In /boot/loader.conf add the following variables:

hint.acpi.0.disabled="1"
hint.p4tcc.0.disabled=1
hint.acpi_throttle.0.disabled=1


b) in /boot/device.hints add:

hint.acpi.0.disabled="1"

c) in /boot/defaults/loader.conf make sure:

##############################################################
### ACPI settings ##########################################
##############################################################
acpi_dsdt_load="NO" # DSDT Overriding
acpi_dsdt_type="acpi_dsdt" # Don't change this
acpi_dsdt_name="/boot/acpi_dsdt.aml"
# Override DSDT in BIOS by this file
acpi_video_load="NO" # Load the ACPI video extension driver

d) disable ACPI thermal monitoring

It is generally a good idea to disable the ACPI thermal monitoring, as many machines hardware does not support it.

To do so in /boot/loader.conf add

debug.acpi.disabled="thermal"

If you want to learn more on on how ACPI is being handled on BDSs check out:

freebsd# man acpi

Other alternative method to permanently wipe out ACPI support is by not compiling ACPI support in the kernel.
If that's the case in /usr/obj/usr/src/sys/GENERIC make sure device acpi is commented, e.g.:

##device acpi

 

Share this on

How to disable PC Spaker on Debian and Ubuntu Linux

Sunday, May 13th, 2012

 

How to disable pc-speaker on Linux / PC-Speaker Old Desktop Computer picture

A PC Speaker is helpful as it could be used as a tool for diagnosing system hardware failures (different systems produce different beep sequences depending on the machine BIOS type).
Using the instructions for the respective BIOS vendor and version one could determine the type of problem experienced by a machine based on the sequence and frequency of sounds produced by the SPEAKER.
Lets say a hardware component on a server is down with no need for a monitor or screen to be attached you can say precisely if it is the hard drive, memory or fan malfunctioning…

Generally speaking historically embedded PC Speaker was inseparatable part of the Personal Computers, preceding the soundblasters, now this is changing but for compitability sake many comp equipment vendors still produce machines with pc-speaker in.
Some newer machines (mostly laptops) are factory produced with no PC-SPEAKER component anymore.
For those who don't know what is PC SPEAKER, it is a hardware device capable of emitting very simple short beep sounds at certain system occasions.

Talking about PC-Speaker, it reminds me of the old computer days, where we used pc-speakers to play music in DOS quite frequently.
It was wide practice across my friends and myself to use the pc-speaker to play Axel Folly and other mod files because we couldn't afford to pay 150$ for a sound cards. Playing a song over pc-speaker is quite a nice thing and it will be a nice thing if someone writes a program to be able to play songs on Linux via the pc-speaker for the sake of experiment.

As of time of writting, I don't know of any application capable of playing music files via the pc-speaker if one knows of something like this please, drop me a comment..

As long as it is used for hardware failure diagnosis the speaker is useful, however there are too many occasions where its just creating useless annoying sounds.
For instance whether one uses a GUI terminal or console typing commands and hits multiple times backspace to delete a mistyped command. The result is just irritating beeps, which could be quite disturbing for other people in the room (for example if you use Linux as Desktop in heterogeneous OS office).
When this "unplanned" glitchering beeps are experienced 100+ times a day you really want to break the computer, as well as your collegues are starting to get mad (if not using their headphones) :)

Hence you need sometimes to turn off the pc-speaker to save some nerves.

Here is how this is done on major Linux distros.

On Debian and most other distros, the PC SPEAKER is controlled by a kernel module, so to disable communication with the speaker you have to remove the kernel module.

On Debian and Fedora disabling pcspeaker is done with:

# modprobe -r pcspkr

Then to permanently disable load of the pcspkr module on system boot:

debian:~# echo 'blacklist pcspkr' >> /etc/modprobe.d/blacklist.conf

On Ubuntu to disable load on boot /etc/modprobe.d/blacklist, file should be used:

ubuntu:~# echo 'blacklist pcspkr' >> /etc/modprobe.d/blacklist
Well that's all folks …

Share this on

How to check the IP address of Skype (user / Contacts) on GNU / Linux with netstat and whois

Thursday, May 3rd, 2012

netstat check skype contact IP info with netstat Linux xterm Debian Linux

Before I explain how netstat and whois commands can be used to check information about a remote skype user - e.g. (skype msg is send or receved) in Skype. I will say in a a few words ( abstract level ), how skype P2P protocol is designed.
Many hard core hackers, certainly know how skype operates, so if this is the case just skip the boring few lines of explanation on how skype proto works.

In short skype transfers its message data as most people know in Peer-to-Peer "mode" (P2P)  - p2p is unique with this that it doesn't require a a server to transfer data from one peer to another. Most classical use of p2p networks in the free software realm are the bittorrents.

Skype way of connecting to peer client to other peer client is done via a so called "transport points". To make a P-to-P connection skype wents through a number of middle point destinations. This transport points (peers) are actually other users logged in Skype and the data between point A and point B is transferred via this other logged users in encrypted form. If a skype messages has to be transferred  from Peer A (point A) to Peer B (Point B) or (the other way around), the data flows in a way similar to:

 A -> D -> F -> B

or

B -> F -> D -> A

(where D and F are simply other people running skype on their PCs).
The communication from a person A to person B chat in Skype hence, always passes by at least few other IP addresses which are owned by some skype users who happen to be located in the middle geographically between the real geographic location of A (the skype peer sender) and B (The skype peer receiver)..

The exact way skypes communicate is way more complex, this basics however should be enough to grasp the basic skype proto concept for most ppl …

In order to find the IP address to a certain skype contact – one needs to check all ESTABLISHED connections of type skype protocol with netsat within the kernel network stack (connection) queue.

netstat displays few IPs, when skype proto established connections are grepped:

noah:~# netstat -tupan|grep -i skype | grep -i established| grep -v '0.0.0.0'
tcp 0 0 192.168.2.134:59677 212.72.192.8:58401 ESTABLISHED 3606/skype
tcp 0 0 192.168.2.134:49096 213.199.179.161:40029 ESTABLISHED 3606/skype
tcp 0 0 192.168.2.134:57896 87.120.255.10:57063 ESTABLISHED 3606/skype

Now, as few IPs are displayed, one needs to find out which exactly from the list of the ESTABLISHED IPs is the the Skype Contact from whom are received or to whom are sent the messages in question.

The blue colored IP address:port is the local IP address of my host running the Skype client. The red one is the IP address of the remote skype host (Skype Name) to which messages are transferred (in the the exact time the netstat command was ran.

The easiest way to find exactly which, from all the listed IP is the IP address of the remote person is to send multiple messages in a low time interval (let's say 10 secs / 10 messages to the remote Skype contact).

It is a hard task to write 10 msgs for 10 seconds and run 10 times a netstat in separate terminal (simultaneously). Therefore it is a good practice instead of trying your reflex, to run a tiny loop to delay 1 sec its execution and run the prior netstat cmd.

To do so open a new terminal window and type:

noah:~# for i in $(seq 1 10); do \
sleep 1; echo '-------'; \
netstat -tupan|grep -i skype | grep -i established| grep -v '0.0.0.0'; \
done

-------
tcp 0 0 192.168.2.134:55119 87.126.71.94:26309 ESTABLISHED 3606/skype
-------
tcp 0 0 192.168.2.134:49096 213.199.179.161:40029 ESTABLISHED 3606/skype
tcp 0 0 192.168.2.134:55119 87.126.71.94:26309 ESTABLISHED 3606/skype
-------
tcp 0 0 192.168.2.134:49096 213.199.179.161:40029 ESTABLISHED 3606/skype
tcp 0 0 192.168.2.134:55119 87.126.71.94:26309 ESTABLISHED 3606/skype
...

You see on the first netstat (sequence) exec, there is only 1 IP address to which a skype connection is established, once I sent some new messages to my remote skype friend, another IP immediatelly appeared. This other IP is actually the IP of the person to whom, I'm sending the "probe" skype messages.
Hence, its most likely the skype chat at hand is with a person who has an IP address of the newly appeared 213.199.179.161

Later to get exact information on who owns 213.199.179.161 and administrative contact info as well as address of the ISP or person owning the IP, do a RIPE  whois

noah:~# whois 213.199.179.161
% This is the RIPE Database query service.
% The objects are in RPSL format.
%
% The RIPE Database is subject to Terms and Conditions.
% See http://www.ripe.net/db/support/db-terms-conditions.pdf

% Note: this output has been filtered.
% To receive output for a database update, use the "-B" flag.
% Information related to '87.126.0.0 - 87.126.127.255'
inetnum: 87.126.0.0 - 87.126.127.255
netname: BTC-BROADBAND-NET-2
descr: BTC Broadband Service
country: BG
admin-c: LG700-RIPE
tech-c: LG700-RIPE
tech-c: SS4127-RIPE
status: ASSIGNED PA
mnt-by: BT95-ADM
mnt-domains: BT95-ADM
mnt-lower: BT95-ADM
source: RIPE # Filteredperson: Lyubomir Georgiev
.....

Note that this method of finding out the remote Skype Name IP to whom a skype chat is running is not always precise.

If for instance you tend to chat to many people simultaneously in skype, finding the exact IPs of each of the multiple Skype contacts will be a very hard not to say impossible task.
Often also by using netstat to capture a Skype Name you're in chat with, there might be plenty of "false positive" IPs..
For instance, Skype might show a remote Skype contact IP correct but still this might not be the IP from which the remote skype user is chatting, as the remote skype side might not have a unique assigned internet IP address but might use his NET connection over a NAT or DMZ.

The remote skype user might be hard or impossible to track also if skype client is run over skype tor proxy for the sake of anonymity
Though it can't be taken as granted that the IP address obtained would be 100% correct with the netstat + whois method, in most cases it is enough to give (at least approximate) info on a Country and City origin of the person you're skyping with.
 

Share this on

Tiny PHP script to dump your browser set HTTP headers (useful in debugging)

Friday, March 30th, 2012

While browsing I stumbled upon a nice blog article

Dumping HTTP headers

The arcitle, points at few ways to DUMP the HTTP headers obtained from user browser.
As I'm not proficient with Ruby, Java and AOL Server what catched my attention is a tiny php for loop, which loops through all the HTTP_* browser set variables and prints them out. Here is the PHP script code:

<?php<br />
foreach($_SERVER as $h=>$v)<br />
if(ereg('HTTP_(.+)',$h,$hp))<br />
echo "<li>$h = $v</li>\n";<br />
header('Content-type: text/html');<br />
?>

The script is pretty easy to use, just place it in a directory on a WebServer capable of executing php and save it under a name like:
show_HTTP_headers.php

If you don't want to bother copy pasting above code, you can also download the dump_HTTP_headers.php script here , rename the dump_HTTP_headers.php.txt to dump_HTTP_headers.php and you're ready to go.

Follow to the respective url to exec the script. I've installed the script on my webserver, so if you are curious of the output the script will be returning check your own browser HTTP set values by clicking here.
PHP will produce output like the one in the screenshot you see below, the shot is taken from my Opera browser:

Screenshot show HTTP headers.php script Opera Debian Linux

Another sample of the text output the script produce whilst invoked in my Epiphany GNOME browser is:

HTTP_HOST = www.pc-freak.net
HTTP_USER_AGENT = Mozilla/5.0 (X11; U; Linux x86_64; en-us) AppleWebKit/531.2+ (KHTML, like Gecko) Version/5.0 Safari/531.2+ Debian/squeeze (2.30.6-1) Epiphany/2.30.6
HTTP_ACCEPT = application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
HTTP_ACCEPT_ENCODING = gzip
HTTP_ACCEPT_LANGUAGE = en-us, en;q=0.90
HTTP_COOKIE = __qca=P0-2141911651-1294433424320;
__utma_a2a=8614995036.1305562814.1274005888.1319809825.1320152237.2021;wooMeta=MzMxJjMyOCY1NTcmODU1MDMmMTMwODQyNDA1MDUyNCYxMzI4MjcwNjk0ODc0JiYxMDAmJjImJiYm; 3ec0a0ded7adebfeauth=22770a75911b9fb92360ec8b9cf586c9;
__unam=56cea60-12ed86f16c4-3ee02a99-3019;
__utma=238407297.1677217909.1260789806.1333014220.1333023753.1606;
__utmb=238407297.1.10.1333023754; __utmc=238407297;
__utmz=238407297.1332444980.1586.413.utmcsr=pc-freak.net|utmccn=(referral)|utmcmd=referral|utmcct=/blog/

You see the script returns, plenty of useful information for debugging purposes:
HTTP_HOST – Virtual Host Webserver name
HTTP_USER_AGENT – The browser exact type useragent returnedHTTP_ACCEPT – the type of MIME applications accepted by the WebServerHTTP_ACCEPT_LANGUAGE – The language types the browser has support for
HTTP_ACCEPT_ENCODING – This PHP variable is usually set to gzip or deflate by the browser if the browser has support for webserver returned content gzipping.
If HTTP_ACCEPT_ENCODING is there, then this means remote webserver is configured to return its HTML and static files in gzipped form.
HTTP_COOKIE – Information about browser cookies, this info can be used for XSS attacks etc. :)
HTTP_COOKIE also contains the referrar which in the above case is:
__utmz=238407297.1332444980.1586.413.utmcsr=pc-freak.net|utmccn=(referral)
The Cookie information HTTP var also contains information of the exact link referrar:
|utmcmd=referral|utmcct=/blog/

For the sake of comparison show_HTTP_headers.php script output from elinks text browser is like so:

* HTTP_HOST = www.pc-freak.net
* HTTP_USER_AGENT = Links (2.3pre1; Linux 2.6.32-5-amd64 x86_64; 143x42)
* HTTP_ACCEPT = */*
* HTTP_ACCEPT_ENCODING = gzip,deflate * HTTP_ACCEPT_CHARSET = us-ascii, ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-10, ISO-8859-13, ISO-8859-14, ISO-8859-15, ISO-8859-16, windows-1250, windows-1251, windows-1252, windows-1256,
windows-1257, cp437, cp737, cp850, cp852, cp866, x-cp866-u, x-mac, x-mac-ce, x-kam-cs, koi8-r, koi8-u, koi8-ru, TCVN-5712, VISCII,utf-8 * HTTP_ACCEPT_LANGUAGE = en,*;q=0.1
* HTTP_CONNECTION = keep-alive
One good reason, why it is good to give this script a run is cause it can help you reveal problems with HTTP headers impoperly set cookies, language encoding problems, security holes etc. Also the script is a good example, for starters in learning PHP programming.

 

Share this on

Resolving “nf_conntrack: table full, dropping packet.” flood message in dmesg Linux kernel log

Wednesday, March 28th, 2012

On many busy servers, you might encounter in /var/log/syslog or dmesg kernel log messages like

nf_conntrack: table full, dropping packet

to appear repeatingly:

[1737157.057528] nf_conntrack: table full, dropping packet.
[1737157.160357] nf_conntrack: table full, dropping packet.
[1737157.260534] nf_conntrack: table full, dropping packet.
[1737157.361837] nf_conntrack: table full, dropping packet.
[1737157.462305] nf_conntrack: table full, dropping packet.
[1737157.564270] nf_conntrack: table full, dropping packet.
[1737157.666836] nf_conntrack: table full, dropping packet.
[1737157.767348] nf_conntrack: table full, dropping packet.
[1737157.868338] nf_conntrack: table full, dropping packet.
[1737157.969828] nf_conntrack: table full, dropping packet.
[1737157.969928] nf_conntrack: table full, dropping packet
[1737157.989828] nf_conntrack: table full, dropping packet
[1737162.214084] __ratelimit: 83 callbacks suppressed

There are two type of servers, I've encountered this message on:

1. Xen OpenVZ / VPS (Virtual Private Servers)
2. ISPs – Internet Providers with heavy traffic NAT network routers
 

I. What is the meaning of nf_conntrack: table full dropping packet error message

In short, this message is received because the nf_conntrack kernel maximum number assigned value gets reached.
The common reason for that is a heavy traffic passing by the server or very often a DoS or DDoS (Distributed Denial of Service) attack. Sometimes encountering the err is a result of a bad server planning (incorrect data about expected traffic load by a company/companeis) or simply a sys admin error…

- Checking the current maximum nf_conntrack value assigned on host:

linux:~# cat /proc/sys/net/ipv4/netfilter/ip_conntrack_max
65536

- Alternative way to check the current kernel values for nf_conntrack is through:

linux:~# /sbin/sysctl -a|grep -i nf_conntrack_max
error: permission denied on key 'net.ipv4.route.flush'
net.netfilter.nf_conntrack_max = 65536
error: permission denied on key 'net.ipv6.route.flush'
net.nf_conntrack_max = 65536

- Check the current sysctl nf_conntrack active connections

To check present connection tracking opened on a system:

:

linux:~# /sbin/sysctl net.netfilter.nf_conntrack_count
net.netfilter.nf_conntrack_count = 12742

The shown connections are assigned dynamicly on each new succesful TCP / IP NAT-ted connection. Btw, on a systems that work normally without the dmesg log being flooded with the message, the output of lsmod is:

linux:~# /sbin/lsmod | egrep 'ip_tables|conntrack'
ip_tables 9899 1 iptable_filter
x_tables 14175 1 ip_tables

On servers which are encountering nf_conntrack: table full, dropping packet error, you can see, when issuing lsmod, extra modules related to nf_conntrack are shown as loaded:

linux:~# /sbin/lsmod | egrep 'ip_tables|conntrack'
nf_conntrack_ipv4 10346 3 iptable_nat,nf_nat
nf_conntrack 60975 4 ipt_MASQUERADE,iptable_nat,nf_nat,nf_conntrack_ipv4
nf_defrag_ipv4 1073 1 nf_conntrack_ipv4
ip_tables 9899 2 iptable_nat,iptable_filter
x_tables 14175 3 ipt_MASQUERADE,iptable_nat,ip_tables

 

II. Remove completely nf_conntrack support if it is not really necessery

It is a good practice to limit or try to omit completely use of any iptables NAT rules to prevent yourself from ending with flooding your kernel log with the messages and respectively stop your system from dropping connections.

Another option is to completely remove any modules related to nf_conntrack, iptables_nat and nf_nat.
To remove nf_conntrack support from the Linux kernel, if for instance the system is not used for Network Address Translation use:

/sbin/rmmod iptable_nat
/sbin/rmmod ipt_MASQUERADE
/sbin/rmmod rmmod nf_nat
/sbin/rmmod rmmod nf_conntrack_ipv4
/sbin/rmmod nf_conntrack
/sbin/rmmod nf_defrag_ipv4

Once the modules are removed, be sure to not use iptables -t nat .. rules. Even attempt to list, if there are any NAT related rules with iptables -t nat -L -n will force the kernel to load the nf_conntrack modules again.

Btw nf_conntrack: table full, dropping packet. message is observable across all GNU / Linux distributions, so this is not some kind of local distribution bug or Linux kernel (distro) customization.
 

III. Fixing the nf_conntrack … dropping packets error

- One temporary, fix if you need to keep your iptables NAT rules is:

linux:~# sysctl -w net.netfilter.nf_conntrack_max=131072

I say temporary, because raising the nf_conntrack_max doesn't guarantee, things will get smoothly from now on.
However on many not so heavily traffic loaded servers just raising the net.netfilter.nf_conntrack_max=131072 to a high enough value will be enough to resolve the hassle.

- Increasing the size of nf_conntrack hash-table

The Hash table hashsize value, which stores lists of conntrack-entries should be increased propertionally, whenever net.netfilter.nf_conntrack_max is raised.

linux:~# echo 32768 > /sys/module/nf_conntrack/parameters/hashsize
The rule to calculate the right value to set is:
hashsize = nf_conntrack_max / 4

- To permanently store the made changes ;a) put into /etc/sysctl.conf:

linux:~# echo 'net.netfilter.nf_conntrack_count = 131072' >> /etc/sysctl.conf
linux:~# /sbin/sysct -p

b) put in /etc/rc.local (before the exit 0 line):

echo 32768 > /sys/module/nf_conntrack/parameters/hashsize

Note: Be careful with this variable, according to my experience raising it to too high value (especially on XEN patched kernels) could freeze the system.
Also raising the value to a too high number can freeze a regular Linux server running on old hardware.

- For the diagnosis of nf_conntrack stuff there is ;

/proc/sys/net/netfilter kernel memory stored directory. There you can find some values dynamically stored which gives info concerning nf_conntrack operations in "real time":

linux:~# cd /proc/sys/net/netfilter linux:/proc/sys/net/netfilter# ls -al nf_log/
total 0
dr-xr-xr-x 0 root root 0 Mar 23 23:02 ./
dr-xr-xr-x 0 root root 0 Mar 23 23:02 ../
-rw-r--r-- 1 root root 0 Mar 23 23:02 0
-rw-r--r-- 1 root root 0 Mar 23 23:02 1
-rw-r--r-- 1 root root 0 Mar 23 23:02 10
-rw-r--r-- 1 root root 0 Mar 23 23:02 11
-rw-r--r-- 1 root root 0 Mar 23 23:02 12
-rw-r--r-- 1 root root 0 Mar 23 23:02 2
-rw-r--r-- 1 root root 0 Mar 23 23:02 3
-rw-r--r-- 1 root root 0 Mar 23 23:02 4
-rw-r--r-- 1 root root 0 Mar 23 23:02 5
-rw-r--r-- 1 root root 0 Mar 23 23:02 6
-rw-r--r-- 1 root root 0 Mar 23 23:02 7
-rw-r--r-- 1 root root 0 Mar 23 23:02 8
-rw-r--r-- 1 root root 0 Mar 23 23:02 9

 

IV. Decreasing other nf_conntrack NAT time-out values to prevent server against DoS attacks

Generally, the default value for nf_conntrack_* time-outs are (unnecessery) large.
Therefore, for large flows of traffic even if you increase nf_conntrack_max, still shorty you can get a nf_conntrack overflow table resulting in dropping server connections. To make this not happen, check and decrease the other nf_conntrack timeout connection tracking values:

linux:~# sysctl -a | grep conntrack | grep timeout
net.netfilter.nf_conntrack_generic_timeout = 600
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 120
net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 60
net.netfilter.nf_conntrack_tcp_timeout_established = 432000
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 60
net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_close = 10
net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300
net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300
net.netfilter.nf_conntrack_udp_timeout = 30
net.netfilter.nf_conntrack_udp_timeout_stream = 180
net.netfilter.nf_conntrack_icmp_timeout = 30
net.netfilter.nf_conntrack_events_retry_timeout = 15
net.ipv4.netfilter.ip_conntrack_generic_timeout = 600
net.ipv4.netfilter.ip_conntrack_tcp_timeout_syn_sent = 120
net.ipv4.netfilter.ip_conntrack_tcp_timeout_syn_sent2 = 120
net.ipv4.netfilter.ip_conntrack_tcp_timeout_syn_recv = 60
net.ipv4.netfilter.ip_conntrack_tcp_timeout_established = 432000
net.ipv4.netfilter.ip_conntrack_tcp_timeout_fin_wait = 120
net.ipv4.netfilter.ip_conntrack_tcp_timeout_close_wait = 60
net.ipv4.netfilter.ip_conntrack_tcp_timeout_last_ack = 30
net.ipv4.netfilter.ip_conntrack_tcp_timeout_time_wait = 120
net.ipv4.netfilter.ip_conntrack_tcp_timeout_close = 10
net.ipv4.netfilter.ip_conntrack_tcp_timeout_max_retrans = 300
net.ipv4.netfilter.ip_conntrack_udp_timeout = 30
net.ipv4.netfilter.ip_conntrack_udp_timeout_stream = 180
net.ipv4.netfilter.ip_conntrack_icmp_timeout = 30

All the timeouts are in seconds. net.netfilter.nf_conntrack_generic_timeout as you see is quite high – 600 secs = (10 minutes).
This kind of value means any NAT-ted connection not responding can stay hanging for 10 minutes!

The value net.netfilter.nf_conntrack_tcp_timeout_established = 432000 is quite high too (5 days!)
If this values, are not lowered the server will be an easy target for anyone who would like to flood it with excessive connections, once this happens the server will quick reach even the raised up value for net.nf_conntrack_max and the initial connection dropping will re-occur again …

With all said, to prevent the server from malicious users, situated behind the NAT plaguing you with Denial of Service attacks:

Lower net.ipv4.netfilter.ip_conntrack_generic_timeout to 60 – 120 seconds and net.ipv4.netfilter.ip_conntrack_tcp_timeout_established to stmh. like 54000

linux:~# sysctl -w net.ipv4.netfilter.ip_conntrack_generic_timeout = 120
linux:~# sysctl -w net.ipv4.netfilter.ip_conntrack_tcp_timeout_established = 54000

This timeout should work fine on the router without creating interruptions for regular NAT users. After changing the values and monitoring for at least few days make the changes permanent by adding them to /etc/sysctl.conf

linux:~# echo 'net.ipv4.netfilter.ip_conntrack_generic_timeout = 120' >> /etc/sysctl.conf
linux:~# echo 'net.ipv4.netfilter.ip_conntrack_tcp_timeout_established = 54000' >> /etc/sysctl.conf

 

Share this on

How to digital watermark to a picture – Protect pic with copyright image or text with composite, convert and Phatch on GNU / Linux

Friday, March 23rd, 2012

Watermarking is a technique to identify a physical or non-physical object with its owner (creator). First watermarks in history originates from very ancient times.

There are two basic types of Watermark types:

I. Physical Watermarks (classical)

II. Digital Watermarks

Historically Classical Watermarks, were mostly important. As we tend to use more and more visible and we switch to use of more invisible stuff, nowdays the importance and use of digital watermarks is steadily raising.

You have most likely already seen pictures from websites which contain a copyright holder message stamp or website logo on it.
As you could imagine the picture watermark is placed in order to prevent pictures from being re-used in another internet space ,without the picture copyright holder explicit permission…

Watermarks have entered most if not all areas of our life, but we often don't recognize they're there / rarely think about them.
Few of the many "physical watermarks" we use daily are:
 

  • paper money watermark (to protect against anti money forgery)
  • bank debit / credit cards stamps near the card chip
  • postcards paper stamps

There are too many different kind of "physical watermarks" and since this is not the accent of this article, I will continue straight to explaiin a bit on Digital (picture) watermarks and how to watermark images with ImageMagick image editting command line suit.

Just like with physical watermarks, there are different kinds of digital watermarks. There are:
 

  • Picture (Images) digital watermarks
  • - Steganography

  • Video watermarks
  • Audio stream digital watermarks
  • Visual digital watermarks
  • - Visible area of text or picture over another text picture or video

  • Invisible digital watermarks
  • - digital information (files) metadata with watermark content etc.

The topic of watermarking is quite wide, so I will stop here and focus on the main idea of this article – to show how to place digital watermark on graphic image or collection of pictures.

The most straightway non-interactive way to do picture watermarking is with ImageMagick's composite command line tool. This little handy tool is capable of creating watermarks in single and multiple pictures.

If you prefer to have a simple text as a watermark, then you should use imagick's convert cmd.

1. Putting a watermark of picture in the right bottom corner

$ composite -gravity southeast -dissolve 100 \
watermark_picture.png image-to-watermark.png \
output-watermarked-image.png

Snoopy Writting pc freak watermark picture text watermark on the right bottom corner with composite

2. Placing watermark to picture in the bottom right corner

$ composite -gravity northeast -dissolve 80 \
watermark_picture.png image-to-watermark.png \
output-watermarked-image.png

Snoopy Writting pc freak picture text watermark on the right bottom corner with composite

3. Watermarking picture in the bottom left corner

$ composite -gravity southwest -dissolve 90 \
watermark_picture.png image-to-watermark.png \
output-watermarked-image.png

Snoopy writting watermarking picture on the bottom left corner imagemagick (composite)

4. Watermarking picture in the top left corner

$ composite -gravity northwest -dissolve 100 \
watermark_picture.png image-to-watermark.jpg \
output-watermarked-image.jpg

As you see from above example, composite even accept mixing up input / output between PNG and JPEG pictures :)

Output Watermarked Image picture on top left corner with pc-freak logo image Imagick composite

5. Put a watermark in the image center

$ composite -gravity center -dissolve 100 \
watermark_picture.png image-to-watermark.png \
output-watermarked-image.png

position watermark on the picture middle (center) composite output picture

6. Sealing image with custom text / Text Watermarking a picture

a) Writting text watermark to an image centered in "footer"

$ convert image-to-watermark.png -pointsize 20 \
-draw "gravity south fill black text 0,12 \
'hip0s Watermark'" output-watermarked-image.jpg

This will place a watermark in position 0,12, meaning the text will be added in the bottom center of the watermarked image.

Watermarking a picture sealing with custom text image imagick composite pic

-pointsize 20 defines the text font size. hip0s Watermark is the actual text that will be stamped.

b) Writting image watermark with font type customization (Arial Tahoma etc.):

To list all available fonts ready to be used by convert, type:

$ convert -list font
$ convert -list font |grep -i arial
Font: Arial-Black-Regular
family: Arial Black
glyphs: /usr/share/fonts/truetype/msttcorefonts/Arial_Black.ttf
Font: Arial-Bold
family: Arial
glyphs: /usr/share/fonts/truetype/msttcorefonts/Arial_Bold.ttf
Font: Arial-Bold-Italic
family: Arial
glyphs: /usr/share/fonts/truetype/msttcorefonts/arialbi.ttf
Font: Arial-Italic
family: Arial
glyphs: /usr/share/fonts/truetype/msttcorefonts/ariali.ttf
Font: Arial-Regular
family: Arial
glyphs: /usr/share/fonts/truetype/msttcorefonts/Arial.ttf

$ convert -list type
Bilevel
ColorSeparation
ColorSeparationMatte
Grayscale
GrayscaleMatte
Optimize
Palette
PaletteBilevelMatte
PaletteMatte
TrueColorMatte
TrueColor

On my system, I have 392 of fonts installed, to check the number of installed fonts ready for use by convert I used:

$ convert -list font|grep -i 'font:'|wc -l
392

To only check exact fonts names usable in convert:

$ convert -list font|grep -i 'font:'

To use the red marked Arial-Regular for font of the text picture timestamp issue;

$ convert watermark_picture.jpg -font Arial-Regular \
-pointsize 20 -draw "gravity south fill black text 0,12 'hip0s Watermark'" \
output-watermarked-image.jpg

Watermark and with Arial-Regular font image magick convert screenshot dog type writter

c) Using external font with convert to place image text watermark

Lets say you would like to use an external font (arhangai.ttf) not listed in convert -font list usable fonts:

$ convert image-to-watermark.png -pointsize 20 \
-font /usr/share/fonts/truetype/arhangai/arhangai.ttf \
-draw "gravity south fill black text 0,12 \
'hip0s Watermark'" output-watermarked-image_7.png

Talking about fonts, if you would like to add some external, nice free-fonts (ttf) files to your current logged in user, exec:

hipo@noah:~$ cd ~/fonts
hipo@noah:/fonts$ for i in \
$(lynx -dump http://www.webpagepublicity.com/free-fonts.html|grep -i .ttf|grep -i http|awk '{ print $2 }'); \
do wget -r -l2 -nd -Nc $i;
done

This will add 85 new nice looking fonts. Putting fonts in .fonts/ directory, are red while fonts are looked up by applications installed on respective the server or desktop GNU / Linux systems. Any font put there is ready to be used across all ImageMagick command line tools, as well as will be added across the list of possible fonts to use in GIMP and the rest of gui editors installed on the system.

According to the (watermark) texts font size passed to convert on some pictures the text written will exceed the picture dimensions and only partially some of the text intended as watermark will be visible.
If you encounter the exceed picture text problem, take few minutes and play with fonts sizes until you have a good font size to fit the approximate dimensions of the (expected minimum / maximum – horizontal and vertical) stamped picture dimensions.

For the sake of clarity, here is a list with arguments used in above, composite and convert examples.
 

  • composite — The ImageMagick command that combines two images.
  • -dissolve 80 – The number after the option determines the brightness of the watermark.  100 is full strength.
  • -gravity southeast — Determines the placement of watermark.
    Possible options are; north, west, south, east, northwest, northeast, southeast, southwest, center
  • watermark_picture.png — The watermark image is the first argument.
  • image-to-watermark.jpg — The second argument is the original image to be watermarked.
  • output-watermarked-image.jpg — The third argument is the new composite image to be created.
    N. B. !  If you don't specify a new file, be careful, the original file will be overwritten.

As ImageMagick is cross platform graphic editting suit – it runs on both *nix (Linux,BSD) and Windows. I have tested it on Linux, only but on FreeBSD and other BSDs it should work without any problem.
The composite and convert above examples should be easily rewritten to run on achieve watarmarking on MS Windows too.

7. Watermarking multiple pictures in a directory

To watermark multiple pictures within a directory use, a short bash loop in combination with either convert or composite could be used:

$ cd your-directory/
$ for i in *; do
convert $i -pointsize 20 -draw "gravity south fill black text 0,12 'hip0s Watermark'" output-watermarked-image.jpg
done

convert and composite also support wildcards like '*.JPG, *.PNG', but I'm not sure if this syntax can be used for mass picture marking?

8. Adding watermark and doing other various advanced Image Edit, Convert and Compose stuff with Phatch GUI program

Another program that is capable to put watermarks on pictures and besides that doing a number of routine graphic manipulation operations achievable with expert Image manipulation programs like GIMP / Inkscape is PHATCH = PHOTO & BATCH

Phatch is swiss army knife for doing web design or or graphics design on Linux.

Phatch is really great and easy to use program. Tt makes putting basic designer effects on pictures with no requirement for any design skills.
With Phatch you can become a designer for a day literally ;)

If you haven't used it yet, make sure you try it!
Below, are two screenshots of Phatch running on my Debian G* / Linux

Phatch Linux Debian Squeeze Screenshot

Phatch Linux Debian Squeeze Screenshot Watermark effect

Phatch is installable via apt on Debian and Ubuntu Linux. It has also a phatch-cli tools, which are a possible substitute to ImageMagick's composite / convert tools.

On deb based distros install Phatch with:

noah:~# apt-get --yes install phatch phatch-cli

In Phatch it is also possible, to create a combination of filters to be later applied to an image file or a group of image files all in a directory. The program capabilities are really outstanding, it is pure joy to work with it.

Using Phatch GUI interface is hard to comprehend in the beginning, I needed few minutes until I can get the idea how to use it. Anyhow once you know the basics, its very easy to use onwards.

Phatch currently can perform the following actions:
 

  • Auto Contrast – Maximize image contrast
  • Border – Crop or add border to all sides
  • Brightness – Adjust brightness from black to white
  • Canvas – Crop the image or enlarge canvas without resizing the image
  • Colorize – Colorize grayscale image
  • Common – Copies the most common pixel value
  • Contrast – Adjust from grey to black & white
  • Convert Mode – Convert the color mode of an image (grayscale, RGB, RGBA or CMYK)
  • Effect – Blur, Sharpen, Emboss, Smooth, ..
  • Equalize – Equalize the image histogram
  • Fit – Downsize and crop image with fixed ratio
  • Grayscale – Fade all colours to gray
  • Invert – Invert the colors of the image (negative)
  • Maximum – Copies the maximum pixel value
  • Median – Copies the median pixel value
  • Minimum – Copies the minimum pixel value
  • Offset – Offset by distance and wrap around
  • Posterize – Reduce the number of bits of colour channel
  • Rank – Copies the rank'th pixel value
  • Rotate – Rotate with random angle
  • Round – Round or crossed corners with variable radius and corners
  • Saturation – Adjust saturation from grayscale to high
  • Save – Save an image with variable compression in different types
  • Scale – Scale an image with different resample filters.
  • Shadow – Drop a blurred shadow under a photo with variable position, blur and color
  • Solarize – Invert all pixel values above threshold
  • Text – Write text at a given position
  • Transpose – Flip or rotate an image by 90 degrees
  • Watermark – Apply a watermark image with variable placement (offset, scaling, tiling) and opacity

Most of the function / effects Phatch in the up list works fine as I tested them to get to know the program.
The only effect that didn't worked for me is Blender effect.
Trying to apply the Blending effect I got error:

Can not apply action Blender:
'dict' object has no attribute 'rfind'

dict object has not attribiture rfind error screenshot my linux

Its really a pity blender filter don't work. I've seen on Phatch's website some pictures showing the blender effect in action and it looks really awesome.

In attempt to work around the err, I tried downloading Phatch's latest release and running it with python interpreter but it didn't work out …
I tried also to install some packages to the system that somehow seemed to be related to blenderversatile 3D modeller/renderer program but this worked neither.
I suspect Phatch blender effect is not working on Ubuntu too as I've red complains in some Ubuntu forums.
If someone succeeding making blend effect work please let me know how?

Interesting feature of Phatch is the program support for applying its predefined filters using a cli interface.
The syntax for phatch cli, should be something like:

phatch -console action_list.phatch

Where action_list.phatch is a Phatch predefined filter. Anyways I didn't manage to figure out how to use the program CLI. I'll be glad to hear if someone succeeded in using the program console, if so please share with me how?

9. Adding Watermark to pictures with GIMP

To add a watermark text or picture in GIMP, there are plenty of ways but is more time consuming by both Phatch or convert, composite..
There is a script in gimp plugin registry site – watermark.scm which adds watermarking capability to GIMP

On my system this script was installed with the deb package gimp-data-extras. To apply the plugin on a pic, I used GIMP menus:

Filters -> Eg -> Copyright Placer

GIMP Copyright Holder plugin Watermark Screenshot

If someone knows about better or quicker ways to do watermarking, please share :)

Share this on

How to delete million of files on busy Linux servers (Work out Argument list too long)

Tuesday, March 20th, 2012

How to Delete million or many thousands of files in the same directory on GNU / Linux and FreeBSD

If you try to delete more than 131072 of files on Linux with rm -f *, where the files are all stored in the same directory, you will get an error:

/bin/rm: Argument list too long.

I've earlier blogged on deleting multiple files on Linux and FreeBSD and this is not my first time facing this error.
Anyways, as time passed, I've found few other new ways to delete large multitudes of files from a server.

In this article, I will explain shortly few approaches to delete few million of obsolete files to clean some space on your server.
Here are 3 methods to use to clean your tons of junk files.

1. Using Linux find command to wipe out millions of files

a.) Finding and deleting files using find's -exec switch:

# find . -type f -exec rm -fv {} \;

This method works fine but it has 1 downside, file deletion is too slow as for each found file external rm command is invoked.

For half a million of files or more, using this method will take "long". However from a server hard disk stressing point of view it is not so bad as, the files deletion is not putting too much strain on the server hard disk.
b.) Finding and deleting big number of files with find's -delete argument:

Luckily, there is a better way to delete the files, by using find's command embedded -delete argument:

# find . -type f -print -delete

c.) Deleting and printing out deleted files with find's -print arg

If you would like to output on your terminal, what files find is deleting in "real time" add -print:

# find . -type f -print -delete

To prevent your server hard disk from being stressed and hence save your self from server normal operation "outages", it is good to combine find command with ionice, e.g.:

# ionice -c 3 find . -type f -print -delete

Just note, that ionice cannot guarantee find's opeartions will not affect severely hard disk i/o requests. On  heavily busy servers with high amounts of disk i/o writes still applying the ionice will not prevent the server from being hanged! Be sure to always keep an eye on the server, while deleting the files nomatter with or without ionice. if throughout find execution, the server gets lagged in serving its ordinary client requests or whatever, stop the execution of the cmd immediately by killing it from another ssh session or tty (if physically on the server).

2. Using a simple bash loop with rm command to delete "tons" of files

An alternative way is to use a bash loop, to print each of the files in the directory and issue /bin/rm on each of the loop elements (files) like so:

for i in *; do
rm -f $i;
done

If you'd like to print what you will be deleting add an echo to the loop:

# for i in $(echo *); do \
echo "Deleting : $i"; rm -f $i; \

The bash loop, worked like a charm in my case so I really warmly recommend this method, whenever you need to delete more than 500 000+ files in a directory.

3. Deleting multiple files with perl

Deleting multiple files with perl is not a bad idea at all.
Here is a perl one liner, to delete all files contained within a directory:

# perl -e 'for(<*>){((stat)[9]<(unlink))}'

If you prefer to use more human readable perl script to delete a multitide of files use delete_multple_files_in_dir_perl.pl

Using perl interpreter to delete thousand of files is quick, really, really quick.
I did not benchmark it on the server, how quick exactly is it, but I guess the delete rate should be similar to find command. Its possible even in some cases the perl loop is  quicker …

4. Using PHP script to delete a multiple files

Using a short php script to delete files file by file in a loop similar to above bash script is another option.
To do deletion  with PHP, use this little PHP script:

<?php
$dir = "/path/to/dir/with/files";
$dh = opendir( $dir);
$i = 0;
while (($file = readdir($dh)) !== false) {
$file = "$dir/$file";
if (is_file( $file)) {
unlink( $file);
if (!(++$i % 1000)) {
echo "$i files removed\n";
}
}
}
?>

As you see the script reads the $dir defined directory and loops through it, opening file by file and doing a delete for each of its loop elements.
You should already know PHP is slow, so this method is only useful if you have to delete many thousands of files on a shared hosting server with no (ssh) shell access.

This php script is taken from Steve Kamerman's blog . I would like also to express my big gratitude to Steve for writting such a wonderful post. His post actually become  inspiration for this article to become reality.

You can also download the php delete million of files script sample here

To use it rename delete_millioon_of_files_in_a_dir.php.txt to delete_millioon_of_files_in_a_dir.php and run it through a browser .

Note that you might need to run it multiple times, cause many shared hosting servers are configured to exit a php script which keeps running for too long.
Alternatively the script can be run through shell with PHP cli:

php -l delete_millioon_of_files_in_a_dir.php.txt.

5. So What is the "best" way to delete million of files on Linux?

In order to find out which method is quicker in terms of execution time I did a home brew benchmarking on my thinkpad notebook.

a) Creating 509072 of sample files.

Again, I used bash loop to create many thousands of files in order to benchmark.
I didn't wanted to put this load on a productive server and hence I used my own notebook to conduct the benchmarks. As my notebook is not a server the benchmarks might be partially incorrect, however I believe still .they're pretty good indicator on which deletion method would be better.

hipo@noah:~$ mkdir /tmp/test
hipo@noah:~$ cd /tmp/test;
hiponoah:/tmp/test$ for i in $(seq 1 509072); do echo aaaa >> $i.txt; done

I had to wait few minutes until I have at hand 509072  of files created. Each of the files as you can read is containing the sample "aaaa" string.

b) Calculating the number of files in the directory

Once the command was completed to make sure all the 509072 were existing, I used a find + wc cmd to calculate the directory contained number of files:

hipo@noah:/tmp/test$ find . -maxdepth 1 -type f |wc -l
509072

real 0m1.886s
user 0m0.440s
sys 0m1.332s

Its intesrsting, using an ls command to calculate the files is less efficient than using find:

hipo@noah:/tmp/test$ time ls -1 |wc -l
509072

real 0m3.355s
user 0m2.696s
sys 0m0.528s

c) benchmarking the different file deleting methods with time

- Testing delete speed of find

hipo@noah:/tmp/test$ time find . -maxdepth 1 -type f -delete
real 15m40.853s
user 0m0.908s
sys 0m22.357s

You see, using find to delete the files is not either too slow nor light quick.

- How fast is perl loop in multitude file deletion ?

hipo@noah:/tmp/test$ time perl -e 'for(<*>){((stat)[9]<(unlink))}'real 6m24.669suser 0m2.980ssys 0m22.673s

Deleting my sample 509072 took 6 mins and 24 secs. This is about 3 times faster than find! GO-GO perl :)
As you can see from the results, perl is a great and time saving, way to delete 500 000 files.

- The approximate speed deletion rate of of for + rm bash loop

hipo@noah:/tmp/test$ time for i in *; do rm -f $i; done

real 206m15.081s
user 2m38.954s
sys 195m38.182s

You see the execution took 195m en 38 secs = 3 HOURS and 43 MINUTES!!!! This is extremely slow ! But works like a charm as the running of deletion didn't impacted my normal laptop browsing. While the script was running I was mostly browsing through few not so heavy (non flash) websites and doing some other stuff in gnome-terminal) :)

As you can imagine running a bash loop is a bit CPU intensive, but puts less stress on the hard disk read/write operations. Therefore its clear using it is always a good practice when deletion of many files on a dedi servers is required.

b) my production server file deleting experience

On a production server I only tested two of all the listed methods to delete my files. The production server, where I tested is running Debian GNU / Linux Squeeze 6.0.3. There I had a task to delete few million of files.
The tested methods tried on the server were:

- The find . type -f -delete method.

- for i in *; do rm -f $i; done

The results from using find -delete method was quite sad, as the server almost hanged under the heavy hard disk load the command produced.

With the for script all went smoothly. The files were deleted for a long long time (like few hours), but while it was running, the server continued with no interruptions..

While the bash loop was running, the server load avarage kept at steady 4
Taking my experience in mind, If you're running a production, server and you're still wondering which delete method to use to wipe some multitude of files, I would recommend you go  the bash for loop + /bin/rm way. Yes, it is extremely slow, expect it run for some half an hour or so but puts not too much extra load on the server..

Using the PHP script will probably be slow and inefficient, if compared to both find and the a bash loop.. I didn't give it a try yet, but suppose it will be either equal in time or at least few times slower than bash.

If you have tried the php script and you have some observations, please drop some comment to tell me how it performs.

To sum it up;

Even though there are "hacks" to clean up some messy parsing directory full of few million of junk files, having such a directory should never exist on the first place.

Frankly, keeping millions of files within the same directory is very stupid idea.
Doing so will have a severe negative impact on a directory listing performance of your filesystem in the long term.

If you know better (more efficient) ways to delete a multitude of files in a dir please share in comments.

Share this on