Posts Tagged ‘multiple files’

10 must know and extremely useful Linux commands that every sys admin should know

Tuesday, July 30th, 2013

10 must know extremely useful gnu linux command line tools tips and tricks
There are plenty of precious command line stuff every admin should be aware on Linux. In this article I just decided to place some I use often and are interesting to know. Below commands are nothing special and probably many of experienced sys admins already know them. However I'm pretty sure novice admins and start-up Linux enthusiasts will find it useful. I know there much more to be said on the topic. So anyone is mostly welcome to share his used cmds.
 
1. Delete all files in directory except files with certain file extension

It is good trick to delete all files in directory except certain file formats, to do so:

root@linux:~# rm !(*.c|*.py|*.txt|*.mp3)

2. Write command output to multiple files (tee)

The normal way to write to file is by using redirect (to overwrite file) ">" or (to append to file) ">>";. However when you need to write output to multiple files there is a command called tee, i.e.:

root@linux:~# ps axuwwf | tee file1 file2 file3

3. Search for text in plain text file printing number of lines after match

Whether you need to print all number of lines after match of "search_text" use:

root@linux:~# grep -A 5 -i "search_text" text_file.txt

4. Show all files where text string is matched with GREP (Search for text recursively)

Searching for text match is extremely helpful for system administration. I use  grep recursive (capability) almost on daily basis:

root@websrv:/etc/dovecot# grep -rli text *
conf.d/10-auth.conf
conf.d/10-mail.conf
dovecot.conf

-l (instructs to only print file names matching string), -r (stands for recursive search), and -i flag (instructs grep to print all matches  inogoring case-sensitivity ( look for text nomatter if with capital or small letters)

5. Finding files and running command on each file type matched

In Linux with find command it is possible to search for files and run command on each file matched.
Lets say you we want to look in current directory for all files .swp (temporary) files produced so often by VIM and wipe them out:

root@linux:~# find . -iname '*.swp*' -exec rm -f {} \;

6. Convert DOS end of file (EOF) to UNIX with sed

If it happens you not have dos2unix command installed on Linux shell and you need to translate DOS end of file (\r\n – return carriage, new line) to UNIX's (\r – return carriage)), do it with sed:

root@linux:~# sed 's/.$//' filename

7. Remove file duplicate lines with awk:

cat test.txt
test
test
test duplicate
The brown fox jump over ...
Richard Stallman rox

root@linux:~# awk '!($0 in array) { array[$0]; print }' test.txt
test
test duplicate
The brown fox jump over ...
Richard Stallman rox

To remove duplicate text from all files in directory same can be easily scripped with bash for loop:

root@linux:~# for i in *; do
awk '!($0 in array) { array[$0]; print }' $i;
done

8. Print only selected columns from text file

To print text only in 1st and 7th column in plain text file with awk:

root@linux:~# awk '{print $1,$6;}' filename.txt ...

To print only all existing users on Linux with their respective set shell type:

root@linux:~# cat /etc/passwd|sed -e 's#:# #g'|awk '{print $1,$6;}'

9. Open file with VIM text editor starting from line

I use only vim for console text processing, and I often had to edit and fix file which fail to compile on certain line number. Thus use vim to open file for writing from necessary line num. To open file and set cursor to line 35 root@linux:~# vim +35 /home/hipo/current.c

10. Run last command with "!!" bash shorcut

Lets say last command you run is uname -a:

root@websrv:/home/student# uname -a
Linux websrv 3.2.0-4-686-pae #1 SMP Debian 3.2.46-1 i686 GNU/Linux

To re-run it simply type "!!":

root@websrv:/home/student# !!
uname -a
Linux websrv 3.2.0-4-686-pae #1 SMP Debian 3.2.46-1 i686 GNU/Linux

root@websrv:/home/student#

 

Tracking multiple log files in real time in Linux console / terminal (MultiTail)

Monday, July 29th, 2013

Multitail multiple tail Debian GNU Linux viewing Apache access and error log in shared screen
Whether you have to administer Apache, Nginx or Lighttpd, or whatever other kind of daemon which interactively logs user requests or errors you probably already know well of tail command (tail -f /var/log/apache2/access.log) is something Webserver Linux admin can't live without. Sometimes however you have number of Virtualhost (domains) each configured to log site activity in separate log file. One solution to the problem is to use GNU Screen (screen – terminal emulator) to launch multiple screen session and launch separate tail -f /var/log/apache2/domain1/access.log , tail -f /var/log/apache2/domain2/access.log etc. This however is a bit of hack and except configuring screen to show multiple windows on one Virtual Terminal (tty or vty in gnome), you can't really see output simultaneously in one separated window.

Here is where multitail comes handy. MultiTail is tool to visualize in real time log records output of multiple logs (tails) in one shared terminal Window. MultiTail is written to use ncurses library used by a bunch of other useful tools like Midnight Command so output is colorful and very nice looking.

Here is MultiTail package description on Debian Linux:

linux:~# apt-cache show multitail|grep -i description -A 1
Description-en: view multiple logfiles windowed on console
 multitail lets you view one or multiple files like the original tail

Description-md5: 5e2f688efb214b063bdc418a705860a1
Tag: interface::text-mode, role::program, scope::utility, uitoolkit::ncurses,
root@noah:/home/hipo# apt-cache show multitail|grep -i description -A 1
Description-en: view multiple logfiles windowed on console
 multitail lets you view one or multiple files like the original tail

Description-md5: 5e2f688efb214b063bdc418a705860a1
Tag: interface::text-mode, role::program, scope::utility, uitoolkit::ncurses,
 

Multiple Tail is available across most Linux distributions to install on Debian / Ubuntu / Mint etc. Linux:

debian:~# apt-get install --yes multitail
...

On recent Fedora / RHEL / CentOS etc. RPM based Linuces to install:

[root@centos ~]# yum -y install multitail
...

On FreeBSD multitail is available to install from ports:

freebsd# cd /usr/ports/sysutils/multitail
freebsd# make install clean
...

Once installed to display records in multiple files lets say Apache domain name access.log and error.log

debian:~# multitail -f /var/log/apache2/access.log /var/log/apache2/error.log

It has very extensive help invoked by simply pressing h while running

multtail-viewing-in-gnome-shared-screen-debian-2-log-files-screenshot

Even better multitail is written to already have integrated color schemes for most popular Linux services log files

multitail multiple tail debian gnu linux logformat different color schemes screenshot
List of supported MulLog Color schemes as of time of writting article is:

acctail, acpitail, apache, apache_error, argus, asterisk, audit, bind, boinc, boinctail ,checkpoint, clamav, cscriptexample, dhcpd, errrpt, exim, httping, ii, inn, kerberos, lambamoo, liniptfw, log4j, mailscanner, motion, mpstat, mysql, nagtail, netscapeldap, netstat, nttpcache, ntpd, oracle, p0f, portsentry, postfix, pptpd, procmail, qmt-clamd, qmt-send, qmt-smtpd, qmt-sophie, qmt-spamassassin, rsstail, samba, sendmail, smartd, snort spamassassin, squid, ssh, strace, syslog, tcpdump, vmstat, vnetbr, websphere, wtmptail

To tell it what kind of log Color scheme to use from cmd line use:

debian:~# multitail -Csapache /var/log/apache2/access.log /var/log/apache2/error.log

multiple tail with Apache highlight on Debian Linux screenshot

Useful feature is to run command display in separate Windows while still following log output, i.e.:

[root@centos:~]# multitail /var/log/httpd.log -l "netstat -nat"
...

Multitail can also merge output from files in one Window, while in second window some other log or command output is displayed. To merge output from Apache access.log and error.log:

debian:~# multitail /var/log/apache2/access.log -I /var/log/apache2/error.log

When merging two log files output to show in one Window it is useful to display each file output in different color for the sake of readability

For example:
 

debian:~# multitail -ci green /var/log/apache/access.log -ci red -I /var/log/apache/error.log

multitail merged Apache access and error log on Debian Linux

To display output from 3 log files in 3 separate shared Windows in console use:

linux:~# multitail -s 2 /var/log/syslog /var/log/apache2/access.log /var/log/apache2/error.log

For some more useful examples, check out MultiTail's official page examples
There is plenty of other useful things to do with multitail, for more RTFM 🙂

How to delete million of files on busy Linux servers (Work out Argument list too long)

Tuesday, March 20th, 2012

How to Delete million or many thousands of files in the same directory on GNU / Linux and FreeBSD

If you try to delete more than 131072 of files on Linux with rm -f *, where the files are all stored in the same directory, you will get an error:

/bin/rm: Argument list too long.

I've earlier blogged on deleting multiple files on Linux and FreeBSD and this is not my first time facing this error.
Anyways, as time passed, I've found few other new ways to delete large multitudes of files from a server.

In this article, I will explain shortly few approaches to delete few million of obsolete files to clean some space on your server.
Here are 3 methods to use to clean your tons of junk files.

1. Using Linux find command to wipe out millions of files

a.) Finding and deleting files using find's -exec switch:

# find . -type f -exec rm -fv {} \;

This method works fine but it has 1 downside, file deletion is too slow as for each found file external rm command is invoked.

For half a million of files or more, using this method will take "long". However from a server hard disk stressing point of view it is not so bad as, the files deletion is not putting too much strain on the server hard disk.
b.) Finding and deleting big number of files with find's -delete argument:

Luckily, there is a better way to delete the files, by using find's command embedded -delete argument:

# find . -type f -print -delete

c.) Deleting and printing out deleted files with find's -print arg

If you would like to output on your terminal, what files find is deleting in "real time" add -print:

# find . -type f -print -delete

To prevent your server hard disk from being stressed and hence save your self from server normal operation "outages", it is good to combine find command with ionice, e.g.:

# ionice -c 3 find . -type f -print -delete

Just note, that ionice cannot guarantee find's opeartions will not affect severely hard disk i/o requests. On  heavily busy servers with high amounts of disk i/o writes still applying the ionice will not prevent the server from being hanged! Be sure to always keep an eye on the server, while deleting the files nomatter with or without ionice. if throughout find execution, the server gets lagged in serving its ordinary client requests or whatever, stop the execution of the cmd immediately by killing it from another ssh session or tty (if physically on the server).

2. Using a simple bash loop with rm command to delete "tons" of files

An alternative way is to use a bash loop, to print each of the files in the directory and issue /bin/rm on each of the loop elements (files) like so:

for i in *; do
rm -f $i;
done

If you'd like to print what you will be deleting add an echo to the loop:

# for i in $(echo *); do \
echo "Deleting : $i"; rm -f $i; \

The bash loop, worked like a charm in my case so I really warmly recommend this method, whenever you need to delete more than 500 000+ files in a directory.

3. Deleting multiple files with perl

Deleting multiple files with perl is not a bad idea at all.
Here is a perl one liner, to delete all files contained within a directory:

# perl -e 'for(<*>){((stat)[9]<(unlink))}'

If you prefer to use more human readable perl script to delete a multitide of files use delete_multple_files_in_dir_perl.pl

Using perl interpreter to delete thousand of files is quick, really, really quick.
I did not benchmark it on the server, how quick exactly is it, but I guess the delete rate should be similar to find command. Its possible even in some cases the perl loop is  quicker …

4. Using PHP script to delete a multiple files

Using a short php script to delete files file by file in a loop similar to above bash script is another option.
To do deletion  with PHP, use this little PHP script:

<?php
$dir = "/path/to/dir/with/files";
$dh = opendir( $dir);
$i = 0;
while (($file = readdir($dh)) !== false) {
$file = "$dir/$file";
if (is_file( $file)) {
unlink( $file);
if (!(++$i % 1000)) {
echo "$i files removed\n";
}
}
}
?>

As you see the script reads the $dir defined directory and loops through it, opening file by file and doing a delete for each of its loop elements.
You should already know PHP is slow, so this method is only useful if you have to delete many thousands of files on a shared hosting server with no (ssh) shell access.

This php script is taken from Steve Kamerman's blog . I would like also to express my big gratitude to Steve for writting such a wonderful post. His post actually become  inspiration for this article to become reality.

You can also download the php delete million of files script sample here

To use it rename delete_millioon_of_files_in_a_dir.php.txt to delete_millioon_of_files_in_a_dir.php and run it through a browser .

Note that you might need to run it multiple times, cause many shared hosting servers are configured to exit a php script which keeps running for too long.
Alternatively the script can be run through shell with PHP cli:

php -l delete_millioon_of_files_in_a_dir.php.txt.

5. So What is the "best" way to delete million of files on Linux?

In order to find out which method is quicker in terms of execution time I did a home brew benchmarking on my thinkpad notebook.

a) Creating 509072 of sample files.

Again, I used bash loop to create many thousands of files in order to benchmark.
I didn't wanted to put this load on a productive server and hence I used my own notebook to conduct the benchmarks. As my notebook is not a server the benchmarks might be partially incorrect, however I believe still .they're pretty good indicator on which deletion method would be better.

hipo@noah:~$ mkdir /tmp/test
hipo@noah:~$ cd /tmp/test;
hiponoah:/tmp/test$ for i in $(seq 1 509072); do echo aaaa >> $i.txt; done

I had to wait few minutes until I have at hand 509072  of files created. Each of the files as you can read is containing the sample "aaaa" string.

b) Calculating the number of files in the directory

Once the command was completed to make sure all the 509072 were existing, I used a find + wc cmd to calculate the directory contained number of files:

hipo@noah:/tmp/test$ find . -maxdepth 1 -type f |wc -l
509072

real 0m1.886s
user 0m0.440s
sys 0m1.332s

Its intesrsting, using an ls command to calculate the files is less efficient than using find:

hipo@noah:/tmp/test$ time ls -1 |wc -l
509072

real 0m3.355s
user 0m2.696s
sys 0m0.528s

c) benchmarking the different file deleting methods with time

– Testing delete speed of find

hipo@noah:/tmp/test$ time find . -maxdepth 1 -type f -delete
real 15m40.853s
user 0m0.908s
sys 0m22.357s

You see, using find to delete the files is not either too slow nor light quick.

– How fast is perl loop in multitude file deletion ?

hipo@noah:/tmp/test$ time perl -e 'for(<*>){((stat)[9]<(unlink))}'real 6m24.669suser 0m2.980ssys 0m22.673s

Deleting my sample 509072 took 6 mins and 24 secs. This is about 3 times faster than find! GO-GO perl 🙂
As you can see from the results, perl is a great and time saving, way to delete 500 000 files.

– The approximate speed deletion rate of of for + rm bash loop

hipo@noah:/tmp/test$ time for i in *; do rm -f $i; done

real 206m15.081s
user 2m38.954s
sys 195m38.182s

You see the execution took 195m en 38 secs = 3 HOURS and 43 MINUTES!!!! This is extremely slow ! But works like a charm as the running of deletion didn't impacted my normal laptop browsing. While the script was running I was mostly browsing through few not so heavy (non flash) websites and doing some other stuff in gnome-terminal) 🙂

As you can imagine running a bash loop is a bit CPU intensive, but puts less stress on the hard disk read/write operations. Therefore its clear using it is always a good practice when deletion of many files on a dedi servers is required.

b) my production server file deleting experience

On a production server I only tested two of all the listed methods to delete my files. The production server, where I tested is running Debian GNU / Linux Squeeze 6.0.3. There I had a task to delete few million of files.
The tested methods tried on the server were:

– The find . type -f -delete method.

– for i in *; do rm -f $i; done

The results from using find -delete method was quite sad, as the server almost hanged under the heavy hard disk load the command produced.

With the for script all went smoothly. The files were deleted for a long long time (like few hours), but while it was running, the server continued with no interruptions..

While the bash loop was running, the server load avarage kept at steady 4
Taking my experience in mind, If you're running a production, server and you're still wondering which delete method to use to wipe some multitude of files, I would recommend you go  the bash for loop + /bin/rm way. Yes, it is extremely slow, expect it run for some half an hour or so but puts not too much extra load on the server..

Using the PHP script will probably be slow and inefficient, if compared to both find and the a bash loop.. I didn't give it a try yet, but suppose it will be either equal in time or at least few times slower than bash.

If you have tried the php script and you have some observations, please drop some comment to tell me how it performs.

To sum it up;

Even though there are "hacks" to clean up some messy parsing directory full of few million of junk files, having such a directory should never exist on the first place.

Frankly, keeping millions of files within the same directory is very stupid idea.
Doing so will have a severe negative impact on a directory listing performance of your filesystem in the long term.

If you know better (more efficient) ways to delete a multitude of files in a dir please share in comments.

Using perl and sed to substitute strings in multiple files on Linux and BSD

Friday, August 26th, 2011

Using perl and sed to replace strings in files on Linux, FreeBSD, OpenBSD, NetBSD and other UnixOn many occasions when had to administer on Linux, BSD, SunOS or any other *nix, there is a need to substitute strings inside files or group of files containing a certain string with another one.

The task is not too complex and many of the senior sysadmins out there would certainly already has faced this requirement and probably had a good idea on files substitution with perl and sed, however I’m quite sure there are dozen of system administrators out there who did not know, how and still haven’t faced a situation where there i a requirement to substitute from a command shell or via a scripting language.

This article tagets exactly these system administrators who are not 100% sys op Gurus 😉

1. Substitute text strings inside files on Linux and BSD with perl

Perl programming language has originally been created to do a lot of text manipulation as well as most of the Linux / Unix based hosts today have installed working copy of perl , therefore using perl as a mean to substitute one string in a file to another one is maybe the best way to completet the task.
Another good thing about perl is that text processing with it is said to be in most cases a bit faster than sed .
However it is still dependent on the string to be substituted I haven’t done benchmark tests to positively say 100% that always perl is quicker, however my common sense suggests perl will be quicker.

Now enough talk here is a very simple way to substitute a reoccuring, text string inside a file with another chosen one is like so:

debian:~# perl -pi -e 's/foo/bar/g' file1 file2

This will substitute the string foo with bar everywhere it’s matched in file1 and file2

However the above code is a bit “dangerous” as it does not preserve a backup copy of the original files, where string is substituted is not made.
Therefore using the above command should only be used where one is 100% sure about the string changes to be made.

Hence a better idea whether conducting the text substitution is to keep also the original file backup under a let’s say .bak extension. To achieve that I use perl as follows:

freebsd# perl -i.bak -p -e 's/syzdarma/magdanoz/g;' file1 file2

This command creates copies of the original files file1 and file2 under the names file1.bak and file2.bak , the files file1 and file2 text occurance of strings syzdarma will get substituted with magdanoz using the option /g which means – (substitute globally).

2. Substitute string in all files inside directory using perl on Linux and BSD

Every now and then the there is a need to do manipulations with large amounts of files, I can’t right now remember a good scenario where I had to change all occuring matching strings to anther one to all files located inside a directory, anyhow I’ve done this on a number of occasions.

A good way to do a mass file string substitution on Linux and BSD hosts equipped with a bash shell is via the commands:

debian:/root/textfiles:# for i in $(echo *.txt); do perl -i.bak -p -e 's/old_string/new_string/g;' $i; done

Where the text files had the default txt file extension .txt

Above bash loop prints each of the files located in /root/textfiles and substitutes everywhere (globally) the old_string with new_string .

Another alternative to the above example to replace multiple occuring text string in all files in multiple directories is possible using a combination of shell commands grep, perl, sort, uniq and xargs .
Let’s say that one wants to match everywhere inside the root directory and all the descendant directories for files with a custom string and substitute it to another one, this can be done with the cmd:

debian:~# grep -R -files-with-matches 'old_string' / | sort | uniq | xargs perl -pi~ -e 's/old_string/new_string/g'

This command will lookup for string old_string in all files in the / – root directory and in case of occurance will substitute with new_string (This command’s idea was borrowed as an idea from http://linuxadmin.org so thx.).

Using the combination of 5 commands, however is not very wise in terms of efficiency.

Therefore to save some system resources, its better in terms of efficiency to take advantage of the find command in combination with xargs , here is how:

debian:~# find / | xargs grep 'old_string' -sl |uniq | xargs perl -pi~ -e 's/old_string/new_string/g'

Once again the find command example will do exactly the same as the substitute method with grep -R …

As enough is said about the way to substitute text strings inside files using perl, I will further explain how text strings can be substituted using sed

The main reason why using sed could be a better choice in some cases is that Unices are not equipped by default with perl interpreter. In general the amount of servers who contains installed sed compared to the ones with perl language interpreter is surely higher.

3. Substitute text strings inside files on Linux and BSD with sed stream editor

In many occasions, wether a website is hosted, one needs to quickly conduct a change in string inside all files located in a directory, to resolve issues with static urls directly encoded in html.
To achieve this task here is a code using two little bash script loops in conjunctions with sed, echo and mv commands:

debian:/var/www/website# for i in $(ls -1); do cat $i |sed -e "s#index.htm#http://www.webdomain.com/#g">$i.new; done
debian:/var/www/website# for i in $(ls *.new); do mv $i $(echo $i |sed -e "s#.new##g"); done

The above command sed -e “s#index.htm#http://www.webdomain.com/#g”, instructs sed to substitute all appearance of the text string index.htm to the new text string http://www.webdomain.com

First for bash loop, creates all the files with substituted string to file1.new, file2.new, file3.new etc.
The second for loop uses mv to overwrite the original input files file1, file2, file3, etc. with the newly created ones file1.new, file2.new, file3.new

There is a a way shorter way to conclude the same text substitutions task using a simpler one liner with only using sed and bash’s eval capabilities, here is how:

debian:/var/www/website# sed -i 's/old_string/new_string/g' *

Above command will change old_string to new_string inside all files in directory /var/www/website

Whether a change has to be made with less than 1024 files using this method might be more efficient, however whether a text substitute has to be done to let’s say 5000+ the above simplistic version will not work. An error of Argument list too long will prevent the sed -i ‘s/old_string/new_string/g’ to complete its task.

The above for loop 2 liner should be also working without problems with FreeBSD and the rest of BSD derivatives, though I have not tested it yet, hence any feedback from FreeBSD guys is mostly welcome.

Consider that in order to have the for loops commands work on FreeBSD or NetBSD, they have to be run under a bash shell.
That’s all folks thanks the Lord for letting me write this nice article, I hope it gives some insights on how multiple files text replace on Unix works .
Cheers 😉