Posts Tagged ‘file1’

Merge (convert) multiple PDF files into one single PDF – Generate one pdf from many on Linux / Windows and Mac

Wednesday, August 6th, 2014

merge-convert-many-pdf-files-to-single-one-generate-one-pdf-from-many-pdf-files-linux-windows-mac-pdftk-logo
I was looking for English Orthodox Bible translation of the Old Testament (Septuagint Version) and found such divided in many pdf files. I wanted to create a common (single) PDF from all the separate Old Testamental Book files in order to put it online as it might be convenient for English native speakers to download and later read offline on their computers the Old Testament Orthodox version Holy Bible.

Before I explain how I did it I will make a short turn to explain few things about Septuagint, as this is probably interesting stuff, you might not know.

Septuagint (also referred as LXX or the Alexandrian Canon) – Is Translation of the Hebrew Bible and some related text in Koine Greek) by legendary 70 Jewish scholars as early as the 2nd century BC. Just for those interested in Christianity it is curious fact that the number of Old Testament books are different among Protestant, Roman Catholic and Orthodox Christians, whether the number of New Testament books are the same in Catholics, Protestant and Orthodox.

So How Many books are in Roman Catholic, Protestant and Orthodox Old Testament Holy Bible?

The Old Testament in Orthodox Holy Bible version has 50 (where Slavonic versions of the bible include also +2 More which are the  Edras books), whether protestant Holy Bible includes only 39 books in old testament and Roman Catholics has 46 old testamental books in there bibles. The reason why Protestants choose to have less books (only 39) is some of the books in the Roman Catholic and Orthodox Church are Apocryphal are referred to as the Apocryphal, or Deuterocanonical books this doesn't mean that the extra 8 Books in Orthodox Bibles are not God Inspired, this means, they don't have the historic authenticity as the early Church accepted canonicals.

The Orthodox Church accepted the Septuagint LXX as divinely inspired to be used in Church.

Now back to how I managed to merge (convert) multiple PDF files into single PDF on my Debian Linux home router.

My first attempt was with ImageMagick's convert (in the same manner as I used to generate PDF files from pictures earlier), e.g.:
 

convert intro.pdf genesis.pdf exodus.pdf leviticus.pdf numbers.pdf deuteronomy.pdf … SINGLE-FILE.PDF

I waited for convertion to complete quite long but it seemed looping so finally after 7 minutes I stopped it and decided to try with something else and, after quick search I found pdftk.

pdftk has plenty of functions and is great for anyone who needs to do Merge / Split Update / Encrypt / Repair corrupted PDFs on Linux:

 apt-cache show pdftk |grep -i desc -A 17
Description: tool for manipulating PDF documents
 If PDF is electronic paper, then pdftk is an electronic stapler-remover,
 hole-punch, binder, secret-decoder-ring, and X-Ray-glasses. Pdftk is a
 simple tool for doing everyday things with PDF documents. Keep one in the
 top drawer of your desktop and use it to:
  – Merge PDF documents
  – Split PDF pages into a new document
  – Decrypt input as necessary (password required)
  – Encrypt output as desired
  – Fill PDF Forms with FDF Data and/or Flatten Forms
  – Apply a Background Watermark
  – Report PDF on metrics, including metadata and bookmarks
  – Update PDF Metadata
  – Attach Files to PDF Pages or the PDF Document
  – Unpack PDF Attachments
  – Burst a PDF document into single pages
  – Uncompress and re-compress page streams
  – Repair corrupted PDF (where possible)

To install pdftk on Debian Linux Lenny / Wheezy:

apt-get install –yes pdftk

After installed to convert a number of separate PDF files into single (merged) PDF file:
 

pdftk file1.pdf file2.pdf file3.pdf cat output single-merged-pdf-file.pdf

 

 

pdftk intro.pdf genesis.pdf exodus.pdf leviticus.pdf numbers.pdf deuteronomy.pdf joshua.pdf judges.pdf ruth.pdf kingdoms_1.pdf kingdoms_2.pdf kingdoms_3.pdf kingdoms_4.pdf paraleipomenon_1.pdf paraleipomenon_2.pdf esdras_1.pdf esdras_2.pdf nehemiah.pdf tobit.pdf judith.pdf esther.pdf maccabees_1.pdf maccabees_2.pdf maccabees_3.pdf psalms.pdf job.pdf proverbs_of_solomon.pdf ecclesiastes.pdf song_of_songs.pdf wisdom_of_solomon.pdf wisdom_of_sirach.pdf hosea.pdf amos.pdf micah.pdf joel.pdf obadiah.pdf jonah.pdf nahum.pdf habbakuk.pdf zephaniah.pdf malachi.pdf isaiah.pdf jeremiah.pdf baruch.pdf lamentations_of_jeremiah.pdf an_epistle_of_jeremiah.pdf ezekiel.pdf daniel.pdf maccabees_4.pdf slavonic_appendix.pdf cat output Orthodox-English-translation-of-Old-Testament-Septuagint.pdf

And Hooray! It worked The resulting share Old Testament (Orthodox) English translation from Septuagint PDF is here

pdftk is also ported for Fedora / CentOS / RHEL etc. (RPM distros), so you to install it there:

yum -y install pdftk

Or if missing in repositories grab the respective pdf and

rpm -ivh pdftk-*yourarch.pdf

PDFtk has also Windows and Mac OS version just in case if you need to script Merging of multiple PDFs to single ones for more check out PDftk Server page homepage here

Using perl and sed to substitute strings in multiple files on Linux and BSD

Friday, August 26th, 2011

Using perl and sed to replace strings in files on Linux, FreeBSD, OpenBSD, NetBSD and other UnixOn many occasions when had to administer on Linux, BSD, SunOS or any other *nix, there is a need to substitute strings inside files or group of files containing a certain string with another one.

The task is not too complex and many of the senior sysadmins out there would certainly already has faced this requirement and probably had a good idea on files substitution with perl and sed, however I’m quite sure there are dozen of system administrators out there who did not know, how and still haven’t faced a situation where there i a requirement to substitute from a command shell or via a scripting language.

This article tagets exactly these system administrators who are not 100% sys op Gurus 😉

1. Substitute text strings inside files on Linux and BSD with perl

Perl programming language has originally been created to do a lot of text manipulation as well as most of the Linux / Unix based hosts today have installed working copy of perl , therefore using perl as a mean to substitute one string in a file to another one is maybe the best way to completet the task.
Another good thing about perl is that text processing with it is said to be in most cases a bit faster than sed .
However it is still dependent on the string to be substituted I haven’t done benchmark tests to positively say 100% that always perl is quicker, however my common sense suggests perl will be quicker.

Now enough talk here is a very simple way to substitute a reoccuring, text string inside a file with another chosen one is like so:

debian:~# perl -pi -e 's/foo/bar/g' file1 file2

This will substitute the string foo with bar everywhere it’s matched in file1 and file2

However the above code is a bit “dangerous” as it does not preserve a backup copy of the original files, where string is substituted is not made.
Therefore using the above command should only be used where one is 100% sure about the string changes to be made.

Hence a better idea whether conducting the text substitution is to keep also the original file backup under a let’s say .bak extension. To achieve that I use perl as follows:

freebsd# perl -i.bak -p -e 's/syzdarma/magdanoz/g;' file1 file2

This command creates copies of the original files file1 and file2 under the names file1.bak and file2.bak , the files file1 and file2 text occurance of strings syzdarma will get substituted with magdanoz using the option /g which means – (substitute globally).

2. Substitute string in all files inside directory using perl on Linux and BSD

Every now and then the there is a need to do manipulations with large amounts of files, I can’t right now remember a good scenario where I had to change all occuring matching strings to anther one to all files located inside a directory, anyhow I’ve done this on a number of occasions.

A good way to do a mass file string substitution on Linux and BSD hosts equipped with a bash shell is via the commands:

debian:/root/textfiles:# for i in $(echo *.txt); do perl -i.bak -p -e 's/old_string/new_string/g;' $i; done

Where the text files had the default txt file extension .txt

Above bash loop prints each of the files located in /root/textfiles and substitutes everywhere (globally) the old_string with new_string .

Another alternative to the above example to replace multiple occuring text string in all files in multiple directories is possible using a combination of shell commands grep, perl, sort, uniq and xargs .
Let’s say that one wants to match everywhere inside the root directory and all the descendant directories for files with a custom string and substitute it to another one, this can be done with the cmd:

debian:~# grep -R -files-with-matches 'old_string' / | sort | uniq | xargs perl -pi~ -e 's/old_string/new_string/g'

This command will lookup for string old_string in all files in the / – root directory and in case of occurance will substitute with new_string (This command’s idea was borrowed as an idea from http://linuxadmin.org so thx.).

Using the combination of 5 commands, however is not very wise in terms of efficiency.

Therefore to save some system resources, its better in terms of efficiency to take advantage of the find command in combination with xargs , here is how:

debian:~# find / | xargs grep 'old_string' -sl |uniq | xargs perl -pi~ -e 's/old_string/new_string/g'

Once again the find command example will do exactly the same as the substitute method with grep -R …

As enough is said about the way to substitute text strings inside files using perl, I will further explain how text strings can be substituted using sed

The main reason why using sed could be a better choice in some cases is that Unices are not equipped by default with perl interpreter. In general the amount of servers who contains installed sed compared to the ones with perl language interpreter is surely higher.

3. Substitute text strings inside files on Linux and BSD with sed stream editor

In many occasions, wether a website is hosted, one needs to quickly conduct a change in string inside all files located in a directory, to resolve issues with static urls directly encoded in html.
To achieve this task here is a code using two little bash script loops in conjunctions with sed, echo and mv commands:

debian:/var/www/website# for i in $(ls -1); do cat $i |sed -e "s#index.htm#http://www.webdomain.com/#g">$i.new; done
debian:/var/www/website# for i in $(ls *.new); do mv $i $(echo $i |sed -e "s#.new##g"); done

The above command sed -e “s#index.htm#http://www.webdomain.com/#g”, instructs sed to substitute all appearance of the text string index.htm to the new text string http://www.webdomain.com

First for bash loop, creates all the files with substituted string to file1.new, file2.new, file3.new etc.
The second for loop uses mv to overwrite the original input files file1, file2, file3, etc. with the newly created ones file1.new, file2.new, file3.new

There is a a way shorter way to conclude the same text substitutions task using a simpler one liner with only using sed and bash’s eval capabilities, here is how:

debian:/var/www/website# sed -i 's/old_string/new_string/g' *

Above command will change old_string to new_string inside all files in directory /var/www/website

Whether a change has to be made with less than 1024 files using this method might be more efficient, however whether a text substitute has to be done to let’s say 5000+ the above simplistic version will not work. An error of Argument list too long will prevent the sed -i ‘s/old_string/new_string/g’ to complete its task.

The above for loop 2 liner should be also working without problems with FreeBSD and the rest of BSD derivatives, though I have not tested it yet, hence any feedback from FreeBSD guys is mostly welcome.

Consider that in order to have the for loops commands work on FreeBSD or NetBSD, they have to be run under a bash shell.
That’s all folks thanks the Lord for letting me write this nice article, I hope it gives some insights on how multiple files text replace on Unix works .
Cheers 😉