Posts Tagged ‘console’

Save ( Extract ) only images from PDF files on GNU / Linux in console and in GNOME nautilus

Wednesday, November 14th, 2012

extract only pictures / ( images ) from PDF / PDF save only images

Some time ago, I've blogged how it is possible to dump a PDF individual pages into JPG / PNG etc. pics.

Today interestingly, I've learned it is possible to not only dump single or whole PDF document pages into pictures but also to selectively dump only the pictures contained within  PDF file into JPEGs.

Dumping only PDF (contained) images into external JPEG files is doable on GNU / Linux with pdfimages.

1. Extracting pictures from PDF in text console / terminal

pdfimages is part of poppler-utils deb package, if for some reasons you don't have pdfimages on ur system install poppler-utils with;

apt-get install --yes poppler-utils

To extract images of a certain PDF from terminal / console command line it is as simple as:

pdfimages -j pdf-file-name.pdf prefix-of-output-file

pdfimages
will extract all pictures, but bear in mind with some PDF versions it might incorrectly dump some text pages thinking it is pictures too. Also with some PDFs which contain scanned very old paper documents (as pictures) trying to force pdfimages to dump it will just provide you with all pages of the PDF in JPGs. Option -j instructs dumping images from PDF in JPEG picture format, whether the second argument will save pictures in files like: prefix-of-output-file-000.jpg, prefix-of-output-file-001.jpg, prefix..-file-002.jpg etc.

2. Adding GNOME nautilus capability to extract images from PDF files

Enabling extracting images in nautilus is possible with one non-default nautilus plugin  nautilus-scripts-manager

nautilis-scripts-manager is very nice, but I'm sure many Linux users did not know it yet. It makes possible to add any custom shell script that does an opeartion to be visible in nautilus via one extra menu Scripts. As not normally needed on most Linux distributions, it is not installed by default so you have to install it:

noah:~# apt-get install --yes nautilus-scripts-manager

Below is a Screenshot from my nautilus Scripts menu (my locale is in Bulgarian), so Scripts word is in Cyrillic "Скриптове" 🙂

nautilus scripts menu screenshot - allowing users to add custom shell scripts to run in GNOME desktop extract pictures from PDF Linux

After nautilus-scripts-manager is installed. To use it in your user home directory you will have to create ~/.gnome/nautilus-scripts, i.e.

$ mkdir ~/.gnome2/nautilus-scripts

Any script placed inside can be then invoked via the newly appeared nautilus "Scripts" menu. Thus to use extract_images_from_pdfs.sh from GUI place it there.

Download the following extract_images_from_pdfs.sh shell script

$ cd ~/.gnome2/nautilus-scripts
$ wget -q https://www.pc-freak.net/files/extract_images_from_pdfs.sh
$ chmod +x extract_images_from_pdfs.sh

If you prefer to copy paste script content:

$ cat ~/.gnome2/nautilus-scripts/extract_images_from_pdf.sh
#!/bin/bash
# Extracts image files from PDF files
# For more information see www.boekhoff.info
## Added check for $1 existence and $1_images dir
existence check by hip0
# https://www.pc-freak.net/blog/
if [ $1 ]; then
if [ ! -d ./$1_images ]; then
mkdir -p ./"$1_images"
fi
pdfimages -j "$1" ./"$1_images"/PDFimage
gdialog --title "Report" --msgbox "Images were successfully extracted!"
exit 0
fi

 

Well that's all. Once you select a PDF and you click with last mouse button on it selecting Scripts -> extract_images_from_pdf.sh a new directory containing the filename prefix of the selected PDF with _images will appear. For exmpl. if pictures are extracted from PDF named filename.PDF in same directory where the file is present you will get new filename_images folder with all pictures dumped from the PDF.

I've learned about pdfimages existence from Sven Boekhoff's blog which btw has plenty of interesting other stuff

 

Well that's it hope this helps, someone. Comments are welcome 🙂

Convert .doc to .pdf on Linux and BSD using console / Convertion of PDF to DOC inside scripts

Tuesday, November 13th, 2012

how to Convert .doc to .PDF using console / terminal on Linux and FreeBSD

On Linux, there are plenty of ways nowadays to convert Microsoft Word or OpenOffice .DOC documents to Adobe's PDF (Postscript). However most of the ways require a graphical environment. As I'm interested in how convertion is done mainly from console to suit shell scripts and php which has to routinely convert a bunch of .DOC files to .PDF. I've checked today how PDF to DOC is possible on Debian, Ubuntu, Arch Linux  and FreeBSD..

There are few tools one can use from console, that doesn't requiere you to have running Xorg on the convertion host. The quality of the produced converted document, may vary and with some Microsoft Office doc files, there might be some garbage. But generally for simplistic and well written "macros" free documents the quality of PDF is satisfactory with few of the tools.

Here I will list the few tools, one can use for convertion:

  • abiword – you probably know abiword GUI program which is a good substitute for people who doesn't want the huge openoffice on the host. interestingly abiword supports converts with no need for GUI
     
  • wvPDF (you have to have install wv package and usually this converter works well only with very old .DOC (MS Office 97) – I was not impressed with those convert results
     
  • oowriter / swriter (whether LibreOffice installed) or writer (on LibreOffice), on some Ubuntus and derivatives the equivalent cmd is lowriter
     
  • unoconv – this tool produces really good DOC to  PDF converts, it is a python script using openoffice / libreoffice as backend convertion engine so produced PDFs will be identical like the ones produced with oowriter, the pros of the tool is its syntax is very user friendly and along with PDF to DOC it supports easy syntax converting to  bunch of other file formats. Actually unoconv supports same convertions which supported by OpenOffice.org, the advantage is however you can use it within console and even schedule convertion to be processed by a remote host.

 

1. Convertion of DOC to PDF with abiword

abiword --to=pdf doc_file_to_convert.doc

2. Convert DOC to PDF with wvPDF

apt-get install --yes wv texlive-base texlive-latex-base ghostscript

wvPDF doc-file-to-convert-to-pdf.doc converted-to-pdf.pdf

wvPDF doc-file-to-convert-to-pdf.doc convert-to-pdf.pdf

Current directory: /home/hipo/Desktop
"doc-file-to-convert-to-pdf.eps" exists - skipping...
Some problem running latex.
Check for Errors in steinway.log
Continuing... 

 

The produced .pdf was not useful most of the text inside was completely missing as well as some weird probably PostScript convertion characters were in the .PDF. Seeing its output I would as of time of writing wvPDF Debian's verion 1.2.4 is crap.


3. Convert DOC to PDF with oowriter / swriter / lowrite

a) convert with oowriter and swriter

I saw posts online claiming DOC to PDF convertion is possible directly with oowriter or swriters with commands:

oowriter -convert-to pdf:writer_pdf_Export input-doc-file-to-convert.doc

or

swriter -convert-to pdf:writer_pdf_Export steinway.doc - as named on some Linux-es
 

As long as I tested it on my Debian Squeeze, neither of the two works
.I saw some suggestions that PDF can be generated by installing and using cups-pdf debian package:

apt-get install cups-pdf oowriter -pt pdf your_word_file.doc b) convert DOC to PDF with lowriter I've seen in Ubuntu documentation and in Ubuntu forums, users saying they had some good results using lowriter, which is a sort of front-end program to ImageMagick's convert. I never tested that but I doubt of any satisfactory results, as I tried converting to PDF earlier using convert and often converts failed. Anyways you try it with:

lowriter --convert-to pdf *.doc

 

4. Converting PDF to DOC  with unoconv

As of time of writing it seems unoconv is best Linux console tool for converting .doc to .pdf

It produces good readable text, as well as pictures and elements looks exactly as in OpenOffice.

To install it I run:

# apt-get install --yes unoconv
....

To use it:

$ unoconv -fpdf any-file-to-convert.doc

If you don't get errors or it doesn't crash a .doc file with same name any-file-to-convert.doc is created.

What unoconv, does is precisely the same as if using OpenOffice GUI's  to convert to PDF:

 

  • Open -> Open Office (3.2 in my case)
  • Open Document to export
  • File->Export as PDF
  • Click: Export
  • Choose file namefor output PDF


An interesting feature of unoconv is its possibility to run and convert as a port listening server. I never used this but noticed it mentioned in manual EXAMPLE section:

 

EXAMPLES
       You can use unoconv in standalone mode, this means that in absence of an OpenOffice listener, it will starts its own:

       unoconv -f pdf some-document.odt
       One can use unoconv as a listener (by default localhost:2002) to let other unoconv instances connect to it:

       unoconv --listener &
       unoconv -f pdf some-document.odt
       unoconv -f doc other-document.odt
       unoconv -f jpg some-image.png
       unoconv -f xsl some-spreadsheet.csv
       kill -15 %-
       This also works on a remote host:

       unoconv --listener --server 1.2.3.4 --port 4567
       and then connect another system to convert documents:

       unoconv --server 1.2.3.4 --port 4567

unoconv does not recognize wildcards like ' * ' , so in order to convert multiple DOC to PDF files one has to use the usual shell loop:

for i in *.doc; do unoconv -fpdf $i; done

From all my tests, I think unoconv is preferred tool for Linux and BSD users (good time to mention unoconv is available on FreeBSD too. BSD users can install it via port  /usr/ports/textproc/unoconv)

Make your Debian Linux and FreeBSD terminal / console display daily verse from KJV Bible

Saturday, September 26th, 2009

Since I am a Christian and I want to daily be in touch with the Holy Scriptures and I am most of the time spending on my Linux system. I have came to the conclusion that it’s beneficial to
have a daily bible displaying everytime after login in console or terminal in X.

Therefore I thought it might be helpful to somebody out there who would wish to have short sentece of bible on each Linux / FreeBSD machine login.

Here is how to set bible quote to appear everytime after login in Debian Linux:
First install the verse program through:


# apt-get install verse

Next if you want to make the verse display global for the system put :


if [ -f /usr/bin/verse ]; thenecho/usr/bin/verse fi

in /etc/bash.bashrc
On the other hand if you’d like to make it local for your account or a setnumber of accounts on your system append


if [ -f /usr/bin/verse ]; thenecho/usr/bin/verse fi

to your user ~/.bashrc as well as to the home directories of the users you’d like to display a bible verse (if for several users).

If you decide to do that be aware that your login via sftp won’t work anymore – forget about sftp transfers ….

Every time you attempt to login you’ll experience the error message:

“Received message too long”. However that ain’t a real problem for me since I use my system as a desktop and don’t sftp or ssh remotely to my desktop.
In order to prevent this issue where sftp interactivity gets broken it is better to add verse app to execute via /etc/profile i. e. in /etc/profile on top of file add:


if [ -f /usr/bin/verse ]; then echo /usr/bin/verse fi

On FreeBSD the same is achieved a bit differently. Here is how to install it in FreeBSD:

First install fortune program and then install the fortune bible module; In FreeBSD bible quotes are only available via the good old fortune program:

cd /usr/ports/misc/fortune-mod-bible;
make install clean

Next open:the /etc/profile file and insert in the end of it:

echo /usr/games/fortune /usr/local/share/games/fortune/bible

On your next login your FreeBSD should be showing a bible
sentence (quotation) after each and every login.
What is different with Debian’s verse program is that verse keeps displaying one exact quote of the bible during every login for the whole day,
where in FreeBSD the fortune-mod-bible does show a different (random) bible sentence on each and every user login.

Linux Check Laptop battery status from Console / Terminal

Tuesday, September 15th, 2009

I needed to have a quick way to check my battery status via terminal and I googled around looking
for a solution. I found the following website explaining in a pretty good way “How to check your battery status in Console”.
.Just like the blog explains the proper way to do that in Linux is via the acpi command. In case if you don’t have, yet the commandplease install it.
1. The fastest way to check the laptop battery status is via: $ acpiThe output would be something like this:Battery 1: discharging, 91%, 02:00:09 remaining2. Check laptop battery temperature:$ acpi -t3. Check laptop battery AC Power Status$ acpi -a4. Check everything related to acpi$ acpi -V
Another possible way to check for your notebook battery status from console in Linux is via:
acpitool
1. Command to show general information for battery status:$ acpitool2. For detailed battery information use:$ acpitool -B add “-v” for verbose output3. Show information about AC power.$ acpitool -a
Some of the other possible ways to check your battery charge status via console are either via: yabs ( Yet Another Battery Status ) script or via:
cat /proc/acpi/battery/BAT0/state orcat /proc/acpi/battery/BAT0/info
END—–

Convert doc files to plain text (txt) in terminal / console (tty) on GNU / Linux

Sunday, September 20th, 2009

I was looking for a way to convert Microsoft .doc files to plain text (txt) in Linux directly through terminal.
After some lookup in Google Groups I found ANTIWORD! .
Luckily Debian comes even with a package containing the nice nifty program.
Here is the description of antiword – Converts MS Word files to text, PS and PDF
Fun, desciprtion Eh? 🙂 Ain’t it?
There are some other ways to Convert doc files to plain text, for instance you could use the command catdoc , for example to convert simple .doc to .txt file usecatdoc -a whitepaper.doc.
Another way to convert .doc files to .txt mostly used by developers is via the wvware (nothing to do with vmware!:)) utility.
wvware could directly convert it to html. For example:
wvWare file.doc >file.htmlor
wvText file.doc file.txt
.A lot of things I’ll skip here are well explained in the article Viewing Word files at the command line .
END—–