Posts Tagged ‘content’

How to disable tidy HTML corrector and validator to output error and warning messages

Sunday, March 18th, 2012

Reading Time: 2minutes

I've noticed in /var/log/apache2/error.log on one of the Debian servers I manage a lot of warnings and errors produced by tidy HTML syntax checker and reformatter program.

There were actually quite plenty frequently appearing messages in the the log like:

...
To learn more about HTML Tidy see http://tidy.sourceforge.net
Please fill bug reports and queries using the "tracker" on the Tidy web site.
Additionally, questions can be sent to html-tidy@w3.org
HTML and CSS specifications are available from http://www.w3.org/
Lobby your company to join W3C, see http://www.w3.org/Consortium
line 1 column 1 - Warning: missing <!DOCTYPE> declaration
line 1 column 1 - Warning: plain text isn't allowed in <head> elements
line 1 column 1 - Info: <head> previously mentioned
line 1 column 1 - Warning: inserting implicit <body>
line 1 column 1 - Warning: inserting missing 'title' element
Info: Document content looks like HTML 3.2
4 warnings, 0 errors were found!
...

I did a quick investigation on where from this messages are logged in error.log, and discovered few .php scripts in one of the websites containing the tidy string.
I used Linux find + grep cmds find in all php files the "tidy "string, like so:

server:~# find . -iname '*.php'-exec grep -rli 'tidy' '{}' ;
find . -iname '*.php' -exec grep -rli 'tidy' '{}' ; ./new_design/modules/index.mod.php
./modules/index.mod.php
./modules/index_1.mod.php
./modules/index1.mod.php

Opening the files, with vim to check about how tidy is invoked, revealed tidy calls like:

exec('/usr/bin/tidy -e -ashtml -utf8 '.$tmp_name,$rett);

As you see the PHP programmers who wrote this website, made a bigtidy mess. Instead of using php5's tidy module, they hard coded tidy external command to be invoked via php's exec(); external tidy command invocation.
This is extremely bad practice, since it spawns the command via a pseudo limited apache shell.
I've notified about the issue, but I don't know when, the external tidy calls will be rewritten.

Until the external tidy invocations are rewritten to use the php tidy module, I decided to at least remove the tidy warnings and errors output.

To remove the warning and error messages I've changed:

exec('/usr/bin/tidy -e -ashtml -utf8 '.$tmp_name,$rett);

exec('/usr/bin/tidy --show-warnings no --show-errors no -q -e -ashtml -utf8 '.$tmp_name,$rett);

The extra switches meaning is like so:

q – instructs tidy to produce quiet output
-e – show only errors and warnings
–show warnings no && –show errors no, completely disable warnings and error output

Onwards tidy no longer logs junk messages in error.log Not logging all this useless warnings and errors has positive effect on overall server performance especially, when the scripts, running /usr/bin/tidy are called as frequently as 1000 times per sec. or more

Block Web server over loading Bad Crawler Bots and Search Engine Spiders with .htaccess rules

Monday, September 18th, 2017

Reading Time: 6minutes

howto-block-webserver-overloading-bad-crawler-bots-spiders-with-htaccess-modrewrite-rules-file

In last post, I've talked about the problem of Search Index Crawler Robots aggressively crawling websitesand how to stop them (the article is here) explaning how to raise delays between Bot URL requests to website and how to completely probhit some bots from crawling with robots.txt.

As explained in article the consequence of too many badly written or agressive behaviour Spider is the "server stoning" and therefore degraded Web Server performance as a cause or even a short time Denial of Service Attack, depending on how well was the initial Server Scaling done.

The bots we want to filter are not to be confused with the legitimate bots, that drives real traffic to your website, just for information

 The 10 Most Popular WebCrawlers Bots as of time of writting are:
 

1. GoogleBot (The Google Crawler bots, funnily bots become less active on Saturday and Sundays :))

2. BingBot (Bing.com Crawler bots)

3. SlurpBot (also famous as Yahoo! Slurp)

4. DuckDuckBot (The dutch search engine duckduckgo.com crawler bots)

5. Baiduspider (The Chineese most famous search engine used as a substitute of Google in China)

6. YandexBot (Russian Yandex Search engine crawler bots used in Russia as a substitute for Google )

7. Sogou Spider (leading Chineese Search Engine launched in 2004)

8. Exabot (A French Search Engine, launched in 2000, crawler for ExaLead Search Engine)

9. FaceBot (Facebook External hit, this crawler is crawling a certain webpage only once the user shares or paste link with video, music, blog whatever  in chat to another user)

10. Alexa Crawler (la_archiver is a web crawler for Amazon's Alexa Internet Rankings, Alexa is a great site to evaluate the approximate page popularity on the internet, Alexa SiteInfo page has historically been the Swift Army knife for anyone wanting to quickly evaluate a webpage approx. ranking while compared to other pages)

Above legitimate bots are known to follow most if not all of W3C – World Wide Web Consorium (W3.Org) standards and therefore, they respect the content commands for allowance or restrictions on a single site as given from robots.txt but unfortunately many of the so called Bad-Bots or Mirroring scripts that are burning your Web Server CPU and Memory mentioned in previous article are either not following /robots.txt prescriptions completely or partially.

Hence with the robots.txt unrespective bots, the case the only way to get rid of most of the webspiders that are just loading your bandwidth and server hardware is to filter / block them is by using Apache's mod_rewrite through

 

.htaccess


file

Create if not existing in the DocumentRoot of your website .htaccess file with whatever text editor, or create it your windows / mac os desktop and transfer via FTP / SecureFTP to server.

I prefer to do it directly on server with vim (text editor)

 

 

vim /var/www/sites/your-domain.com/.htaccess

 

RewriteEngine On

IndexIgnore .htaccess */.??* *~ *# */HEADER* */README* */_vti*

SetEnvIfNoCase User-Agent "^Black Hole” bad_bot
SetEnvIfNoCase User-Agent "^Titan bad_bot
SetEnvIfNoCase User-Agent "^WebStripper" bad_bot
SetEnvIfNoCase User-Agent "^NetMechanic" bad_bot
SetEnvIfNoCase User-Agent "^CherryPicker" bad_bot
SetEnvIfNoCase User-Agent "^EmailCollector" bad_bot
SetEnvIfNoCase User-Agent "^EmailSiphon" bad_bot
SetEnvIfNoCase User-Agent "^WebBandit" bad_bot
SetEnvIfNoCase User-Agent "^EmailWolf" bad_bot
SetEnvIfNoCase User-Agent "^ExtractorPro" bad_bot
SetEnvIfNoCase User-Agent "^CopyRightCheck" bad_bot
SetEnvIfNoCase User-Agent "^Crescent" bad_bot
SetEnvIfNoCase User-Agent "^Wget" bad_bot
SetEnvIfNoCase User-Agent "^SiteSnagger" bad_bot
SetEnvIfNoCase User-Agent "^ProWebWalker" bad_bot
SetEnvIfNoCase User-Agent "^CheeseBot" bad_bot
SetEnvIfNoCase User-Agent "^Teleport" bad_bot
SetEnvIfNoCase User-Agent "^TeleportPro" bad_bot
SetEnvIfNoCase User-Agent "^MIIxpc" bad_bot
SetEnvIfNoCase User-Agent "^Telesoft" bad_bot
SetEnvIfNoCase User-Agent "^Website Quester" bad_bot
SetEnvIfNoCase User-Agent "^WebZip" bad_bot
SetEnvIfNoCase User-Agent "^moget/2.1" bad_bot
SetEnvIfNoCase User-Agent "^WebZip/4.0" bad_bot
SetEnvIfNoCase User-Agent "^WebSauger" bad_bot
SetEnvIfNoCase User-Agent "^WebCopier" bad_bot
SetEnvIfNoCase User-Agent "^NetAnts" bad_bot
SetEnvIfNoCase User-Agent "^Mister PiX" bad_bot
SetEnvIfNoCase User-Agent "^WebAuto" bad_bot
SetEnvIfNoCase User-Agent "^TheNomad" bad_bot
SetEnvIfNoCase User-Agent "^WWW-Collector-E" bad_bot
SetEnvIfNoCase User-Agent "^RMA" bad_bot
SetEnvIfNoCase User-Agent "^libWeb/clsHTTP" bad_bot
SetEnvIfNoCase User-Agent "^asterias" bad_bot
SetEnvIfNoCase User-Agent "^httplib" bad_bot
SetEnvIfNoCase User-Agent "^turingos" bad_bot
SetEnvIfNoCase User-Agent "^spanner" bad_bot
SetEnvIfNoCase User-Agent "^InfoNaviRobot" bad_bot
SetEnvIfNoCase User-Agent "^Harvest/1.5" bad_bot
SetEnvIfNoCase User-Agent "Bullseye/1.0" bad_bot
SetEnvIfNoCase User-Agent "^Mozilla/4.0 (compatible; BullsEye; Windows 95)" bad_bot
SetEnvIfNoCase User-Agent "^Crescent Internet ToolPak HTTP OLE Control v.1.0" bad_bot
SetEnvIfNoCase User-Agent "^CherryPickerSE/1.0" bad_bot
SetEnvIfNoCase User-Agent "^CherryPicker /1.0" bad_bot
SetEnvIfNoCase User-Agent "^WebBandit/3.50" bad_bot
SetEnvIfNoCase User-Agent "^NICErsPRO" bad_bot
SetEnvIfNoCase User-Agent "^Microsoft URL Control – 5.01.4511" bad_bot
SetEnvIfNoCase User-Agent "^DittoSpyder" bad_bot
SetEnvIfNoCase User-Agent "^Foobot" bad_bot
SetEnvIfNoCase User-Agent "^WebmasterWorldForumBot" bad_bot
SetEnvIfNoCase User-Agent "^SpankBot" bad_bot
SetEnvIfNoCase User-Agent "^BotALot" bad_bot
SetEnvIfNoCase User-Agent "^lwp-trivial/1.34" bad_bot
SetEnvIfNoCase User-Agent "^lwp-trivial" bad_bot
SetEnvIfNoCase User-Agent "^Wget/1.6" bad_bot
SetEnvIfNoCase User-Agent "^BunnySlippers" bad_bot
SetEnvIfNoCase User-Agent "^Microsoft URL Control – 6.00.8169" bad_bot
SetEnvIfNoCase User-Agent "^URLy Warning" bad_bot
SetEnvIfNoCase User-Agent "^Wget/1.5.3" bad_bot
SetEnvIfNoCase User-Agent "^LinkWalker" bad_bot
SetEnvIfNoCase User-Agent "^cosmos" bad_bot
SetEnvIfNoCase User-Agent "^moget" bad_bot
SetEnvIfNoCase User-Agent "^hloader" bad_bot
SetEnvIfNoCase User-Agent "^humanlinks" bad_bot
SetEnvIfNoCase User-Agent "^LinkextractorPro" bad_bot
SetEnvIfNoCase User-Agent "^Offline Explorer" bad_bot
SetEnvIfNoCase User-Agent "^Mata Hari" bad_bot
SetEnvIfNoCase User-Agent "^LexiBot" bad_bot
SetEnvIfNoCase User-Agent "^Web Image Collector" bad_bot
SetEnvIfNoCase User-Agent "^The Intraformant" bad_bot
SetEnvIfNoCase User-Agent "^True_Robot/1.0" bad_bot
SetEnvIfNoCase User-Agent "^True_Robot" bad_bot
SetEnvIfNoCase User-Agent "^BlowFish/1.0" bad_bot
SetEnvIfNoCase User-Agent "^JennyBot" bad_bot
SetEnvIfNoCase User-Agent "^MIIxpc/4.2" bad_bot
SetEnvIfNoCase User-Agent "^BuiltBotTough" bad_bot
SetEnvIfNoCase User-Agent "^ProPowerBot/2.14" bad_bot
SetEnvIfNoCase User-Agent "^BackDoorBot/1.0" bad_bot
SetEnvIfNoCase User-Agent "^toCrawl/UrlDispatcher" bad_bot
SetEnvIfNoCase User-Agent "^WebEnhancer" bad_bot
SetEnvIfNoCase User-Agent "^TightTwatBot" bad_bot
SetEnvIfNoCase User-Agent "^suzuran" bad_bot
SetEnvIfNoCase User-Agent "^VCI WebViewer VCI WebViewer Win32" bad_bot
SetEnvIfNoCase User-Agent "^VCI" bad_bot
SetEnvIfNoCase User-Agent "^Szukacz/1.4" bad_bot
SetEnvIfNoCase User-Agent "^QueryN Metasearch" bad_bot
SetEnvIfNoCase User-Agent "^Openfind data gathere" bad_bot
SetEnvIfNoCase User-Agent "^Openfind" bad_bot
SetEnvIfNoCase User-Agent "^Xenu’s Link Sleuth 1.1c" bad_bot
SetEnvIfNoCase User-Agent "^Xenu’s" bad_bot
SetEnvIfNoCase User-Agent "^Zeus" bad_bot
SetEnvIfNoCase User-Agent "^RepoMonkey Bait & Tackle/v1.01" bad_bot
SetEnvIfNoCase User-Agent "^RepoMonkey" bad_bot
SetEnvIfNoCase User-Agent "^Zeus 32297 Webster Pro V2.9 Win32" bad_bot
SetEnvIfNoCase User-Agent "^Webster Pro" bad_bot
SetEnvIfNoCase User-Agent "^EroCrawler" bad_bot
SetEnvIfNoCase User-Agent "^LinkScan/8.1a Unix" bad_bot
SetEnvIfNoCase User-Agent "^Keyword Density/0.9" bad_bot
SetEnvIfNoCase User-Agent "^Kenjin Spider" bad_bot
SetEnvIfNoCase User-Agent "^Cegbfeieh" bad_bot

 

<Limit GET POST>
order allow,deny
allow from all
Deny from env=bad_bot
</Limit>

 


Above rules are Bad bots prohibition rules have RewriteEngine On directive included however for many websites this directive is enabled directly into VirtualHost section for domain/s, if that is your case you might also remove RewriteEngine on from .htaccess and still the prohibition rules of bad bots should continue to work
Above rules are also perfectly suitable wordpress based websites / blogs in case you need to filter out obstructive spiders even though the rules would work on any website domain with mod_rewrite enabled.

Once you have implemented above rules, you will not need to restart Apache, as .htaccess will be read dynamically by each client request to Webserver

2. Testing .htaccess Bad Bots Filtering Works as Expected


In order to test the new Bad Bot filtering configuration is working properly, you have a manual and more complicated way with lynx (text browser), assuming you have shell access to a Linux / BSD / *Nix computer, or you have your own *NIX server / desktop computer running
 

Here is how:
 

 

lynx -useragent="Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)" -head -dump http://www.your-website-filtering-bad-bots.com/

 

 

Note that lynx will provide a warning such as:

Warning: User-Agent string does not contain "Lynx" or "L_y_n_x"!

Just ignore it and press enter to continue.

Two other use cases with lynx, that I historically used heavily is to pretent with Lynx, you're GoogleBot in order to see how does Google actually see your website?
 

  • Pretend with Lynx You're GoogleBot

 

lynx -useragent="Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -head -dump http://www.your-domain.com/

 

 

  • How to Pretend with Lynx Browser You are GoogleBot-Mobile

 

lynx -useragent="Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_1 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8B117 Safari/6531.22.7 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)" -head -dump http://www.your-domain.com/

 


Or for the lazy ones that doesn't have Linux / *Nix at disposal you can use WannaBrowser website

Wannabrowseris a web based browser emulator which gives you the ability to change the User-Agent on each website req1uest, so just set your UserAgent to any bot browser that we just filtered for example set User-Agent to CheeseBot

The .htaccess rule earier added once detecting your browser client is coming in with the prohibit browser agent will immediately filter out and you'll be unable to access the website with a message like:
 

HTTP/1.1 403 Forbidden

 

Just as I've talked a lot about Index Bots, I think it is worthy to also mention three great websites that can give you a lot of Up to Date information on exact Spiders returned user-agent, common known Bot traits as well as a a current updated list with the Bad Bots etc.

Bot and Browser Resources information user-agents, bad-bots and odd Crawlers and Bots specifics

1. botreports.com
2. user-agents.org
3. useragentapi.com

 

An updated list with robots user-agents (crawler-user-agents) is also available in github here regularly updated by Caia Almeido

There are also a third party plugin (modules) available for Website Platforms like WordPress / Joomla / Typo3 etc.

Besides the listed on these websites as well as the known Bad and Good Bots, there are perhaps a hundred of others that might end up crawling your webdsite that might or might not need  to be filtered, therefore before proceeding with any filtering steps, it is generally a good idea to monitor your  HTTPDaccess.log / error.log, as if you happen to somehow mistakenly filter the wrong bot this might be a reason for WebsiteIndexing Problems.

Hope this article give you some valueable information. Enjoy ! 🙂

 

rc.local missing in Debian 8 Jessie and Debian 9 Stretch and newer Ubuntu 16, Fedora, CentOS Linux – Why is /etc/rc.local not working and how to make it work again

Monday, September 11th, 2017

Reading Time: 3minutes

rc.local-not-working-solve-fix-linux-startup-with-rc.local-explained-how-to-make-rc.local-working-again-on-newer-linux-distributions

If you have installed a newer version of Debian GNU / Linux such as Debian Jessie or Debian  9 Stretch or Ubuntu 16 Xenial Xerus either on a server or on a personal Desktop laptop and you want tto execute a number of extra commands next to finalization of system boot just like we GNU / Linux users used to do already for the rest 25+ years you will be surprised that /etc/rc.local is no longer available (file is completely missing!!!).

This kind of behaviour (to avoid use of /etc/rc.local and make the file not present by default right after Linux OS install) was evident across many RedHack (Redhat) distributions such as Fedora and CentOS Linux for the last number of releases and the tendency was to also happen in Debian based distros too as it often does, however there was a possibility on this RPM based distros as well as rest of Linux distros to have the /etc/rc.local manually created to work around the missing file.

But NOoooo, the smart new generation GNU / Linux architects with large brains decided to completely wipe out the execution on Linux boot of /etc/rc.local from finalization stage, SMART isn't it??

For instance If you used to eat certain food for the last 25+ years and they suddenly prohibit you to eat it because they say this is not necessery anymore how would you feel?? Crazy isn't it??

Yes I understand the idea to wipe out /etc/rc.local did have a reason as the developers are striving to constanly improve the boot speed process (and the introduction of systemd(system and service manager) in Debian 8 Jessie over the past years did changed significantly on how Linux boots (earlier used SysV boot and LSB – linux standard based init scripts), but come on guys /etc/rc.local
doesn't stone the boot process with minutes, including it will add just 2, 3 seconds extra to boot runtime, so why on earth did you decided to remove it??

What I really loved about Linux through the years was the high level of consistency and inter-operatibility, most things worked just the same way across distributions and there was some logic upgrade, but lately this kind of behaviour is changing so in many of the new things in both GUI and text mode (console) way to interact with a GNU / Linux PC all becomes messy sadly …

So the smart guys who develop Gnu / Linux distros said its time to depreciate /etc/rc.local to prevent the user to be able to execute his set of finalization commands at the end of each booted multiuser runlevel.

The good news is you can bring back (resurrect) /etc/rc.local really easy:

To so, just execute the following either in Physical /dev/tty Console or in Gnome-Terminal (for GNOME users) or for KDE GUI environment users in KDE's terminal emulator konsole:

 


cat <<EOF >/etc/rc.local
#!/bin/sh -e
#
# rc.local
#
# This script is executed at the end of each multiuser runlevel.
# Make sure that the script will "exit 0" on success or any other
# value on error.
#
# In order to enable or disable this script just change the execution
# bits.
#
# By default this script does nothing.

exit 0
EOF
chmod +x /etc/rc.local
systemctl start rc-local
systemctl status rc-local


I think above is self-explanatory /etc/rc.local file is being created and then to enable it we run systemctl start rc-local and then to check the just run rc-local service status systemctl status

You will get an output similar to below:
 

 

root@jericho:/home/hipo# systemctl start rc-local
root@jericho:/home/hipo# systemctl status rc-local
● rc-local.service – /etc/rc.local Compatibility
   Loaded: loaded (/lib/systemd/system/rc-local.service; static; vendor preset:
  Drop-In: /lib/systemd/system/rc-local.service.d
           └─debian.conf
   Active: active (exited) since Mon 2017-09-11 13:15:35 EEST; 6s ago
  Process: 5008 ExecStart=/etc/rc.local start (code=exited, status=0/SUCCESS)
    Tasks: 0 (limit: 4915)
   CGroup: /system.slice/rc-local.service
sep 11 13:15:35 jericho systemd[1]: Starting /etc/rc.local Compatibility…
setp 11 13:15:35 jericho systemd[1]: Started /etc/rc.local Compatibility.

To test /etc/rc.local is working as expected you can add to print any string on boot, right before exit 0 command in /etc/rc.local

you can add for example:
 

echo "YES, /etc/rc.local IS NOW AGAIN WORKING JUST LIKE IN EARLIER LINUX DISTRIBUTIONS!!! HOORAY !!!!";


On CentOS 7 and Fedora 18 codename (Spherical Cow) or other RPM based Linux distro if /etc/rc.local is missing you can follow very similar procedures to have it enabled, make sure

/etc/rc.d/rc.local

is existing

and /etc/rc.local is properly symlined to /etc/rc.d/rc.local

Also don't forget to check whether /etc/rc.d/rc.local is set to be executable file with ls -al /etc/rc.d/rc.local

If it is not executable, make it be by running cmd:
 

chmod a+x /etc/rc.d/rc.local


If file /etc/rc.d/rc.local happens to be missing just create it with following content:

 

#!/bin/sh

# Your boot time rc.commands goes somewhere below and above before exit 0

exit 0


That's all folks rc.local not working is solved,
enjoy /etc/rc.local working again 🙂

 

How to install nginx webserver from source on Debian Linux / Install Latest Nginx on Debian

Wednesday, March 23rd, 2011

Reading Time: 6minutes

Nginx install server logo
If you're running a large website consisting of a mixture of php scripts, images and html. You probably have noticed that using just one Apache server to serve all the content is not that efficient

Each Apache child (I assume you're using Apache mpm prefork consumes approximately (20MB), this means that each client connection would consume 20 mb of your server memory.
This as you can imagine is truly a suicide in terms of memory. Each request for a picture, css or simple html file would ask Apache to fork another process and will consume (20mb of extra memory form your server mem capacity)!.

Taking in consideration all this notes and the need for some efficiency here, the administrator should normally think about dividing the processing of the so called static content from the dynamic content served on the server.

Apache is really a nice webserver software but with all the loaded modules to serve dynamic content, for instance php, cgi, python etc., it's becoming not the best solution for handling a (css, javascript, html, flv, avi, mov etc. files).

Even a plain Apache server installation without (libphp, mod_rewrite mod deflate etc.) is still not dealing efficiently enough with the aforementioned static files content

Here comes the question if Apache is not that quick and efficient in serving static files, what then? The answer is caching webserver! By caching the regular static content files, your website visitors will benefit by experiencing shorter webserver responce files in downloading static contents and therefore will generally hasten your website and improve the end user's experience.

There are plenty of caching servers out there, some are a proprietary software and some are free software.

However the three most popular servers out there for static file content serving are:

  • Squid,
  • Varnish
  • Nginx

In this article as you should have already found out by the article title I'll discuss Nginx

You might ask why exactly Nginx and not some of the other twos, well simply cause Squid is too complicated to configure and on the other hand does provide lower performance than Nginx. On the other hand Varnish is also a good solution for static file webserver, but I believe it is not tested enough. However I should mention that my experience with testing varnish on my own home router is quite good by so far.

If you're further interested into varhisn cache I would suggest you checkout www.varhisn-cache.org .

Now as I have said a few words about squid and varhisn let's proceed to the essence of the article and say few words about nginx

Here is a quote describing nginx in a short and good manner directly extracted from nginx.com

nginx [engine x] is a HTTP and reverse proxy server, as well as a mail proxy server written by Igor Sysoev. It has been running for more than five years on many heavily loaded Russian sites including Rambler (RamblerMedia.com). According to Netcraft nginx served or proxied 4.70% busiest sites in April 2010. Here are some of success stories: FastMail.FM, WordPress.com.

By default nginx is available ready to be installed in Debian via apt-get, however sadly enough the version available for install is pretty much outdated as of time of writting the nginx debian version in lenny's deb package repositories is 0.6.32-3+lenny3

This version was release about 2 years ago and is currently completely outdated, therefore I found it is not a good idea to use this old and probably slower release of nginx and I jumped further to install my nginx from source:
Nginx source installation actually is very simple on Linux platforms.

1. As a first step in order to be able to succeed with the install from source make sure your system you have installed the packages:

debian:~# apt-get install libpcre3 libpcre3-dev libpcrecpp0 libssl-dev zlib1g-dev build-essential

2. Secondly download latest nginx source code tarball

Check out on http://nginx.com/download the latest stable release of nginx and further issue the commands below:

debian:~# cd /usr/local/src
debian:/usr/local/src# wget http://nginx.org/download/nginx-0.9.6.tar.gz

3.Unarchive nginx source code

debian:/usr/local/src#tar -zxvvf nginx-0.9.6.tar.gz
...

The nginx server requirements for me wasn't any special so I proceeded and used the nginx ./configure script which is found in nginx-0.9.6

4. Compline nginx server

debian:/usr/local/src# cd nginx-0.9.6
debian:/usr/local/src/nginx-0.9.6# ./configure && make && make install
+ Linux 2.6.26-2-amd64 x86_64
checking for C compiler ... found
+ using GNU C compiler
+ gcc version: 4.3.2 (Debian 4.3.2-1.1)
checking for gcc -pipe switch ... found
...
...

The last lines printed by the nginx configure script are actually the major interesting ones for administration purposes the default complation options in my case were:

Configuration summary
+ using system PCRE library
+ OpenSSL library is not used
+ md5: using system crypto library
+ sha1 library is not used
+ using system zlib library

nginx path prefix: "/usr/local/nginx"
nginx binary file: "/usr/local/nginx/sbin/nginx"
nginx configuration prefix: "/usr/local/nginx/conf"
nginx configuration file: "/usr/local/nginx/conf/nginx.conf"
nginx pid file: "/usr/local/nginx/logs/nginx.pid"
nginx error log file: "/usr/local/nginx/logs/error.log"
nginx http access log file: "/usr/local/nginx/logs/access.log"
nginx http client request body temporary files: "client_body_temp"
nginx http proxy temporary files: "proxy_temp"
nginx http fastcgi temporary files: "fastcgi_temp"
nginx http uwsgi temporary files: "uwsgi_temp"
nginx http scgi temporary files: "scgi_temp"

If you want to setup nginx server to support ssl (https) and for instance install nginx to a different server path you can use some ./configure configuration options, for instance:

./configure –sbin-path=/usr/local/sbin –with-http_ssl_module

Now before you can start the nginx server, you should also set up the nginx init script;

5. Download and set a ready to use script with cmd:

debian:~# cd /etc/init.d
debian:/etc/init.d# wget https://pc-freak.net/files/nginx-init-script
debian:/etc/init.d# mv nginx-init-script nginx
debian:/etc/init.d# chmod +x nginx

6. Configure Nginx

Nginx is a really easy and simple server, just like the Russians, Simple but good!
By the way it's interesting to mention nginx has been coded by a Russian, so it's robust and hard as a rock as all the other Russian creations 🙂
Nginx configuration files in a default install as the one in my case are to be found in /usr/local/nginx/conf

In the nginx/conf directory you're about to find the following list of files which concern nginx server configurations:

deiban:/usr/local/nginx:~# ls -1
fastcgi.conf
fastcgi.conf.default
fastcgi_params
fastcgi_params.default
koi-utf
koi-win
mime.types
mime.types.default
nginx.conf
nginx.conf.default
scgi_params
scgi_params.default
uwsgi_params
uwsgi_params.default
win-utf

The .default files are just a copy of the ones without the .default extension and contain the default respective file directives.

In my case I'm not using fastcgi to serve perl or php scripts via nginx so I don't need to configure the fastcgi.conf and fastcgi_params files, the scgi_params and uwsgi_params conf files are actually files which contain nginx configuration directives concerning the use of nginx to process SSI (Server Side Include) scripts and therefore I skip configuring the SSI conf files.
koi-utf and koi-win are two files which usually you don't need to configure and aims the nginx server to support the UTF-8 character encoding and the mime.types conf is a file which has a number of mime types the nginx server will know how to handle.

Therefore after all being said the only file which needs to configured is nginx.conf

7. Edit /usr/local/nginx/conf/nginx.conf

debian:/usr/local/nginx:# vim /usr/local/nginx/conf/nginx.conf

Therein you will find the following default configuration:

#gzip on;

server {
listen 80;
server_name localhost;

#charset koi8-r;

#access_log logs/host.access.log main;

location / {
root html;
index index.html index.htm;
}
#error_page 404 /404.html;

# redirect server error pages to the static page /50x.html
#
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root html;
}

In the default configuration above you need to modify only the above block of code as follows:

server {
listen 80;
server_name yoursitedomain.com;

#charset koi8-r;

#access_log logs/access.log main;

location / {
root /var/www/yoursitedomain.com/html;
index index.html index.htm;
}

Change the yoursitedomain.com and /var/www/yoursitedomain.com/html with your directory and website destinations.

8. Start nginx server with nginx init script

debian:/usr/local/nginx:# /etc/init.d/nginx start
Starting nginx:

This should bring up the nginx server, if something is miss configured you will notice also some error messages, as you can see in my case in above init script output, thanksfully there are no error messages.
Note that you can also start nginx directly via invoking /usr/local/nginx/sbin/nginx binary

To check if the nginx server has properly started from the command line type:

debian:/usr/local/nginx:~# ps ax|grep -i nginx|grep -v grep
9424 ? Ss 0:00 nginx: master process /usr/local/nginx/sbin/nginx
9425 ? S 0:00 nginx: worker process

Another way to check if the web browser is ready to serve your website file conten,t you can directly access your website by pointing your browser to with http://yoursitedomain.com/, you should get your either your custom index.html file or the default nginx greeting Welcome to nginx

9. Add nginx server to start up during system boot up

debian:/usr/local/nginx:# /usr/sbin/update-rc.d -f nginx defaults

That's all now you have up and running nginx and your static file serving will require you much less system resources, than with Apache.
Hope this article was helpful to somebody, feedback on it is very welcome!

How to remove the meta generator Content (Joomla! – Copyright) in Joomla 1.5

Thursday, December 30th, 2010

Reading Time: < 1minute

Joomla-remove-meta-generator-content-to-hide-joomla-site-install
Do you wonder How to change <meta name="Generator" content="Joomla! – Copyright (C) 2005 – 2007 Open Source Matters. All rights reserved." /> in Joomla 1.5

If yes, Here is how I've just found to remove the:

 

in my Joomla installation.

I need to remove that as a part of making my website not to leak out that it runs on top of Joomla.

So here is how:

1. Go to your Joomla website main root directory
2. Edit /libraries/joomla/document/html/renderer/head.php
Look for line: 83 in the /libraries/joomla/document/html/renderer/head.php
There you will notice the code:

$strHtml .= $tab.'<meta name="generator" content="'.$document->getGenerator().'" />'.$lnEnd;

In order to remove the <meta name="generator" content="Joomla …." /> change the above code to something like:

$strHtml .= $tab.'<meta name="generator" content="My Custom Web site Generator name" />'.$lnEnd;

That's all now next time you refresh your website the content="Joomla! – Copyright (C) 2005 – 2009 Open Source Matters. All rights reserved." will be no more.
Cheers! 🙂

How to enable output compression (gzipfile content compression) in nginx webserver

Friday, April 8th, 2011

Reading Time: 2minutes
I have recently installed and configured a Debian Linux server with nginx
. Since then I’ve been testing around different ways to optimize the nginx performance.

In my nginx quest, one of the most crucial settings which dramatically improved the end client performance was enabling the so called output compression which in Apache based servers is also known as content gzip compression .
In Apache webservers the content gzip compression is provided by a server module called mod_deflate .

The output compression nginx settings saves a lot of bandwidth and though it adds up a bit more load to the server, the plain text files like html, xml, js and css’s download time reduces drasticly as they’re streamed to the browser in gzip compressed format.
This little improvement in download speed also does impact the overall end user browser experience and therefore improves the browsing speed experience with websites.

If you have already had experience nginx you already know it is a bit fastidious and you have to be very careful with it’s configuration, however thanksfully enabling the gzip compression was actually rather easier than I thought.

Here is what I added in my nginx config to enable output compression:

## Compression
gzip on;
gzip_buffers 16 8k;
gzip_comp_level 9;
gzip_http_version 1.1;
gzip_min_length 0;
gzip_vary on;

Important note here is that need to add this code in the nginx configuration block starting with:

http {
....
## Compression
gzip on;
gzip_buffers 16 8k;
gzip_comp_level 9;
gzip_http_version 1.1;
gzip_min_length 0;
gzip_vary on;

In order to load the gzip output compression as a next step you need to restart the nginx server, either by it’s init script if you use one or by killing the old nginx server instances and starting up the nginx server binary again:
I personally use an init script, so restarting nginx for me is done via the cmd:

debian:~# /etc/init.d/nginx restart
Restarting nginx: nginx.

Now to test if the output gzip compression is enabled for nginx, you can simply use telnet

hipo@linux:~$ telnet your-nginx-webserver-domain.com 80
Escape character is '^]'.

After the Escape character is set ‘^]’ appears on your screen type in the blank space:

HEAD / HTTP/1.0

and press enter twice.
The output which should follow should look like:


HTTP/1.1 200 OK
Server: nginx
Date: Fri, 08 Apr 2011 12:04:43 GMT
Content-Type: text/html
Content-Length: 13
Last-Modified: Tue, 22 Mar 2011 15:04:26 GMT
Connection: close
Vary: Accept-Encoding
Expires: Fri, 15 Apr 2011 12:04:43 GMT
Cache-Control: max-age=604800
Accept-Ranges: bytes

The whole transaction with telnet command issued and the nginx webserver output should look like so:

hipo@linux:~$ telnet your-nginx-webserver-domain.com 80
Trying xxx.xxx.xxx.xxx...
Connected to your-nginx-webserver-domain.com
.Escape character is '^]'.
HEAD / HTTP/1.0

HTTP/1.1 200 OK
Server: nginx
Date: Fri, 08 Apr 2011 12:04:43 GMT
Content-Type: text/html
Content-Length: 13
Last-Modified: Tue, 22 Mar 2011 15:04:26 GMT
Connection: close
Vary: Accept-Encoding
Expires: Fri, 15 Apr 2011 12:04:43 GMT
Cache-Control: max-age=604800
Accept-Ranges: bytes

The important message in the returned output which confirms your nginx output compression is properly configured is:

Vary: Accept-Encoding

If this message is returned by your nginx server, this means your nginx now will distribute it’s content to it’s clients in compressed format and apart from the browsing boost a lot of server and client bandwitdth will be saved.

check your food content additives on your Mobile Phone with e-additives (Etata)

Friday, June 4th, 2010

Reading Time: 2minutes
E-Additives J2ME application check your food contamination

In the present age it’s really modern for companies to cut costs and increase a foodproduct durabity and endurance using addition of specially crafted chemical compontents.
Most of which are starting with E and followed by a number for example E328 .
Though this is generally profitable for companies and is prolonging the food durability it’smaking the food less nourishing and more harmful or even sometimes toxic for us humans.
A good friend of mine Necroleak or as earlier known Pro-XeX has created a nice J2MEapplication for mobiles that has a database of most groups of E food chemical additives and is able to tell youif a certain E type like E329 for instance is belonging to which chemical additive group.
This is quite handy especially when you go for grocery to the city market and you have to buy a can of milk or some type ofcanned food.
In the european Union as well as in America, New Zealand, Australia and Israel the E number of the additives are encountered on every non-biological food label.
Hence it’s really helpful when you launch the E-additives application whileyou’re selecting your food and check the food additives E E labelling and therefore know what type of chemical you might swallow while eating the purchased food.
This type of behaviour is really smart and could have a positive impact on your physical health in a long term and help you select a food which is less chemical contaminated.More about the ETATA / E-Additives can be read on it’s official page
Some of the benefits of E-Additives as an application that it is really multi-platform oriented and is supposed to run on most mobile phones which include the J2ME Java Virtual machine
I decided to try the e-additives mobile software on my Nokia 9300i and I have to share the program installed and runs on the mobile quite nice, though the J2ME included with Nokia 9300i is currently quite outdated.

Here are some pictures of e-additives running my Nokia 9300i mobile:

E-additives logo screen Nokia 9300i
E-additives logo screen Nokia 9300i
The only downside of th e-additives on my nokia is that some pictures shown on the e-additives website are not appearing on my phone.
However since I can search in E-additives – E database the application is performing it’s original intention through enabling me to check how actually contaminated with chemical additives (preservative food additions) are my daily meals.

Adding Comments & Google search plugins to nanoblogger

Thursday, July 30th, 2009

Reading Time: 2minutes
Ever wondered how you can add comments option to nanoblogger just like any normal blogging software like wordpress do support? Cause I did, Today I wandered around in google a bit until I can make it.In order to make all this workable I choose the nbcom nanoblogger plugin.Here’s what led me to success.First prepare yourself to loose a couple of hours! It’s more complicated or at least it was more complicated than I expected to add a simple plugin as this.Second, You’ll need the following nbcom 1.1 archive .
Right after that you’ll need to untar it and rename hdrs.tmpl and consts.tmpl to *.php, edit them and remove all the unneeded “/” back slashes. Copy the content of nbcom-1.1 dir to the document root of your nanoblogger. Make sure to edit blog.conf and have all the necessary like BLOG_URL, BLOG_ADMIN=”session”, DB_PASS=”password_of_sql_user_session”, yes I forgot to mention that you’ll need to read the INSTALL file contained in the nbcom archive and follow the instructions for creation of user and database that nbcom would use in the future. There are described like 7+ files more you need to edit and if you’re lucky like I was you’ll have nanoblogger+comments support up and running! Hooray! Wish you Good luck.
Secondly probably you have wondered how to make nanoblogger has a normal search field just like any normal php based blogger software out there. It took me like an hour before I came to the enlightenment. First I had downloaded the google.sh script mirrored for you to make your life easier here .
Next copy the file to your nanoblogger plugins directory.
And last you have to edit the templates/main_index.htm file and add the content that is as commented in the google.sh file. If you’re lucky again you’ll have nanoblogger with this two valuable plugins working. EnjoyEND—–

Linux convert and read .mht (Microsoft html) file format. MHT format explained

Thursday, June 5th, 2014

Reading Time: 2minutes

linux-open-and-convert-mht-file-format-to-html-howto
If you're using Linux as a Desktop system sooner or later you will receive an email with instructions or an html page stored in .mht file format.
So what is mht?MHT is an webpage archive format (short for MIME HTML document). MHTML saves the Web page content and incorporates external resources, such as images, applets, Flash animations and so on, into HTML documents. Usually those .mht files were produced with Microsoft Internet Explorer – saving pages through:

File -> Save As (Save WebPage) dialog saves pages in .MHT.

To open those .mht files on Linux, where Firefox is availableadd theUNMHT FF Extension to browser. Besides allowing you to view MHT on Linux, whether some customer is requiring a copy of an HTML page in MHT, UNMHT allows you to also save complete web pages, including text and graphics, into a MHT file.
There is also support for Google Chrome browser for MHT opening and saving via a plugin called IETAB. But unfortunately IETAB is not supported in Linux.
Anyways IETAB is worthy to mention here as if your'e a Windows users and you want to browse pages compatible only with Internet Explorer, IETAB will emulates exactly IE by using IE rendering engine in Chrome  and supports Active X Controls. IETAB is a great extension for QA (web testers) using Windows for desktop who prefer to not use IE for security reasons. IETab supports IE6, IE7, IE8 and IE9.

Another way to convert .MHT content file into HTML is to use Linux KDE's mhttohtml tool.

linux-kde-converter-mhttohtml

Another approach to open .MHT files in Linux is to use Opera browser for Linux which has support for .MHT

Note that because MHT files could be storing potentially malicious content (like embedded Malware) it is always wise when opening MHT on Windows to assure you have scanned the file with Antivirus program. Often mails containing .MHT from unknown recipients are containing viruses or malware. Also links embedded into MHT file could easily expose you to spoof attacks. MHT files are encoded in combination of plain text MIMEs and BASE64 encoding scheme, MHT's mimetype is:

MIME type: message/rfc822