Posts Tagged ‘request’

Apache disable requests to not log to access.log Logfile through SetEnvIf and dontlog httpd variables

Monday, October 11th, 2021

apache-disable-certain-strings-from-logging-to-access-log-logo

Logging to Apache access.log is mostly useful as this is a great way to keep log on who visited your website and generate periodic statistics with tools such as Webalizer or Astats to keep track on your visitors and generate various statistics as well as see the number of new visitors as well most visited web pages (the pages which mostly are attracting your web visitors), once the log analysis tool generates its statistics, it can help you understand better which Web spiders visit your website the most (as spiders has a predefined) IP addresses, which can give you insight on various web spider site indexation statistics on Google, Yahoo, Bing etc. . Sometimes however either due to bugs in web spiders algorithms or inconsistencies in your website structure, some of the web pages gets double visited records inside the logs, this could happen for example if your website uses to include iframes.

Having web pages accessed once but logged to be accessed twice hence is erroneous and unwanted, and though that usually have to be fixed by the website programmers, if such approach is not easily doable in the moment and the website is running on critical production system, the double logging of request can be omitted thanks to a small Apache log hack with SetEnvIf Apache config directive. Even if there is no double logging inside Apache log happening it could be that some cron job or automated monitoring scripts or tool such as monit is making periodic requests to Apache and this is garbling your Log Statistics results.

In this short article hence I'll explain how to do remove certain strings to not get logged inside /var/log/httpd/access.log.

1. Check SetEnvIf is Loaded on the Webserver
 

On CentOS / RHEL Linux:

# /sbin/apachectl -M |grep -i setenvif
AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using localhost.localdomain. Set the 'ServerName' directive globally to suppress this message
 setenvif_module (shared)


On Debian / Ubuntu Linux:

/usr/sbin/apache2ctl -M |grep -i setenvif
AH00548: NameVirtualHost has no effect and will be removed in the next release /etc/apache2/sites-enabled/000-default.conf:1
 setenvif_module (shared)


2. Using SetEnvIf to omit certain string to get logged inside apache access.log


SetEnvIf could be used either in some certain domain VirtualHost configuration (if website is configured so), or it can be set as a global Apache rule from the /etc/httpd/conf/httpd.conf 

To use SetEnvIf  you have to place it inside a <Directory …></Directory> configuration block, if it has to be enabled only for a Certain Apache configured directory, otherwise you have to place it in the global apache config section.

To be able to use SetEnvIf, only in a certain directories and subdirectories via .htaccess, you will have defined in <Directory>

AllowOverride FileInfo


The general syntax to omit a certain Apache repeating string from keep logging with SetEnvIf is as follows:
 

SetEnvIf Request_URI "^/WebSiteStructureDirectory/ACCESS_LOG_STRING_TO_REMOVE$" dontlog


General syntax for SetEnvIf is as follows:

SetEnvIf attribute regex env-variable

SetEnvIf attribute regex [!]env-variable[=value] [[!]env-variable[=value]] …

Below is the overall possible attributes to pass as described in mod_setenvif official documentation.
 

  • Host
  • User-Agent
  • Referer
  • Accept-Language
  • Remote_Host: the hostname (if available) of the client making the request.
  • Remote_Addr: the IP address of the client making the request.
  • Server_Addr: the IP address of the server on which the request was received (only with versions later than 2.0.43).
  • Request_Method: the name of the method being used (GET, POST, etc.).
  • Request_Protocol: the name and version of the protocol with which the request was made (e.g., "HTTP/0.9", "HTTP/1.1", etc.).
  • Request_URI: the resource requested on the HTTP request line – generally the portion of the URL following the scheme and host portion without the query string.

Next locate inside the configuration the line:

CustomLog /var/log/apache2/access.log combined


To enable filtering of included strings, you'll have to append env=!dontlog to the end of line.

 

CustomLog /var/log/apache2/access.log combined env=!dontlog

 

You might be using something as cronolog for log rotation to prevent your WebServer logs to become too big in size and hard to manage, you can append env=!dontlog to it in same way.

If you haven't used cronolog is it is perhaps best to show you the package description.

server:~# apt-cache show cronolog|grep -i description -A10 -B5
Version: 1.6.2+rpk-2
Installed-Size: 63
Maintainer: Debian QA Group <packages@qa.debian.org>
Architecture: amd64
Depends: perl:any, libc6 (>= 2.4)
Description-en: Logfile rotator for web servers
 A simple program that reads log messages from its input and writes
 them to a set of output files, the names of which are constructed
 using template and the current date and time.  The template uses the
 same format specifiers as the Unix date command (which are the same
 as the standard C strftime library function).
 .
 It intended to be used in conjunction with a Web server, such as
 Apache, to split the access log into daily or monthly logs:
 .
   TransferLog "|/usr/bin/cronolog /var/log/apache/%Y/access.%Y.%m.%d.log"
 .
 A cronosplit script is also included, to convert existing
 traditionally-rotated logs into this rotation format.

Description-md5: 4d5734e5e38bc768dcbffccd2547922f
Homepage: http://www.cronolog.org/
Tag: admin::logging, devel::lang:perl, devel::library, implemented-in::c,
 implemented-in::perl, interface::commandline, role::devel-lib,
 role::program, scope::utility, suite::apache, use::organizing,
 works-with::logfile
Section: web
Priority: optional
Filename: pool/main/c/cronolog/cronolog_1.6.2+rpk-2_amd64.deb
Size: 27912
MD5sum: 215a86766cc8d4434cd52432fd4f8fe7

If you're using cronolog to daily rotate the access.log and you need to filter out the strings out of the logs, you might use something like in httpd.conf:

 

CustomLog "|/usr/bin/cronolog –symlink=/var/log/httpd/access.log /var/log/httpd/access.log_%Y_%m_%d" combined env=!dontlog


 

3. Disable Apache logging access.log from certain USERAGENT browser
 

You can do much more with SetEnvIf for example you might want to omit logging requests from a UserAgent (browser) to end up in /dev/null (nowhere), e.g. prevent any Website requests originating from Internet Explorer (MSIE) to not be logged.

SetEnvIf User_Agent "(MSIE)" dontlog

CustomLog /var/log/apache2/access.log combined env=!dontlog


4. Disable Apache logging from requests coming from certain FQDN (Fully Qualified Domain Name) localhost 127.0.0.1 or concrete IP / IPv6 address

SetEnvIf Remote_Host "dns.server.com$" dontlog

CustomLog /var/log/apache2/access.log combined env=!dontlog


Of course for this to work, your website should have a functioning DNS servers and Apache should be configured to be able to resolve remote IPs to back resolve to their respective DNS defined Hostnames.

SetEnvIf recognized also perl PCRE Regular Expressions, if you want to filter out of Apache access log requests incoming from multiple subdomains starting with a certain domain hostname.

 

SetEnvIf Remote_Host "^example" dontlog

– To not log anything coming from localhost.localdomain address ( 127.0.0.1 ) as well as from some concrete IP address :

SetEnvIf Remote_Addr "127\.0\.0\.1" dontlog

SetEnvIf Remote_Addr "192\.168\.1\.180" dontlog

– To disable IPv6 requests that be coming at the log even though you don't happen to use IPv6 at all

SetEnvIf Request_Addr "::1" dontlog

CustomLog /var/log/apache2/access.log combined env=!dontlog


– Note here it is obligatory to escape the dots '.'


5. Disable robots.txt Web Crawlers requests from being logged in access.log

SetEnvIf Request_URI "^/robots\.txt$" dontlog

CustomLog /var/log/apache2/access.log combined env=!dontlog

Using SetEnvIfNoCase to read incoming useragent / Host / file requests case insensitve

The SetEnvIfNoCase is to be used if you want to threat incoming originators strings as case insensitive, this is useful to omit extraordinary regular expression SetEnvIf rules for lower upper case symbols.

SetEnvIFNoCase User-Agent "Slurp/cat" dontlog
SetEnvIFNoCase User-Agent "Ask Jeeves/Teoma" dontlog
SetEnvIFNoCase User-Agent "Googlebot" dontlog
SetEnvIFNoCase User-Agent "bingbot" dontlog
SetEnvIFNoCase Remote_Host "fastsearch.net$" dontlog

Omit from access.log logging some standard web files .css , .js .ico, .gif , .png and Referrals from own domain

Sometimes your own site scripts do refer to stuff on your own domain that just generates junks in the access.log to keep it off.

SetEnvIfNoCase Request_URI "\.(gif)|(jpg)|(png)|(css)|(js)|(ico)|(eot)$" dontlog

 

SetEnvIfNoCase Referer "www\.myowndomain\.com" dontlog

CustomLog /var/log/apache2/access.log combined env=!dontlog

 

6. Disable Apache requests in access.log and error.log completely


Sometimes at rare cases the produced Apache logs and error log is really big and you already have the requests logged in another F5 Load Balancer or Haproxy in front of Apache WebServer or alternatively the logging is not interesting at all as the Web Application served written in ( Perl / Python / Ruby ) does handle the logging itself. 
I've earlier described how this is done in a good amount of details in previous article Disable Apache access.log and error.log logging on Debian Linux and FreeBSD

To disable it you will have to comment out CustomLog or set it to together with ErrorLog to /dev/null in apache2.conf / httpd.conf (depending on the distro)
 

CustomLog /dev/null
ErrorLog /dev/null


7. Restart Apache WebServer to load settings
 

An important to mention is in case you have Webserver with multiple complex configurations and there is a specific log patterns to omit from logs it might be a very good idea to:

a. Create /etc/httpd/conf/dontlog.conf / etc/apache2/dontlog.conf
add inside all your custom dontlog configurations
b. Include dontlog.conf from /etc/httpd/conf/httpd.conf / /etc/apache2/apache2.conf

Finally to make the changes take affect, of course you will need to restart Apache webserver depending on the distro and if it is with systemd or System V:

For systemd RPM based distro:

systemctl restart httpd

or for Deb based Debian etc.

systemctl apache2 restart

On old System V scripts systems:

On RedHat / CentOS etc. restart Apache with:
 

/etc/init.d/httpd restart


On Deb based SystemV:
 

/etc/init.d/apache2 restart


What we learned ?
 

We have learned about SetEnvIf how it can be used to prevent certain requests strings getting logged into access.log through dontlog, how to completely stop certain browser based on a useragent from logging to the access.log as well as how to omit from logging certain requests incoming from certain IP addresses / IPv6 or FQDNs and how to stop robots.txt from being logged to httpd log.


Finally we have learned how to completely disable Apache logging if logging is handled by other external application.
 

Briefly unavailable for scheduled maintenance. Fix WordPress after interrupted upgrade

Thursday, March 2nd, 2017

briefly-unavaiable-for-a-scheduled-maintenance-wordpress-website-fix-howto_1

I've recenty tried Update my WordPress blog sites and being unattentive I've selected all the plugins possitble for Upgrade by checking the "Select All" check box on the Update dialogs and almost automatically int he hurry pressing Update button however out of a sudden I've realized I could screw up my websites brutally as some of the plugins to upgraded might be lacking 100% compitability with their prior versions.

I've made a messes out of my blog many times during upgrades because of choosing to upgrade the wrong not 100% compatible plugins and I know well how painful and hard to track it could be a misbehaving incompatible plugin or how ot could cause a severe sluggishness to blog which automatically reflects on how well the website search engine ranked in Google / Yahoo / Bing indexed etc.

Thus as an almost unconcsious reaction to prevent myself the future troubles I've tried to cancel the update request in Firefox browser and trying to reload the Update page with a hope that I might be quick enough for the Apache / WP / MySQLbackend Update Update queries request to be delaying for processing but I was too slow and bang! I ended up with the following unpleasent message in my browser:

briefly-unavaiable-for-a-scheduled-maintenance-wordpress-website-fix-howto
Briefly unavailable for scheduled maintenance.
 

As you could guess that message caused me quite a lot of worries at hand especially since I've already break up my sites many times by doing quick unmindful reactions and the fact that there is Google Adsense ads appearing which does give me some Return on Investment cents every now and then …

It took me few minutes of research online to find what really happened and how to fix / resolve the WebSites normal operations.
 


So what causes the Briefly unavailable for scheduled maintenance. appears ?

When WordPress does some of its integrated maintenance jobs a plugin enable / disable or any task that has to modify crucial configurations inside the database WordPress does disable access to all end clients to itself in order to protect its sensitive data to appear to browser requestors as showing some unexpected information to end client browser could be later used by crackers / hackers or a possibly open a security hole for an attacker.

The message is wordpress generated notice and it is pretty normal for the end user to see it during the WP site installation update depending on how many plugins are installed and loaded to the site and how long it will take for the backend Linux / Windows server to fetch the archived .zips of plugins and substitute with the new ones and update the files extracting them to wp-content/plugins and updating the respective required SQL database / tables it could be showing for end users from few secs to few minutes.

However under some circumstances on Browser request timeout to remote wordpress site due to a network connectivity issue or just a bad configuration of Apache for requests timeout (or a slow remote server Apache responce time due to server Hardware / Mem overload) or a stupid browser "Stop" / cancel request like in my case you end up with the Briefly unavailable for scheduled maintenance and you can can longer access the your https://siteurl.com/wp-admin Admin Panel.

The message is triggered by a WP craeted file .maintenance inside /var/www/blog-site/ e.g. WordPress PHP scripts does check for /var/www/blog-site/.maintenance
existence and if it is matched the WP scripts does generate the Briefly unavailble … message.


How to resolve the "Briefly unavailable for scheduled maintenance. Check back in a minute" WordPress error ?

As you might guess removing the maintenance "coming soon" like message in most of the cases comes to just deleting the .maintenance file, to do so:

1. Login to remote server via FTP or SFTP
2. Locate your WP website root folder that should be something like /var/www/blog-site/.maintenance and issue:
issue something like:
 

$ rm -rf /var/www/bog-site/.maintenance


Assuming that some plugin Update .zip extraction or SQL update query did not ended being half installed / executed that should solve the error.
To check whether all is back to normal just refresh your browser pointing to the "broken" site. If it appears well you can thank God for that 🙂
If not check the apache error logs and php error logs and see which of the php scripts is failing and then try to manually fetch and unzip the WP .zip package to wp-content/plugins folder and give it another try and if God bless so it will work as before 🙂

How to prevent your WP based business in future from such nasty errors using A Staging site (test) version of your blog ?

Just run a duplicate of your website under a separate folder on your hosting and do enable the same plugins as on the primary website and copy over the MySQL / PostgreSQL Database from your Live site to the Staging, then once it is enabled before doing any crucial WordPress version updates or Plugins Update always do try the Upgrade first on your Test Staging site. If it does execute fine there in most of the cases the result should be the same on the Production host and that could solve you effors and nerves of debugging a hard to get failure errors or faulty plugins without affecting what your End users see.

If you're not hosting the WordPress install under your own hosting like me you can always use some of the public available hostings like  BlueHost or WPEngine

 

How to renew self signed QMAIL toaster and QMAIL rocks expired SSL pem certificate

Friday, September 2nd, 2011

qmail_toaster_logo-fix-qmail-rocks-expired-ssl-pem-certificate

One of the QMAIL server installs, I have installed very long time ago. I've been notified by clients, that the certificate of the mail server has expired and therefore I had to quickly renew the certificate.

This qmail installation, SSL certificates were located in /var/qmail/control under the names servercert.key and cervercert.pem

Renewing the certificates with a new self signed ones is pretty straight forward, to renew them I had to issue the following commands:

1. Generate servercert encoded key with 1024 bit encoding

debian:~# cd /var/qmail/control
debian:/var/qmail/control# openssl genrsa -des3 -out servercert.key.enc 1024
Generating RSA private key, 1024 bit long modulus
...........++++++
.........++++++
e is 65537 (0x10001)
Enter pass phrase for servercert.key.enc:
Verifying - Enter pass phrase for servercert.key.enc:

In the Enter pass phrase for servercert.key.enc I typed twice my encoded key password, any password is good, here though using a stronger one is better.

2. Generate the servercert.key file

debian:/var/qmail/control# openssl rsa -in servercert.key.enc -out servercert.key
Enter pass phrase for servercert.key.enc:
writing RSA key

3. Generate the certificate request

debian:/var/qmail/control# openssl req -new -key servercert.key -out servercert.csr
debian:/var/qmail/control# openssl rsa -in servercert.key.enc -out servercert.key
Enter pass phrase for servercert.key.enc:writing RSA key
root@soccerfame:/var/qmail/control# openssl req -new -key servercert.key -out servercert.csr
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [AU]:UK
State or Province Name (full name) [Some-State]:London
Locality Name (eg, city) []:London
Organization Name (eg, company) [Internet Widgits Pty Ltd]:My Company
Organizational Unit Name (eg, section) []:My Org
Common Name (eg, YOUR name) []:
Email Address []:admin@adminmail.com

Please enter the following 'extra' attributes
to be sent with your certificate request
A challenge password []:
An optional company name []:

In the above prompts its necessery to fill in the company name and location, as each of the prompts clearly states.

4. Sign the just generated certificate request

debian:/var/qmail/control# openssl x509 -req -days 9999 -in servercert.csr -signkey servercert.key -out servercert.crt

Notice the option -days 9999 this option instructs the newly generated self signed certificate to be valid for 9999 days which is quite a long time, the reason why the previous generated self signed certificate expired was that it was built for only 365 days

5. Fix the newly generated servercert.pem permissions debian:~# cd /var/qmail/control
debian:/var/qmail/control# chmod 640 servercert.pem
debian:/var/qmail/control# chown vpopmail:vchkpw servercert.pem
debian:/var/qmail/control# cp -f servercert.pem clientcert.pem
debian:/var/qmail/control# chown root:qmail clientcert.pem
debian:/var/qmail/control# chmod 640 clientcert.pem

Finally to load the new certificate, restart of qmail is required:

6. Restart qmail server

debian:/var/qmail/control# qmailctl restart
Restarting qmail:
* Stopping qmail-smtpd.
* Sending qmail-send SIGTERM and restarting.
* Restarting qmail-smtpd.

Test the newly installed certificate

To test the newly installed SSL certificate use the following commands:

debian:~# openssl s_client -crlf -connect localhost:465 -quiet
depth=0 /C=UK/ST=London/L=London/O=My Org/OU=My Company/emailAddress=admin@adminmail.com
verify error:num=18:self signed certificate
verify return:1
...
debian:~# openssl s_client -starttls smtp -crlf -connect localhost:25 -quiet
depth=0 /C=UK/ST=London/L=London/O=My Org/OU=My Company/emailAddress=admin@adminmail.com
verify error:num=18:self signed certificate
verify return:1
250 AUTH LOGIN PLAIN CRAM-MD5
...

If an error is returned like 32943:error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol:s23_clnt.c:607: this means that SSL variable in the qmail-smtpdssl/run script is set to 0.

To solve this error, change SSL=0 to SSL=1 in /var/qmail/supervise/qmail-smtpdssl/run and do qmailctl restart

The error verify return:1 displayed is perfectly fine and it's more of a warning than an error as it just reports the certificate is self signed.

How to install nginx webserver from source on Debian Linux / Install Latest Nginx on Debian

Wednesday, March 23rd, 2011

Nginx install server logo
If you're running a large website consisting of a mixture of php scripts, images and html. You probably have noticed that using just one Apache server to serve all the content is not that efficient

Each Apache child (I assume you're using Apache mpm prefork consumes approximately (20MB), this means that each client connection would consume 20 mb of your server memory.
This as you can imagine is truly a suicide in terms of memory. Each request for a picture, css or simple html file would ask Apache to fork another process and will consume (20mb of extra memory form your server mem capacity)!.

Taking in consideration all this notes and the need for some efficiency here, the administrator should normally think about dividing the processing of the so called static content from the dynamic content served on the server.

Apache is really a nice webserver software but with all the loaded modules to serve dynamic content, for instance php, cgi, python etc., it's becoming not the best solution for handling a (css, javascript, html, flv, avi, mov etc. files).

Even a plain Apache server installation without (libphp, mod_rewrite mod deflate etc.) is still not dealing efficiently enough with the aforementioned static files content

Here comes the question if Apache is not that quick and efficient in serving static files, what then? The answer is caching webserver! By caching the regular static content files, your website visitors will benefit by experiencing shorter webserver responce files in downloading static contents and therefore will generally hasten your website and improve the end user's experience.

There are plenty of caching servers out there, some are a proprietary software and some are free software.

However the three most popular servers out there for static file content serving are:

  • Squid,
  • Varnish
  • Nginx

In this article as you should have already found out by the article title I'll discuss Nginx

You might ask why exactly Nginx and not some of the other twos, well simply cause Squid is too complicated to configure and on the other hand does provide lower performance than Nginx. On the other hand Varnish is also a good solution for static file webserver, but I believe it is not tested enough. However I should mention that my experience with testing varnish on my own home router is quite good by so far.

If you're further interested into varhisn cache I would suggest you checkout www.varhisn-cache.org .

Now as I have said a few words about squid and varhisn let's proceed to the essence of the article and say few words about nginx

Here is a quote describing nginx in a short and good manner directly extracted from nginx.com

nginx [engine x] is a HTTP and reverse proxy server, as well as a mail proxy server written by Igor Sysoev. It has been running for more than five years on many heavily loaded Russian sites including Rambler (RamblerMedia.com). According to Netcraft nginx served or proxied 4.70% busiest sites in April 2010. Here are some of success stories: FastMail.FM, WordPress.com.

By default nginx is available ready to be installed in Debian via apt-get, however sadly enough the version available for install is pretty much outdated as of time of writting the nginx debian version in lenny's deb package repositories is 0.6.32-3+lenny3

This version was release about 2 years ago and is currently completely outdated, therefore I found it is not a good idea to use this old and probably slower release of nginx and I jumped further to install my nginx from source:
Nginx source installation actually is very simple on Linux platforms.

1. As a first step in order to be able to succeed with the install from source make sure your system you have installed the packages:

debian:~# apt-get install libpcre3 libpcre3-dev libpcrecpp0 libssl-dev zlib1g-dev build-essential

2. Secondly download latest nginx source code tarball

Check out on http://nginx.com/download the latest stable release of nginx and further issue the commands below:

debian:~# cd /usr/local/src
debian:/usr/local/src# wget http://nginx.org/download/nginx-0.9.6.tar.gz

3.Unarchive nginx source code

debian:/usr/local/src#tar -zxvvf nginx-0.9.6.tar.gz
...

The nginx server requirements for me wasn't any special so I proceeded and used the nginx ./configure script which is found in nginx-0.9.6

4. Compline nginx server

debian:/usr/local/src# cd nginx-0.9.6
debian:/usr/local/src/nginx-0.9.6# ./configure && make && make install
+ Linux 2.6.26-2-amd64 x86_64
checking for C compiler ... found
+ using GNU C compiler
+ gcc version: 4.3.2 (Debian 4.3.2-1.1)
checking for gcc -pipe switch ... found
...
...

The last lines printed by the nginx configure script are actually the major interesting ones for administration purposes the default complation options in my case were:

Configuration summary
+ using system PCRE library
+ OpenSSL library is not used
+ md5: using system crypto library
+ sha1 library is not used
+ using system zlib library

nginx path prefix: "/usr/local/nginx"
nginx binary file: "/usr/local/nginx/sbin/nginx"
nginx configuration prefix: "/usr/local/nginx/conf"
nginx configuration file: "/usr/local/nginx/conf/nginx.conf"
nginx pid file: "/usr/local/nginx/logs/nginx.pid"
nginx error log file: "/usr/local/nginx/logs/error.log"
nginx http access log file: "/usr/local/nginx/logs/access.log"
nginx http client request body temporary files: "client_body_temp"
nginx http proxy temporary files: "proxy_temp"
nginx http fastcgi temporary files: "fastcgi_temp"
nginx http uwsgi temporary files: "uwsgi_temp"
nginx http scgi temporary files: "scgi_temp"

If you want to setup nginx server to support ssl (https) and for instance install nginx to a different server path you can use some ./configure configuration options, for instance:

./configure –sbin-path=/usr/local/sbin –with-http_ssl_module

Now before you can start the nginx server, you should also set up the nginx init script;

5. Download and set a ready to use script with cmd:

debian:~# cd /etc/init.d
debian:/etc/init.d# wget https://www.pc-freak.net/files/nginx-init-script
debian:/etc/init.d# mv nginx-init-script nginx
debian:/etc/init.d# chmod +x nginx

6. Configure Nginx

Nginx is a really easy and simple server, just like the Russians, Simple but good!
By the way it's interesting to mention nginx has been coded by a Russian, so it's robust and hard as a rock as all the other Russian creations 🙂
Nginx configuration files in a default install as the one in my case are to be found in /usr/local/nginx/conf

In the nginx/conf directory you're about to find the following list of files which concern nginx server configurations:

deiban:/usr/local/nginx:~# ls -1
fastcgi.conf
fastcgi.conf.default
fastcgi_params
fastcgi_params.default
koi-utf
koi-win
mime.types
mime.types.default
nginx.conf
nginx.conf.default
scgi_params
scgi_params.default
uwsgi_params
uwsgi_params.default
win-utf

The .default files are just a copy of the ones without the .default extension and contain the default respective file directives.

In my case I'm not using fastcgi to serve perl or php scripts via nginx so I don't need to configure the fastcgi.conf and fastcgi_params files, the scgi_params and uwsgi_params conf files are actually files which contain nginx configuration directives concerning the use of nginx to process SSI (Server Side Include) scripts and therefore I skip configuring the SSI conf files.
koi-utf and koi-win are two files which usually you don't need to configure and aims the nginx server to support the UTF-8 character encoding and the mime.types conf is a file which has a number of mime types the nginx server will know how to handle.

Therefore after all being said the only file which needs to configured is nginx.conf

7. Edit /usr/local/nginx/conf/nginx.conf

debian:/usr/local/nginx:# vim /usr/local/nginx/conf/nginx.conf

Therein you will find the following default configuration:

#gzip on;

server {
listen 80;
server_name localhost;

#charset koi8-r;

#access_log logs/host.access.log main;

location / {
root html;
index index.html index.htm;
}
#error_page 404 /404.html;

# redirect server error pages to the static page /50x.html
#
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root html;
}

In the default configuration above you need to modify only the above block of code as follows:

server {
listen 80;
server_name yoursitedomain.com;

#charset koi8-r;

#access_log logs/access.log main;

location / {
root /var/www/yoursitedomain.com/html;
index index.html index.htm;
}

Change the yoursitedomain.com and /var/www/yoursitedomain.com/html with your directory and website destinations.

8. Start nginx server with nginx init script

debian:/usr/local/nginx:# /etc/init.d/nginx start
Starting nginx:

This should bring up the nginx server, if something is miss configured you will notice also some error messages, as you can see in my case in above init script output, thanksfully there are no error messages.
Note that you can also start nginx directly via invoking /usr/local/nginx/sbin/nginx binary

To check if the nginx server has properly started from the command line type:

debian:/usr/local/nginx:~# ps ax|grep -i nginx|grep -v grep
9424 ? Ss 0:00 nginx: master process /usr/local/nginx/sbin/nginx
9425 ? S 0:00 nginx: worker process

Another way to check if the web browser is ready to serve your website file conten,t you can directly access your website by pointing your browser to with http://yoursitedomain.com/, you should get your either your custom index.html file or the default nginx greeting Welcome to nginx

9. Add nginx server to start up during system boot up

debian:/usr/local/nginx:# /usr/sbin/update-rc.d -f nginx defaults

That's all now you have up and running nginx and your static file serving will require you much less system resources, than with Apache.
Hope this article was helpful to somebody, feedback on it is very welcome!

How to make a mirror of website on GNU / Linux with wget / Few tips on wget site mirroring

Wednesday, February 22nd, 2012

how-to-make-mirror-of-website-on-linux-wget

Everyone who used Linux is probably familiar with wget or has used this handy download console tools at least thousand of times. Not so many Desktop GNU / Linux users like Ubuntu and Fedora Linux users had tried using wget to do something more than single files download.
Actually wget is not so popular as it used to be in earlier linux days. I've noticed the tendency for newer Linux users to prefer using curl (I don't know why).

With all said I'm sure there is plenty of Linux users curious on how a website mirror can be made through wget.
This article will briefly suggest few ways to do website mirroring on linux / bsd as wget is both available on those two free operating systems.

1. Most Simple exact mirror copy of website

The most basic use of wget's mirror capabilities is by using wget's -mirror argument:

# wget -m http://website-to-mirror.com/sub-directory/

Creating a mirror like this is not a very good practice, as the links of the mirrored pages will still link to external URLs. In other words link URL will not pointing to your local copy and therefore if you're not connected to the internet and try to browse random links of the webpage you will end up with many links which are not opening because you don't have internet connection.

2. Mirroring with rewritting links to point to localhost and in between download page delay

Making mirror with wget can put an heavy load on the remote server as it fetches the files as quick as the bandwidth allows it. On heavy servers rapid downloads with wget can significantly reduce the download server responce time. Even on a some high-loaded servers it can cause the server to hang completely.
Hence mirroring pages with wget without explicity setting delay in between each page download, could be considered by remote server as a kind of DoS – (denial of service) attack. Even some site administrators have already set firewall rules or web server modules configured like Apache mod_security which filter requests to IPs which are doing too frequent HTTP GET /POST requests to the web server.
To make wget delay with a 10 seconds download between mirrored pages use:

# wget -mk -w 10 -np --random-wait http://website-to-mirror.com/sub-directory/

The -mk stands for -m/-mirror and -k / shortcut argument for –convert-links (make links point locally), –random-wait tells wget to make random waits between o and 10 seconds between each page download request.

3. Mirror / retrieve website sub directory ignoring robots.txt "mirror restrictions"

Some websites has a robots.txt which restricts content download with clients like wget, curl or even prohibits, crawlers to download their website pages completely.

/robots.txt restrictions are not a problem as wget has an option to disable robots.txt checking when downloading.
Getting around the robots.txt restrictions with wget is possible through -e robots=off option.
For instance if you want to make a local mirror copy of the whole sub-directory with all links and do it with a delay of 10 seconds between each consequential page request without reading at all the robots.txt allow/forbid rules:

# wget -mk -w 10 -np -e robots=off --random-wait http://website-to-mirror.com/sub-directory/

4. Mirror website which is prohibiting Download managers like flashget, getright, go!zilla etc.

Sometimes when try to use wget to make a mirror copy of an entire site domain subdirectory or the root site domain, you get an error similar to:

Sorry, but the download manager you are using to view this site is not supported.
We do not support use of such download managers as flashget, go!zilla, or getright

This message is produced by the site dynamic generation language PHP / ASP / JSP etc. used, as the website code is written to check on the browser UserAgent sent.
wget's default sent UserAgent to the remote webserver is:
Wget/1.11.4

As this is not a common desktop browser useragent many webmasters configure their websites to only accept well known established desktop browser useragents sent by client browsers.
Here are few typical user agents which identify a desktop browser:
 

  • Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0
  • Mozilla/5.0 (X11; Linux i686; rv:6.0) Gecko/20100101 Firefox/6.0
  • Mozilla/6.0 (Macintosh; I; Intel Mac OS X 11_7_9; de-LI; rv:1.9b4) Gecko/2012010317 Firefox/10.0a4
  • Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:2.2a1pre) Gecko/20110324 Firefox/4.2a1pre

etc. etc.

If you're trying to mirror a website which has implied some kind of useragent restriction based on some "valid" useragent, wget has the -U option enabling you to fake the useragent.

If you get the Sorry but the download manager you are using to view this site is not supported , fake / change wget's UserAgent with cmd:

# wget -mk -w 10 -np -e robots=off \
--random-wait
--referer="http://www.google.com" \--user-agent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6" \--header="Accept:text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5" \--header="Accept-Language: en-us,en;q=0.5" \--header="Accept-Encoding: gzip,deflate" \--header="Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7" \--header="Keep-Alive: 300"

For the sake of some wget anonimity – to make wget permanently hide its user agent and pretend like a Mozilla Firefox running on MS Windows XP use .wgetrc like this in home directory.

5. Make a complete mirror of a website under a domain name

To retrieve complete working copy of a site with wget a good way is like so:

# wget -rkpNl5 -w 10 --random-wait www.website-to-mirror.com

Where the arguments meaning is:
-r – Retrieve recursively
-k – Convert the links in documents to make them suitable for local viewing
-p – Download everything (inline images, sounds and referenced stylesheets etc.)
-N – Turn on time-stamping
-l5 – Specify recursion maximum depth level of 5

6. Make a dynamic pages static site mirror, by converting CGI, ASP, PHP etc. to HTML for offline browsing

It is often websites pages are ending in a .php / .asp / .cgi … extensions. An example of what I mean is for instance the URL http://php.net/manual/en/tutorial.php. You see the url page is tutorial.php once mirrored with wget the local copy will also end up in .php and therefore will not be suitable for local browsing as .php extension is not understood how to interpret by the local browser.
Therefore to copy website with a non-html extension and make it offline browsable in HTML there is the –html-extension option e.g.:

# wget -mk -w 10 -np -e robots=off \
--random-wait \
--convert-links http://www.website-to-mirror.com

A good practice in mirror making is to set a download limit rate. Setting such rate is both good for UP and DOWN side (the local host where downloading and remote server). download-limit is also useful when mirroring websites consisting of many enormous files (documental movies, some music etc.).
To set a download limit to add –limit-rate= option. Passing by to wget –limit-rate=200K would limit download speed to 200KB.

Other useful thing to assure wget has made an accurate mirror is wget logging. To use it pass -o ./my_mirror.log to wget.
 

Speeding up Apache through apache2-mpm-worker and php5-cgi on Debian / How to improve Apache performance and decrease server memory consumption

Friday, March 18th, 2011

speeding up apache through apache2-mpm-worker and php5-cgi on Debian Linux / how to improve apache performance and decrease server responce time
By default most Apache running Linux servers on the Internet are configured to use with the mpm prefork apache module
Historically prefork apache module is the predecessor of the worker module therefore it's believed to be a way more tested and reliable, if you need a critical reliable webserver configuration.

However from my experience by so far with the Apache MPM Worker I can boldly say that many of the rumors concerning the unreliabity of apache2-mpm-worker are just myths.

The old way Apache handles connections e.g. the mod prefork is the well known way that high amount of the daemons on Linux and BSD are still realying on.
When prefork is a used by Apache, every new TCP/IP connection arriving at your Linux server on the Apache configured port let's say on port 80 is being served by Apache in a way that the Apache process (mother process) parent does fork a new Apache parent copy in order to serve the new request.
Thus by using the prefork Apache needs to fork new process (if it doesn't have already an empty forked one waiting for connections) and serve the HTTP request of the new client, after the request of the client is completed the newly forked Apache usually dies (even though it again depends on the way the Apache server is configured via the Apache configuration – apache2.conf / httpd.conf etc.).

Now you can imagine how slow and memory consuming it is that all the time the parent Apache process spawns new processes, kills old ones etc. in order to fulfill the client requests.

Now just to compare the Apace mpm prefork does not use the old forking way, but relies on a few Apache processes which handles all the requests without constantly being destroyed and recreated like with the prefork module.
This saves operations and system resources, threaded programming has already been proven to be more efficient way to handle tasks and is heavily adopted in GUI programming for instance in Microsoft Windows, Mac OS X, Linux Gnome, KDE etc.

There is plenty of information and statistical data which compares Apache running with prefork and respectively worker modules online.
As the goal of this article is not to went in depths with this kind of information I would not say more on it but let you explore online a bit more about them in case if you're interested.

The purpose of this article is to explain in short how to substitute the Apache2-MPM-Prefork and how your server performance could benefit out of the use of Apache2-MPM-Worker.
On Debian the default Apache process serving module in Apache 1.3x,Apache 2.0x and 2.2x is prefork thus the installation of apache2-mpm-worker is not "a standard way" to install Apache

Deciding to swith from the default Debian apache-mpm-prefork to apache-mpm-worker is quite a serious and responsible decision and in some cases might cause troubles, if you have decided to follow my article be sure to consider all the possible negative consequences of switching to the apache worker !

Now after having said a bunch of info which might be not necessary with the experienced system admin I'll continue on with the steps to install the apache2-mpm-worker.

1. Install the apache2-mpm-worker

debian:~# apt-get install apache2-mpm-worker php5-cgi
Reading state information... Done
The following packages were automatically installed and are no longer required:
The following packages will be REMOVED apache2-mpm-prefork libapache2-mod-php5
The following NEW packages will be installed apache2-mpm-worker
0 upgraded, 1 newly installed, 2 to remove and 46 not upgraded.
Need to get 0B/259kB of archives.After this operation, 6193kB disk space will be freed.

As you can notice in below's text confirmation which will appear you will have to remove the apache2-mpm-prefork and the apache2-mpm-worker modules before you can proceed to install the apache2-mpm-prefork.

You might ask yourself if I remove my installed libphp how would I be able to use my Apache with my PHP based websites? And why does the apt package manager requires the libapache2-mod-php5 to get removed.
The explanation is simple apache2-mpm-worker is not thread safe, in other words scripts which does use the php fork(); function would not work correctly with the Apache worker module and will probably be leading to PHP and Apache crashes.
Therefore in order to install the apache mod worker it's necessary that no libapache2-mod-php5 is existent on the system.
In order to have a PHP installed on the server again you will have to use the php5-cgi deb package, this is the reason in the above apt-get command I'm also requesting apt to install the php5-cgi package next to apache2-mpm-worker.

2. Enable the cgi and cgid apache modules

debian:~# a2enmod cgi
debian:~# a2enmod cgid

3. Activate the mod_actions apache modules

debian:~# cd /etc/apache2/mods-enabled
debian:~# ln -sf ../mods-available/actions.load
debian:~# ln -sf ../mods-available/actions.conf

4. Add configuration options in order to enable mod worker to use the newly installed php5-cgi

Edit /etc/apache2/mods-available/actions.conf vim, mcedit or nano (e.g. your editor of choice and add inside:

&ltIfModule mod_actions.c>
Action application/x-httpd-php /cgi-bin/php5
</IfModule>

After completing all the above instructions, you might also need to edit your /etc/apache2/apache2.conf to tune up, how your Apache mpm worker will serve client requests.
Configuring the <IfModule mpm_worker_module> in apache2.conf is necessary to optimize your newly installed mpm_worker module for performance.

5. Configure the mod_worker_module in apache2.conf One example configuration for the mod worker is:

<IfModule mpm_worker_module>
StartServers 2
MaxClients 150
MinSpareThreads 25
MaxSpareThreads 75
ThreadsPerChild 25
MaxRequestsPerChild 0
</IfModule>

Consider the fact that this configuration is just a sample and it's in no means configured for serving Apache requests for high load Apache servers and you need to further play with the values to have a good results on your server.

6. Check that all is fine with your Apache configurations and no syntax errors are encountered

debian:~# /usr/sbin/apache2ctl -t
Syntax OK

If you get something different from Syntax OK track the error and fix it before you're ready to restart the Apache server.

7. Now restart the Apache server

debian:~# /etc/init.d/apache2 restart

All should run fine and hopefully your PHP scripts should be interpreted just fine through the php5-cgi instead of the libapache2-mod-php5.
Using the /usr/bin/php5-cgi will increase with some percentage your server CPU load but on other hand will drasticly decrease the Webserver memory consumption.
That's quite logical because the libapache2-mod-hp5 is loaded once during apache server whether a new instance of /usr/bin/php5-cgi is invoked during each of Apache requests via the mod worker.

There is one serious security flow coming with php5-cgi, DoS against a server processing scripts through php5-cgi is much easier to be achieved.
An example for a denial attack which could affect a website running with mod worker and php5-cgi, could be simulated from a simple user with a web browser which holds up the f5 or ctrl + r browser page refresh buttons.
In that case whenever php5-cgi is used the CPU load would rise drastic, one possible solution to this denial of service issues is by installing and using libapache2-mod-evasive like so:

8. Install libapache2-mod-evasive

debian:~# apt-get install libapache2-mod-evasive
The Apache mod evasive module is a nice apache module to minimize HTTP DoS and brute force attacks.
Now with mod worker through the php5-cgi, your apache should start serving requests more efficiently than before.
For some performance reasons some might even want to try out the fastcgi with the worker to boost the Apache performance but as I have never tried that I can't say how reliable a a mod worker with a fastcgi would be.

N.B. ! If you have some specific php configurations within /etc/php5/apache2/php.ini you will have to set them also in /etc/php5/cgi/php.ini before you proceed with the above instructions to install Apache otherwise your PHP scripts might not work as expected.

Mod worker is also capable to work with the standard mod php5 Apache module, but if you decide to go this route you will have to recompile your PHP lib manually from source as in Debian this option is not possible with the default php library.
This installation worked fine on Debian Lenny but suppose the same installation should work fine on Debian Squeeze as well as Debian testing/unstable.
Feedback on the afore-described mod worker installation is very welcome!

Preserve Session IDs of Tomcat cluster behind Apache reverse proxy / Sticky sessions with mod_proxy and Tomcat

Wednesday, February 26th, 2014

apache_and_tomcat_merged_logo_prevent_sticky_sessions
Having a combination of Apache webservice Reverse Proxy to redirect invisibly traffic to a number of Tomcat server positioned in a DMZ is a classic task in big companies Corporate world.
Hence if you work for company like IBM or HP sooner or later you will need to configure Apache Webserver cluster with few running Jakarta Tomcat Application servers behind. Scenario with necessity to access a java based application via Tomcat which requires logging (authentication) relaying on establishing and keeping a session ID is probably one of the most common ones and if you do it for first time you will probably end up with Session ID issues.  Session ID issues are hard to capture at first as on first glimpse application will seem to be working but users will have to re-login all the time even though the programmers might have coded for a session to expiry in 30 minutes or so.

… I mean not having configured Session ID prevention to Tomcats will cause random authentication session expiries and users using the Tomcat app will be unable to normally access below application with authenticated credentials. The solution to these is known under term "Sticky sessions"
To configure Sticky sessions you need to already have configured Apache/s with following minimum configuration:

  • enabled mod_proxy, proxy_balancer_module, proxy_http_module and or mod_proxy_ajp (in Apache config)

  LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_balancer_module modules/mod_proxy_balancer.so
LoadModule proxy_http_module modules/mod_proxy_http.so

  • And configured and tested Tomcats running an Application reachable via AJP protocol

Below example assumes there is Reverse Proxy Load Balancer Apache which has to forward all traffic to 2 tomcats. The config can easily be extended for as many as necessary by adding more BalancerMembers.

In Apache webserver (apache2.conf / httpd.conf) you need to have JSESSIONID configured. These JSESSIONID is going to be appended to each client request from Reverse Proxy to each of Tomcat servers with value opened once on authentication to first Tomcat node to each of the other ones.

<Proxy balancer://mycluster>
BalancerMember ajp://10.16.166.53:11010/ route=delivery1
BalancerMember ajp://10.16.166.66:11010/ route=delivery2
</Proxy>

ProxyRequests Off
ProxyPass / balancer://mycluster/ stickysession=JSESSIONID
ProxyPassReverse / balancer://mycluster/

The two variables route=delivery1 and route=delivery2 are routed to hosts identificators that also has to be present in Tomcat server configurations
In Tomcat App server First Node (server.xml)

<Engine name="Catalina" defaultHost="localhost" jvmRoute="delivery1">

In Tomcat App server Second Node (server.xml)

<Engine name="Catalina" defaultHost="localhost" jvmRoute="delivery2">

Once Sticky Sessions are configured it is useful to be able to track they work fine this is possible through logging each of established JESSSIONIDs, to do so add in httpd.conf

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"\"%{JSESSIONID}C\"" combined

After modifications restart Apache and Tomcat to load new configs. In Apache access.log the proof should be the proof that sessions are preserved via JSESSIONID, there should be logs like:
 

127.0.0.1 - - [18/Sep/2013:10:02:02 +0800] "POST /examples/servlets/servlet/RequestParamExample HTTP/1.1" 200 662 "http://localhost/examples/servlets/servlet/RequestParamExample" "Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130807 Firefox/17.0""B80557A1D9B48EC1D73CF8C7482B7D46.server2"

127.0.0.1 - - [18/Sep/2013:10:02:06 +0800] "GET /examples/servlets/servlet/RequestInfoExample HTTP/1.1" 200 693 "http://localhost/examples/servlets/" "Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130807 Firefox/17.0""B80557A1D9B48EC1D73CF8C7482B7D46.server2"

That should solve problems with mysterious session expiries 🙂

How to resolve (fix) WordPress wp-cron.php errors like “POST /wp-cron.php?doing_wp_cron HTTP/1.0″ 404” / What is wp-cron.php and what it does

Monday, March 12th, 2012

fix wordpress wp-cron.php 404 HTTP error, what is wp-cron.php schedule logo

One of the WordPress websites hosted on our dedicated server produces all the time a wp-cron.php 404 error messages like:

xxx.xxx.xxx.xxx - - [15/Apr/2010:06:32:12 -0600] "POST /wp-cron.php?doing_wp_cron HTTP/1.0

I did not know until recently, whatwp-cron.php does, so I checked in google and red a bit. Many of the places, I've red are aa bit unclear and doesn't give good exlanation on what exactly wp-cron.php does. I wrote this post in hope it will shed some more light on wp-config.php and how this major 404 issue is solved..
So

what is wp-cron.php doing?

 

  • wp-cron.php is acting like a cron scheduler for WordPress.
  • wp-cron.php is a wp file that controls routine actions for particular WordPress install.
  • Updates the data in SQL database on every, request, every day or every hour etc. – (depending on how it's set up.).
  • wp-cron.php executes automatically by default after EVERY PAGE LOAD!
  • Checks all pending comments for spam with Akismet (if akismet or anti-spam plugin alike is installed)
  • Sends all scheduled emails (e.g. sent a commentor email when someone comments on his comment functionality, sent newsletter subscribed persons emails etc.)
  • Post online scheduled articles for a day and time of particular day

Suppose you're writting a new post and you want to take advantage of WordPress functionality to schedule a post to appear Online at specific time:

What is wordpress wp-cron.php, Scheduling wordpress post screenshot

The Publish Immediately, field execution is being issued on the scheduled time thanks to the wp-cron.php periodic invocation.

Another example for wp-cron.php operation is in handling flushing of WP old HTML Caches generated by some wordpress caching plugin like W3 Total Cache
wp-cron.php takes care for dozens of other stuff silently in the background. That's why many wordpress plugins are depending heavily on wp-cron.php proper periodic execution. Therefore if something is wrong with wp-config.php, this makes wordpress based blog or website partially working or not working at all.
 

Our company wp-cron.php errors case

In our case the:
212.235.185.131 – – [15/Apr/2010:06:32:12 -0600] "POST /wp-cron.php?doing_wp_cron HTTP/1.0" 404
is occuring in Apache access.log (after each unique vistor request to wordpress!.), this is cause wp-cron.php is invoked on each new site visitor site request.
This puts a "vain load" on the Apache Server, attempting constatly to invoke the script … always returning not found 404 err.

As a consequence, the WP website experiences "weird" problems all the time. An illustration of a problem caused by the impoper wp-cron.php execution is when we are adding new plugins to WP.

Lets say a new wordpress extension is download, installed and enabled in order to add new useful functioanlity to the site.

Most of the time this new plugin would be malfunctioning if for example it is prepared to add some kind of new html form or change something on some or all the wordpress HTML generated pages.
This troubles are result of wp-config.php's inability to update settings in wp SQL database, after each new user request to our site.
So the newly added plugin website functionality is not showing up at all, until WP cache directory is manually deleted with rm -rf /var/www/blog/wp-content/cache/

I don't know how thi whole wp-config.php mess occured, however my guess is whoever installed this wordpress has messed something in the install procedure.

Anyways, as I researched thoroughfully, I red many people complaining of having experienced same wp-config.php 404 errs. As I red, most of the people troubles were caused by their shared hosting prohibiting the wp-cron.php execution.
It appears many shared hostings providers choose, to disable the wordpress default wp-cron.php execution. The reason is probably the script puts heavy load on shared hosting servers and makes troubles with server overloads.

Anyhow, since our company server is adedicated server I can tell for sure in our case wordpress had no restrictions for how and when wp-cron.php is invoked.
I've seen also some posts online claiming, the wp-cron.php issues are caused of improper localhost records in /etc/hosts, after a thorough examination I did not found any hosts problems:

hipo@debian:~$ grep -i 127.0.0.1 /etc/hosts
127.0.0.1 localhost.localdomain localhost

You see from below paste, our server, /etc/hosts has perfectly correct 127.0.0.1 records.

Changing default way wp-cron.php is executed

As I've learned it is generally a good idea for WordPress based websites which contain tens of thousands of visitors, to alter the default way wp-cron.php is handled. Doing so will achieve some efficiency and improve server hardware utilization.
Invoking the script, after each visitor request can put a heavy "useless" burden on the server CPU. In most wordpress based websites, the script did not need to make frequent changes in the DB, as new comments in posts did not happen often. In most wordpress installs out there, big changes in the wordpress are not common.

Therefore, a good frequency to exec wp-cron.php, for wordpress blogs getting only a couple of user comments per hour is, half an hour cron routine.

To disable automatic invocation of wp-cron.php, after each visitor request open /var/www/blog/wp-config.php and nearby the line 30 or 40, put:

define('DISABLE_WP_CRON', true);

An important note to make here is that it makes sense the position in wp-config.php, where define('DISABLE_WP_CRON', true); is placed. If for instance you put it at the end of file or near the end of the file, this setting will not take affect.
With that said be sure to put the variable define, somewhere along the file initial defines or it will not work.

Next, with Apache non-root privileged user lets say www-data, httpd, www depending on the Linux distribution or BSD Unix type add a php CLI line to invoke wp-cron.php every half an hour:

linux:~# crontab -u www-data -e

0,30 * * * * cd /var/www/blog; /usr/bin/php /var/www/blog/wp-cron.php 2>&1 >/dev/null

To assure, the php CLI (Command Language Interface) interpreter is capable of properly interpreting the wp-cron.php, check wp-cron.php for syntax errors with cmd:

linux:~# php -l /var/www/blog/wp-cron.php
No syntax errors detected in /var/www/blog/wp-cron.php

That's all, 404 wp-cron.php error messages will not appear anymore in access.log! 🙂

Just for those who can find the root of the /wp-cron.php?doing_wp_cron HTTP/1.0" 404 and fix the issue in some other way (I'll be glad to know how?), there is also another external way to invoke wp-cron.php with a request directly to the webserver with short cron invocation via wget or lynx text browser.

– Here is how to call wp-cron.php every half an hour with lynxPut inside any non-privileged user, something like:
01,30 * * * * /usr/bin/lynx -dump "http://www.your-domain-url.com/wp-cron.php?doing_wp_cron" 2>&1 >/dev/null

– Call wp-cron.php every 30 mins with wget:

01,30 * * * * /usr/bin/wget -q "http://www.your-domain-url.com/wp-cron.php?doing_wp_cron"

Invoke the wp-cron.php less frequently, saves the server from processing the wp-cron.php thousands of useless times.

Altering the way wp-cron.php works should be seen immediately as the reduced server load should drop a bit.
Consider you might need to play with the script exec frequency until you get, best fit cron timing. For my company case there are only up to 3 new article posted a week, hence too high frequence of wp-cron.php invocations is useless.

With blog where new posts occur once a day a script schedule frequency of 6 up to 12 hours should be ok.