Posts Tagged ‘text’

How to change default Text editor in Linux

Saturday, August 31st, 2019

Reading Time: 2 minutes

This is a very trivial question but, as I thought someone that is starting with Linux basics Operating might be interested I will shortly explain in this small article how to change default text editor on Linux.

Changing default text editor is especially helpful if you have to administer a newly purchased dedicated servers, that comes with default Operating System preinstalled.

By default many Linux distributions versions such as Debian / Ubuntu comes with nano comes with a default text editor nano (ANOther enhanced free Pico editor clone) many people as me are irritated and prefer to use instead vim (Vi Improved), mcedit (the Midnight Commander), joe or emacs as a default.
 

1. Changing default console text editor on Debian based Linux


On Debian / Ubuntu / Mint and other deb based distributions the easiest way to change text editor is with update-alternatives cmd.
 

update-alternatives –config editor


changing-default-text-editor-in-linux

Using Debian update-alternatives is useful as it makes the change OS global wide and the default mc viewer program mcview will also understand the change in the default text editor, which makes it the preferrable way to do it on deb based OS family.

An alternative way to set the default programs for the OS system wide is to create the respective symbolic link in /etc/alternatives actually what update-alternatives wrapper script does is exactly this it creates the required symlink.

2. Changing default text editor on any Linux


To change the text editor for only a single system existing user in /etc/passwd you need to edit $HOME/.bashrc, e.g. ~/.bashrc on Debian based Linux or on Fedora / RHEL / CentOS by adding to ~/.bash_profile
 

vim ~/.bashrc


And add

alias editor=vim

or

export EDITOR='/path/to/text-editor/program'
export VISUAL='/path/to/text-editor/program'

To change to mcedit for example when opening in any program that triggers to run default text editor

export EDITOR='/usr/bin/mcedit'
export VISUAL='/usr/bin/mcedit'


To make the change system wide on any Linux distribution you have to add the export EDITOR / export VISUAL at the end of /etc/bash.bashrc

To load the newly included .bashrc* instructions use source command
 

source ~/.bashrc

 

3. Changing the default text editor for mcview if all else fails

 

Once mc is running, use following menu keys order (also visible from Midnight Commander) menus:
 

    F9 Activates the top menu.
    o Selects the Option menu.
    c Opens the configuration dialog.
    i Toggles the use internal edit option.
    s Saves your preferences.

mcview-how-to-change-default-text-editor-screenshot

That's all folks Enjoy !

 

Email Linux alternative text console clients to Thunderbird, fetchmail, Mutt, fetchmail + Alpine how to

Saturday, November 4th, 2017

Reading Time: 6 minutes

linux-email-alternatives-for-text-console-email-fetching-gathering-alternative-to-thunderbird-and-evolution-howto

As a GNU / Linux user you might end up searching for the best email client to satisfy your needs, for those who used so far Outlook Express on M$ Windows first switch to GNU / Linux the most likely one to choose is either Mozilla Thunderbird or GNOME's Evolution default Mail Clientbut what more text / console programs are there that will allow you to easily check email via POP3 and IMAP on Linux?

 

1. Install Fetchmail and use to collect and copy your emails from remote server to your local machine
 

 SSL enabled POP3, APOP, IMAP mail gatherer/forwarder
 fetchmail is a free, full-featured, robust, and well-documented remote mail
 retrieval and forwarding utility intended to be used over on-demand TCP/IP
 links (such as SLIP or PPP connections).  It retrieves mail from remote mail
 servers and forwards it to your local (client) machine's delivery system, so
 it can then be read by normal mail user agents such as mutt, elm, pine,
 (x)emacs/gnus, or mailx.  The fetchmailconf package includes an interactive
 GUI configurator suitable for end-users.

To install it, issue:
 

apt-get install –yes fetchmail procmail


To configure fetchmail to gather your mail from your POP3 / IMAP mailbox, create below
.fetchmailrc configuratoin and modify according to your account

 

# vim .fetchmailrc

 

#### .fetchmailrc
 set daemon 600
 set logfile fetchmail.log

 poll the_mail_server_hostname proto POP3

  user "Remote_Username" pass "PASSWORD=" is "local_username" preconnect "date >> fetchmail.log"
 #ssl
  fetchall
  #no keep
  no rewrite
  mda "/usr/bin/procmail -f %F -d %T";


Here is also few words on each of the .fetchmailrc config options

set daemon 600 The fetchmail binary with run in the background in daemon mod and fetch mail from the server every 600 seconds or 10 minutes.

set logfile fetchmail.log This will set the directory and file name of the fetchmail user log file. Eveytime fetchmail recieves an email, checks the pop3 server or errors out you will find an entry here.

poll the_isp_mail_server proto POP3 This line tells fetchmail what mail server to contact, in theis case "the_isp_mail_server" and to use the "POP3" protocol.

user "remote_user_name" pass "PASSWORD" is "local_username" preconnect "date >> fetchmail.log The user directive tells fetchmail what the name of the user on the remote mail server is for example "remote_user_name". The pass directive is simply the password you will use for the remote user on the mail server. The "is" directive is optional. It tells fetchmail to deliver mail to a diferent user name if the user on the remote mail server and the local machine are different. For example, I may be using the name "joe.doe" on the mail server, but my local user name is "jdoe". I would use a line like user "joe.doe" pass "PASSWORD" is "jdoe". The preconnect command simply adds the current time and date to the fetchmail log file every time fetchmail checks for new mail.

ssl The "ssl" directive tells fetchmail to use encryption when connecting to the pop3 mail server. Fetchmail will use port 995 instead of port 110 for un-encypted mail communication. In order to use ssl the remote mail server must be able to use ssl. Comment out this directive if you do _not_ use pop3s.

fetchall Fetchall just means to fetch all of the mail on the mail server no matter what the "read" flag is. It is possibly to read mail through many different processes. If you use another mail client from another location, for example you could have read you mail and kept it ont he server, but marked it with the "read" flag. At this point if you did _not_ use the "fetchall" flag then only mail marked as new would be downloaded.

no keep Once the mail is downloded from the mail server fetchmail is to tell the server to remove it from the server. You may choose to comment this option out if you want to leave all mail on the server.

no rewrite Do not rewrite headers. When fetchmail recieves the mail you do not want any of the headers mangled in any way. This directive tells fetchmail to leave the headers alone.

mda "/usr/bin/procmail -f %F -d %T"; The mda is your "mail delivery agent. Procmail is the program that fetchmail will hand the mail off to once it is downloaded. The argument "-f %F" logs who the mail if from and "-d %T" turns on explicit delivery mode, delivery will be to the local user recipient.

For configuring multiple mailboxes email to be gathered to local machine through fetchmail add to above configuration, some more config similar to this:

 poll mail.example.com protocol pop3:
       username "admin" password "your-plain-text-password" is "username" here;
       username "what-ever-user-name" password "Just-another-pass#" is "foreman" here;

  poll mail.example.org protocol pop3 with option sslproto '':
       user "whatever-user1" password "its-my-pass" mda "/usr/bin/procmail -d %T":   user "whatever-user1" password "its-my-pass" mda "/usr/bin/procmail -d %T

 


Because as you can see fetchmail keeps password in plaintext it is a best security practice to set some good file permissions on .fetchmailrc just to make sure some other local user on the same Linux / Unix machine will not be able to read your plaintext password, to do so issue below command.
 

chmod 600 ~/.fetchmailrc

 

For the purpose of logging as we have it into the config you will also need to create new blank file fetchmail.log
 

touch fetchmail.log


Once fetchmail all your emails you can use mail command to view your messages or further configure alpine or mutt to read the downloaded messages.

 

2. Use Alpine text based email client to check your downloaded email with fetchmail
 

Alpine is Text-based email client, friendly for novices but powerful
 Alpine is an upgrade of the well-known PINE email client.  Its name derives
 from the use of the Apache License and its ties to PINE.

In other words what Alpine is it is a rewritten and improved version of the good old PINE Unix email client (for those who remember it).

To give alpine a try on Debian / Ubuntu install it with:

 

apt-get install –yes alpine pilot

 

Mutt-text-console-linux-email-client

 


3. Use MuTT advanced and much more colorful text email client to view your emailbox

mutt-text-email-client-logo-dog

 Mutt is a sophisticated text-based Mail User Agent. Some highlights:
 .
  * MIME support (including RFC1522 encoding/decoding of 8-bit message
    headers and UTF-8 support).
  * PGP/MIME support (RFC 2015).
  * Advanced IMAP client supporting SSL encryption and SASL authentication.
  * POP3 support.
  * ESMTP support.
  * Message threading (both strict and non-strict).
  * Keybindings are configurable, default keybindings are much like ELM;
    Mush and PINE-like ones are provided as examples.
  * Handles MMDF, MH and Maildir in addition to regular mbox format.
  * Messages may be (indefinitely) postponed.
  * Colour support.
  * Highly configurable through easy but powerful rc file.

 

To install MuTT:

 

linux:~# apt-get install –yes mutt

Configuring mutt if you don't have priorly set-up with fetchmail to collect your remote e-mails, you might want to try out .mutt's email fetch features to do so you will need a .muttrc configuration like that:
 

# Automatically log in to this mailbox at startup
set spoolfile="imaps://User_Name:Your-Secret-Password@mail.example.com/"
# Define the = shortcut, and the entry point for the folder browser (c?)
set folder="imaps://mail.example.com/"
set record="=Sent"
set postponed="=Drafts"

You might also omit placing the password inside .muttrc configuration as storing the password in plaintext might be a big security hole if someone is able to read it at certain point, but the downside of that is you'll be asked by mutt to fill in your email password on every login which at a point becomes pretty annoying.
 

If you face problems with inability of mutt to connect to remote mail server due to TLS problems, you can also try to play with below configurations:
 

# activate TLS if available on the server
 set ssl_starttls=yes
 # always use SSL when connecting to a server
 set ssl_force_tls=yes
 # Don't wait to enter mailbox manually
 unset imap_passive        
 # Automatically poll subscribed mailboxes for new mail (new in 1.5.11)
 set imap_check_subscribed
 # Reduce polling frequency to a sane level
 set mail_check=60
 # And poll the current mailbox more often (not needed with IDLE in post 1.5.11)
 set timeout=10
 # keep a cache of headers for faster loading (1.5.9+?)
 set header_cache=~/.hcache
 # Display download progress every 5K
 set net_inc=5
 

 

Once you have the emails downloaded with fetchmail for your mailbox mutt should be showing your email stuff like in below screenshot
 

linux:~$ mutt

 

 

Mutt-text-console-linux-email-client

Of course a very handy thing to have is w3m-img text browser that displays images as it might be able to open your pictures attached to email if you're on a physical console tty.

I'll be curious to hear, if you know of better and easier solutions to check mail in console, so if you know such please drop a comment explaining how you check your mail text.

 

How to check who is flooding your Apache, NGinx Webserver – Real time Monitor statistics about IPs doing most URL requests and Stopping DoS attacks with Fail2Ban

Wednesday, August 20th, 2014

Reading Time: 6 minutes

check-who-is-flooding-your-apache-nginx-webserver-real-time-monitoring-ips-doing-most-url-requests-to-webserver-and-protecting-your-webserver-with-fail2ban

If you're Linux ystem administrator in Webhosting company providing WordPress / Joomla / Drupal web-sites hosting and your UNIX servers suffer from periodic denial of service attacks, because some of the site customers business is a target of competitor company who is trying to ruin your client business sites through DoS or DDOS attacks, then the best thing you can do is to identify who and how is the Linux server being hammered. If you find out DoS is not on a network level but Apache gets crashing because of memory leaks and connections to Apache are so much that the CPU is being stoned, the best thing to do is to check which IP addresses are causing the excessive GET / POST / HEAD requests in logged.
 

There is the Apachetop tool that can give you the most accessed webserver URLs in a refreshed screen like UNIX top command, however Apachetop does not show which IP does most URL hits on Apache / Nginx webserver. 

 

1. Get basic information on which IPs accesses Apache / Nginx the most using shell cmds


Before examining the Webserver logs it is useful to get a general picture on who is flooding you on a TCP / IP network level, with netstat like so:
 

# here is howto check clients count connected to your server
netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n


If you get an extensive number of connected various IPs / hosts (like 10000 or something huge as a number), depending on the type of hardware the server is running and the previous scaling planned for the system you can determine whether the count as huge as this can be handled normally by server, if like in most cases the server is planned to serve a couple of hundreds or thousands of clients and you get over 10000 connections hanging, then your server is under attack or if its Internet server suddenly your website become famous like someone posted an article on some major website and you suddenly received a tons of hits.


There is a way using standard shell tools, to get some basic information on which IP accesses the webserver the most with:

tail -n 500 /var/log/apache2/access.log | cut -d' ' -f1 | sort | uniq -c | sort -gr

Or if you want to keep it refreshing periodically every few seconds run it through watch command:

watch "tail -n 500 /var/log/apache2/access.log | cut -d' ' -f1 | sort | uniq -c | sort -gr"

monioring-access-hits-to-webserver-by-ip-show-most-visiting-apache-nginx-ip-with-shell-tools-tail-cut-uniq-sort-tools-refreshed-with-watch-cmd


Another useful combination of shell commands is to Monitor POST / GET / HEAD requests number in access.log :
 

 awk '{print $6}' access.log | sort | uniq -c | sort -n

     1 "alihack<%eval
      1 "CONNECT
      1 "fhxeaxb0xeex97x0fxe2-x19Fx87xd1xa0x9axf5x^xd0x125x0fx88x19"x84xc1xb3^v2xe9xpx98`X'dxcd.7ix8fx8fxd6_xcdx834x0c"
      1 "x16x03x01"
      1 "xe2
      2 "mgmanager&file=imgmanager&version=1576&cid=20
      6 "4–"
      7 "PUT
     22 "–"
     22 "OPTIONS
     38 "PROPFIND
   1476 "HEAD
   1539 "-"
  65113 "POST
 537122 "GET


However using shell commands combination is plenty of typing and hard to remember, plus above tools does not show you, approximately how frequenty IP hits the webserver

 

2. Real-time monitoring IP addresses with highest URL reqests with logtop

 


Real-time monitoring on IP addresses with highest URL requests is possible with no need of "console ninja skills"  through – logtop.

 

2.1 Install logtop on Debian / Ubuntu and deb derivatives Linux

 


a) Installing Logtop the debian way

LogTop is easily installable on Debian and Ubuntu in newer releases of Debian – Debian 7.0 and Ubuntu 13/14 Linux it is part of default package repositories and can be straightly apt-get-ed with:

apt-get install –yes logtop

b) Installing Logtop from source code (install on older deb based Linuxes)

On older Debian – Debian 6 and Ubuntu 7-12 servers to install logtop compile from source code – read the README installation instructions or if lazy copy / paste below:

cd /usr/local/src
wget https://github.com/JulienPalard/logtop/tarball/master
mv master JulienPalard-logtop.tar.gz
tar -zxf JulienPalard-logtop.tar.gz

cd JulienPalard-logtop-*/
aptitude install libncurses5-dev uthash-dev

aptitude install python-dev swig

make python-module

python setup.py install

make

make install

 

mkdir -p /usr/bin/
cp logtop /usr/bin/


2.2 Install Logtop on CentOS 6.5 / 7.0 / Fedora / RHEL and rest of RPM based Linux-es

b) Install logtop on CentOS 6.5 and CentOS 7 Linux

– For CentOS 6.5 you need to rpm install epel-release-6-8.noarch.rpm
 

wget http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
rpm -ivh epel-release-6-8.noarch.rpm
links http://dl.fedoraproject.org/pub/epel/6/SRPMS/uthash-1.9.9-6.el6.src.rpm
rpmbuild –rebuild
uthash-1.9.9-6.el6.src.rpm
cd /root/rpmbuild/RPMS/noarch
rpm -ivh uthash-devel-1.9.9-6.el6.noarch.rpm


– For CentOS 7 you need to rpm install epel-release-7-0.2.noarch.rpm

 

links http://download.fedoraproject.org/pub/epel/beta/7/x86_64/repoview/epel-release.html
 

Click on and download epel-release-7-0.2.noarch.rpm

rpm -ivh epel-release-7-0.2.noarch
rpm –import /etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
yum -y install git ncurses-devel uthash-devel
git clone https://github.com/JulienPalard/logtop.git
cd logtop
make
make install

 

2.3 Some Logtop use examples and short explanation

 

logtop shows 4 columns as follows – Line number, Count, Frequency, and Actual line

 

The quickest way to visualize which IP is stoning your Apache / Nginx webserver on Debian?

 

tail -f access.log | awk {'print $1; fflush();'} | logtop

 

 

logtop-check-which-ip-is-making-most-requests-to-your-apache-nginx-webserver-linux-screenshot

On CentOS / RHEL

tail -f /var/log/httpd/access_log | awk {'print $1; fflush();'} | logtop

 

Using LogTop even Squid Proxy caching server access.log can be monitored.
To get squid Top users by IP listed:

 

tail -f /var/log/squid/access.log | awk {'print $1; fflush();'} | logtop


logtop-visualizing-top-users-using-squid-proxy-cache
 

Or you might visualize in real-time squid cache top requested URLs
 

tail -f /var/log/squid/access.log | awk {'print $7; fflush();'} | logtop


visualizing-top-requested-urls-in-squid-proxy-cache-howto-screenshot

 

3. Automatically Filter IP addresses causing Apache / Nginx Webservices Denial of Service with fail2ban
 

Once you identify the problem if the sites hosted on server are target of Distributed DoS, probably your best thing to do is to use fail2ban to automatically filter (ban) IP addresses doing excessive queries to system services. Assuming that you have already installed fail2ban as explained in above link (On Debian / Ubuntu Linux) with:
 

apt-get install –yes fail2ban


To make fail2ban start filtering DoS attack IP addresses, you will have to set the following configurations:
 

vim /etc/fail2ban/jail.conf


Paste in file:
 

[http-get-dos]
 
enabled = true
port = http,https
filter = http-get-dos
logpath = /var/log/apache2/WEB_SERVER-access.log
# maxretry is how many GETs we can have in the findtime period before getting narky
maxretry = 300
# findtime is the time period in seconds in which we're counting "retries" (300 seconds = 5 mins)
findtime = 300
# bantime is how long we should drop incoming GET requests for a given IP for, in this case it's 5 minutes
bantime = 300
action = iptables[name=HTTP, port=http, protocol=tcp]


Before you paste make sure you put the proper logpath = location of webserver (default one is /var/log/apache2/access.log), if you're using multiple logs for each and every of hosted websites, you will probably want to write a script to automatically loop through all logs directory get log file names and automatically add auto-modified version of above [http-get-dos] configuration. Also configure maxtretry per IP, findtime and bantime, in above example values are a bit low and for heavy loaded websites which has to serve thousands of simultaneous connections originating from office networks using Network address translation (NAT), this might be low and tuned to prevent situations, where even the customer of yours can't access there websites 🙂

To finalize fail2ban configuration, you have to create fail2ban filter file:

vim /etc/fail2ban/filters.d/http-get-dos.conf


Paste:
 

# Fail2Ban configuration file
#
# Author: http://www.go2linux.org
#
[Definition]
 
# Option: failregex
# Note: This regex will match any GET entry in your logs, so basically all valid and not valid entries are a match.
# You should set up in the jail.conf file, the maxretry and findtime carefully in order to avoid false positives.
 
failregex = ^<HOST> -.*"(GET|POST).*
 
# Option: ignoreregex
# Notes.: regex to ignore. If this regex matches, the line is ignored.
# Values: TEXT
#
ignoreregex =


To make fail2ban load new created configs restart it:
 

/etc/init.d/fail2ban restart


If you want to test whether it is working you can use Apache webserver Benchmark tools such as ab or siege.
The quickest way to test, whether excessive IP requests get filtered – and make your IP banned temporary:
 

ab -n 1000 -c 20 http://your-web-site-dot-com/

This will make 1000 page loads in 20 concurrent connections and will add your IP to temporary be banned for (300 seconds) = 5 minutes. The ban will be logged in /var/log/fail2ban.log, there you will get smth like:

2014-08-20 10:40:11,943 fail2ban.actions: WARNING [http-get-dos] Ban 192.168.100.5
2013-08-20 10:44:12,341 fail2ban.actions: WARNING [http-get-dos] Unban 192.168.100.5


How to disable tidy HTML corrector and validator to output error and warning messages

Sunday, March 18th, 2012

Reading Time: 2 minutes

I've noticed in /var/log/apache2/error.log on one of the Debian servers I manage a lot of warnings and errors produced by tidy HTML syntax checker and reformatter program.

There were actually quite plenty frequently appearing messages in the the log like:

...
To learn more about HTML Tidy see http://tidy.sourceforge.net
Please fill bug reports and queries using the "tracker" on the Tidy web site.
Additionally, questions can be sent to html-tidy@w3.org
HTML and CSS specifications are available from http://www.w3.org/
Lobby your company to join W3C, see http://www.w3.org/Consortium
line 1 column 1 - Warning: missing <!DOCTYPE> declaration
line 1 column 1 - Warning: plain text isn't allowed in <head> elements
line 1 column 1 - Info: <head> previously mentioned
line 1 column 1 - Warning: inserting implicit <body>
line 1 column 1 - Warning: inserting missing 'title' element
Info: Document content looks like HTML 3.2
4 warnings, 0 errors were found!
...

I did a quick investigation on where from this messages are logged in error.log, and discovered few .php scripts in one of the websites containing the tidy string.
I used Linux find + grep cmds find in all php files the "tidy "string, like so:

server:~# find . -iname '*.php'-exec grep -rli 'tidy' '{}' ;
find . -iname '*.php' -exec grep -rli 'tidy' '{}' ; ./new_design/modules/index.mod.php
./modules/index.mod.php
./modules/index_1.mod.php
./modules/index1.mod.php

Opening the files, with vim to check about how tidy is invoked, revealed tidy calls like:

exec('/usr/bin/tidy -e -ashtml -utf8 '.$tmp_name,$rett);

As you see the PHP programmers who wrote this website, made a bigtidy mess. Instead of using php5's tidy module, they hard coded tidy external command to be invoked via php's exec(); external tidy command invocation.
This is extremely bad practice, since it spawns the command via a pseudo limited apache shell.
I've notified about the issue, but I don't know when, the external tidy calls will be rewritten.

Until the external tidy invocations are rewritten to use the php tidy module, I decided to at least remove the tidy warnings and errors output.

To remove the warning and error messages I've changed:

exec('/usr/bin/tidy -e -ashtml -utf8 '.$tmp_name,$rett);

exec('/usr/bin/tidy --show-warnings no --show-errors no -q -e -ashtml -utf8 '.$tmp_name,$rett);

The extra switches meaning is like so:

q – instructs tidy to produce quiet output
-e – show only errors and warnings
–show warnings no && –show errors no, completely disable warnings and error output

Onwards tidy no longer logs junk messages in error.log Not logging all this useless warnings and errors has positive effect on overall server performance especially, when the scripts, running /usr/bin/tidy are called as frequently as 1000 times per sec. or more


Block Web server over loading Bad Crawler Bots and Search Engine Spiders with .htaccess rules

Monday, September 18th, 2017

Reading Time: 6 minutes

howto-block-webserver-overloading-bad-crawler-bots-spiders-with-htaccess-modrewrite-rules-file

In last post, I've talked about the problem of Search Index Crawler Robots aggressively crawling websites and how to stop them (the article is here) explaning how to raise delays between Bot URL requests to website and how to completely probhit some bots from crawling with robots.txt.

As explained in article the consequence of too many badly written or agressive behaviour Spider is the "server stoning" and therefore degraded Web Server performance as a cause or even a short time Denial of Service Attack, depending on how well was the initial Server Scaling done.

The bots we want to filter are not to be confused with the legitimate bots, that drives real traffic to your website, just for information

 The 10 Most Popular WebCrawlers Bots as of time of writting are:
 

1. GoogleBot (The Google Crawler bots, funnily bots become less active on Saturday and Sundays :))

2. BingBot (Bing.com Crawler bots)

3. SlurpBot (also famous as Yahoo! Slurp)

4. DuckDuckBot (The dutch search engine duckduckgo.com crawler bots)

5. Baiduspider (The Chineese most famous search engine used as a substitute of Google in China)

6. YandexBot (Russian Yandex Search engine crawler bots used in Russia as a substitute for Google )

7. Sogou Spider (leading Chineese Search Engine launched in 2004)

8. Exabot (A French Search Engine, launched in 2000, crawler for ExaLead Search Engine)

9. FaceBot (Facebook External hit, this crawler is crawling a certain webpage only once the user shares or paste link with video, music, blog whatever  in chat to another user)

10. Alexa Crawler (la_archiver is a web crawler for Amazon's Alexa Internet Rankings, Alexa is a great site to evaluate the approximate page popularity on the internet, Alexa SiteInfo page has historically been the Swift Army knife for anyone wanting to quickly evaluate a webpage approx. ranking while compared to other pages)

Above legitimate bots are known to follow most if not all of W3C – World Wide Web Consorium (W3.Org) standards and therefore, they respect the content commands for allowance or restrictions on a single site as given from robots.txt but unfortunately many of the so called Bad-Bots or Mirroring scripts that are burning your Web Server CPU and Memory mentioned in previous article are either not following /robots.txt prescriptions completely or partially.

Hence with the robots.txt unrespective bots, the case the only way to get rid of most of the webspiders that are just loading your bandwidth and server hardware is to filter / block them is by using Apache's mod_rewrite through

 

.htaccess


file

Create if not existing in the DocumentRoot of your website .htaccess file with whatever text editor, or create it your windows / mac os desktop and transfer via FTP / SecureFTP to server.

I prefer to do it directly on server with vim (text editor)

 

 

vim /var/www/sites/your-domain.com/.htaccess

 

RewriteEngine On

IndexIgnore .htaccess */.??* *~ *# */HEADER* */README* */_vti*

SetEnvIfNoCase User-Agent "^Black Hole” bad_bot
SetEnvIfNoCase User-Agent "^Titan bad_bot
SetEnvIfNoCase User-Agent "^WebStripper" bad_bot
SetEnvIfNoCase User-Agent "^NetMechanic" bad_bot
SetEnvIfNoCase User-Agent "^CherryPicker" bad_bot
SetEnvIfNoCase User-Agent "^EmailCollector" bad_bot
SetEnvIfNoCase User-Agent "^EmailSiphon" bad_bot
SetEnvIfNoCase User-Agent "^WebBandit" bad_bot
SetEnvIfNoCase User-Agent "^EmailWolf" bad_bot
SetEnvIfNoCase User-Agent "^ExtractorPro" bad_bot
SetEnvIfNoCase User-Agent "^CopyRightCheck" bad_bot
SetEnvIfNoCase User-Agent "^Crescent" bad_bot
SetEnvIfNoCase User-Agent "^Wget" bad_bot
SetEnvIfNoCase User-Agent "^SiteSnagger" bad_bot
SetEnvIfNoCase User-Agent "^ProWebWalker" bad_bot
SetEnvIfNoCase User-Agent "^CheeseBot" bad_bot
SetEnvIfNoCase User-Agent "^Teleport" bad_bot
SetEnvIfNoCase User-Agent "^TeleportPro" bad_bot
SetEnvIfNoCase User-Agent "^MIIxpc" bad_bot
SetEnvIfNoCase User-Agent "^Telesoft" bad_bot
SetEnvIfNoCase User-Agent "^Website Quester" bad_bot
SetEnvIfNoCase User-Agent "^WebZip" bad_bot
SetEnvIfNoCase User-Agent "^moget/2.1" bad_bot
SetEnvIfNoCase User-Agent "^WebZip/4.0" bad_bot
SetEnvIfNoCase User-Agent "^WebSauger" bad_bot
SetEnvIfNoCase User-Agent "^WebCopier" bad_bot
SetEnvIfNoCase User-Agent "^NetAnts" bad_bot
SetEnvIfNoCase User-Agent "^Mister PiX" bad_bot
SetEnvIfNoCase User-Agent "^WebAuto" bad_bot
SetEnvIfNoCase User-Agent "^TheNomad" bad_bot
SetEnvIfNoCase User-Agent "^WWW-Collector-E" bad_bot
SetEnvIfNoCase User-Agent "^RMA" bad_bot
SetEnvIfNoCase User-Agent "^libWeb/clsHTTP" bad_bot
SetEnvIfNoCase User-Agent "^asterias" bad_bot
SetEnvIfNoCase User-Agent "^httplib" bad_bot
SetEnvIfNoCase User-Agent "^turingos" bad_bot
SetEnvIfNoCase User-Agent "^spanner" bad_bot
SetEnvIfNoCase User-Agent "^InfoNaviRobot" bad_bot
SetEnvIfNoCase User-Agent "^Harvest/1.5" bad_bot
SetEnvIfNoCase User-Agent "Bullseye/1.0" bad_bot
SetEnvIfNoCase User-Agent "^Mozilla/4.0 (compatible; BullsEye; Windows 95)" bad_bot
SetEnvIfNoCase User-Agent "^Crescent Internet ToolPak HTTP OLE Control v.1.0" bad_bot
SetEnvIfNoCase User-Agent "^CherryPickerSE/1.0" bad_bot
SetEnvIfNoCase User-Agent "^CherryPicker /1.0" bad_bot
SetEnvIfNoCase User-Agent "^WebBandit/3.50" bad_bot
SetEnvIfNoCase User-Agent "^NICErsPRO" bad_bot
SetEnvIfNoCase User-Agent "^Microsoft URL Control – 5.01.4511" bad_bot
SetEnvIfNoCase User-Agent "^DittoSpyder" bad_bot
SetEnvIfNoCase User-Agent "^Foobot" bad_bot
SetEnvIfNoCase User-Agent "^WebmasterWorldForumBot" bad_bot
SetEnvIfNoCase User-Agent "^SpankBot" bad_bot
SetEnvIfNoCase User-Agent "^BotALot" bad_bot
SetEnvIfNoCase User-Agent "^lwp-trivial/1.34" bad_bot
SetEnvIfNoCase User-Agent "^lwp-trivial" bad_bot
SetEnvIfNoCase User-Agent "^Wget/1.6" bad_bot
SetEnvIfNoCase User-Agent "^BunnySlippers" bad_bot
SetEnvIfNoCase User-Agent "^Microsoft URL Control – 6.00.8169" bad_bot
SetEnvIfNoCase User-Agent "^URLy Warning" bad_bot
SetEnvIfNoCase User-Agent "^Wget/1.5.3" bad_bot
SetEnvIfNoCase User-Agent "^LinkWalker" bad_bot
SetEnvIfNoCase User-Agent "^cosmos" bad_bot
SetEnvIfNoCase User-Agent "^moget" bad_bot
SetEnvIfNoCase User-Agent "^hloader" bad_bot
SetEnvIfNoCase User-Agent "^humanlinks" bad_bot
SetEnvIfNoCase User-Agent "^LinkextractorPro" bad_bot
SetEnvIfNoCase User-Agent "^Offline Explorer" bad_bot
SetEnvIfNoCase User-Agent "^Mata Hari" bad_bot
SetEnvIfNoCase User-Agent "^LexiBot" bad_bot
SetEnvIfNoCase User-Agent "^Web Image Collector" bad_bot
SetEnvIfNoCase User-Agent "^The Intraformant" bad_bot
SetEnvIfNoCase User-Agent "^True_Robot/1.0" bad_bot
SetEnvIfNoCase User-Agent "^True_Robot" bad_bot
SetEnvIfNoCase User-Agent "^BlowFish/1.0" bad_bot
SetEnvIfNoCase User-Agent "^JennyBot" bad_bot
SetEnvIfNoCase User-Agent "^MIIxpc/4.2" bad_bot
SetEnvIfNoCase User-Agent "^BuiltBotTough" bad_bot
SetEnvIfNoCase User-Agent "^ProPowerBot/2.14" bad_bot
SetEnvIfNoCase User-Agent "^BackDoorBot/1.0" bad_bot
SetEnvIfNoCase User-Agent "^toCrawl/UrlDispatcher" bad_bot
SetEnvIfNoCase User-Agent "^WebEnhancer" bad_bot
SetEnvIfNoCase User-Agent "^TightTwatBot" bad_bot
SetEnvIfNoCase User-Agent "^suzuran" bad_bot
SetEnvIfNoCase User-Agent "^VCI WebViewer VCI WebViewer Win32" bad_bot
SetEnvIfNoCase User-Agent "^VCI" bad_bot
SetEnvIfNoCase User-Agent "^Szukacz/1.4" bad_bot
SetEnvIfNoCase User-Agent "^QueryN Metasearch" bad_bot
SetEnvIfNoCase User-Agent "^Openfind data gathere" bad_bot
SetEnvIfNoCase User-Agent "^Openfind" bad_bot
SetEnvIfNoCase User-Agent "^Xenu’s Link Sleuth 1.1c" bad_bot
SetEnvIfNoCase User-Agent "^Xenu’s" bad_bot
SetEnvIfNoCase User-Agent "^Zeus" bad_bot
SetEnvIfNoCase User-Agent "^RepoMonkey Bait & Tackle/v1.01" bad_bot
SetEnvIfNoCase User-Agent "^RepoMonkey" bad_bot
SetEnvIfNoCase User-Agent "^Zeus 32297 Webster Pro V2.9 Win32" bad_bot
SetEnvIfNoCase User-Agent "^Webster Pro" bad_bot
SetEnvIfNoCase User-Agent "^EroCrawler" bad_bot
SetEnvIfNoCase User-Agent "^LinkScan/8.1a Unix" bad_bot
SetEnvIfNoCase User-Agent "^Keyword Density/0.9" bad_bot
SetEnvIfNoCase User-Agent "^Kenjin Spider" bad_bot
SetEnvIfNoCase User-Agent "^Cegbfeieh" bad_bot

 

<Limit GET POST>
order allow,deny
allow from all
Deny from env=bad_bot
</Limit>

 


Above rules are Bad bots prohibition rules have RewriteEngine On directive included however for many websites this directive is enabled directly into VirtualHost section for domain/s, if that is your case you might also remove RewriteEngine on from .htaccess and still the prohibition rules of bad bots should continue to work
Above rules are also perfectly suitable wordpress based websites / blogs in case you need to filter out obstructive spiders even though the rules would work on any website domain with mod_rewrite enabled.

Once you have implemented above rules, you will not need to restart Apache, as .htaccess will be read dynamically by each client request to Webserver

2. Testing .htaccess Bad Bots Filtering Works as Expected


In order to test the new Bad Bot filtering configuration is working properly, you have a manual and more complicated way with lynx (text browser), assuming you have shell access to a Linux / BSD / *Nix computer, or you have your own *NIX server / desktop computer running
 

Here is how:
 

 

lynx -useragent="Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)" -head -dump http://www.your-website-filtering-bad-bots.com/

 

 

Note that lynx will provide a warning such as:

Warning: User-Agent string does not contain "Lynx" or "L_y_n_x"!

Just ignore it and press enter to continue.

Two other use cases with lynx, that I historically used heavily is to pretent with Lynx, you're GoogleBot in order to see how does Google actually see your website?
 

  • Pretend with Lynx You're GoogleBot

 

lynx -useragent="Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -head -dump http://www.your-domain.com/

 

 

  • How to Pretend with Lynx Browser You are GoogleBot-Mobile

 

lynx -useragent="Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_1 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8B117 Safari/6531.22.7 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)" -head -dump http://www.your-domain.com/

 


Or for the lazy ones that doesn't have Linux / *Nix at disposal you can use WannaBrowser website

Wannabrowseris a web based browser emulator which gives you the ability to change the User-Agent on each website req1uest, so just set your UserAgent to any bot browser that we just filtered for example set User-Agent to CheeseBot

The .htaccess rule earier added once detecting your browser client is coming in with the prohibit browser agent will immediately filter out and you'll be unable to access the website with a message like:
 

HTTP/1.1 403 Forbidden

 

Just as I've talked a lot about Index Bots, I think it is worthy to also mention three great websites that can give you a lot of Up to Date information on exact Spiders returned user-agent, common known Bot traits as well as a a current updated list with the Bad Bots etc.

Bot and Browser Resources information user-agents, bad-bots and odd Crawlers and Bots specifics

1. botreports.com
2. user-agents.org
3. useragentapi.com

 

An updated list with robots user-agents (crawler-user-agents) is also available in github here regularly updated by Caia Almeido

There are also a third party plugin (modules) available for Website Platforms like WordPress / Joomla / Typo3 etc.

Besides the listed on these websites as well as the known Bad and Good Bots, there are perhaps a hundred of others that might end up crawling your webdsite that might or might not need  to be filtered, therefore before proceeding with any filtering steps, it is generally a good idea to monitor your  HTTPD access.log / error.log, as if you happen to somehow mistakenly filter the wrong bot this might be a reason for Website Indexing Problems.

Hope this article give you some valueable information. Enjoy ! 🙂

 


The Church feast of St. Tsar (King) Boris Mihail (Michael) I – Baptizer of Bulgaria – 2nd of May a Triumph for Bulgarian Autocephalous Church and Church Slavonic

Monday, May 2nd, 2011

Reading Time: 5 minutes

saint Tsar Boris the baptizer of all Bulgaria

Today on Second of May every year in the Bulgarian Orthodox Church calendar we commemorate the memory of Saint King Boris (Michael) who is a baptizer of Bulgarian nation and among the greatest saint of Bulgaria, perhaps second in saintship after St. John of Rila.

St. King Boris was the ruler of the First bulgarian Empire (852-889) the actual title as we know it from the chronicles is not king but archont which literally means ruler, but as the main chronicles who came to us are Byzantine we have to take in consideration that Byzantine Chronicles aimed to discredit the Bulgarian rulers because of feeling inferior every non-Byzantines who in Byzantine terms were considered barbarians.

He had a notable correspondence with Pope Nicolas II head of the Western Church before the Great Schism.
Saint Boris I is famous with with 115 Questions (which are fully preserved). He  beseech the pope to sent an apostoles of faith who will teach the new nation to faith in the same time he lead a correspondence with the Byzantine emperor and the Eastern Church.

The pope sent 2 missionaries cardinal Formiosa of Portuens and Bishop Paul of Papulon. King Boris requested for Cardinal Formiosa to become a future archibishop of Bulgaria, but the pope Nicolas II being afraid of loosing ground of a choice of future Bulgarian archibishop rejected king Boris request it is interesting fact that the same cardinal Formiosa become the next Pope of the Roman Western Church in period (891 – 896).

Taking in consideration the papal's desire to dominate over him and the more freedoms given by the Byzantine Church along with the fact that Saint Cyril and Methodius's pupils came to him with a ready Church Slavonic Glagolic version of the Holy Bible and a new Church language (and considering the fact that the seven pupils of St. Cyril and Methodius were being chased away from Great Moravia (Saint Clement of Ohrid (Kliment Ohridski) and Naum Ohrdiski with Sava, Angelarius and Gorazd) and seeing the fallacy of the filioque addition to the Creed of Faith finally King Boris received baptism by the hands of Patriarch Photius.

He received holy baptismal in year 864, receiving the Christian name Michael (Mihail) receiving his Christian name from his godfather, Emperor Michael III.

According to Church tradition the reason to Baptize in Christian faith was his amazement of an icon of the Judgement Day he saw in one of his visits to Constantinople.

The Church tradition also says saint Boris I's sister have lived for a long time in Constantinople, where she received baptism and once returning to the Bulgaria she bring the light of faith here too.

Besides that some of his family has already earlier converted to Christianity and some earlier Bulgarian Khans such as Trivelius (Tervel) who is considered to saved Europe from Muslim Invasion earlier were already Christians.

King_Boris I bapttish of Byzantine Emperor Michael III

During his holy reign he has established mass Christianization of Bulgaria, where the traditional ancient pagan traditions and belief in fake gods like Tangra were abolished completely.

St. Tsar Boris has secured the Bulgarian Church an autocephalousy, he also received the saints Cyril and Methodius, when they were banished from Great Moravia.

Our saint king has secured a refuge for st. Cyril and Methodius and provided them with assistance to develop the Slavonic alphabet and literature.

After he abdicated in 889, his eldest son and successor tried to restore the old pagan religion but was deposed by Boris I. During the Council of Preslav which followed that event, the Byzantine clergy was replaced with Bulgarian and the Greek language was replaced with Old Bulgarian as an official language of the Church and the state.

Bulgaria_under_rule_of_Boris_I-st-the-baptizer-king-of-Bulgaria

In 889 Boris abdicated the throne and became a monk. His son and successor Vladimir attempted a pagan reaction (to return the Bulgarians to the old belief in Tanra), which brought Boris out of his monastic life in 893 out of zeal for Christ and a fatherhood for the future of Bulgarian nation.

Vladimir was defeated and blinded by a miracle of God as Boris had less soldiers than his pagan son but in the name of the Lord Jesus Christ he showed victorious.

Once defeating his unbelieving son, Boris gathered the Council of Preslav in which the Old Greek language in Church was replaced with Church Slavonic and most imporantly on the council King Boris on the Church assembly placed his third son, Simeon I of Bulgaria on the throne, threatening him with the same fate if he too apostatized.

Simeon I was reluctant to become a secular ruler as according to some of the written sources he was preparing to become a great spiritual leader a monk and perhaps a next archibishop or even patriarch in the newly established autocephalous Bulgarian Church but following the monk rule of humility he decided and accepted the throne but throughout his life he kept his great zeal for monasticism and enlightenment. His ruling become a golden age for the Church Slavonic he financed seriously and worked hardly to prepare educated monks and to translate a major works of the Holy Fathers such as Shestodnev (The Six Days of Creation), Simeonov Sbornik (Simeon's Collection) and many of the key works of saint Athanasius the Great, Saint Basyl the Great, Saint John of Damascus etc.

According to some historians it was King Simeon who later give birth to the second by size Monastic Community in Byzantine Emperire the Stone Monasteries of (Meteora).

Boris returned to his monastery, emerging once again in c. 895 to help Simeon fight the Magyars, who had invaded Bulgaria in alliance with the Byzantines. After the passing of this crisis, Boris resumed monastic life and led a holy life until he pass out in year 907.

Here is the daily troparion assigned by the Bulgarian Orthodox Church sang at the churches today (the text is translation from Bulgarian):

St. Boris-Michael, prince of Bulgaria, Troparion

Full of the fear of God, and enlightened by holy baptism, thou becamest a habitation of the Holy Spirit,
O right-believing King Boris; and having established the Orthodox Faith in the land of Bulgaria,
and set aside the scepter of kingship,
thou madest thine abode in the wil derness,
didst flourish in ascetic struggles, and found grace before the Lord.
And now, standing before the throne of the Most High, pray thou, that He grant unto us who entreat thee salva tion for our souls.


Windows 7 fix menu messed up cyrillic – How to fix cyrillic text in Windows

Friday, July 22nd, 2016

Reading Time: 2 minutes

faststone-viewer-messed-up-menu-cyrillic-windows-7-screenshot

How to fix Cyrillic text on Windows 7

I've reinstalled my HP provided company work notebook with Windows 7 Enterprise x86 and had troubles with seeing Cyrillic written text, letters and fonts.
The result after installing some programs and selecting as a default language Bulgarian during installation setup prompt let me to see in some programs and in some of my old written text file names and Cyrillic WIN CP1251 content to be showing a cryptic letters like in above screenshot.

If you're being curious what is causing the broken encoding cyrillic text, it is the fact that in past a lot of cyrillic default encoding was written in KOI-8R and WIN-CP1251 encoding which is not unicode e.g. not compatible with the newer standard encoding for cyrillic UTF-8. Of course the authors of some old programs and documents are not really responsbie for the messed up cyrillic as noone expected that every Cyrillic text will be in UTF-8 in newer times.

Thanksfully there is a way to fix the unreadable / broken encoding cyrillic text by:

Going too menus:

Start menu -> Control Panel -> Change display language -> Clock, Language and Religion

Once there click the Administratibe tab

and choose

Change system locale.

windows-7_administrative_tab_change-system-locale

Here if you're not logged in with administrator user you will be prompted for administrative privileges.

select-system-locale-choose-bulgarian-and-hit-ok-windows-7

Being there choose your language (country) to be:

Bulgarian (Bulgaria) – if you're like me a Bulgarian or Russian (if you're Russian / Belarusian / Ukrainian) or someone from the countries of ex-USSR.
Click OK

And reboot (restart) your computer in order to make the new settings active.
 

This should be it from now on all cyrillic letters in all programs / documents and file names on your PC should visualize fine just as it was intended more or less by the cyrillic assumed creator Saint Climent Ohridski who was a  who reformed Cyrillic from Glagolic alphabet.


Word 2011 Check spelling for Mac OS X 2011 – Word check text in Mac OS X Office

Monday, February 29th, 2016

Reading Time: 2 minutes

office-for-mac-2011-logo
 

If you happen to be running Mac OS X powered notebook and have recently installed Microsoft Office 2011 for Mac OS because you used to migrate from a Windows PC, you will probably suprrised that your Native Language Dictionary check you used heavily on Windows might be not performing on Mac.
This was exactly the case with my wife Svetlana and as she is not a computer expert and I'm the IT support at home I had to solve it somehow.

Luckily Office 2011 for Mac OS X which I have installed earlier comes with plenty of foreign-languages such as Russian, Bulgarian, Czech, French, German, UK English, US English etc.
Proofing tools is very handy especially for people like my wife who is natively Belarusian and is in process of learning Bulgarian, thus often in need to check Bulgarian words spelling.

By Default the Check spelling on Office package was set to English, there is a quick way to change this to a certain text without changing the check-spelling default from English, the key shortcut to use is:

I. Press Mac command (key) + A – To select All text in opened document (in our case text was in Bulgarian)

command-a-mac-os-x

Click Word window menu and:
 

Choose Tools→Language


and select Bulgarian (or whatever language you need check spelling for.

If you need to change the Language default for all time, again you can do it from Tools

 

 

 Tools→Language

 

language-menu-on-mac-os-x-microsoft-word-office-2011-package-screenshot

II. The Language dialog will appear and you'll see a list of languages to choose. 

select-language-menu-screenshot-ms-word-2011-on-mac-os-x

III. Next a Pop-up Dialog will ask you whether you're sure you want to change the default language to the language of choice in my case this was Russian.

language-default-dialog-mac-osx-word-2011-office-on-mac-change-default-spelling-language

That's it check spelling default will be aplpied now to Word normal template, so next time you open a document your default spelling choice will set

How to search text strings only in hidden files dot (.) files within a directory on Linux and FreeBSD

Saturday, April 28th, 2012

Reading Time: 2 minutes

how-to-search-hidden-files-linux-freebsd-logo_grep
If there is necessity to look for a string in all hidden files with all sub-level subdirectories (be aware this will be time consuming and CPU stressing) use:
 

hipo@noah:~$ grep -rli 'PATH' .*

./.gftp/gftprc
./.gftp/cache/cache.OOqZVP
….

Sometimes its necessery to only grep for variables within the first-level directories (lets say you would like to grep a 'PATH' variable set, string within the $HOME directory, the command is:

hipo@noah:~$ grep PATH .[!.]*

.profile:PATH=/bin:/usr/bin/:${PATH}
.profile:export PATH
.profile:# set PATH so it includes user's private bin if it exists
.profile: PATH="$HOME/bin:$PATH"
.profile.language-env-bak:# set PATH so it includes user's private bin if it exists
.profile.language-env-bak: PATH="$HOME/bin:$PATH"
.viminfo:?/PATH.xcyrillic: XNLSPATH=/usr/X11R6/lib/X11/nls
.xcyrillic: export XNLSPATH

The regular expression .[!.]*, means exclude any file or directory name starting with '..', e.g. match only .* files

Note that to use the grep PATH .[!.]* on FreeBSD you will have to use this regular expression in bash shell, the default BSD csh or tsch shells will not recognize the regular expression, e.g.:

grep PATH '.[!.]*'
grep: .[!.]*: No such file or directory

Hence on BSD, if you need to look up for a string within the home directory, hidden files: .profile .bashrc .bash_profile .cshrc run it under bash shell:

freebsd# /usr/local/bin/bash
[root@freebsd:/home/hipo]# grep PATH .[!.]*

.bash_profile:# set PATH so it includes user's private bin if it exists
.bash_profile:# PATH=~/bin:"${PATH}"
.bash_profile:# do the same with …

Another easier to remember, alternative grep cmd is:

hipo@noah:~$ grep PATH .*
.profile:PATH=/bin:/usr/bin/:${PATH}
.profile:export PATH
.profile:# set PATH so it includes user's private bin if it exists
.profile: PATH="$HOME/bin:$PATH"
….

Note that grep 'string' .* is a bit different in meaning, as it will not prevent grep to match filenames with names ..filename1, ..filename2 etc.
Though grep 'string' .* will work note that it will sometimes output some unwanted matches if filenames with double dot in the beginning of file name are there …
That's all folks 🙂