Posts Tagged ‘servers’

Creating data backups on Debian and Ubuntu servers with Bacula professional backup tool

Wednesday, April 17th, 2013

Bacula professional GNU Linux Freebsd Netbsd backup software logo with bat

1. Install Bacula Backup System

root@pcfreak:~# apt-cache show bacula |grep -i description -A 5

Description: network backup, recovery and verification – meta-package
 Bacula is a set of programs to manage backup, recovery and verification
 of computer data across a network of computers of different kinds.
 .
 It is efficient and relatively easy to use, while offering many advanced
 storage management features that make it easy to find and recover lost or
 damaged files. Due to its modular design, Bacula is scalable from small
 single computer systems to networks of hundreds of machines.
 .

root@pcfreak:~# apt-get install bacula

 

Reading package lists… Done
Building dependency tree      
Reading state information… Done
The following extra packages will be installed:
  bacula-client bacula-common bacula-common-sqlite3 bacula-console bacula-director-common bacula-director-sqlite3 bacula-fd bacula-sd
  bacula-sd-sqlite3 bacula-server bacula-traymonitor libsqlite0 mt-st mtx sqlite sqlite3
Suggested packages:
  bacula-doc dds2tar scsitools sg3-utils kde gnome-desktop-environment sqlite-doc sqlite3-doc
The following NEW packages will be installed:
  bacula bacula-client bacula-common bacula-common-sqlite3 bacula-console bacula-director-common bacula-director-sqlite3 bacula-fd bacula-sd
  bacula-sd-sqlite3 bacula-server bacula-traymonitor libsqlite0 mt-st mtx sqlite sqlite3
0 upgraded, 17 newly installed, 0 to remove and 0 not upgraded.
2 not fully installed or removed.
Need to get 2,859 kB of archives.
After this operation, 6,992 kB of additional disk space will be used.
Do you want to continue [Y/n]? Y
Get:1 http://security.debian.org/ squeeze/updates/main bacula-common amd64 5.0.2-2.2+squeeze1 [637 kB]
Get:2 http://security.debian.org/ squeeze/updates/main bacula-common-sqlite3 amd64 5.0.2-2.2+squeeze1 [102 kB]
Get:3 http://security.debian.org/ squeeze/updates/main bacula-console amd64 5.0.2-2.2+squeeze1 [67.6 kB]
Get:4 http://security.debian.org/ squeeze/updates/main bacula-director-common amd64 5.0.2-2.2+squeeze1 [56.6 kB]
Get:5 http://security.debian.org/ squeeze/updates/main bacula-director-sqlite3 amd64 5.0.2-2.2+squeeze1 [308 kB]
Get:6 http://security.debian.org/ squeeze/updates/main bacula-sd amd64 5.0.2-2.2+squeeze1 [459 kB]
Get:7 http://security.debian.org/ squeeze/updates/main bacula-sd-sqlite3 amd64 5.0.2-2.2+squeeze1 [435 kB]
Get:8 http://security.debian.org/ squeeze/updates/main bacula-server all 5.0.2-2.2+squeeze1 [48.5 kB]
Get:9 http://security.debian.org/ squeeze/updates/main bacula-fd amd64 5.0.2-2.2+squeeze1 [124 kB]
Get:10 http://security.debian.org/ squeeze/updates/main bacula-client all 5.0.2-2.2+squeeze1 [48.5 kB]
Get:11 http://security.debian.org/ squeeze/updates/main bacula all 5.0.2-2.2+squeeze1 [1,030 B]
Get:12 http://security.debian.org/ squeeze/updates/main bacula-traymonitor amd64 5.0.2-2.2+squeeze1 [70.0 kB]
Get:13 http://ftp.uk.debian.org/debian/ squeeze/main sqlite3 amd64 3.7.3-1 [100 kB]
Get:14 http://ftp.uk.debian.org/debian/ squeeze/main libsqlite0 amd64 2.8.17-6 [188 kB]
Get:15 http://ftp.uk.debian.org/debian/ squeeze/main sqlite amd64 2.8.17-6 [22.0 kB]
Get:16 http://ftp.uk.debian.org/debian/ squeeze/main mtx amd64 1.3.12-3 [154 kB]
Get:17 http://ftp.uk.debian.org/debian/ squeeze/main mt-st amd64 1.1-4 [35.6 kB]                                                            
Fetched 2,859 kB in 6s (471 kB/s)                                                                                                           
Selecting previously deselected package bacula-common.
(Reading database … 86693 files and directories currently installed.)
Unpacking bacula-common (from …/bacula-common_5.0.2-2.2+squeeze1_amd64.deb) …
Adding user 'bacula'… Ok.
Selecting previously deselected package bacula-common-sqlite3.
Unpacking bacula-common-sqlite3 (from …/bacula-common-sqlite3_5.0.2-2.2+squeeze1_amd64.deb) …
Selecting previously deselected package bacula-console.
Unpacking bacula-console (from …/bacula-console_5.0.2-2.2+squeeze1_amd64.deb) …
Processing triggers for man-db …
Setting up bacula-common (5.0.2-2.2+squeeze1) …
Selecting previously deselected package bacula-director-common.
(Reading database … 86860 files and directories currently installed.)
Unpacking bacula-director-common (from …/bacula-director-common_5.0.2-2.2+squeeze1_amd64.deb) …
Selecting previously deselected package sqlite3.
Unpacking sqlite3 (from …/sqlite3_3.7.3-1_amd64.deb) …
Selecting previously deselected package libsqlite0.
Unpacking libsqlite0 (from …/libsqlite0_2.8.17-6_amd64.deb) …
Selecting previously deselected package sqlite.
Unpacking sqlite (from …/sqlite_2.8.17-6_amd64.deb) …
Selecting previously deselected package bacula-director-sqlite3.
Unpacking bacula-director-sqlite3 (from …/bacula-director-sqlite3_5.0.2-2.2+squeeze1_amd64.deb) …
Selecting previously deselected package mtx.
Unpacking mtx (from …/mtx_1.3.12-3_amd64.deb) …
Selecting previously deselected package bacula-sd.
Unpacking bacula-sd (from …/bacula-sd_5.0.2-2.2+squeeze1_amd64.deb) …
Selecting previously deselected package bacula-sd-sqlite3.
Unpacking bacula-sd-sqlite3 (from …/bacula-sd-sqlite3_5.0.2-2.2+squeeze1_amd64.deb) …
Selecting previously deselected package bacula-server.
Unpacking bacula-server (from …/bacula-server_5.0.2-2.2+squeeze1_all.deb) …
Selecting previously deselected package bacula-fd.
Unpacking bacula-fd (from …/bacula-fd_5.0.2-2.2+squeeze1_amd64.deb) …
Selecting previously deselected package bacula-client.
Unpacking bacula-client (from …/bacula-client_5.0.2-2.2+squeeze1_all.deb) …
Selecting previously deselected package bacula.
Unpacking bacula (from …/bacula_5.0.2-2.2+squeeze1_all.deb) …
Selecting previously deselected package bacula-traymonitor.
Unpacking bacula-traymonitor (from …/bacula-traymonitor_5.0.2-2.2+squeeze1_amd64.deb) …
Selecting previously deselected package mt-st.
Unpacking mt-st (from …/archives/mt-st_1.1-4_amd64.deb) …
Processing triggers for man-db …
Setting up acct (6.5.4-2.1) …
Setting up bacula-director-common (5.0.2-2.2+squeeze1) …
Setting up bacula-director-sqlite3 (5.0.2-2.2+squeeze1) …
config: Running dbc_go bacula-director-sqlite3 configure
Stopping Bacula Director…:.
 *** Checking type of existing DB at /var/lib/bacula/bacula.db: None
 *** Will create new database at this location.
dbconfig-common: writing config to /etc/dbconfig-common/bacula-director-sqlite3.conf

Creating config file /etc/dbconfig-common/bacula-director-sqlite3.conf with new version
creating database bacula.db: success.
verifying database bacula.db exists: success.
populating database via sql…  done.
Processing configuration…Ok.
Starting Bacula Director…:.
Setting up bacula-sd (5.0.2-2.2+squeeze1) …
Starting Bacula Storage daemon…:.
Setting up acct (6.5.4-2.1) …
insserv: warning: script 'K02courier-imap' missing LSB tags and overrides
insserv: script iptables: service skeleton already provided!
insserv: warning: script 'courier-imap' missing LSB tags and overrides
Turning on process accounting, file set to '/var/log/account/pacct'.
Done..
Setting up bacula-sd-sqlite3 (5.0.2-2.2+squeeze1) …
Setting up bacula-server (5.0.2-2.2+squeeze1) …
Setting up bacula-fd (5.0.2-2.2+squeeze1) …
Starting Bacula File daemon…:.
Setting up bacula-client (5.0.2-2.2+squeeze1) …
Setting up bacula (5.0.2-2.2+squeeze1) …
Setting up proftpd-basic (1.3.3a-6squeeze6) …
Starting ftp server: proftpd.
Setting up mt-st (1.1-4) …
update-alternatives: using /bin/mt-st to provide /bin/mt (mt) in auto mode.
 

 

Once installed you will have 3 processes running in background used by Bacula backup system (bacula-dir, bacula-sd and bacula-fd)
root@pcfreak:~# ps ax |grep -i bacula|grep -v grep
6044 ? Ssl 0:00 /usr/sbin/bacula-dir -c /etc/bacula/bacula-dir.conf -u bacula -g bacula
6089 ? Ssl 0:00 /usr/sbin/bacula-sd -c /etc/bacula/bacula-sd.conf -u bacula -g tape
6167 ? Ssl 0:00 /usr/sbin/bacula-fd -c /etc/bacula/bacula-fd.conf

Here is what each of them does:

a) Bacula-dir or Bacula-Director is main Bacula Backup system component. Bacula-dir controls the whole backup system and the various other 2 daemons Bacula-FD and  Bacula-SD.

b) Bacula-fd – (Bacula File Daemon) acts as the interface between  Bacula network backup system and the filesystems to be backed up:  it  is  responsible for   reading/writing/verifying the files to be  backup'd/verified/restored. Network transfer can optionally be compressed.

c) Bacula-sd – (Bacula Storage Daemon) acts as interface between Bacula network backup system and Tape Drive or filesystem where backups will be stored

Each of 3 processes bacula-dir, bacula-fd and bacula-sd has their own init script in /etc/rc.d/, e.g.:

# /etc/init.d/bacula-directory
# /etc/init.d/bacula-fd
# /etc/init.d/bacula-sd

2. Configuring Bacula Backup System

Configuring Bacula is done via configuration files located in /etc/bacula

root@pcfreak:~# cd /etc/bacula
root@pcfreak:/etc/bacula# ls -1
bacula-dir.conf
bacula-fd.conf
bacula-fd.conf.dist
bacula-sd.conf
bacula-sd.conf.dist
bconsole.conf
common_default_passwords
scripts/
tray-monitor.conf

3. Defining what needs to be backed up

Here is a short description of most important configuration blocks in Bacula's main config bacula-dir.conf
 

1.Director resource defines the Director’s parameters. Name, Password, WorkingDirectory, and PidDirectory must be set. QueryFile specifies where the Director can find the SQL queries.

2.Job defines a backup or restore to perform. You will need at least one job per client. To simplify configuration of similar clients, create a common JobDefs resource and refer to it from within a Job. For example, if you have one set of defaults for desktops and another set for servers, you can create a Desktop and Server (these names are arbitrary and set with the Name attribute) JobDefs and refer to those two collections of settings from a Job.

3. Schedule resource is referred to within a Job to allow it to occur automatically.

4. FileSet resource defines which files are to be backed up. You can both Include and Exclude files.

5.Each Client resource details the clients that this Director can back up.

6.Storage resource specifies the storage daemon available to the Director.

7.Pool identifies a set of storage volumes (tapes/files) that Bacula can write data to. Each Pool can be configured to use different sets of tapes for different jobs.

8.Catalog resource defines Bacula catalog (database) to be used.

9. Messages resource captures where to send messages and which messages to send.
 

a) Defining directories to be backed up

Defining what needs to be backed up is done through bacula-dir.conf ( /etc/bacula/bacula-dir.conf ). In the file there is a FileSet section, where dirs to backed up have to be included, below config defines to backup /usr/sbin, /etc/, /root, /usr and /var directories
 

# List of files to be backed up
FileSet {
  Name = "Full Set"
  Include {
    Options {
      signature = MD5
    }
#   
#  Put your list of files here, preceded by 'File =', one per line
#    or include an external list with:
#
#    File = <file-name
#
#  Note: / backs up everything on the root partition.
#    if you have other partitions such as /usr or /home
#    you will probably want to add them too.
#
#  By default this is defined to point to the Bacula binary
#    directory to give a reasonable FileSet to backup to
#    disk storage during initial testing.
#
    File = /usr/sbin
    File = /root
    File = /etc
    File = /usr
    File = /var

  }

b) Defining where to store back ups

All configuration of where Bacula will store created backups is done through /etc/bacula/bacula-sd.conf

There are few configurations that needs to be tuned according to custom user purposes, below I paste them from config:
 

Storage {                             # definition of myself
  Name = pcfreak-sd
  SDPort = 9103                  # Director's port     
  WorkingDirectory = "/var/lib/bacula"
  Pid Directory = "/var/run/bacula"
  Maximum Concurrent Jobs = 20
  SDAddress = 127.0.0.1
}

Device {
  Name = FileStorage
  Media Type = File
  Archive Device = /nonexistant/path/to/file/archive/dir
  LabelMedia = yes;                   # lets Bacula label unlabeled media
  Random Access = Yes;
  AutomaticMount = yes;               # when device opened, read it
  RemovableMedia = no;
  AlwaysOpen = no;
}

Messages {
  Name = Standard
  director = pcfreak-dir = all

}

 

Storage sets working directory where temporary backups are created on backup creation time – default is /var/lib/bacula

Device – defines exact directory where backups will be stored after created – usually this is a directory with  mounted hard disk specially for backups. Bacula default is /nonexistant/path/to/file/archive/dir

Messages – configures where and what kind of messages are send on bacula operations

c) Configuring Bacula to create backups via network

Configuring where Bacula will act just on server localhost, or will bind and be visible to store backups via network IP is done from Bacula-FD (Bacula File Daemon).

By default it listens to localhost127.0.0.1. Bacula-FD configurations are done from /etc/bacula/bacula-fd.conf. Most important section configuring where bacula listens is named FileDaemon.
 

#
# "Global" File daemon configuration specifications
#
FileDaemon {                          # this is me
  Name = pcfreak-fd
  FDport = 9102                  # where we listen for the director
  WorkingDirectory = /var/lib/bacula
  Pid Directory = /var/run/bacula
  Maximum Concurrent Jobs = 20
  FDAddress = 127.0.0.1
}
 

 

By commenting FDAddress, Bacula will automatically listen to external IP configured on lan interface eth0

4. Managing Bacula Command Line Interfa – bconsole

Managing bacula interactively is done through bconsole (Bacula's Management Console) command.

root@pcfreak:~# bconsole

Connecting to Director localhost:9101
1000 OK: pcfreak-dir Version: 5.0.2 (28 April 2010)
Enter a period to cancel a command.
*
*help
  Command       Description
  =======       ===========
  add           Add media to a pool
  autodisplay   Autodisplay console messages
  automount     Automount after label
  cancel        Cancel a job
  create        Create DB Pool from resource
  delete        Delete volume, pool or job
  disable       Disable a job
  enable        Enable a job
  estimate      Performs FileSet estimate, listing gives full listing
  exit          Terminate Bconsole session
  gui           Non-interactive gui mode
  help          Print help on specific command
  label         Label a tape
  list          List objects from catalog
  llist         Full or long list like list command
  messages      Display pending messages
  memory        Print current memory usage
  mount         Mount storage
  prune         Prune expired records from catalog
  purge         Purge records from catalog
  python        Python control commands
  quit          Terminate Bconsole session
  query         Query catalog
  restore       Restore files
  relabel       Relabel a tape
  release       Release storage
  reload        Reload conf file
  run           Run a job
  status        Report status
  setdebug      Sets debug level
  setip         Sets new client address — if authorized
  show          Show resource records
  sqlquery      Use SQL to query catalog
  time          Print current time
  trace         Turn on/off trace to file
  unmount       Unmount storage
  umount        Umount – for old-time Unix guys, see unmount
  update        Update volume, pool or stats
  use           Use catalog xxx
  var           Does variable expansion
  version       Print Director version
  wait          Wait until no jobs are running

When at a prompt, entering a period cancels the command.

You have messages.
*
 

On run bconsole launches another service bacula-console.

root@pcfreak:~# ps ax |grep -i bacula-console|grep -v grep 13959 pts/5 Sl+ 0:00 /usr/sbin/bacula-console -c /etc/bacula/bconsole.conf

There are 4 tcp/ip ports via which communication between Bacula processes is done;

a) Communication from bconsole to Bacula is throigh Port Number 9101
b) Communication from bacula-dir to bacula-sd is done using Port Number 9103
c) bacula-dir to bacula-fd talks via Port Number 9102
d) Messages between Bacula-fd to bacula-sd is via port num 9103

Both of 4 ports are only listening on (127.0.0.1) / localhost and thus there is no security risk from external malicious users to enter Bacula remotely.

a) some essential commands while in bconsole shell

*show pools
Pool: name=Default PoolType=Backup
      use_cat=1 use_once=0 cat_files=1
      max_vols=0 auto_prune=1 VolRetention=1 year
      VolUse=0 secs recycle=1 LabelFormat=*None*
      CleaningPrefix=*None* LabelType=0
      RecyleOldest=0 PurgeOldest=0 ActionOnPurge=0
      MaxVolJobs=0 MaxVolFiles=0 MaxVolBytes=0
      MigTime=0 secs MigHiBytes=0 MigLoBytes=0
      JobRetention=0 secs FileRetention=0 secs
Pool: name=File PoolType=Backup
      use_cat=1 use_once=0 cat_files=1
      max_vols=100 auto_prune=1 VolRetention=1 year
      VolUse=0 secs recycle=1 LabelFormat=*None*
      CleaningPrefix=*None* LabelType=0
      RecyleOldest=0 PurgeOldest=0 ActionOnPurge=0
      MaxVolJobs=0 MaxVolFiles=0 MaxVolBytes=53687091200
      MigTime=0 secs MigHiBytes=0 MigLoBytes=0
      JobRetention=0 secs FileRetention=0 secs
Pool: name=Scratch PoolType=Backup
      use_cat=1 use_once=0 cat_files=1
      max_vols=0 auto_prune=1 VolRetention=1 year
      VolUse=0 secs recycle=1 LabelFormat=*None*
      CleaningPrefix=*None* LabelType=0
      RecyleOldest=0 PurgeOldest=0 ActionOnPurge=0
      MaxVolJobs=0 MaxVolFiles=0 MaxVolBytes=0
      MigTime=0 secs MigHiBytes=0 MigLoBytes=0
      JobRetention=0 secs FileRetention=0 secs
You have messages.

*status
Status available for:
     1: Director
     2: Storage
     3: Client
     4: All
Select daemon type for status (1-4):

*label
Automatically selected Catalog: MyCatalog
Using Catalog "MyCatalog"
Automatically selected Storage: File
Enter new Volume name:

*messages

b) Restoring Backups with bconsole

Restoring from backups is done with restore command

*restore
Automatically selected Catalog: MyCatalog
Using Catalog "MyCatalog"

First you select one or more JobIds that contain files
to be restored. You will be presented several methods
of specifying the JobIds. Then you will be allowed to
select which files from those JobIds are to be restored.

To select the JobIds, you have the following choices:
     1: List last 20 Jobs run
     2: List Jobs where a given File is saved
     3: Enter list of comma separated JobIds to select
     4: Enter SQL list command
     5: Select the most recent backup for a client
     6: Select backup for a client before a specified time
     7: Enter a list of files to restore
     8: Enter a list of files to restore before a specified time
     9: Find the JobIds of the most recent backup for a client
    10: Find the JobIds for a backup for a client before a specified time
    11: Enter a list of directories to restore for found JobIds
    12: Select full restore to a specified Job date
    13: Cancel
Select item:  (1-13):

 

Bacula can create backups on Tapes as well as tapes are still heavily used for backing data in some Banks, airports and other organizations where data is crucial.

Bacula is not among the easiest systems to create backups but for Backup administrators who work with Linux and FreeBSD it is great. Its scalability allows to make a very robust and complex backupping scheme which are hardly achievalable with other less professional backup tools like rsnapshot or rsync.
 

Share this on

Fixing insserv: warning: script ‘…’ missing LSB tags and overrides on Debian and Ubuntu Linux

Friday, April 12th, 2013

apt-get f install logo fixing warning script missing LSB tags Debian Ubuntu Linux
 

Some of packages I just tried to install on one of the Debian servers I admin failed during package (set up) configuration stage. Here is little paste with the errors due to it dpkg-reconfigure on each of newly set-up packages failed:

Setting up acct (6.5.4-2.1) ...
insserv: warning: script 'K02courier-imap' missing LSB tags and overrides
insserv: warning: script 'courier-imapd' missing LSB tags and overrides
insserv: warning: script 'iptables' missing LSB tags and overrides
insserv: warning: script 'courier-imap' missing LSB tags and overrides 

Because of this whole package install failed and the usual

# apt-get -f install

supposed to fix mess with packages end up with same errors:

insserv: warning: script 'K02courier-imap' missing LSB tags and overrides
insserv: warning: script 'iptables' missing LSB tags and overrides
insserv: warning: script 'courier-imap' missing LSB tags and overrides
insserv: There is a loop between service watchdog and iptables if stopped
insserv:  loop involving service iptables at depth 2
insserv:  loop involving service watchdog at depth 1
insserv: Stopping iptables depends on watchdog and therefore on system facility `$all' which can not be true!
insserv: exiting now without changing boot order!
update-rc.d: error: insserv rejected the script header
dpkg: error processing acct (--configure):
 subprocess installed post-installation script returned error exit status 1

T
he scripts in question iptables / courier-imap / K02courier-imap were custom created  scripts by me earlier and I have completely forgot about it. In Debian 5 and earlier I used the same scripts to make system load custom services not installed through a standard Debian package. After a bit of research, I've noticed in newer Debian / Ubuntu release, new Commented tags are included in all Debian belonging packages init scripts. Thus the reason for failing package configuration, were my custom scripts were missing those tags. To get around the situation I had to open manually each of the scripts missing init script LSB tags i.e. ( iptables / courier-imap / K02courier-imap ) and add after
#! /bin/sh

shebang;

### BEGIN INIT INFO
# Provides:          skeleton
# Required-Start:    $remote_fs $syslog
# Required-Stop:     $remote_fs $syslog
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: Example initscript
# Description:       This file should be used to construct scripts to be
#                    placed in /etc/init.d.
### END INIT INFO

Once those "boilerplate", skele comments are included to solve the mess I had to run again:

# apt-get -f install

This solves it. Enjoy :)


Share this on

How to change hostname permanently on Debian and Ubuntu Linux

Thursday, March 14th, 2013

Change hostname on Debian and Ubuntu Linux terminal hostname screenshot

I had to configure a newly purchased dedicated server from UK2. New servers cames shipped with some random assigned node hostname  like server42803. This is pretty annoying, and has to be changed especially if your company has a naming server policy in some format like; company-s1#, company-s2#, company-sN#.

Changing hostname via hosts definition file /etc/hosts to assign the IP address of the host to the hostname is not enough for changing the hostname shown in shell via SSH user login.

To display full hostname on Debian and Ubuntu, had to type:

server42803:~# hostname
server42803.uk2net.com

To change permanently server host to lets say company-s5;

server42803:~# cat /etc/hostname | \
sed -e 's#server42803.uk2net.com#company-s5#' > /etc/hostname

To change for current logged in SSH session:

server42803:~# hostname company-s5
company-s5:~#

Finally because already old hostname is red by sshd, you have to also restart sshd for new hostname to be visible on user ssh:

company-s5:~# /etc/init.d/ssh restart
...

As well as run script:

company-s5:~# /etc/init.d/hostname.sh

Mission change host accomplished, Enjoy :)

Share this on

Rsync slow data (bandwidth limit) transferring on productive Linux / *BSD servers to 2nd

Thursday, March 7th, 2013

If amount of Unique users on website has increased dramatically and Apache + PHP server starts to get user load higher than 50% in times of most users site activity then it is time to move to think of migrating data on more powerful Server hardware.

Moving few thousands of Gigabytes of PHP, JS, PNG, JPG images and plain text files data from a productive host to another puts an extra burden on hard disk Input / Output (I/O) operations, thus risking to put extraordinary server load and make websites on server inaccessible. The normal way I copy data on less busy servers is create  .tar.gz archive of data from one server and transfer with sftp or scp. In this situation, doing so however puts too much load on server and thus is risking to stone the server and make it inaccessible to users. A solution to problem is to use rsync instead, synchronizing data between the servers by instructing it to transfer data from one hard disk to another via network using a maximum read/write bandwidth.

rsync command argument specifying a maximum bandwidth is --bwlimit=KBPS

To transfer data between two servers specifyinga maximum transfer bandwidth of 10MB per second you have to pass 2MBytes as it is in megabytes (2*1024Kb) = 2048.

Hence to make the transfer while logged to current productive server via SSH to host server with IP XXX.XXX.XXX.XXX I used:
w:~# cd /home/sites
w:/home/sites# /usr/bin/rsync --bwlimit=2048 -avz -e ssh . root@XXX.XXX.XXX.XXX:/home/sites/

The arguments to above rsync command are clear enough (-e ssh) – tells to use ssh as data transfer protocol, (root@) – specifies to connect to second server with root user and (:/home/sites/) – tells rsync to transfer to remote server to same directory (/home/sites/) like from which copying.

Bear in mind that, in order this method to work, rsync has to be installed both on the server from which data is transferred and to second one to where data is transferred.
Since rsync is available in Linux as well as has port in FreeBSD / NetBSD / OpenBSD ports tree, same way to transfer "web data" while upgrading BSD OS host to another is possible.

Share this on

How to set repository to install binary packages on amd64 FreeBSD 9.1

Friday, January 11th, 2013

Though, it is always good idea to build from source for better performance of Apache + MySQL + PHP, its not worthy the time on installing minor things like; trafshow, tcpdump or deco (MC – midnight commander like native freebsd BSD program).

If you're on a 64 bit version of FreeBSD ( amd64) 9.1 and you try to install a binary package with;

freebsd# pkg_add -vr vim

Ending up with an error;

Error: Unable to get ftp://ftp.freebsd.org/pub/FreeBSD/ports/amd64/packages-9.1-release/Latest/vim.tbz: File unavailable (e.g., file not found, no access)
pkg_add: unable to fetch 'ftp://ftp.freebsd.org/pub/FreeBSD/ports/amd64/packages-9.1-release/Latest/vim.tbz' by URL
pkg_add: 1 package addition(s) failed

The error is caused by lack of special packages-9.1-release directory existing on FreeBSD.org servers. I've realized this after doing a quick manual check opening ftp://ftp.freebsd.org/pub/FreeBSD/ports/amd64. The existing URL containing working fbsd 9.1 binaries is:

ftp://ftp.freebsd.org/pub/FreeBSD/ports/amd64/packages-9-current/Latest/
h

You will have to set a repository for FreeBSD 9.1 amd64 packages manually with cmd:
freebsd# echo $SHELL
/bin/csh
freebsd# setenv PACKAGESITE ftp://ftp.freebsd.org/pub/FreeBSD/ports/amd64/packages-9-current/Latest/

If you're on bash shell use export instead:

freebsd# export PACKAGESITE="ftp://ftp.freebsd.org/pub/FreeBSD/ports/amd64/packages-9-current/Latest/"

To make ftp://ftp.freebsd.org/pub/FreeBSD/ports/amd64/packages-9-current/Latest/ as a permanent binary repository:

echo 'setenv PACKAGESITE ftp://ftp.freebsd.org/pub/FreeBSD/ports/amd64/packages-9-current/Latest/' >> /root/.cshrc

or

echo 'export PACKAGESITE="ftp://ftp.freebsd.org/pub/FreeBSD/ports/amd64/packages-9-current/Latest/"' >> /root/.bashrc

Now, pkg_add as much as you like ;)

Share this on

How to completely disable Replication in MySQL server 5.1.61 on Debian GNU / Linux

Monday, July 16th, 2012

Some time ago on one of the Database MySQL servers, I've configured replication as it was required to test somethings. Eventually it turned out replication will be not used (for some reason) it was too slow and not fitting our company needs hence we needed to disable it.

It seemed logical to me that, simply removing any replication related directives from my.cnf and a restart of the SQL server will be enough to turn replication off on the Debian Linux host. Therefore I proceeded removed all replication configs from /etc/my/my.cnf and issued MySQL restart i. e.:

sql-server:~# /etc/init.d/mysql restart
....

This however didn't turned off replication,as I thought and in phpmyadminweb frontend interface, replication was still appearing to be active in the replication tab.

Something was still making the SQL server still act as an Replication Slave Host, so after a bit of pondering and trying to remember, the exact steps I took to make the replication work on the host I remembered that actually I issued:

mysql> START SLAVE;

Onwards I run:

mysql> SHOW SLAVE STATUS;
....

and found in the database the server was still running in Slave Replication mode

Hence to turn off the db host run as a Slave, I had to issue in mysql cli:

mysql> STOP SLAVE;
Query OK, 0 rows affected, 1 warning (0.01 sec)
mysql> RESET SLAVE;
Query OK, 0 rows affected, 1 warning (0.01 sec)

Then after a reload of SQL server in memory, the host finally stopped working as a Slave Replication host, e.g.

sql-server:~# /etc/init.d/mysql restart
....

After the restart, to re-assure myself the SQL server is no more set to run as MySQL replication Slave host:

mysql> SHOW SLAVE STATUS;
Empty set (0.00 sec)

Cheers ;)

Share this on

Resolving “nf_conntrack: table full, dropping packet.” flood message in dmesg Linux kernel log

Wednesday, March 28th, 2012

On many busy servers, you might encounter in /var/log/syslog or dmesg kernel log messages like

nf_conntrack: table full, dropping packet

to appear repeatingly:

[1737157.057528] nf_conntrack: table full, dropping packet.
[1737157.160357] nf_conntrack: table full, dropping packet.
[1737157.260534] nf_conntrack: table full, dropping packet.
[1737157.361837] nf_conntrack: table full, dropping packet.
[1737157.462305] nf_conntrack: table full, dropping packet.
[1737157.564270] nf_conntrack: table full, dropping packet.
[1737157.666836] nf_conntrack: table full, dropping packet.
[1737157.767348] nf_conntrack: table full, dropping packet.
[1737157.868338] nf_conntrack: table full, dropping packet.
[1737157.969828] nf_conntrack: table full, dropping packet.
[1737157.969928] nf_conntrack: table full, dropping packet
[1737157.989828] nf_conntrack: table full, dropping packet
[1737162.214084] __ratelimit: 83 callbacks suppressed

There are two type of servers, I've encountered this message on:

1. Xen OpenVZ / VPS (Virtual Private Servers)
2. ISPs – Internet Providers with heavy traffic NAT network routers
 

I. What is the meaning of nf_conntrack: table full dropping packet error message

In short, this message is received because the nf_conntrack kernel maximum number assigned value gets reached.
The common reason for that is a heavy traffic passing by the server or very often a DoS or DDoS (Distributed Denial of Service) attack. Sometimes encountering the err is a result of a bad server planning (incorrect data about expected traffic load by a company/companeis) or simply a sys admin error…

- Checking the current maximum nf_conntrack value assigned on host:

linux:~# cat /proc/sys/net/ipv4/netfilter/ip_conntrack_max
65536

- Alternative way to check the current kernel values for nf_conntrack is through:

linux:~# /sbin/sysctl -a|grep -i nf_conntrack_max
error: permission denied on key 'net.ipv4.route.flush'
net.netfilter.nf_conntrack_max = 65536
error: permission denied on key 'net.ipv6.route.flush'
net.nf_conntrack_max = 65536

- Check the current sysctl nf_conntrack active connections

To check present connection tracking opened on a system:

:

linux:~# /sbin/sysctl net.netfilter.nf_conntrack_count
net.netfilter.nf_conntrack_count = 12742

The shown connections are assigned dynamicly on each new succesful TCP / IP NAT-ted connection. Btw, on a systems that work normally without the dmesg log being flooded with the message, the output of lsmod is:

linux:~# /sbin/lsmod | egrep 'ip_tables|conntrack'
ip_tables 9899 1 iptable_filter
x_tables 14175 1 ip_tables

On servers which are encountering nf_conntrack: table full, dropping packet error, you can see, when issuing lsmod, extra modules related to nf_conntrack are shown as loaded:

linux:~# /sbin/lsmod | egrep 'ip_tables|conntrack'
nf_conntrack_ipv4 10346 3 iptable_nat,nf_nat
nf_conntrack 60975 4 ipt_MASQUERADE,iptable_nat,nf_nat,nf_conntrack_ipv4
nf_defrag_ipv4 1073 1 nf_conntrack_ipv4
ip_tables 9899 2 iptable_nat,iptable_filter
x_tables 14175 3 ipt_MASQUERADE,iptable_nat,ip_tables

 

II. Remove completely nf_conntrack support if it is not really necessery

It is a good practice to limit or try to omit completely use of any iptables NAT rules to prevent yourself from ending with flooding your kernel log with the messages and respectively stop your system from dropping connections.

Another option is to completely remove any modules related to nf_conntrack, iptables_nat and nf_nat.
To remove nf_conntrack support from the Linux kernel, if for instance the system is not used for Network Address Translation use:

/sbin/rmmod iptable_nat
/sbin/rmmod ipt_MASQUERADE
/sbin/rmmod rmmod nf_nat
/sbin/rmmod rmmod nf_conntrack_ipv4
/sbin/rmmod nf_conntrack
/sbin/rmmod nf_defrag_ipv4

Once the modules are removed, be sure to not use iptables -t nat .. rules. Even attempt to list, if there are any NAT related rules with iptables -t nat -L -n will force the kernel to load the nf_conntrack modules again.

Btw nf_conntrack: table full, dropping packet. message is observable across all GNU / Linux distributions, so this is not some kind of local distribution bug or Linux kernel (distro) customization.
 

III. Fixing the nf_conntrack … dropping packets error

- One temporary, fix if you need to keep your iptables NAT rules is:

linux:~# sysctl -w net.netfilter.nf_conntrack_max=131072

I say temporary, because raising the nf_conntrack_max doesn't guarantee, things will get smoothly from now on.
However on many not so heavily traffic loaded servers just raising the net.netfilter.nf_conntrack_max=131072 to a high enough value will be enough to resolve the hassle.

- Increasing the size of nf_conntrack hash-table

The Hash table hashsize value, which stores lists of conntrack-entries should be increased propertionally, whenever net.netfilter.nf_conntrack_max is raised.

linux:~# echo 32768 > /sys/module/nf_conntrack/parameters/hashsize
The rule to calculate the right value to set is:
hashsize = nf_conntrack_max / 4

- To permanently store the made changes ;a) put into /etc/sysctl.conf:

linux:~# echo 'net.netfilter.nf_conntrack_count = 131072' >> /etc/sysctl.conf
linux:~# /sbin/sysct -p

b) put in /etc/rc.local (before the exit 0 line):

echo 32768 > /sys/module/nf_conntrack/parameters/hashsize

Note: Be careful with this variable, according to my experience raising it to too high value (especially on XEN patched kernels) could freeze the system.
Also raising the value to a too high number can freeze a regular Linux server running on old hardware.

- For the diagnosis of nf_conntrack stuff there is ;

/proc/sys/net/netfilter kernel memory stored directory. There you can find some values dynamically stored which gives info concerning nf_conntrack operations in "real time":

linux:~# cd /proc/sys/net/netfilter linux:/proc/sys/net/netfilter# ls -al nf_log/
total 0
dr-xr-xr-x 0 root root 0 Mar 23 23:02 ./
dr-xr-xr-x 0 root root 0 Mar 23 23:02 ../
-rw-r--r-- 1 root root 0 Mar 23 23:02 0
-rw-r--r-- 1 root root 0 Mar 23 23:02 1
-rw-r--r-- 1 root root 0 Mar 23 23:02 10
-rw-r--r-- 1 root root 0 Mar 23 23:02 11
-rw-r--r-- 1 root root 0 Mar 23 23:02 12
-rw-r--r-- 1 root root 0 Mar 23 23:02 2
-rw-r--r-- 1 root root 0 Mar 23 23:02 3
-rw-r--r-- 1 root root 0 Mar 23 23:02 4
-rw-r--r-- 1 root root 0 Mar 23 23:02 5
-rw-r--r-- 1 root root 0 Mar 23 23:02 6
-rw-r--r-- 1 root root 0 Mar 23 23:02 7
-rw-r--r-- 1 root root 0 Mar 23 23:02 8
-rw-r--r-- 1 root root 0 Mar 23 23:02 9

 

IV. Decreasing other nf_conntrack NAT time-out values to prevent server against DoS attacks

Generally, the default value for nf_conntrack_* time-outs are (unnecessery) large.
Therefore, for large flows of traffic even if you increase nf_conntrack_max, still shorty you can get a nf_conntrack overflow table resulting in dropping server connections. To make this not happen, check and decrease the other nf_conntrack timeout connection tracking values:

linux:~# sysctl -a | grep conntrack | grep timeout
net.netfilter.nf_conntrack_generic_timeout = 600
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 120
net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 60
net.netfilter.nf_conntrack_tcp_timeout_established = 432000
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 60
net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_close = 10
net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300
net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300
net.netfilter.nf_conntrack_udp_timeout = 30
net.netfilter.nf_conntrack_udp_timeout_stream = 180
net.netfilter.nf_conntrack_icmp_timeout = 30
net.netfilter.nf_conntrack_events_retry_timeout = 15
net.ipv4.netfilter.ip_conntrack_generic_timeout = 600
net.ipv4.netfilter.ip_conntrack_tcp_timeout_syn_sent = 120
net.ipv4.netfilter.ip_conntrack_tcp_timeout_syn_sent2 = 120
net.ipv4.netfilter.ip_conntrack_tcp_timeout_syn_recv = 60
net.ipv4.netfilter.ip_conntrack_tcp_timeout_established = 432000
net.ipv4.netfilter.ip_conntrack_tcp_timeout_fin_wait = 120
net.ipv4.netfilter.ip_conntrack_tcp_timeout_close_wait = 60
net.ipv4.netfilter.ip_conntrack_tcp_timeout_last_ack = 30
net.ipv4.netfilter.ip_conntrack_tcp_timeout_time_wait = 120
net.ipv4.netfilter.ip_conntrack_tcp_timeout_close = 10
net.ipv4.netfilter.ip_conntrack_tcp_timeout_max_retrans = 300
net.ipv4.netfilter.ip_conntrack_udp_timeout = 30
net.ipv4.netfilter.ip_conntrack_udp_timeout_stream = 180
net.ipv4.netfilter.ip_conntrack_icmp_timeout = 30

All the timeouts are in seconds. net.netfilter.nf_conntrack_generic_timeout as you see is quite high – 600 secs = (10 minutes).
This kind of value means any NAT-ted connection not responding can stay hanging for 10 minutes!

The value net.netfilter.nf_conntrack_tcp_timeout_established = 432000 is quite high too (5 days!)
If this values, are not lowered the server will be an easy target for anyone who would like to flood it with excessive connections, once this happens the server will quick reach even the raised up value for net.nf_conntrack_max and the initial connection dropping will re-occur again …

With all said, to prevent the server from malicious users, situated behind the NAT plaguing you with Denial of Service attacks:

Lower net.ipv4.netfilter.ip_conntrack_generic_timeout to 60 – 120 seconds and net.ipv4.netfilter.ip_conntrack_tcp_timeout_established to stmh. like 54000

linux:~# sysctl -w net.ipv4.netfilter.ip_conntrack_generic_timeout = 120
linux:~# sysctl -w net.ipv4.netfilter.ip_conntrack_tcp_timeout_established = 54000

This timeout should work fine on the router without creating interruptions for regular NAT users. After changing the values and monitoring for at least few days make the changes permanent by adding them to /etc/sysctl.conf

linux:~# echo 'net.ipv4.netfilter.ip_conntrack_generic_timeout = 120' >> /etc/sysctl.conf
linux:~# echo 'net.ipv4.netfilter.ip_conntrack_tcp_timeout_established = 54000' >> /etc/sysctl.conf

 

Share this on

How to disable tidy HTML corrector and validator to output error and warning messages

Sunday, March 18th, 2012

I've noticed in /var/log/apache2/error.log on one of the Debian servers I manage a lot of warnings and errors produced by tidy HTML syntax checker and reformatter program.

There were actually quite plenty frequently appearing messages in the the log like:

...
To learn more about HTML Tidy see http://tidy.sourceforge.net
Please fill bug reports and queries using the "tracker" on the Tidy web site.
Additionally, questions can be sent to html-tidy@w3.org
HTML and CSS specifications are available from http://www.w3.org/
Lobby your company to join W3C, see http://www.w3.org/Consortium
line 1 column 1 - Warning: missing <!DOCTYPE> declaration
line 1 column 1 - Warning: plain text isn't allowed in <head> elements
line 1 column 1 - Info: <head> previously mentioned
line 1 column 1 - Warning: inserting implicit <body>
line 1 column 1 - Warning: inserting missing 'title' element
Info: Document content looks like HTML 3.2
4 warnings, 0 errors were found!
...

I did a quick investigation on where from this messages are logged in error.log, and discovered few .php scripts in one of the websites containing the tidy string.
I used Linux find + grep cmds find in all php files the "tidy "string, like so:

server:~# find . -iname '*.php'-exec grep -rli 'tidy' '{}' \;
find . -iname '*.php' -exec grep -rli 'tidy' '{}' \; ./new_design/modules/index.mod.php
./modules/index.mod.php
./modules/index_1.mod.php
./modules/index1.mod.php

Opening the files, with vim to check about how tidy is invoked, revealed tidy calls like:

exec('/usr/bin/tidy -e -ashtml -utf8 '.$tmp_name,$rett);

As you see the PHP programmers who wrote this website, made a bigtidy mess. Instead of using php5's tidy module, they hard coded tidy external command to be invoked via php's exec(); external tidy command invocation.
This is extremely bad practice, since it spawns the command via a pseudo limited apache shell.
I've notified about the issue, but I don't know when, the external tidy calls will be rewritten.

Until the external tidy invocations are rewritten to use the php tidy module, I decided to at least remove the tidy warnings and errors output.

To remove the warning and error messages I've changed:

exec('/usr/bin/tidy -e -ashtml -utf8 '.$tmp_name,$rett);

exec('/usr/bin/tidy --show-warnings no --show-errors no -q -e -ashtml -utf8 '.$tmp_name,$rett);

The extra switches meaning is like so:

q – instructs tidy to produce quiet output
-e – show only errors and warnings
–show warnings no && –show errors no, completely disable warnings and error output

Onwards tidy no longer logs junk messages in error.log Not logging all this useless warnings and errors has positive effect on overall server performance especially, when the scripts, running /usr/bin/tidy are called as frequently as 1000 times per sec. or more

Share this on

How to upgrade single package with their dependencies on Debian and Ubuntu Linux

Friday, March 16th, 2012

Debian GNU / Linux apt-get upgrade a package selection of a whole bunch of packages ready to upgrade apt artistic logo

Are you a Debian System Administrator and you recently run apt-get upgrade && apt-get upgrade finding out there are plenty of new packagesfor upgrade? Do you need only a pre-selected number of packages to upgrade with apt?
I run apt-get update && apt-get upgrade on one of our company Debian servers, just to see there are a number of packages to be upgraded among which there was some I didn't wanted to upgrade. Here is a little paste output from apt-get upgrade:

debian:~# apt-get update && apt-get upgrade
Hit http://security.debian.org squeeze/updates Release.gpg
...
Hit http://security.debian.org squeeze/updates/main amd64 Packages
Fetched 128 kB in 0s (441 kB/s)
Reading package lists... Done
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages will be upgraded:
at imagemagick libdbd-pg-perl libfreetype6 libmagickcore3 libmagickcore3-extra libmagickwand3 libmysqlclient16 mysql-client
mysql-client-5.1 mysql-common mysql-server mysql-server-5.1 mysql-server-core-5.1
Do you want to continue [Y/n]
14 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

From first sight it seems logical to issue apt-get upgrade packagename to upgrade only single package with its package dependencies, instead of the whole group the above packs. However doing:
apt-get upgrade imagemagick will still try to upgrade all the packages instead of just imagemagick and its dependency package deb libmagickcore3

debian:~# apt-get upgrade imagemagick
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages will be upgraded:
at imagemagick libdbd-pg-perl libfreetype6 libmagickcore3 libmagickcore3-extra libmagickwand3 libmysqlclient16 mysql-client
mysql-client-5.1 mysql-common mysql-server mysql-server-5.1 mysql-server-core-5.1
14 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Do you want to continue [Y/n]

Doing all package,upgrade is not a good idea in my case, since upgrading mysql-server will require a MySQL server restart (something which we cannot afford to do right now) on this production server.
MySQL server restart during upgrade is never a good idea especially on productive busy (heavy loaded) SQL servers.
A restart of the MySQL server serving thousands of requests per second could lead often to crashed tables and hence temporary server downtime etc.

Still it is a good idea to upgrade the rest of packages with their newer versions. For exmpl. to upgrade; imagemagick, at , libfreetype6 and so on.

In order to upgrade only this 3 ones and their respective package dependencies, issue:

debian:~# apt-get --yes install imagemagick at libfreetype6

Repeat the apt-get install command with passing all the single package name you want to be upgraded and voila you're done :) .
Be sure the apt-get install packagename upgrade doesn't require also upgrade of myssql-server, mysql-client, mysql-common or mysql-server-core-5.1 or any of the package name you want to preserve from upgrading.

Share this on

How to make a mirror of website on GNU / Linux with wget / Few tips on wget site mirroring

Wednesday, February 22nd, 2012

Everyone who used Linux is probably familiar with wget or has used this handy download console tools at least thousand of times. Not so many Desktop GNU / Linux users like Ubuntu and Fedora Linux users had tried using wget to do something more than single files download.
Actually wget is not so popular as it used to be in earlier linux days. I've noticed the tendency for newer Linux users to prefer using curl (I don't know why).

With all said I'm sure there is plenty of Linux users curious on how a website mirror can be made through wget.
This article will briefly suggest few ways to do website mirroring on linux / bsd as wget is both available on those two free operating systems.

1. Most Simple exact mirror copy of website

The most basic use of wget's mirror capabilities is by using wget's -mirror argument:

# wget -m http://website-to-mirror.com/sub-directory/

Creating a mirror like this is not a very good practice, as the links of the mirrored pages will still link to external URLs. In other words link URL will not pointing to your local copy and therefore if you're not connected to the internet and try to browse random links of the webpage you will end up with many links which are not opening because you don't have internet connection.

2. Mirroring with rewritting links to point to localhost and in between download page delay

Making mirror with wget can put an heavy load on the remote server as it fetches the files as quick as the bandwidth allows it. On heavy servers rapid downloads with wget can significantly reduce the download server responce time. Even on a some high-loaded servers it can cause the server to hang completely.
Hence mirroring pages with wget without explicity setting delay in between each page download, could be considered by remote server as a kind of DoS – (denial of service) attack. Even some site administrators have already set firewall rules or web server modules configured like Apache mod_security which filter requests to IPs which are doing too frequent HTTP GET /POST requests to the web server.
To make wget delay with a 10 seconds download between mirrored pages use:

# wget -mk -w 10 -np --random-wait http://website-to-mirror.com/sub-directory/

The -mk stands for -m/-mirror and -k / shortcut argument for –convert-links (make links point locally), –random-wait tells wget to make random waits between o and 10 seconds between each page download request.
3. Mirror / retrieve website sub directory ignoring robots.txt "mirror restrictions"Some websites has a robots.txt which restricts content download with clients like wget, curl or even prohibits, crawlers to download their website pages completely.

/robots.txt restrictions are not a problem as wget has an option to disable robots.txt checking when downloading.
Getting around the robots.txt restrictions with wget is possible through -e robots=off option.
For instance if you want to make a local mirror copy of the whole sub-directory with all links and do it with a delay of 10 seconds between each consequential page request without reading at all the robots.txt allow/forbid rules:

# wget -mk -w 10 -np -e robots=off --random-wait http://website-to-mirror.com/sub-directory/

4. Mirror website which is prohibiting Download managers like flashget, getright, go!zilla etc.

Sometimes when try to use wget to make a mirror copy of an entire site domain subdirectory or the root site domain, you get an error similar to:

Sorry, but the download manager you are using to view this site is not supported.
We do not support use of such download managers as flashget, go!zilla, or getright

This message is produced by the site dynamic generation language PHP / ASP / JSP etc. used, as the website code is written to check on the browser UserAgent sent.
wget's default sent UserAgent to the remote webserver is:
Wget/1.11.4

As this is not a common desktop browser useragent many webmasters configure their websites to only accept well known established desktop browser useragents sent by client browsers.
Here are few typical user agents which identify a desktop browser:
 

  • Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0
  • Mozilla/5.0 (X11; Linux i686; rv:6.0) Gecko/20100101 Firefox/6.0
  • Mozilla/6.0 (Macintosh; I; Intel Mac OS X 11_7_9; de-LI; rv:1.9b4) Gecko/2012010317 Firefox/10.0a4
  • Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:2.2a1pre) Gecko/20110324 Firefox/4.2a1pre

etc. etc.

If you're trying to mirror a website which has implied some kind of useragent restriction based on some "valid" useragent, wget has the -U option enabling you to fake the useragent.

If you get the Sorry but the download manager you are using to view this site is not supported , fake / change wget's UserAgent with cmd:

# wget -mk -w 10 -np -e robots=off \
--random-wait
--referer="http://www.google.com" \--user-agent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6" \--header="Accept:text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5" \--header="Accept-Language: en-us,en;q=0.5" \--header="Accept-Encoding: gzip,deflate" \--header="Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7" \--header="Keep-Alive: 300"

For the sake of some wget anonimity – to make wget permanently hide its user agent and pretend like a Mozilla Firefox running on MS Windows XP use .wgetrc like this in home directory.

5. Make a complete mirror of a website under a domain name

To retrieve complete working copy of a site with wget a good way is like so:

# wget -rkpNl5 -w 10 --random-wait www.website-to-mirror.com

Where the arguments meaning is:
-r – Retrieve recursively
-k – Convert the links in documents to make them suitable for local viewing
-p – Download everything (inline images, sounds and referenced stylesheets etc.)
-N – Turn on time-stamping
-l5 – Specify recursion maximum depth level of 5

6. Make a dynamic pages static site mirror, by converting CGI, ASP, PHP etc. to HTML for offline browsing

It is often websites pages are ending in a .php / .asp / .cgi … extensions. An example of what I mean is for instance the URL http://php.net/manual/en/tutorial.php. You see the url page is tutorial.php once mirrored with wget the local copy will also end up in .php and therefore will not be suitable for local browsing as .php extension is not understood how to interpret by the local browser.
Therefore to copy website with a non-html extension and make it offline browsable in HTML there is the –html-extension option e.g.:

# wget -mk -w 10 -np -e robots=off \
--random-wait \
--convert-links http://www.website-to-mirror.com

A good practice in mirror making is to set a download limit rate. Setting such rate is both good for UP and DOWN side (the local host where downloading and remote server). download-limit is also useful when mirroring websites consisting of many enormous files (documental movies, some music etc.).
To set a download limit to add –limit-rate= option. Passing by to wget –limit-rate=200K would limit download speed to 200KB.

Other useful thing to assure wget has made an accurate mirror is wget logging. To use it pass -o ./my_mirror.log to wget.
 

Share this on