Saturday, 27th April 2024

Comment posted How to make a mirror of website on GNU / Linux with wget / Few tips on wget site mirroring by .

Recent comments by

Share this on:

Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

No Responses to “…”

  1. for|Thank you for this posting as well as the many others which I have read through your website. Have you ever thought about being a guest author. My small site could definitely use someone with your background to write every now and then. You truly know says:
    Firefox 3.5.3 Firefox 3.5.3 Windows 7 Windows 7
    Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3

    The true dilemma with malware and spyware is always that the majority of folks are extremely care-free when it comes to eliminating spyware from their pc. Perhaps a ton of individuals are not really very techie, but using some of the guidance you have supplied it should be simple to remove viruses.

    View CommentView Comment
  2. admin says:
    Firefox 52.0 Firefox 52.0 GNU/Linux x64 GNU/Linux x64
    Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0

    A very good option in case of mirroring websites which had some anti-mirroring rules in robots.txt (e.g. are trying to prevent you to download their content, prevent themselves against data theft) is:

    robots=off

    Just add it to the mirroring wget line, like so:


    wget -e robots=off -mk -w 10 -np --random-wait http://www.website-that-we-will-mirror.com

    Also a very helpful option in mirroring if you have some anti content stealing integrated rules, that check the incoming user-agent and are filtering user agents such as Teleport or Wget is to use:

    –user-agent option, below is how to mirror a website that has integrated some basic security against content stealing – please don’t use this for evil deeds, but keep the mirrored data for your personal use:

    wget -e robots=off -mk -w 10 -np --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0" --random-wait http://www.website-that-we-will-mirror.com

    View CommentView Comment