Archive for May 4th, 2010

Mirroring web site content ignoring the robots.txt prohibition rules with wget on Linux

Tuesday, May 4th, 2010

I wanted to mirror a content of a website which included a robots.txt file with specificdirectories Disallow rules e.g. ,it included some code like for instance:

User-agent: *
Disallow: /privatedir/

Since the restriction on automated downloads on /privatedir/ was at hand I needed toget around the restriction using some command line downloaded like wget .After a quick look online I found the wget FAQ which included a good description on how to ignore the robots rules in robots.txt.
Furthermore I consulted with wget‘s manual because I wanted to mirror only a partfrom the whole website (mirror only a data of a certain directory). Finally I ended with the following wget rule which got me around robots.txt Disallow restrictions:

freebsd# wget -e robots=off --wait 3 --mirror --level 1 --convert-links http://www.domaincom/privatedir/index.html

Issuing the above command mirrored the whole privatedir without any restrains, here is what does the option convert-links does:

–convert-links’ – After the download is complete, convert the links in the document to make them suitable for local viewing.This affects not only the visible hyperlinks, but any part of the document that links to external content, such as embedded images,links to style sheets, hyperlinks to non-HTML content, etc.

Also as you can see from the above command line I’ve used the “–wait 3” because I wanted to be sure that some mod rewrite regular expression rules on the server won’t cut my access to the /privatedir/ directory, because of the rapid file fetch.
The ignore of the robots.txt itself is done via the:
-e robots=off wget parameter.

How to fix unbootable Windows with “Windows could not start because the following file is missing” \WINDOWS\SYSTEM32\CONFIG\SYSTEM

Tuesday, May 4th, 2010

The Desktop computer system that my sister is using is running a Windows XP Professional Service Pack 2 (SP2).
The Windows installation is almost 2 years old, however I was really surprised how the damned Microsoft software broke.
Here is how, one day I got really mad at my sister she completely drove me out of myself.
Being affected by her continuous unethical behaviour I decided to return it back to her and logged in with Window’s
administrator account and changed her password.
The Spybot Search and Destroy (S&D) spyware active protection (Tea Timer) warned me that some registry settings will be changed whileI was changing my syster’s Windows password and I accepted the change.Hereafter I restarted the system afterwards and guess what? Windows couldn’t boot anymore!
Let me ask you a question is that unsual for the shitty Windows operating system? NO IT’S ABSOLUTELY NORMAL :)!
That pissed me off a bit so I left the machine with unbootable Windows System for a few weeks ’till today.
The error message which occured during Windows boot time was:

Windows could not start because the following file is missing
or corrupt:
WINDOWSSYSTEM32CONFIGSYSTEM

You can attempt to repair this file by starting Windows Setup
using the original CD-ROM.
Select ‘r’ at the first screen to start repair.

To fix the issue I had to call a friend (Alex) and ask him for a Windows XP SP2 install cd.
We used the Windows System Recovery console to boot up and access the file system. After the Recovery Console loadedwe tried to switch to the C: drive but the hard drive was taking ages scraping through the drive, with thehdd led indicator blinking all the time.
First I suspected something could be wrong with the hard drive on the physical hdd layer. However I instructed Alex,to issue the CHKDSK command to see if that would do any good.

That’s it the good old CHKDSK fixed the file system issues and we rebooted. And hooray such a joy!
Unbelievable the System worked again! Hooray! 🙂