Posts Tagged ‘boot process’

Recover/Restore unbootable GRUB boot loader on Debian Testing GNU/Linux using Linux LiveCD (Debian Install CD1)

Tuesday, July 6th, 2010

I’ve recently broke my grub untentianally while whiping out one of my disk partitions who was prepared to run a hackintosh.
Thus yesterday while switching on my notebook I was unpleasently surprised with an error Grub Error 17 and the boot process was hanging before it would even get to grub’s OS select menu.

That was nasty and gave me a big headache, since I wasn’t even sure if my partitions are still present.
What made things even worse that I haven’t created any backups preliminary to prepare for an emergency!
Thus restoring my system was absolutely compulsory at any cost.
In recovering the my grub boot manager I have used as a basis of my recovery an article called How to install Grub from a live Ubuntu cd
Though the article is quite comprehensive, it’s written a bit foolish, probably because it targets Ubuntu novice users 🙂
Another interesting article that gave me a hand during solving my issues was HOWTO: install grub with a chroot
Anyways, My first unsuccessful attempt was following a mix of the aforementioned articles and desperately trying to chroot to my mounted unbootable partition in order to be able to rewrite the grub loader in my MBR.
The error that slap me in my face during chroot was:

chroot: cannot execute /bin/sh : exec format error

Ghh Terrible … After reasoning on the shitty error I came to the conclusion that probably the livecd I’m trying to chroot to my debian linux partition is probably using a different incompatible version of glibc , if that kind of logic is true I concluded that it’s most likely that the glibc on my Linux system is newer (I came to that assumption because I was booting from livecds (Elive, Live CentOS as well Sabayon Linux, which were burnt about two years ago).

To proof my guesses I decided to try using Debian testing Squeeze/Sid install cd . That is the time to mention that I’m running Debian testing/unstable under the code name (Squeeze / Sid).
I downloaded the Debian testing amd64 last built image from here burnt it to a cd on another pc.
And booted it to my notebook, I wasn’t completely sure if the Install CD would have all the necessary recovery tools that I would need to rebuilt my grub but eventually it happened that the debian install cd1 has everything necessary for emergency situations like this one.

After I booted from the newly burned Debian install cd I followed the following recovery route to be able to recovery my system back to normal.It took me a while until I come with the steps described here, but I won’t get into details for brevity

1. Make new dir where you intend to mount your Linux partition and mount /proc, /dev, /dev/pts filesystems and the partition itself

noah:~# mkdir /mnt/root
noah:~# mount -t ext3 /dev/sda8 /mnt/root
noah:~# mount -o bind /dev /mnt/root/dev
noah:~# mount devpts /dev/pts -t devpts

Change /dev/sda8 in the above example commands with your partition name and number.
2. chroot to the mounted partition in order to be able to use your filesystem, exactly like you normally use it when you’re using your Linux partition

noah:~# chroot /mnt/root /bin/bash

Hopefully now you should be in locked in your filesystem and use your Linux non-bootable system as usual.

Being able to access your /boot/grub directory I suggest you first check that everything inside:

/boot/grub/menu.lst is well defined and there are no problems with the paths to the Linux partitions.

Next issue the following commands which will hopefully recover your broken grub boot loader.

noah:~# grub
noah:~# find /boot/grub/stage1

The second command find /boot/grub/stage1 should provide you with your partitions range e.g. it should return something like:

root (hd0,7)

Nevertheless in my case instead of the expected root (hd0,7) , I was returned

/boot/grub/stage1 not found

Useless to say this is uncool 🙂

As a normal reaction I tried experimenting in order to fix the mess. Logically enough I tried to reinstall grub using the

noah:~# grub-install --root-directory=/boot /dev/sda
noah:~# update-grub

To check if that would fix my grub issues I restarted my notebook. Well now grub menu appeared with some error generated by splashy
Trying to boot any of the setup Linux kernels was failing with some kind of error where the root file system was trying to be loaded from /root directory instead of the normal / because of that neither /proc /dev and /sys filesystems was unable to be mounted and the boot process was interrupting in some kind of rescue mode similar to busybox, though it was a was less flexible than a normal busybox shell.

To solve that shitty issue I once again booted with the Debian Testing (Sid / Squeeze ) Install CD1 and used the commands displayed above to mount my linux partition.

Next I reinstalled the following packages:

noah:~# apt-get update
noah:~# apt-get install --reinstall linux-image-amd64 uswsusp hibernate grub grub-common initramfs-tools

Here the grub reinstall actually required me to install the new grub generation 2 (version 2)
It was also necessary to remove the splashy

noah:~# apt-get remove splashy
As well as to grep through all my /etc/ and look for a /dev/sda6 and substitute it with my changed partition name /dev/sda8

One major thing where I substituted /dev/sda6 to my actual linux partition now with a name /dev/sda8 was in:

initramfs-tools/conf.d/resumeThe kernel reinstall and consequently (update) does offered me to substitute my normal /dev/sda* content in my /etc/fstab to some UUIDS like UUID=ba6058da-37f8-4065-854b-e3d0a874fb4e

Including this UUIDs and restarting now rendered my system completely unbootable … So I booted once again from the debian install cd .. arrgh 🙂 and removed the UUID new included lines in /etc/fstab and left the good old declarations.
After rebooting the system now my system booted once again! Hooray! All my data and everything is completely intact now Thanks God! 🙂

The end of the work week :)

Friday, February 1st, 2008

One more week passed without serious server problems. Yesterday after upgrade to debian 4.0rc2 with

apt-get dist-upgrade and reboot the pc-freak box became unbootable.

I wasn’t able to fix it until today because the machine’s box seemed not to read cds well.The problem was consisted of this that after the boot process of the linux kernel has started the machine the boot up was interrupted with a message saying
/sbin/init is missing

and I was dropped to a busybox without being able to read nothing from my filesystem.Thankfully nomen came to Dobrich for the weekend and today he bring me his cdrom-drive I booted with the debian.

Using Debian’s linux rescue I mounted the partition to check what’s wrong. I suspected something is terribly wrong with the lilo’s conf.

Looking closely to it I saw it’s the lilo conf file it was setupped to load a initrd for the older kernel. changing the line to thenew initrd in /etc/lilo.conf and rereading the lilo; /sbin/lilo -C; /sbin/lilo;

fixed the mess and pc-freak booted succesfully! 🙂

Yesterday I had to do something kinky. It was requested from a client to have access to a mysql service of one of the company servers,the problem was that the client didn’t have static IP so I didn’t have a good way to put into the current firewall.

Everytime the adsl they use got restarted a new absolutely random IP from all the BTC IP ranges was assigned.

The solution was to make a port redirect to a non-standard mysql port (XXXXX) which pointed to the standard 3306 service. I had to tell the firewall not to check the coming IPs on the non-standard port (XXXXX) against the 3306 service fwall rules.

Thanks to the help of a guy inirc.freenode.net #iptables jengelh I figured out the solution.

To complete the requested task it was needed to mark all packagescoming into port (XXXXX) using the iptables mangle option and to add a rule to ACCEPT all marked packages.

The rules looked like this

/sbin/iptables -t mangle -A PREROUTING -p tcp –dport XXXXX -j MARK –set-mark 123456/sbin/iptables -t nat -A PREROUTING -d EXTERNAL_IP -i eth0 -p tcp –dport XXXXX -j DNAT –to-destination EXTERNAL_IP:3306

/sbin/iptables -t filter -A INPUT -p tcp –dport 3306 -m mark –mark 123456 -j ACCEPT .

Something I wondered a bit was should /proc/sys/net/ipv4/ip_forward in order for the above redirect to be working, in case you’re wondering too well it doesn’t 🙂 The working week was a sort of quiteful no serious problems with servers and work no serious problems at school (although I see me and my collegues become more and more unserious) at studying. My grand parentsdecided to make me a gift and give me money to buy a laptop and I’m pretty happy for this 🙂 All that is left is to choose a good machine with hardware supported both by FreeBSD and Linux.

END—–

How to enable AUTO fsck (ext3, ext4, reiserfs, LVM filesystems) checking on Linux boot through /etc/fstab

Tuesday, July 12th, 2011

How to auto FSCK manual fsck screenshot

Are you an administrator of servers and it happens a server is DOWN.
You request the Data Center to reboot, however suddenly the server fails to boot properly and you have to request for IPKVM or some web java interface to directly access the server physical terminal …

This is a very normal admin scenario and many people who have worked in the field of remote system administrators (like me), should have experienced that bad times multiple times.

Sadly enough only a insignifant number of administrators try to do their best to reduce this down times to resolve client stuff downtime but prefer spending time playing the ztype! game or watching some porn website 😉

Anyways there are plenty of things like Server Auto Reboot on Crash with software Watchdog etc., that we as sysadmins can do to reduce server downtimes and most of the manual human interactions on server boot time.

In that manner of thougts a very common thing when setting up a new Linux server that many server admins forget or don’t know is to enable all the server partition filesystems to be auto fscked during server boot time.

By not enabling the auto filesystem check options in Linux the server filesystems did not automatically scan and fix hard drive partitions for fs innode inconsistencies.
Even though the filesystems are tuned to automatically get checked on every 38 system reboots, still if some kind of filesystem errors are found that require a manual confirmation the boot process is interrupted and the admin ends up with a server which is not reachable remotely via ssh !

For the remote system administrator, this times are a terrible times of waitings, prayers and hopes that the server hardware is fine 😉 as well as being on hold to get a KVM to get into the server manually and enter the necessery input to fsck prompt.

Many of this bad times can be completely avoided with a very simple fix through /etc/fstab by enabling all server partitions containing any filesystem to be automatically checked and fixed in case if inconsistencies or errors are found by fsck.ext3, fsck.ext4, fsck.reiserfs etc. commands.

A very typical default /etc/fstab file you will find on many servers should look something like:

/dev/sda8 / ext3 errors=remount-ro 0 1
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
/dev/sda1 /home ext3 defaults 0 0

Notice the line:
/dev/sda1 /home ext3 defaults 0 0

The first column in the example contains the device name, the second one its mount point, third its filesystem type, fourth the mount options, fifth (a number) dump options, and sixth (another number) filesystem check options. Let’s take a closer look at this stuff.

The ones which are interesting to enable auto fsck checking and error resolving is provided usually by the last sixth variable (filesystem check option) which in the above example equals 0 .

When the filesystem check option equals 0 this means the auto fsck and repair for the respective filesystem is disabled.
Some time in the past the dump backup option (5th option in the example) was also used but as far as I can understand today it’s not that important in modern GNU/Linux distributions.

Now having the above sample crontab in order to enable the fsck file checking on Linux boot for /dev/sda1 , we will need to modify the above line’s filesystem check option be 2, e.g. the line would afterwards look like:

/dev/sda1 /home ext3 defaults 0 2

Setting the 2 as an option for filesystem check is necessery for every filesystem which is not mounted as a root filesystem /

In above example /etc/fstab you already see that auto filesystem fsck is enabled for root partition:

/dev/sda8 / ext3 errors=remount-ro 0 1
(notice the 1 in the end of the line)

Finally a modified version of the default sample /etc/fstab which will check the extra /dev/sda1 /home partition would look like so:

/dev/sda8 / ext3 errors=remount-ro 0 1
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
/dev/sda1 /home ext3 defaults 0 2

Making sure all Linux server partitions has the auto filesystem check option enabled is something absoultely necessery!
Enabling the auto fsck on servers always makes me sleep calmer 😉
Hope it helps your too. 🙂