A kernel panic is an action taken by an operating system upon detecting an internal fatal error from which it cannot safely recover. The term is largely specific to Unix and Unix-like systems; for Microsoft Windows operating systems the equivalent term is “stop error” (or, colloquially BSOD “Blue Screen of Death”).
The kernel routines that handle panics, known as panic() in AT&T-derived and BSD Unix source code, are generally designed to output an error message to the console, dump an image of kernel memory to disk for post-mortem debugging and then either wait for the system to be manually rebooted, or initiate an automatic reboot.
The default it’s to wait, so if this happen on one of your servers and you don’t notice it all its services could stay down for some time, while using an automatic reboot the problem could be solved quickly.
We can configure a directive that will automatically reboot the system when a kernel panic is detected.
This directive, which can be inserted in the lines of grub that make boot up the system with the preferred parameters, does nothing but tell the kernel that, in case there is a kernel panic, instead of leaving the pc stuck to alert you in some way (such as by flashing the LEDs on the keyboard), the system must be restarted within a certain time.
This directive is part of the line where we specify the root of the system and is called:
“panic = XX” where XX indicates the seconds to wait before restarting the system, for example,
The startup parameters, just to give an example, could be:
title=Gentoo Linux (2.6.31-gentoo-r6) RAID LVM2 root (hd0,4) kernel /boot/kernel-genkernel-x86_64-2.6.31-gentoo-r6 root=/dev/ram0 ramdisk=8192 real_root=/dev/md0 dolvm udev panic=20 vga=0x318 video=vesafb:mtrr,ywrap initrd /boot/initramfs-genkernel-x86_64-2.6.31-gentoo-r6
The line we are interested in is the one in bold.You can see the parameter panic=20.
In this case, we told the kernel to reboot after 20 seconds in the case of a kernel-panic.
All this must, of course, be supported by the fact that at the boot time all the service/programs that our system need must be available and started to perform the tasks that are part of its duties.
Alternative, use sysctl
As alternative to the boot option you can put the parameter in the /etc/sysctl.conf file to include kernel.panic parameter as follows.
kernel.panic = 20
Once you have added this option to your sysctl file use the command:
sysctl -p /etc/sysctl.conf
To re-read and enable it (it will be read automatically on next reboots).
On local systems, it is also convenient to be able to reboot the system with a key-press in the case of a panic. Instead of having the system reboot automatically on a local system, consider using the magic SysRq keys to reboot your system if X locks up or keyboard entry is being ignored.
To enable SysRq add to the file /etc/sysctl.conf the following option:
kernel.sysrq = 1
And as above run the command
sysctl -p /etc/sysctl.conf
to enable it in the current session.
Common use of SysRq
A common idiom to perform a safe reboot of a Linux computer which has otherwise locked up, is “Raising Elephants Is So Utterly Boring”, “Reboot Even If System Utterly Broken” or simply remembering the word “BUSIER” backwards, is often useful. It stands for:
unRaw (take control of keyboard back from X), tErminate (send SIGTERM to all processes, allowing them to terminate gracefully), kIll (send SIGKILL to all processes, forcing them to terminate immediately), Sync (flush data to disk), Unmount (remount all filesystems read-only), reBoot.
This can prevent a fsck being required on reboot and gives some programs a chance to save emergency backups of unsaved work.
In practice, each command may require a few seconds to complete, especially if feedback is unavailable from the screen due to a freeze or display corruption. For example, sending SIGKILL to processes which have not yet finished terminating can cause data loss.
Article provided by Asapy Technologies