Today I present an interesting aritcle on Bad Blocks, first published on http://virtualprivateserver.castlegem.co.uk/
Hardware fails, that is a fact. Nowadays, hard drives are rather reliable, but nevertheless every now and then we will see drives failing or at least having hiccups. Using smartcl/smartd to monitor disks is a good thing, below we will discuss how some lesser issues can be handled without actually having to reboot the system – it is still up to a sys admin’s own discretion to judge circumstances correctly and evaluate whether disk errors encountered are a one time incident or indicative of an entirely failing disk.
Let’s have a look at a typical smartcl -a DEVICE output:
# smartctl -a /dev/sda
... ID# ATTRIBUTE_NAME .... RAW_VALUE 197 Current_Pending_Sector .... 2 ...
OK, so we have an oops here. Time to find out what is going on:
# smartctl –test=short /dev/sda
This will take a very short time, a couple of minutes at most, e.g.:
Please wait 2 minutes for test to complete. Test will complete after Sat Feb 2 16:25:10 2013
Now, with a current pending sector count > 0 we will most likely have an ouch after the test completes:
Num .. Status Remaining .. LBA_of_first_error ... # 2 .. Completed: read failure 90% .. 1825221261 ...
LBA counts sectors in units of 512 bytes and starts at 0, so we now need to find out where 1825221261 is actually located:
# fdisk -lu /dev/sda
will display some information about the device in question:
Device Boot Start End Blocks Id System ... /dev/sda3 31641600 1953523711 960941056 83 Linux ...
Obviously, 1825221261 is on /dev/sda3, thus. Now we need to determine the file system block for our LBA in question, so we first have to get the block size:
# tune2fs -l /dev/sda3 | grep Block
Block count: 240235264 Block size: 4096 Blocks per group: 32768
OK, 4096 bytes. So, the actual block number will be:
(LBA – PARTITION_START_SECTOR) * (512 / BLOCKSIZE)
In our case, this is:
(1825221261 – 31641600) * (512 / 4096) = 224197457.625
We only need the integer part, the fraction just tells us that we are into the 6th sector out of eight that make up this file system block.
It is good practice to find out which inode/file has been affected by using debugfs (operations can take a while with this tool):
# debugfs
debugfs: open /dev/sda3 debugfs: icheck BLOCK (224197457 in our case) Block Inode number 224197457 56025154 debugfs: ncheck 56025154 Inode Pathname 56025154 /some/path/to/file
Now, if this file isn’t anything crucial, then we can start correcting things now:
# dd if=/dev/zero of=/dev/sda3 bs=4096 count=1 seek=BLOCK
(224197457 here)
# sync
smartctl -a will now show an updated current pending sector count, and you can re-run a short smartctl test.
Popular Posts:
- None Found
Please wait 2 minutes for test to complete.
Test will complete after Sat Feb 2 16:25:10 2013
Where is output file?
I cant find it anywhere, not in pwd or home?
Run smartctl -a /dev/sda again – the test results will be there, nearer the end.
I get this:
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 6353 –
# 2 Short offline Completed without error 00% 6333 –
# 3 Short offline Completed without error 00% 6331 –
# 4 Short offline Completed without error 00% 3014 –
So, no error on my hdd?
How can I see particular 1-5 alone, not just last one?
And where are this log files in filesystem?
I search everywhere and can’t find it.
Please wait 2 minutes for test to complete.
Test will complete after Sat Feb 2 16:25:10 2013
Where is output file?
smartctl –test=short /dev/sda
Should be smartctl –test=short /dev/sda
Oh no… it’s the web site that changes the data, not the error in the article. It should be double-minus before the “test” parameter.