26.4.18

fsck and repair on reboot

Problem: Input/output errors observed on disk access. dmesg or cat /var/log/messages shows "Unrecovered read error".
Solution: Your hard disk is bad. At a minimum it has some bad sectors. You can try to repair it by forcing the utility fsck to run on reboot. You must be root to do this.

su;
touch /forcefsck; #the presence of this file tells system to fsck on boot
echo "-y" > /fsckoptions; #option to automatically repair errors encountered
reboot;

The system will remove these files after completing the fsck.

The Rocks version looks like this:

su;
ssh compute-0-1 'touch /forcefsck';
ssh compute-0-1 'echo -y > /fsckoptions';
ssh compute-0-1 'reboot';
exit;

If there are problems and the fsck won't complete, you may need to boot using a live disk and remove forcefsck and fsckoptions manually. This because the Rocks admin password may not work on a compute node.

Another approach is to search for bad blocks and write them to the 'bad block inode', so they will not be used in the future. This can be done interactively, e.g:
umount /dev/sda5;
e2fsck -ck /dev/sda5; #use badblocks read-only test to find bad blocks, faster.
or
e2fsck -cck /dev/sda5; #use badblocks read/write test to find bad blocks, slow.