Showing posts from 2011

Preventing/Identifying hardware failures in Linux environment.

Hardware failures are always catastrophic . If it is a single point of failure then the impact can be severe  like data loss , delay in arrival of data , service unavailability etc. To prevent single point of failure we can setup high availability , load balancing etc but that comes at a cost and many cannot afford that may be because of technical reasons or because of  cost reasons. So , can  come up with some preventive measures  to at least  alert us in advance that a device is going to fail after few days . It might not be possible for every hardware device but yes we can do it for some devices like disk drives .  But this might not be possible for every hardware device to calculate in advance that a device is going to fail . For such cases like system getting rebooted by itself, or system was hung/unresponsive because of some hardware failure, we need to identify which device failed actually . Many a times  no trace can be found in the system log for such incidents and we have to…

How to identify whether a device is IPMI enabled or not in Linux

I posted earlier here how to use ipmitool  to monitor and troubleshoot hardware issues as another medium besides looking at the system log file. Sometimes there might be some hardare issues like system reboot/hung system where we may find no clue in the system log file , but ipmitool  can provide some clues in its System Event Log (SEL) to help identify a potential hardware issue . We have identified many production issues in our environment where no trace could be found in system log file , but ipmitool pointing out that we are having some bad hardware .

But to use ipmitool , we need to check first whether my systemboard has  support for IPMI. Old
system-boards might not support IPMI technology .

Using dmidecode tool to identify IPMI Support

We can use dmidecode to check whether a particular device is an IPMI enabled device .  The following example shows this ..

-bash-3.00$ sudo dmidecode | grep -A 6 -i ipmi
IPMI Device Information
    Interface Type: KCS (Keyboard Control Style)

free command: Releasing the disk cache

Linux uses its free memory for caching stuff which helps reduce IO . For many, this looks like memory is running low but everything is fine actually. This cached memory can be released if needed by tuning some kernel parameters. But probably nobody would like to do this except for bench marking purpose. Disk cache actually makes applications load faster and run smoother

Tuning /proc/sys/vm/drop_caches

The following is the state of  memory utilization in my desktop.

$ free -m
                    total        used       free      shared    buffers     cached
Mem:          1987       1805        181          0         230          1298
-/+ buffers/cache:        276       1710
Swap:         1906           0         1906

We can see from the first row,  "free" is reporting only 181M to be free and around 1298M has been used for disk cache.

Now let's tune  /proc/sys/vm/drop_caches to release the memory used for cache. We can specify 3 values to drop_caches. Default value is 0 which …