When I needed to figure out where the performance bottlenecks were in some Linux software, I found helpful answers at http://stackoverflow.com in the form of two tools: 1. naming threads in combination with “top -Hp <pid>” and 2. “pstack <pid>”. The first was helpful in watching which threads were consuming the most CPU. The second was useful in sampling the application over time to find the hot spots and their stack traces.
Six months ago, I replaced the failing hard drive in my Linux laptop, and already, the SMART tools are telling me that I should back up and replace the hard drive — a high number of sectors have gone bad.
Hmmm. What’s this? SMART also reported that the hard drive had reached “overheating” temperature ranges. Why would that be? I did some Google searching, and came up with the following advice:
- Don’t close the laptop lid while it is powered up! This is how I had normally run my Linux laptop — it’s a server, and I leave the lid closed. Oops! I’ve changed the power settings so that when the laptop lid is closed, it sleeps.
- Edit /etc/grub.conf and add acpi_osi=Linux or try acpi=off to seee if apm (automatic hardware control) will take over. I’ve just started trying the former. UPDATE 8 Feb 2011: Using this prevented my laptop from waking up from sleep, so I stopped using it.
- Vacuum the dust off the fan screen (to prevent airflow blockage)
- Monitor the temperature with ‘smartclt’
Based on a tip from my father (a long time Linux expert), I ran “smartctl -H /dev/sda”, and it says “SMART overall-health self-assessment test result: PASSED”. I assume it means the hard drive is still okay, but I had better not forget to make regular backups, and monitor the status of the hard drive.