Monit: Monitor Hard drive SMART health and temperature

Lets face it, NAS / Server drives fail and for that reason it is important to monitor hard drive SMART health and temperature so you can take precautionary actions such as backing up or increase cooling. Monit is an automatic monitoring, maintenance, and repair utility for Unix systems. If the hard drive's SMART health status fails or if the drive temperature is high, Monit will send you an email alert. You can then examine and resolve any issues. In this Monit tutorial, I will show you how to monitor hard drive temperature and SMART health with Monit on your home server. By hard drive SMART status monitoring I mean the overall SMART health as reported by short and long tests. I am assuming that you have already installed and configured Monit following my previous guide.

Determine the Drive Partitions to Monitor

Before you can monitor hard drive SMART health and temperature with Monit, you need to know what drives you want to monitor. If you remember my recently built HTPC-NAS combo, I have two hard drives on my server (sda, which is a SSD and sdb, which is a 4 TB HDD). sda has 3 partitions (sda1, sda2, and sda3). sdb has only one partition. If you do not know which ones to monitor use the lsblk command. Notice sda and sdb disks in the output below.

List All Disk Drives With Lsblk Command
List All Disk Drives With Lsblk Command

In this post I show hard drive SMART status monitoring only for sda, which is my SSD. But the process is the same for other drives.

Monitor your home server with Monit:

Install smartmontools

Next, you need a software called smartmontools installed on your system. smartmontools contains smartctl and smartd that monitor systems using Self-Monitoring, Analysis, and Reporting Technology (SMART) that is built into most modern hard disks. smartctl tests provide a warning of hard drive degradation and failure. In can be used to monitor hard drive SMART health and temperature. Use the following command to install smartmontools.

sudo apt-get install smartmontools

Once smartmontools is installed, verify its performance using the following command (replace sda with your you disk of interest):

sudo smartctl -i /dev/sda

The output should look like what is shown in the picture below. Ensure that the hard drive is SMART capable and SMART is enabled (the last two lines).

Smartctl Hard Drive Information
Smartctl Hard Drive Information

In case SMART is not enabled, use the following command to enable it, and check HDD SMART Status in the output.

sudo smartctl --smart=on --offlineauto=on --saveauto=on /dev/sda

The output should confirm that SMART was enabled on the hard drive and automatic offline testing was set to every few hours, as shown below.

Enable Smart On Hard Disk
Enable Smart On Hard Disk

Now, we are good to setup hard disk SMART status monitoring with monit.

Create Monit Hard Drive Monitoring Scripts

Before, we can monitor hard drive health and temperature with Monit, we will need to create scripts that will check SMART test results and report it to Monit. If it does not exist already, create a folder called "scripts" inside /etc/monit using the command below:

sudo mkdir /etc/monit/scripts

Hard Drive Temperature Monitoring Script

Next, create a new hard drive temperature monitoring script:

sudo nano /etc/monit/scripts/sdatemp.sh

Copy the following contents to it, save, and exit (press Ctrl X, press Y, and press ENTER).

#!/bin/sh
HDDTP=`/usr/sbin/smartctl -a /dev/sda | grep Temp | awk -F " " '{printf "%d",$10}'`
#echo $HDDTP # for debug only
exit $HDDTP

Next, give execute privileges to the script using the following command:

sudo chmod 755 /etc/monit/scripts/sdatemp.sh

Finally, verify that the script works and outputs the hard disk temperature. To do this, temporarily remove the # in front of line 3, save, exit, and run the command below. Harddisk temperature should appear in the next line as shown below:

$ sudo bash /etc/monit/scripts/sdatemp.sh
25

If you see the temperature printed then you are good to go. Reopen the /etc/monit/scripts/sdatemp.sh, add the # in front of line 3 and make it look like the code block above, save, and exit.

Hard Drive SMART Health Monitoring Script

Next, create a new hard drive SMART health monitoring script:

sudo nano /etc/monit/scripts/sdahealth.sh

Copy the following contents to it, save, and exit (press Ctrl X, press Y, and press ENTER).

#!/bin/sh
STATUS=`/usr/sbin/smartctl -H /dev/sda | grep overall-health | awk 'match($0,"result:"){print substr($0,RSTART+8,6)}'`
#echo $STATUS
if [ "$STATUS" = "PASSED" ] 
then
    # 0 implies PASSED
    TP=0
else 
    # 1 implies FAILED
    TP=1
fi
#echo $TP # for debug only
exit $TP

Next, give execute privileges to the script using the following command:

sudo chmod 755 /etc/monit/scripts/sdahealth.sh

Finally, verify that the script works and outputs the hard disk SMART health. To do this, temporarily remove the # in front of line 3 and the penultimate line, save, exit, and run the command below. Hard Drive SMART Overall Health should appear in the next line as shown below:

$ sudo bash /etc/monit/scripts/sdahealth.sh
PASSED
0

If you see "PASSED" and "0" printed then you are good to go. Reopen the /etc/monit/scripts/sdahealth.sh, add the # in front of line 3 and penultimate line, and make it look like the code block above, save, and exit.

Monitor Hard Drive SMART Health with Monit

Next, it is required that you have a working Monit instance with a proper /etc/monit/monitrc file. Monit configurations for various services are loaded from /etc/monit/conf.d folder. To monitor Hard Drive SMART health with Monit, create a Monit configuration file using the following command.

sudo /etc/monit/conf.d/sdastatus

Copy the following contents to it, save, and exit (press Ctrl X, press Y, and press ENTER).

#Temperature
check program SSD-Temp with path "/etc/monit/scripts/sdatemp.sh"
    every 5 cycles
    if status > 35 then alert
    group health

#SMART Overall Health
check program SSD-Health with path "/etc/monit/scripts/sdahealth.sh"
    every 120 cycles
    if status != 1 then alert
    group health

This code will help monitor harddisk SMART health with Monit. You may change the words SSD-Temp and SSD-Health to something else.You may also customize frequency of check (X cycles). If the hard drive SMART status fails or if the hard drive temperature gets above 35, Monit will send you and email alert. An example of Monit alert is shown below.

Monit System Monitoring Email Alert
Monit System Monitoring Email Alert

I looked at the average temperature for my SSD and added 10 C to it to set the target of 35 degree C. This is well below its maximum temperature rating of 70 degree C. My HTPC-NAS Combo setup does not generate much heat. Don't be alarmed if your temperature is much higher than the numbers listed here.

Test and Reload Monit

Once you make any changes you have to test Monit configuration:

sudo monit -t

You should see the following message: Control File Syntax OK. Then, check to see if Monit is already running using the following command:

sudo /etc/init.d/monit status

If Monit is running, reload Monit configurations using the following command:

sudo /etc/init.d/monit reload

If Monit is not running, then start it using sudo monit command instead. The whole sequence of commands for testing and reloading Monit is shown in the picture below.

Monit Test And Reload
Monit Test And Reload

Now, fire up your web browser and visit one of the following URLs depending on how your Monit is configured (be sure to use the correct port number):

  • http://localhost:2812
  • http://IPADDRESS:2812 (local network IP)
  • http://domain.com:2812 (if you have domain name pointing to your server)

You should see the hard drive health status and temperature listed in your Monit WebUI as shown in the picture below (see SSD-Temp and SSD-Health). The SMART health monitoring script listed above will output "1" if health status "PASSED" and "2" if SMART hard drive failure is detected.

Monitor Hard Drive Smart Health With Monit
Monitor Hard Drive Smart Health With Monit

That is it for hard drive SMART health and temperature monitoring with Monit. As you can see Monit allows for automatic system health monitoring, which can be a big help for administrators. Monit Wiki page has several examples. More home server specific Monit examples to follow, so keep checking back.

Be the 1 in 200,000. Help us sustain what we do.
114 / 150 by Dec 31, 2024
Join Us (starting from just $1.67/month)

Anand

Anand is a self-learned computer enthusiast, hopeless tinkerer (if it ain't broke, fix it), a part-time blogger, and a Scientist during the day. He has been blogging since 2010 on Linux, Ubuntu, Home/Media/File Servers, Smart Home Automation, and related HOW-TOs.

Try Deployarr