Lets face it, NAS / Server drives fail and for that reason it is important to monitor hard drive SMART health and temperature so you can take precautionary actions such as backing up or increase cooling. Monit is an automatic monitoring, maintenance, and repair utility for Unix systems. If the hard drive's SMART health status fails or if the drive temperature is high, Monit will send you an email alert. You can then examine and resolve any issues. In this Monit tutorial, I will show you how to monitor hard drive temperature and SMART health with Monit on your home server. By hard drive SMART status monitoring I mean the overall SMART health as reported by short and long tests. I am assuming that you have already installed and configured Monit following my previous guide.
Table of Contents
Determine the Drive Partitions to Monitor
Before you can monitor hard drive SMART health and temperature with Monit, you need to know what drives you want to monitor. If you remember my recently built HTPC-NAS combo, I have two hard drives on my server (sda, which is a SSD and sdb, which is a 4 TB HDD). sda has 3 partitions (sda1, sda2, and sda3). sdb has only one partition. If you do not know which ones to monitor use the lsblk
command. Notice sda and sdb disks in the output below.
In this post I show hard drive SMART status monitoring only for sda, which is my SSD. But the process is the same for other drives.
Monitor your home server with Monit:
- Home server system load monitoring (CPU, RAM, Swap)
- Server hard drive storage monitoring (HDD space)
- Motherboard temperature monitoring
- Processor or CPU temperature monitoring
- Monitor Hard drive SMART health and temperature
- Monitor file server status (Samba and NFS)
- Monitor web server status (Apache, NGINX, and MySQL)
- Monitor CouchPotato process status
- Monitor SickBeard process status
- Monitor SickRage process status
- Monitor SABnzbd process status
- Monitor Webmin process status
- Monitor qBittorrent process status
- Monitor Transmission process status
- Monitor ShellInABox process status
Install smartmontools
Next, you need a software called smartmontools
installed on your system. smartmontools
contains smartctl
and smartd
that monitor systems using Self-Monitoring, Analysis, and Reporting Technology (SMART) that is built into most modern hard disks. smartctl
tests provide a warning of hard drive degradation and failure. In can be used to monitor hard drive SMART health and temperature. Use the following command to install smartmontools
.
sudo apt-get install smartmontools
Once smartmontools
is installed, verify its performance using the following command (replace sda with your you disk of interest):
sudo smartctl -i /dev/sda
The output should look like what is shown in the picture below. Ensure that the hard drive is SMART capable and SMART is enabled (the last two lines).
In case SMART is not enabled, use the following command to enable it, and check HDD SMART Status in the output.
sudo smartctl --smart=on --offlineauto=on --saveauto=on /dev/sda
The output should confirm that SMART was enabled on the hard drive and automatic offline testing was set to every few hours, as shown below.
Now, we are good to setup hard disk SMART status monitoring with monit.
Create Monit Hard Drive Monitoring Scripts
Before, we can monitor hard drive health and temperature with Monit, we will need to create scripts that will check SMART test results and report it to Monit. If it does not exist already, create a folder called "scripts" inside /etc/monit
using the command below:
sudo mkdir /etc/monit/scripts
Hard Drive Temperature Monitoring Script
Next, create a new hard drive temperature monitoring script:
sudo nano /etc/monit/scripts/sdatemp.sh
Copy the following contents to it, save, and exit (press Ctrl X, press Y, and press ENTER).
#!/bin/sh HDDTP=`/usr/sbin/smartctl -a /dev/sda | grep Temp | awk -F " " '{printf "%d",$10}'` #echo $HDDTP # for debug only exit $HDDTP
Next, give execute privileges to the script using the following command:
sudo chmod 755 /etc/monit/scripts/sdatemp.sh
Finally, verify that the script works and outputs the hard disk temperature. To do this, temporarily remove the # in front of line 3, save, exit, and run the command below. Harddisk temperature should appear in the next line as shown below:
$ sudo bash /etc/monit/scripts/sdatemp.sh 25
If you see the temperature printed then you are good to go. Reopen the /etc/monit/scripts/sdatemp.sh
, add the # in front of line 3 and make it look like the code block above, save, and exit.
Hard Drive SMART Health Monitoring Script
Next, create a new hard drive SMART health monitoring script:
sudo nano /etc/monit/scripts/sdahealth.sh
Copy the following contents to it, save, and exit (press Ctrl X, press Y, and press ENTER).
#!/bin/sh STATUS=`/usr/sbin/smartctl -H /dev/sda | grep overall-health | awk 'match($0,"result:"){print substr($0,RSTART+8,6)}'` #echo $STATUS if [ "$STATUS" = "PASSED" ] then # 0 implies PASSED TP=0 else # 1 implies FAILED TP=1 fi #echo $TP # for debug only exit $TP
Next, give execute privileges to the script using the following command:
sudo chmod 755 /etc/monit/scripts/sdahealth.sh
Finally, verify that the script works and outputs the hard disk SMART health. To do this, temporarily remove the # in front of line 3 and the penultimate line, save, exit, and run the command below. Hard Drive SMART Overall Health should appear in the next line as shown below:
$ sudo bash /etc/monit/scripts/sdahealth.sh PASSED 0
If you see "PASSED" and "0" printed then you are good to go. Reopen the /etc/monit/scripts/sdahealth.sh
, add the # in front of line 3 and penultimate line, and make it look like the code block above, save, and exit.
Monitor Hard Drive SMART Health with Monit
Next, it is required that you have a working Monit instance with a proper /etc/monit/monitrc
file. Monit configurations for various services are loaded from /etc/monit/conf.d
folder. To monitor Hard Drive SMART health with Monit, create a Monit configuration file using the following command.
sudo /etc/monit/conf.d/sdastatus
Copy the following contents to it, save, and exit (press Ctrl X, press Y, and press ENTER).
#Temperature check program SSD-Temp with path "/etc/monit/scripts/sdatemp.sh" every 5 cycles if status > 35 then alert group health #SMART Overall Health check program SSD-Health with path "/etc/monit/scripts/sdahealth.sh" every 120 cycles if status != 1 then alert group health
This code will help monitor harddisk SMART health with Monit. You may change the words SSD-Temp
and SSD-Health
to something else.You may also customize frequency of check (X cycles). If the hard drive SMART status fails or if the hard drive temperature gets above 35, Monit will send you and email alert. An example of Monit alert is shown below.
I looked at the average temperature for my SSD and added 10 C to it to set the target of 35 degree C. This is well below its maximum temperature rating of 70 degree C. My HTPC-NAS Combo setup does not generate much heat. Don't be alarmed if your temperature is much higher than the numbers listed here.
Test and Reload Monit
Once you make any changes you have to test Monit configuration:
sudo monit -t
You should see the following message: Control File Syntax OK. Then, check to see if Monit is already running using the following command:
sudo /etc/init.d/monit status
If Monit is running, reload Monit configurations using the following command:
sudo /etc/init.d/monit reload
If Monit is not running, then start it using sudo monit
command instead. The whole sequence of commands for testing and reloading Monit is shown in the picture below.
Now, fire up your web browser and visit one of the following URLs depending on how your Monit is configured (be sure to use the correct port number):
- http://localhost:2812
- http://IPADDRESS:2812 (local network IP)
- http://domain.com:2812 (if you have domain name pointing to your server)
You should see the hard drive health status and temperature listed in your Monit WebUI as shown in the picture below (see SSD-Temp and SSD-Health). The SMART health monitoring script listed above will output "1" if health status "PASSED" and "2" if SMART hard drive failure is detected.
That is it for hard drive SMART health and temperature monitoring with Monit. As you can see Monit allows for automatic system health monitoring, which can be a big help for administrators. Monit Wiki page has several examples. More home server specific Monit examples to follow, so keep checking back.