Posts Tagged ‘Monitoring’

Monitoring Linux hardware Hard Drives / Temperature and Disk with lm_sensors / smartd / hddtemp and Zabbix Userparameter lm_sensors report script

Thursday, April 30th, 2020


I'm part of a  SysAdmin Team that is partially doing some minor Zabbix imrovements on a custom corporate installed Zabbix in an ongoing project to substitute the previous HP OpenView monitoring for a bunch of Legacy Linux hosts.
As one of the necessery checks to have is regarding system Hardware, the task was to invent some simplistic way to monitor hardware with the Zabbix Monitoring tool.  Monitoring Bare Metal servers hardware of HP / Dell / Fujituse etc. servers  in Linux usually is done with a third party software provided by the Hardware vendor. But as this requires an additional services to run and sometimes is not desired. It was interesting to find out some alternative Linux native ways to do the System hardware monitoring.
Monitoring statistics from the system hardware components can be obtained directly from the server components with ipmi / ipmitool (for more info on it check my previous article Reset and Manage intelligent  Platform Management remote board article).
With ipmi
 hardware health info could be received straight from the ILO / IDRAC / HPMI of the server. However as often the Admin-Lan of the server is in a seperate DMZ secured network and available via only a certain set of routed IPs, ipmitool can't be used.

So what are the other options to use to implement Linux Server Hardware Monitoring?

The tools to use are perhaps many but I know of two which gives you most of the information you ever need to have a prelimitary hardware damage warning system before the crash, these are:

1. smartmontools (smartd)

Smartd is part of smartmontools package which contains two utility programs (smartctl and smartd) to control and monitor storage systems using the Self-Monitoring, Analysis and Reporting Technology system (SMART) built into most modern ATA/SATA, SCSI/SAS and NVMe disks

Disk monitoring is handled by a special service the package provides called smartd that does query the Hard Drives periodically aiming to find a warning signs of hardware failures.
The downside of smartd use is that it implies a little bit of extra load on Hard Drive read / writes and if misconfigured could reduce the the Hard disk life time.

linux:~#  /usr/sbin/smartctl -a /dev/sdb2
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-5-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke,

Device Model:     KINGSTON SA400S37240G
Serial Number:    50026B768340AA31
LU WWN Device Id: 5 0026b7 68340aa31
Firmware Version: S1Z40102
User Capacity:    240,057,409,536 bytes [240 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Thu Apr 30 14:05:01 2020 EEST
SMART support is: Available – device has SMART capability.
SMART support is: Enabled

SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  120) seconds.
Offline data collection
capabilities:                    (0x11) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0002) Does not save SMART data before
                                        entering power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  10) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
  1 Raw_Read_Error_Rate     0x0032   100   100   000    Old_age   Always       –       100
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       –       2820
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       –       21
148 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       0
149 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       0
167 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       0
168 Unknown_Attribute       0x0012   100   100   000    Old_age   Always       –       0
169 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       0
170 Unknown_Attribute       0x0000   100   100   010    Old_age   Offline      –       0
172 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       –       0
173 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       0
181 Program_Fail_Cnt_Total  0x0032   100   100   000    Old_age   Always       –       0
182 Erase_Fail_Count_Total  0x0000   100   100   000    Old_age   Offline      –       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       –       0
192 Power-Off_Retract_Count 0x0012   100   100   000    Old_age   Always       –       16
194 Temperature_Celsius     0x0022   034   052   000    Old_age   Always       –       34 (Min/Max 19/52)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       –       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       –       0
218 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       –       0
231 Temperature_Celsius     0x0000   097   097   000    Old_age   Offline      –       97
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       –       2104
241 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       –       1857
242 Total_LBAs_Read         0x0032   100   100   000    Old_age   Always       –       1141
244 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       32
245 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       107
246 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       15940

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

Selective Self-tests/Logging not supported

2. hddtemp

Usually if smartd is used it is useful to also use hddtemp which relies on smartd data.
 The hddtemp program monitors and reports the temperature of PATA, SATA
 or SCSI hard drives by reading Self-Monitoring Analysis and Reporting
 Technology (S.M.A.R.T.)
information on drives that support this feature.

linux:~# /usr/sbin/hddtemp /dev/sda1
/dev/sda1: Hitachi HDS721050CLA360: 31°C
linux:~# /usr/sbin/hddtemp /dev/sdc6
/dev/sdc6: KINGSTON SV300S37A120G: 25°C
linux:~# /usr/sbin/hddtemp /dev/sdb2
/dev/sdb2: KINGSTON SA400S37240G: 34°C
linux:~# /usr/sbin/hddtemp /dev/sdd1
/dev/sdd1: WD Elements 10B8: S.M.A.R.T. not available

3. lm-sensors / i2c-tools 

 Lm-sensors is a hardware health monitoring package for Linux. It allows you
 to access information from temperature, voltage, and fan speed sensors.
was historically bundled in the same package as lm_sensors but has been seperated cause not all hardware monitoring chips are I2C devices, and not all I2C devices are hardware monitoring chips.

The most basic use of lm-sensors is with the sensors command

linux:~# sensors
Adapter: PCI adapter
loc1:         +55.0 C  (high = +120.0 C, crit = +110.0 C)

Adapter: ISA adapter
Physical id 0:  +28.0 C  (high = +78.0 C, crit = +88.0 C)
Core 0:         +26.0 C  (high = +78.0 C, crit = +88.0 C)
Core 1:         +28.0 C  (high = +78.0 C, crit = +88.0 C)
Core 2:         +28.0 C  (high = +78.0 C, crit = +88.0 C)
Core 3:         +28.0 C  (high = +78.0 C, crit = +88.0 C)


On CentOS Linux useful tool is also  lm_sensors-sensord.x86_64 – A Daemon that periodically logs sensor readings to syslog or a round-robin database, and warns of sensor alarms.

In Debian Linux there is also the psensors-server (an HTTP server providing JSON Web service which can be used by GTK+ Application to remotely monitor sensors) useful for developers


If you have a Xserver installed on the Server accessed with Xclient or via VNC though quite rare,
You can use xsensors or Psensora GTK+ (Widget Toolkit for creating Graphical User Interface) application software.

With this 3 tools it is pretty easy to script one liners and use the Zabbix UserParameters functionality to send hardware report data to a Company's Zabbix Sserver, though Zabbix has already some templates to do so in my case, I couldn't import this templates cause I don't have Zabbix Super-Admin credentials, thus to work around that a sample work around is use script to monitor for higher and critical considered temperature.
Here is a tiny sample script I came up in 1 min time it can be used to used as 1 liner UserParameter and built upon something more complex.

SENSORS_HIGH=`sensors | awk '{ print $6 }'| grep '^+' | uniq`;
SENSORS_CRIT=`sensors | awk '{ print $9 }'| grep '^+' | uniq`; ;SENSORS_STAT=`sensors|grep -E 'Core\s' | awk '{ print $1" "$2" "$3 }' | grep "$SENSORS_HIGH|$SENSORS_CRIT"`;
if [ ! -z $SENSORS_STAT ]; then
echo 'Temperature HIGH';
echo 'Sensors OK';

Of course there is much more sophisticated stuff to use for monitoring out there

Below script can be easily adapted and use on other Monitoring Platforms such as Nagios / Munin / Cacti / Icinga and there are plenty of paid solutions, but for anyone that wants to develop something from scratch just like me I hope this
article will be a good short introduction.
If you know some other Linux hardware monitoring tools, please share.

Debian Linux: Installing and monitoring servers with Icanga (Nagios fork soft)

Monday, June 3rd, 2013


There is plenty of software for monitoring how server performs and whether servers are correctly up and running. There is probably no Debian Linux admin who didn't already worked or at least tried Nagios and Mointor to monitor and notify whether server is unreachable or how server services operate. Nagios and Munin are play well together to prevent possible upcoming problems with Web / Db / E-mail services or get notify whether they are completely inaccessible. One similar "next-generation" and less known software is Icanga.
The reason, why to use Icinga  instead of Nagios is  more features a list of what does Icinga supports more than Nagios is on its site here
I recently heard of it and decided to try it myself. To try Icanga I followed Icanga's install tutorial on Wiki.Icanga.Org here
In Debian Wheezy, Icinga is already part of official repositories so installing it like in Squeeze and Lenny does not require use of external Debian BackPorts repositories.

1. Install Icinga pre-requirement packages

debian:# apt-get --yes install php5 php5-cli php-pear php5-xmlrpc php5-xsl php5-gd php5-ldap php5-mysql

2. Install Icanga-web package

debian:~# apt-get --yes install icinga-web

Here you will be prompted a number of times to answer few dialog questions important for security, as well as fill in MySQL server root user / password as well as SQL password that will icinga_web mySQL user use.





Setting up icinga-idoutils (1.7.1-6) …
dbconfig-common: writing config to /etc/dbconfig-common/icinga-idoutils.conf
granting access to database icinga for icinga-idoutils@localhost: success.
verifying access for icinga-idoutils@localhost: success.
creating database icinga: success.
verifying database icinga exists: success.
populating database via sql…  done.
dbconfig-common: flushing administrative password
Setting up icinga-web (1.7.1+dfsg2-6) …
dbconfig-common: writing config to /etc/dbconfig-common/icinga-web.conf

Creating config file /etc/dbconfig-common/icinga-web.conf with new version
granting access to database icinga_web for icinga_web@localhost: success.
verifying access for icinga_web@localhost: success.
creating database icinga_web: success.
verifying database icinga_web exists: success.
populating database via sql…  done.
dbconfig-common: flushing administrative password

Creating config file /etc/icinga-web/conf.d/database-web.xml with new version
database config successful: /etc/icinga-web/conf.d/database-web.xml

Creating config file /etc/icinga-web/conf.d/database-ido.xml with new version
database config successful: /etc/icinga-web/conf.d/database-ido.xml
enabling config for webserver apache2…
Enabling module rewrite.
To activate the new configuration, you need to run:
  service apache2 restart
`/etc/apache2/conf.d/icinga-web.conf' -> `../../icinga-web/apache2.conf'
[ ok ] Reloading web server config: apache2 not running.
root password updates successfully!
Basedir: /usr Cachedir: /var/cache/icinga-web
Cache already purged!

3. Enable Apache mod_rewrite

debian:~# a2enmod rewrite
debian:~# /etc/init.d/apache2 restart

4. Icinga documentation files

Some key hints on Enabling some more nice Icinga features are mentioned in Icinga README files, check out, all docs files included with Icinga separate packs are into:

debian:~# ls -ld *icinga*/
drwxr-xr-x 3 root root 4096 Jun  3 10:48 icinga-common/
drwxr-xr-x 3 root root 4096 Jun  3 10:48 icinga-core/
drwxr-xr-x 3 root root 4096 Jun  3 10:48 icinga-idoutils/
drwxr-xr-x 2 root root 4096 Jun  3 10:48 icinga-web/

debian:~# less /usr/share/doc/icinga-web/README.Debian debian:~# less /usr/share/doc/icinga-idoutils/README.Debian

5. Configuring Icinga

Icinga configurations are separated in two directories:

debian:~# ls -ld *icinga*

drwxr-xr-x 4 root root 4096 Jun  3 10:50 icinga
drwxr-xr-x 3 root root 4096 Jun  3 11:07 icinga-web


etc/icinga/ – (contains configurations files for on exact icinga backend server behavior)

/etc/icinga-web – (contains all kind of Icinga Apache configurations)
Main configuration worthy to look in after install is /etc/icinga/icinga.cfg.

6. Accessing newly installed Icinga via web

To access just installed Icinga, open in browser URL – htp://localhost/icinga-web

icinga web login screen in browser debian gnu linux

logged in inside Icinga / Icinga web view and control frontend

7. Monitoring host services with Icinga (NRPE)

As fork of Nagios. Icinga has similar modular architecture and uses number of external plugins to Monitor external host services list of existing plugins is on Icinga's wiki here.
Just like Nagios Icinga supports NRPE protocol (Nagios Remote Plugin Executor). To setup NRPE, nrpe plugin from nagios is used (nagios-nrpe-server). 

To install NRPE on any of the nodes to be tracked;
debian: ~# apt-get install –yes nagios-nrpe-server

 Then to configure NRPE edit /etc/nagios/nrpe_local.cfg


Once NRPE is supported in Icinga, you can install on Windows or Linux hosts NRPE clients like in Nagios to report on server processes state and easily monitor if server disk space / load or service is in critical state.

How to disable ACPI (power saving) support in FreeBSD / Disable acpi on BSD kernel boot time

Tuesday, May 15th, 2012

FreeBSD disable ACPI how ACPI Basic works basic diagram

On FreeBSD the default kernel is compiled to support ACPI. Most of the modern PCs has already embedded support for ACPI power saving instructions.
Therefore a default installed FreeBSD is trying to take advantage of this at cases and is trying to save energy.
This is not too useful on servers, because saving energy could have at times a bad impact on server performance if the server is heavy loaded at times and not so loaded at other times of the day.

Besides that on servers saving energy shouldn't be the main motivator but server stability and productivity is. Therefore in my personal view on FreeBSD used on servers it is better to disable complete the ACPI in order to disable CPU fan control to change rotation speeds all the time from low to high rotation cycles and vice versa at times of low / high server load.

Another benefit of removing the ACPI support on a server is this would probably increase the CPU fan life span and possibly prevent the CPU to be severely heated at times.

Moreover, some piece of hardware might have troubles in properly supporting ACPI specifications and thus ACPI could be a reason for unexpected machine hang ups.

With all said I would recommend to anyone willing to use BSD for a server to disable the ACPI (Advanced Configuration and Power Interface), just like I did.

Here is how;

1. Quick review on how ACPI is handled on FreeBSD

acpi support is being handled on FreeBSD by a number of loadable kernel modules, here is a complete list of all the kernel modules dealins with acpi:

freebsd# cd /boot
freebsd# find . -iname '*acpi*.ko'

By default on FreeBSD, if hardware has some support for ACPI the acpi gets activated by acpi.ko kernel module. The specific type of vendors specific ACPI like IBM, ASUS, Fujitsu are controlled by the respective kernel module from the list …

Hence, to control if ACPI is loaded or not on a FreeBSD system with no need to reboot one can use kldload, kldunload module management BSD cmds.

a) Check if acpi is loaded on a BSD

freebsd# kldstatkldstat | grep -i acpi
9 1 0xc9260000 57000 acpi.ko

b) unload kernel enabled ACPI support

freebsd# kldunload acpi

c) Load acpi support (not the case with me but someone might need it, if for instance BSD is running on laptop)

freebsd# kldload acpi

2. Disabling ACPI to load on bootup on BSD

a) In /boot/loader.conf add the following variables:


b) in /boot/device.hints add:


c) in /boot/defaults/loader.conf make sure:

### ACPI settings ##########################################
acpi_dsdt_load="NO" # DSDT Overriding
acpi_dsdt_type="acpi_dsdt" # Don't change this
# Override DSDT in BIOS by this file
acpi_video_load="NO" # Load the ACPI video extension driver

d) disable ACPI thermal monitoring

It is generally a good idea to disable the ACPI thermal monitoring, as many machines hardware does not support it.

To do so in /boot/loader.conf add


If you want to learn more on on how ACPI is being handled on BDSs check out:

freebsd# man acpi

Other alternative method to permanently wipe out ACPI support is by not compiling ACPI support in the kernel.
If that's the case in /usr/obj/usr/src/sys/GENERIC make sure device acpi is commented, e.g.:

##device acpi


How to find out all programs bandwidth use with (nethogs) top like utility on Linux

Friday, September 30th, 2011

Just run across across a super nice top like, program for system administrators, its called nethogs and is definitely entering my “l337” admin outfit next to tools like iftop, nettop, ettercap, darkstat htop, iotop etc.

nethogs is ultra easy to use, to get immediately in console statistics about running processes UPLOAD and DOWNLOAD bandwidth consumption just run it:

linux:~# nethogs

Nethogs screenshot on Linux Server with Nginx
Nethogs running on Debian GNU/Linux serving static web content with Nginx

If you need to check what program is using what amount of network bandwidth, you will definitely love this tool. Having information of bandwidth consumption is also viewable partially with iftop, however iftop is unable to track the bandwidth consumption to each process using the network thus it seems nethogs is unique at what it does.

Nethogs supports IPv4 and IPv6 as well as supports network traffic over ppp. The tool is available via package repositories for Debian GNU/Lenny 5 and Debian Squeeze 6.

To install Nethogs on CentOS and Fedora distributions, you will have to install it from source. On CentOS 5.7, latest nethogs which as of time of writting this article is 0.8.0 compiles and installs fine with make && make install commands.

In the manner of thoughts of network bandwidth monitoring, another very handy tool to add extra understanding on what kind of traffic is crossing over a Linux server is jnettop
jnettop shows which hosts/ports is taking up the most network traffic.
It is available for install via apt in Debian 5/6).

Here is a screenshot on jnettop in action:

Jnettop check network traffic in console

To install jnettop on latest Fedoras / CentOS / Slackware Linux it has to be download and compiled from source via jnettop’s official wiki page
I’ve tested jnettop install from source on CentOS release 5.7 and it seems to compile just fine using the usual compile commands:

[root@prizebg jnettop-0.13.0]# ./configure
[root@prizebg jnettop-0.13.0]# make
[root@prizebg jnettop-0.13.0]# make install

If you need to have an idea on the network traffic passing by your Linux server distringuished by tcp/udp/icmp network protocols and services like ssh / ftp / apache, then you will definitely want to take a look at nettop (if of course not familiar with it yet).
Nettop is not provided as a deb package in Debian and Ubuntu, where it is included as rpm for CentOS and presumably Fedora?
Here is a screenshot on nettop network utility in action:

Nettop server traffic division by protocol screenshot
FreeBSD users should be happy to find out that jnettop and nettop are part of the ports tree and the two can be installed straight, however nethogs would not work on FreeBSD, I searched for a utility capable of what Nethogs can, but couldn’t find such.
It seems the only way on FreeBSD to track bandwidth back and from originating process is using a combination of iftop and sockstat utilities. Probably there are other tools which people use to track network traffic to the processes running on a hos and do general network monitoringt, if anyone knows some good tools, please share with me.

Monitoring multi core / (multiple CPUs) servers with top, tload and on Linux

Thursday, March 17th, 2011

The default GNU / Linux top command does allow to see statistics on servers and systems with multiple CPUs.
This is quite beneficial especially on Linux systems which are not equipped with htop which does show statistics to the multiple-core system load.

To examine the multiple CPUs statistics with the default top command available on every Linux system and part of the procps/proc file system utilities

1. Start top:

linux:~# top

When the top system load statistics screen starts up refreshing,

2. press simply 1
You will notice all your system cpus to show up in the top head:

8 cpu top screen statistics on Linux

As I have started talking about top, a very useful way to use top to track processes which are causing a system high loads is:

linux:~# top -b -i

This command will run top in batch mode interactively and will show you statistics about the most crucial processes which does cause a server load, look over the output and you will get an idea about what is causing you server troubles.
Moreover if you’re a Linux console freak as me you will also probably want to take a look at tload

tload command is a part of the procps – /proc file system utilities and as you can read in the tload manual tload – graphic representation of system load average

Here is a picture to give you an idea on the console output of tload :

tload console/terminal system load statistics on Linux screenshot

Another tool that you might find very usefel is slabtop it’s again a part of the procps linux package.
slabtop – displays a listing of the top caches sorted by one of the listed sort criteria., in most of the cases the slabtop kernel cache monitoring tool won’t be necessary for the regular administrator, however on some servers it might help up to the administrator to resolve performance issues which are caused by the kernel as a bottleneck.
slabtop is also used as a tool by kernel developers to write and debug the Linux kernel.