Archive for the ‘Monitoring’ Category

Zabbix rkhunter monitoring check if rootkits trojans and viruses or suspicious OS activities are detected

Wednesday, December 8th, 2021

monitor-rkhunter-with-zabbix-zabbix-rkhunter-check-logo

If you're using rkhunter to monitor for malicious activities, a binary changes, rootkits, viruses, malware, suspicious stuff and other famous security breach possible or actual issues, perhaps you have configured your machines to report to some Email.
But what if you want to have a scheduled rkhunter running on the machine and you don't want to count too much on email alerting (especially because email alerting) makes possible for emails to be tracked by sysadmin pretty late?

We have been in those situation and in this case me and my dear colleague Georgi Stoyanov developed a small rkhunter Zabbix userparameter check to track and Alert if any traces of "Warning"''s are mateched in the traditional rkhunter log file /var/log/rkhunter/rkhunter.log

To set it up and use it is pretty use you will need to have a recent version of zabbix-agent installed on the machine and connected to a Zabbix server, in my case this is:

[root@centos ~]# rpm -qa |grep -i zabbix-agent
zabbix-agent-4.0.7-1.el7.x86_64

 placed inside /etc/zabbix/zabbix_agentd.d/userparameter_rkhunter_warning_check.conf

[root@centos /etc/zabbix/zabbix_agentd.d ]# cat userparameter_rkhunter_warning_check.conf
# userparameter script to check if any Warning is inside /var/log/rkhunter/rkhunter.log and if found to trigger Zabbix alert
UserParameter=rkhunter.warning, (TODAY=$(date |awk '{ print $1" "$2" "$3 }'); if [ $(cat /var/log/rkhunter/rkhunter.log | awk “/$TODAY/,EOF” | /bin/grep -i ‘\[ Warning \]’ | /usr/bin/wc -l) != ‘0’ ]; then echo 1; else echo 0; fi)
UserParameter=rkhunter.suspected,(/bin/grep -i 'Suspect files: ' /var/log/rkhunter/rkhunter.log|tail -n 1| awk '{ print $4 }')
UserParameter=rkhunter.rootkits,(/bin/grep -i 'Possible rootkits: ' /var/log/rkhunter/rkhunter.log|tail -n 1| awk '{ print $4 }')


2. Prepare Rkhunter Template, Triggers and Items


In Zabbix Server that you access from web control interface, you will have to prepare a new template called lets say Rkhunter with the necessery Triggers and Items


2.1 Create Rkhunter Items
 

On Zabbix Server side, uou will have to configure 3 Items for the 3 configured userparameter above script keys, like so:

rkhunter-items-zabbix-screenshot.png

  • rkhunter.suspected Item configuration


rkhunter-suspected-files

  • rkhunter.warning Zabbix Item config

rkhunter-warning-found-check-zabbix

  • rkhunter.rootkits Zabbix Item config

     

     

    rkhunter-zabbix-possible-rootkits-item1

    2.2 Create Triggers
     

You need to have an overall of 3 triggers like in below shot:

zabbix-rkhunter-all-triggers-screenshot
 

  • rkhunter.rootkits Trigger config

rkhunter-rootkits-trigger-zabbix1

  • rkhunter.suspected Trigger cfg

rkhunter-suspected-trigger-zabbix1

  • rkhunter warning Trigger cfg

rkhunter-warning-trigger1

3. Reload zabbix-agent and test the keys


It is necessery to reload zabbix agent for the new userparameter to start to be sent to remote zabbix server (through a proxy if you have one configured).

[root@centos ~]# systemctl restart zabbix-agent


To make the zabbix-agent send the keys to the server you can use zabbix_sender to have the test tool you will have to have installed (zabbix-sender) on the server.

To trigger a manualTest if you happen to have some problems with the key which shouldn''t be the case you can sent a value to the respectve key with below command:

[root@centos ~ ]# zabbix_sender -vv -c "/etc/zabbix/zabbix_agentd.conf" -k "khunter.warning" -o "1"


Check on Zabbix Server the sent value is received, for any oddities as usual check what is inside  /var/log/zabbix/zabbix_agentd.log for any errors or warnings.

Install and enable Sysstats IO / DIsk / CPU / Network monitoring console suite on Redhat 8.3, Few sar useful command examples

Tuesday, September 28th, 2021

linux-sysstat-monitoring-logo

 

Why to monitoring CPU, Memory, Hard Disk, Network usage etc. with sysstats tools?
 

Using system monitoring tools such as Zabbix, Nagios Monit is a good approach, however sometimes due to zabbix server interruptions you might not be able to track certain aspects of system performance on time. Thus it is always a good idea to 
Gain more insights on system peroformance from command line. Of course there is cmd tools such as iostat and top, free, vnstat that provides plenty of useful info on system performance issues or bottlenecks. However from my experience to have a better historical data that is systimized and all the time accessible from console it is a great thing to have sysstat package at place. Since many years mostly on every server I administer, I've been using sysstats to monitor what is going on servers over a short time frames and I'm quite happy with it. In current company we're using Redhats and CentOS-es and I had to install sysstats on Redhat 8.3. I've earlier done it multiple times on Debian / Ubuntu Linux and while I've faced on some .deb distributions complications of making sysstat collect statistics I've come with an article on Howto fix sysstat Cannot open /var/log/sysstat/sa no such file or directory” on Debian / Ubuntu Linux
 

Sysstat contains the following tools related to collecting I/O and CPU statistics:
iostat
Displays an overview of CPU utilization, along with I/O statistics for one or more disk drives.
mpstat
Displays more in-depth CPU statistics.
Sysstat also contains tools that collect system resource utilization data and create daily reports based on that data. These tools are:
sadc
Known as the system activity data collector, sadc collects system resource utilization information and writes it to a file.
sar
Producing reports from the files created by sadc, sar reports can be generated interactively or written to a file for more intensive analysis.

My experience with CentOS 7 and Fedora to install sysstat it was pretty straight forward, I just had to install it via yum install sysstat wait for some time and use sar (System Activity Reporter) tool to report collected system activity info stats over time.
Unfortunately it seems on RedHat 8.3 as well as on CentOS 8.XX instaling sysstats does not work out of the box.

To complete a successful installation of it on RHEL 8.3, I had to:

[root@server ~]# yum install -y sysstat


To make sysstat enabled on the system and make it run, I've enabled it in sysstat

[root@server ~]# systemctl enable sysstat


Running immediately sar command, I've faced the shitty error:


Cannot open /var/log/sysstat/sa18:
No such file or directory. Please check if data collecting is enabled”

 

Once installed I've waited for about 5 minutes hoping, that somehow automatically sysstat would manage it but it didn't.

To solve it, I've had to create additionally file /etc/cron.d/sysstat (weirdly RPM's post install instructions does not tell it to automatically create it)

[root@server ~]# vim /etc/cron.d/sysstat

# run system activity accounting tool every 10 minutes
0 * * * * root /usr/lib64/sa/sa1 60 59 &
# generate a daily summary of process accounting at 23:53
53 23 * * * root /usr/lib64/sa/sa2 -A &

 

  • /usr/local/lib/sa1 is a shell script that we can use for scheduling cron which will create daily binary log file.
  • /usr/local/lib/sa2 is a shell script will change binary log file to human-readable form.

 

[root@server ~]# chmod 600 /etc/cron.d/sysstat

[root@server ~]# systemctl restart sysstat


In a while if sysstat is working correctly you should get produced its data history logs inside /var/log/sa

[root@server ~]# ls -al /var/log/sa 


Note that the standard sysstat history files on Debian and other modern .deb based distros such as Debian 10 (in  y.2021) is stored under /var/log/sysstat

Here is few useful uses of sysstat cmds


1. Check with sysstat machine history SWAP and RAM Memory use


To lets say check last 10 minutes SWAP memory use:

[hipo@server yum.repos.d] $ sar -W  |last -n 10
 

Linux 4.18.0-240.el8.x86_64 (server)       09/28/2021      _x86_64_        (8 CPU)

12:00:00 AM  pswpin/s pswpout/s
12:00:01 AM      0.00      0.00
12:01:01 AM      0.00      0.00
12:02:01 AM      0.00      0.00
12:03:01 AM      0.00      0.00
12:04:01 AM      0.00      0.00
12:05:01 AM      0.00      0.00
12:06:01 AM      0.00      0.00

[root@ccnrlb01 ~]# sar -r | tail -n 10
14:00:01        93008   1788832     95.06         0   1357700    725740      9.02    795168    683484        32
14:10:01        78756   1803084     95.81         0   1358780    725740      9.02    827660    652248        16
14:20:01        92844   1788996     95.07         0   1344332    725740      9.02    813912    651620        28
14:30:01        92408   1789432     95.09         0   1344612    725740      9.02    816392    649544        24
14:40:01        91740   1790100     95.12         0   1344876    725740      9.02    816948    649436        36
14:50:01        91688   1790152     95.13         0   1345144    725740      9.02    817136    649448        36
15:00:02        91544   1790296     95.14         0   1345448    725740      9.02    817472    649448        36
15:10:01        91108   1790732     95.16         0   1345724    725740      9.02    817732    649340        36
15:20:01        90844   1790996     95.17         0   1346000    725740      9.02    818016    649332        28
Average:        93473   1788367     95.03         0   1369583    725074      9.02    800965    671266        29

 

2. Check system load? Are my processes waiting too long to run on the CPU?

[root@server ~ ]# sar -q |head -n 10
Linux 4.18.0-240.el8.x86_64 (server)       09/28/2021      _x86_64_        (8 CPU)

12:00:00 AM   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15   blocked
12:00:01 AM         0       272      0.00      0.02      0.00         0
12:01:01 AM         1       271      0.00      0.02      0.00         0
12:02:01 AM         0       268      0.00      0.01      0.00         0
12:03:01 AM         0       268      0.00      0.00      0.00         0
12:04:01 AM         1       271      0.00      0.00      0.00         0
12:05:01 AM         1       271      0.00      0.00      0.00         0
12:06:01 AM         1       265      0.00      0.00      0.00         0


3. Show various CPU statistics per CPU use
 

On a multiprocessor, multi core server sometimes for scripting it is useful to fetch processor per use historic data, 
this can be attained with:

 

[hipo@server ~ ] $ mpstat -P ALL
Linux 4.18.0-240.el8.x86_64 (server)       09/28/2021      _x86_64_        (8 CPU)

06:08:38 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
06:08:38 PM  all    0.17    0.02    0.25    0.00    0.05    0.02    0.00    0.00    0.00   99.49
06:08:38 PM    0    0.22    0.02    0.28    0.00    0.06    0.03    0.00    0.00    0.00   99.39
06:08:38 PM    1    0.28    0.02    0.36    0.00    0.08    0.02    0.00    0.00    0.00   99.23
06:08:38 PM    2    0.27    0.02    0.31    0.00    0.06    0.01    0.00    0.00    0.00   99.33
06:08:38 PM    3    0.15    0.02    0.22    0.00    0.03    0.01    0.00    0.00    0.00   99.57
06:08:38 PM    4    0.13    0.02    0.20    0.01    0.03    0.01    0.00    0.00    0.00   99.60
06:08:38 PM    5    0.14    0.02    0.27    0.00    0.04    0.06    0.01    0.00    0.00   99.47
06:08:38 PM    6    0.10    0.02    0.17    0.00    0.04    0.02    0.00    0.00    0.00   99.65
06:08:38 PM    7    0.09    0.02    0.15    0.00    0.02    0.01    0.00    0.00    0.00   99.70


 

sar-sysstat-cpu-statistics-screenshot

Monitor processes and threads currently being managed by the Linux kernel.

[hipo@server ~ ] $ pidstat

pidstat-various-random-process-statistics

[hipo@server ~ ] $ pidstat -d 2


pidstat-show-processes-with-most-io-activities-linux-screenshot

This report tells us that there is few processes with heave I/O use Filesystem system journalling daemon jbd2, apache, mysqld and supervise, in 3rd column you see their respective PID IDs.

To show threads used inside a process (like if you press SHIFT + H) inside Linux top command:

[hipo@server ~ ] $ pidstat -t -p 10765 1 3

Linux 4.19.0-14-amd64 (server)     28.09.2021     _x86_64_    (10 CPU)

21:41:22      UID      TGID       TID    %usr %system  %guest   %wait    %CPU   CPU  Command
21:41:23      108     10765         –    1,98    0,99    0,00    0,00    2,97     1  mysqld
21:41:23      108         –     10765    0,00    0,00    0,00    0,00    0,00     1  |__mysqld
21:41:23      108         –     10768    0,00    0,00    0,00    0,00    0,00     0  |__mysqld
21:41:23      108         –     10771    0,00    0,00    0,00    0,00    0,00     5  |__mysqld
21:41:23      108         –     10784    0,00    0,00    0,00    0,00    0,00     7  |__mysqld
21:41:23      108         –     10785    0,00    0,00    0,00    0,00    0,00     6  |__mysqld
21:41:23      108         –     10786    0,00    0,00    0,00    0,00    0,00     2  |__mysqld

10765 – is the Process ID whose threads you would like to list

With pidstat, you can further monitor processes for memory leaks with:

[hipo@server ~ ] $ pidstat -r 2

 

4. Report paging statistics for some old period

 

[root@server ~ ]# sar -B -f /var/log/sa/sa27 |head -n 10
Linux 4.18.0-240.el8.x86_64 (server)       09/27/2021      _x86_64_        (8 CPU)

15:42:26     LINUX RESTART      (8 CPU)

15:55:30     LINUX RESTART      (8 CPU)

04:00:01 PM  pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s pgscand/s pgsteal/s    %vmeff
04:01:01 PM      0.00     14.47    629.17      0.00    502.53      0.00      0.00      0.00      0.00
04:02:01 PM      0.00     13.07    553.75      0.00    419.98      0.00      0.00      0.00      0.00
04:03:01 PM      0.00     11.67    548.13      0.00    411.80      0.00      0.00      0.00      0.00

 

5.  Monitor Received RX and Transmitted TX network traffic perl Network interface real time
 

To print out Received and Send traffic per network interface 4 times in a raw

sar-sysstats-network-traffic-statistics-screenshot
 

[hipo@server ~ ] $ sar -n DEV 1 4


To continusly monitor all network interfaces I/O traffic

[hipo@server ~ ] $ sar -n DEV 1


To only monitor a certain network interface lets say loopback interface (127.0.0.1) received / transmitted bytes

[hipo@server yum.repos.d] $  sar -n DEV 1 2|grep -i lo
06:29:53 PM        lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
06:29:54 PM        lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:           lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00


6. Monitor block devices use
 

To check block devices use 3 times in a raw
 

[hipo@server yum.repos.d] $ sar -d 1 3


sar-sysstats-blockdevice-statistics-screenshot
 

7. Output server monitoring data in CSV database structured format


For preparing a nice graphs with Excel from CSV strucuted file format, you can dump the collected data as so:

 [root@server yum.repos.d]# sadf -d /var/log/sa/sa27 — -n DEV | grep -v lo|head -n 10
server-name-fqdn;-1;2021-09-27 13:42:26 UTC;LINUX-RESTART    (8 CPU)
# hostname;interval;timestamp;IFACE;rxpck/s;txpck/s;rxkB/s;txkB/s;rxcmp/s;txcmp/s;rxmcst/s;%ifutil
server-name-fqdn;-1;2021-09-27 13:55:30 UTC;LINUX-RESTART    (8 CPU)
# hostname;interval;timestamp;IFACE;rxpck/s;txpck/s;rxkB/s;txkB/s;rxcmp/s;txcmp/s;rxmcst/s;%ifutil
server-name-fqdn;60;2021-09-27 14:01:01 UTC;eth1;19.42;16.12;1.94;1.68;0.00;0.00;0.00;0.00
server-name-fqdn;60;2021-09-27 14:01:01 UTC;eth0;7.18;9.65;0.55;0.78;0.00;0.00;0.00;0.00
server-name-fqdn;60;2021-09-27 14:01:01 UTC;eth2;5.65;5.13;0.42;0.39;0.00;0.00;0.00;0.00
server-name-fqdn;60;2021-09-27 14:02:01 UTC;eth1;18.90;15.55;1.89;1.60;0.00;0.00;0.00;0.00
server-name-fqdn;60;2021-09-27 14:02:01 UTC;eth0;7.15;9.63;0.55;0.74;0.00;0.00;0.00;0.00
server-name-fqdn;60;2021-09-27 14:02:01 UTC;eth2;5.67;5.15;0.42;0.39;0.00;0.00;0.00;0.00

To graph the output data you can use Excel / LibreOffice's Excel equivalent Calc or if you need to dump a CSV sar output and generate it on the fly from a script  use gnuplot 


What we've learned?


How to install and enable on cron sysstats on Redhat and CentOS 8 Linux ? 
How to continuously monitor CPU / Disk and Network, block devices, paging use and processes and threads used by the kernel per process ?  
As well as how to export previously collected data to CSV to import to database or for later use inrder to generate graphic presentation of data.
Cheers ! 🙂

 

Fix Zabbix selinux caused permission issues on CentOS 7 Linux / cannot set resource limit: [13] Permission denied error solution

Tuesday, July 6th, 2021

zabbix-selinux-logo-fix-zabbix-permission-issues-when-running-on-ceontos-linux-change-selinux-to-permissive-howto.

If you have to install Zabbix client that has to communicate towards Zabbix server via a Zabbix Proxy you might be unpleasently surprised that it cannot cannot be start if the selinux mode is set to Enforcing.
Error message like on below screenshot will be displayed when starting proxy client with systemctl.

zabbix-proxy-cannot-be-started-due-to-selinux-permissions

In the zabbix logs you will see error  messages such as:
 

"cannot set resource limit: [13] Permission denied, CentOS 7"

 

29085:20160730:062959.263 Starting Zabbix Agent [Test host]. Zabbix 3.0.4 (revision 61185).
29085:20160730:062959.263 **** Enabled features ****
29085:20160730:062959.263 IPv6 support: YES
29085:20160730:062959.263 TLS support: YES
29085:20160730:062959.263 **************************
29085:20160730:062959.263 using configuration file: /etc/zabbix/zabbix_agentd.conf
29085:20160730:062959.263 cannot set resource limit: [13] Permission denied
29085:20160730:062959.263 cannot disable core dump, exiting…

 

Next step to do is to check whether zabbix is listed in selinux's enabled modules to do so run:
 

[root@centos ~ ]# semodules -l

…..
vhostmd    1.1.0
virt    1.5.0
vlock    1.2.0
vmtools    1.0.0
vmware    2.7.0
vnstatd    1.1.0
vpn    1.16.0
w3c    1.1.0
watchdog    1.8.0
wdmd    1.1.0
webadm    1.2.0
webalizer    1.13.0
wine    1.11.0
wireshark    2.4.0
xen    1.13.0
xguest    1.2.0
xserver    3.9.4
zabbix    1.6.0
zarafa    1.2.0
zebra    1.13.0
zoneminder    1.0.0
zosremote    1.2.0

 

[root@centos ~ ]# sestatus
# sestatusSELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   enforcing
Mode from config file:          enforcing
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Max kernel policy version:      28

To get exact zabbix IDs that needs to be added as permissive for Selinux you can use ps -eZ like so:

[root@centos ~ ]# ps -eZ |grep -i zabbix
system_u:system_r:zabbix_agent_t:s0 1149 ?     00:00:00 zabbix_agentd
system_u:system_r:zabbix_agent_t:s0 1150 ?     00:04:28 zabbix_agentd
system_u:system_r:zabbix_agent_t:s0 1151 ?     00:00:00 zabbix_agentd
system_u:system_r:zabbix_agent_t:s0 1152 ?     00:00:00 zabbix_agentd
system_u:system_r:zabbix_agent_t:s0 1153 ?     00:00:00 zabbix_agentd
system_u:system_r:zabbix_agent_t:s0 1154 ?     02:21:46 zabbix_agentd

As you can see zabbix is enabled and hence selinux enforcing mode is preventing zabbix client / server to operate and communicate normally, hence to make it work we need to change zabbix agent and zabbix proxy to permissive mode.

Setting selinux for zabbix agent and zabbix proxy to permissive mode

If you don't have them installed you might neet the setroubleshoot setools, setools-console and policycoreutils-python rpms packs (if you have them installed skip this step).

[root@centos ~ ]# yum install setroubleshoot.x86_64 setools.x86_64 setools-console.x86_64 policycoreutils-python.x86_64

Then to add zabbix service to become permissive either run

[root@centos ~ ]# semanage permissive –add zabbix_t

[root@centos ~ ]# semanage permissive -a zabbix_agent_t


In some cases you might also need in case if just adding the permissive for zabbix_agent_t try also :

setsebool -P zabbix_can_network=1

Next try to start zabbox-proxy and zabbix-agent systemd services 

[root@centos ~ ]# systemctl start zabbix-proxy.service

[root@centos ~ ]# systemctl start zabbix-agent.service

Hopefully all should report fine with the service checking the status should show you something like:

[root@centos ~ ]# systemctl status zabbix-agent
● zabbix-agent.service – Zabbix Agent
   Loaded: loaded (/usr/lib/systemd/system/zabbix-agent.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2021-06-24 07:47:42 CEST; 1 weeks 5 days ago
 Main PID: 1149 (zabbix_agentd)
   CGroup: /system.slice/zabbix-agent.service
           ├─1149 /usr/sbin/zabbix_agentd -c /etc/zabbix/zabbix_agentd.conf
           ├─1150 /usr/sbin/zabbix_agentd: collector [idle 1 sec]
           ├─1151 /usr/sbin/zabbix_agentd: listener #1 [waiting for connection]
           ├─1152 /usr/sbin/zabbix_agentd: listener #2 [waiting for connection]
           ├─1153 /usr/sbin/zabbix_agentd: listener #3 [waiting for connection]
           └─1154 /usr/sbin/zabbix_agentd: active checks #1 [idle 1 sec]

Check the Logs finally to make sure all is fine with zabbix being allowed by selinux.

[root@centos ~ ]# grep zabbix_proxy /var/log/audit/audit.log

[root@centos ~ ]# tail -n 100 /var/log/zabbix/zabbix_agentd.log


If no errors are in and you receive and you can visualize the usual zabbix collected CPU / Memory / Disk etc. values you're good, Enjoy ! 🙂

Add Zabbix time synchronization ntp userparameter check script to Monitor Linux servers

Tuesday, December 8th, 2020

Zabbix-logo-how-to-make-ntpd-time-server-monitoring-article

 

How to add Zabbix time synchronization ntp userparameter check script to Monitor Linux servers?

We needed to set on some servers at my work an elementary check with Zabbix monitoring to check whether servers time is correctly synchronized with ntpd time service as well report if the ntp daemon is correctly running on the machine. For that a userparameter script was developed called userparameter_ntp.conf the script is simplistic and few a lines of bash shell scripting 
stuff is based on gresping information required from ntpq and ntpstat common ntp client commands to get information about the status of time synchronization on the servers.
 

[root@linuxserver ]# ntpstat
synchronised to NTP server (10.80.200.30) at stratum 3
   time correct to within 47 ms
   polling server every 1024 s

 

[root@linuxserver ]# ntpq -c peers
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+timeserver1 10.26.239.41     2 u  319 1024  377   15.864    1.270   0.262
+timeserver2 10.82.239.41     2 u  591 1024  377   16.287   -0.334   1.748
*timeserver3 10.82.239.43     2 u   47 1024  377   15.613   -0.553   0.251
 timeserver4 .INIT.          16 u    – 1024    0    0.000    0.000   0.000


Below is Zabbix UserParameter script that does report us 3 important values we monitor to make sure time server synchronization works as expected the zabbix keys we set are ntp.offset, ntp.sync, ntp.exact in attempt to describe what we're fetching from ntp client:

[root@linuxserver ]# cat /etc/zabbix/zabbix-agent.d/userparameter_ntp.conf

UserParameter=ntp.offset,(/usr/sbin/ntpq -pn | /usr/bin/awk 'BEGIN { offset=1000 } $1 ~ /\*/ { offset=$9 } END { print offset }')
#UserParameter=ntp.offset,(/usr/sbin/ntpq -pn | /usr/bin/awk 'FNR==4{print $9}')
UserParameter=ntp.sync,(/usr/bin/ntpstat | cut -f 1 -d " " | tr -d ' \t\n\r\f')
UserParameter=ntp.exact,(/usr/bin/ntpstat | /usr/bin/awk 'FNR==2{print $5,$6}')

In Zabbix the monitored ntpd parameters set-upped looks like this:

 

ntp_time_synchronization_check-zabbix-screenshot.

 

!Note that in above userparameter example, the commented userparameter script is a just another way to do an ntpd offset returned value which was developed before the more sophisticated with more regular expression checks from the /usr/sbin/ntpd via ntpq, perhaps if you want to extend it you can also use another script to report more verbose information to Zabbix if that is required like ouput from ntpq -c peers command:
 

UserParameter=ntp.verbose,(/usr/sbin/ntpq -c peers)

Of course to make the Zabbix fetch necessery data from monitored hosts, we need to set-up further new Zabbix Template with the respective Trigger and Items.

Below are few screenshots including the triggers used.

ntpd_server-time_synchronization_check-zabbix-screenshot-triggers

  • ntpd.trigger

{NTP:net.udp.service[ntp].last(0)}<1

  • NTP Synchronization trigger

{NTP:ntp.sync.iregexp(unsynchronised)}=1

 

 

As you can see from history we have setup our items to Store history of reported data to Zabbix from parameter script for 90 days and update our monitor check, every 30 seconds from the monitored hosts to which Tempate is applied.

Well that's all folks, time synchronization issues we'll be promptly triggering a new Alarm in Zabbix !

Check server Internet connectivity Speedtest from Linux terminal CLI

Friday, August 7th, 2020

check-server-console-speedtest

If you are a system administrator of a dedicated server and you have no access to Xserver Graphical GNOME / KDE etc. environment and you wonder how you can track the bandwidth connectivity speed of remote system to the internet and you happen to have a modern Linux distribution, here is few ways to do a speedtest.
 

1. Use speedtest-cli command line tool to test connectivity

 


speedtest-cli is a tiny tool written in python, to use it hence you need to have python installed on the server.
It is available both for Redhat Linux distros and Debians / Ubuntus etc. in the list of standard installable packages.

a) Install speedtest-cli on Fedora / CentOS / RHEL
 

On CentOS / RHEL / Scientific Linux lower than ver 8:

 

 

$ sudo yum install python

On CentOS 8 / RHEL 8 user type the following command to install Python 3 or 2:

 

 

$sudo yum install python3
$ sudo yum install python2

 

 

 


On Fedora Linux version 22+

 

 

$ sudo dnf install python
$ sudo dnf install pytho3

 


Once python is at place download speedtest.py or in case if link is not reachable download mirrored version of speedtest.py on www.pc-freak.net here
 

 

 

$ wget -O speedtest-cli https://raw.githubusercontent.com/sivel/speedtest-cli/master/speedtest.py
$ chmod +x speedtest-cli

 


Then it is time to run script speedtest-screenshot-linux-terminal-console-cli-cmd
To test enabled Bandwidth on the server

 

 

$ python speedtest-cli


b) Install speedtest-cli on Debian

On Latest Debian 10 Buster speedtest is available out of the box in regular .deb repositories, so fetch it with apt
 

 

# apt install –yes speedtest-cli

 


You can give now speedtest-cli a try with –bytes arguments to get speed values in bytes instead of bits or if you want to generate an image with test results in picture just like it will appear if you use speedtest.net inside a gui browser, use the –share option

speedtest-screenshot-linux-terminal-console-cli-cmd-options

 

 

 

2. Getting connectivity results of all defined speedtest test City Locations


Speedtest has a list of servers through which a Upload and Download speed is tested, to run speedtest-cli to test with each and every server and get a better picture on what kind of connectivity to expect from your server towards the closest region capital cities, fetch speedtest-servers.php list and use a small shell loop below is how:

 

 

 

 

 

root@pcfreak:~#  wget http://www.speedtest.net/speedtest-servers.php
–2020-08-07 16:31:34–  http://www.speedtest.net/speedtest-servers.php
Преобразувам www.speedtest.net (www.speedtest.net)… 151.101.2.219, 151.101.66.219, 151.101.130.219, …
Connecting to www.speedtest.net (www.speedtest.net)|151.101.2.219|:80… успешно свързване.
HTTP изпратено искане, чакам отговор… 301 Moved Permanently
Адрес: https://www.speedtest.net/speedtest-servers.php [следва]
–2020-08-07 16:31:34–  https://www.speedtest.net/speedtest-servers.php
Connecting to www.speedtest.net (www.speedtest.net)|151.101.2.219|:443… успешно свързване.
HTTP изпратено искане, чакам отговор… 307 Temporary Redirect
Адрес: https://c.speedtest.net/speedtest-servers-static.php [следва]
–2020-08-07 16:31:35–  https://c.speedtest.net/speedtest-servers-static.php
Преобразувам c.speedtest.net (c.speedtest.net)… 151.101.242.219
Connecting to c.speedtest.net (c.speedtest.net)|151.101.242.219|:443… успешно свързване.
HTTP изпратено искане, чакам отговор… 200 OK
Дължина: 211695 (207K) [text/xml]
Saving to: ‘speedtest-servers.php’
speedtest-servers.php                  100%[==========================================================================>] 206,73K  –.-KB/s    in 0,1s
2020-08-07 16:31:35 (1,75 MB/s) – ‘speedtest-servers.php’ saved [211695/211695]

Once file is there with below loop we extract all file defined servers id="" 's 
 

root@pcfreak:~# for i in $(cat speedtest-servers.php | egrep -Eo 'id="[0-9]{4}"' |sed -e 's#id="##' -e 's#"##g'); do speedtest-cli  –server $i; done
Retrieving speedtest.net configuration…
Testing from Vivacom (83.228.93.76)…
Retrieving speedtest.net server list…
Retrieving information for the selected server…
Hosted by Telecoms Ltd. (Varna) [38.88 km]: 25.947 ms
Testing download speed……………………………………………………………………..
Download: 57.71 Mbit/s
Testing upload speed…………………………………………………………………………………………
Upload: 93.85 Mbit/s
Retrieving speedtest.net configuration…
Testing from Vivacom (83.228.93.76)…
Retrieving speedtest.net server list…
Retrieving information for the selected server…
Hosted by GMB Computers (Constanta) [94.03 km]: 80.247 ms
Testing download speed……………………………………………………………………..
Download: 35.86 Mbit/s
Testing upload speed…………………………………………………………………………………………
Upload: 80.15 Mbit/s
Retrieving speedtest.net configuration…
Testing from Vivacom (83.228.93.76)…

…..

 


etc.

For better readability you might want to add the ouput to a file or even put it to run periodically on a cron if you have some suspcion that your server Internet dedicated lines dies out to some general locations sometimes.
 

3. Testing UPlink speed with Download some big file from source location


In the past a classical way to test the bandwidth connectivity of your Internet Service Provider was to fetch some big file, Linux guys should remember it was almost a standard to roll a download of Linux kernel source .tar file with some test browser as elinks / lynx / w3c.
speedtest-screenshot-kernel-org-shot1 speedtest-screenshot-kernel-org-shot2
or if those are not at hand test connectivity on remote free shell servers whatever file downloader as wget or curl was used.
Analogical method is still possible, for example to use wget to get an idea about bandwidtch connectivity, let it roll below 500 mb from speedtest.wdc01.softlayer.com to /dev/null few times:

 

$ wget –output-document=/dev/null http://speedtest.wdc01.softlayer.com/downloads/test500.zip

$ wget –output-document=/dev/null http://speedtest.wdc01.softlayer.com/downloads/test500.zip

$ wget –output-document=/dev/null http://speedtest.wdc01.softlayer.com/downloads/test500.zip

 

# wget -O /dev/null –progress=dot:mega http://cachefly.cachefly.net/10mb.test ; date
–2020-08-07 13:56:49–  http://cachefly.cachefly.net/10mb.test
Resolving cachefly.cachefly.net (cachefly.cachefly.net)… 205.234.175.175
Connecting to cachefly.cachefly.net (cachefly.cachefly.net)|205.234.175.175|:80… connected.
HTTP request sent, awaiting response… 200 OK
Length: 10485760 (10M) [application/octet-stream]
Saving to: ‘/dev/null’

     0K …….. …….. …….. …….. …….. …….. 30%  142M 0s
  3072K …….. …….. …….. …….. …….. …….. 60%  179M 0s
  6144K …….. …….. …….. …….. …….. …….. 90%  204M 0s
  9216K …….. ……..                                    100%  197M=0.06s

2020-08-07 13:56:50 (173 MB/s) – ‘/dev/null’ saved [10485760/10485760]

Fri 07 Aug 2020 01:56:50 PM UTC


To be sure you have a real picture on remote machine Internet speed it is always a good idea to run download of random big files on a certain locations that are well known to have a very stable Internet bandwidth to the Internet backbone routers.

4. Using Simple shell script to test Internet speed


Fetch and use speedtest.sh

 


wget https://raw.github.com/blackdotsh/curl-speedtest/master/speedtest.sh && chmod u+x speedtest.sh && bash speedtest.sh

 

 

5. Using iperf to test connectivity between two servers 

 

iperf is another good tool worthy to mention that can be used to test the speed between client and server.

To use iperf install it with apt and do on the server machine to which bandwidth will be tested:

 

# iperf -s 

 

On the client machine do:

 

# iperf -c 192.168.1.1 

 

where 192.168.1.1 is the IP of the server where iperf was spawned to listen.

6. Using Netflix fast to determine Internet connection speed on host


Fast

fast is a service provided by Netflix. Its web interface is located at Fast.com and it has a command-line interface available through npm (npm is a package manager for nodejs) so if you don't have it you will have to install it first with:

# apt install –yes npm

 

Note that if you run on Debian this will install you some 249 new nodejs packages which you might not want to have on the system, so this is useful only for machines that has already use of nodejs.

 

$ fast

 

     82 Mbps ↓


The command returns your Internet download speed. To get your upload speed, use the -u flag:

 

$ fast -u

 

   ⠧ 80 Mbps ↓ / 8.2 Mbps ↑

 

7. Use speedometer / iftop to measure incoming and outgoing traffic on interface


If you're measuring connectivity on a live production server system, then you might consider that the measurement output might not be exactly correct especially if you're measuring the Uplink / Downlink on a Heavy loaded webserver / Mail Server / Samba or DNS server.
If this is the case a very useful tools to consider to extract the already taken traffic used on your Incoming and Outgoing ( TX / RX ) Network interfaces
are speedometer and iftop, they're present and installable depending on the OS via yum / apt or the respective package manager.

 


To install on Debian server:

 

 

 

# apt install –yes iftop speedometer

 


The most basic use to check the live received traffic in a nice Ncurses like text graphic is with: 

 

 

 

 

# speedometer -r 


speedometer-check-received-transmitted-network-traffic-on-linux1

To generate real time ASCII art graph on RX / TX traffic do:

 

 

# speedometer -r eth0 -t eth0


speedometer-check-received-transmitted-network-traffic-on-linux

 

 

 

 

# iftop -P -i eth0

 

 


iftop-show-statistics-on-connections-screenshot-pcfreak

 

 

 

 

 

Report haproxy node switch script useful for Zabbix or other monitoring

Tuesday, June 9th, 2020

zabbix-monitoring-logo
For those who administer corosync clustered haproxy and needs to build monitoring in case if the main configured Haproxy node in the cluster is changed, I've developed a small script to be integrated with zabbix-agent installed to report to a central zabbix server via a zabbix proxy.
The script  is very simple it assumed DC1 variable is the default used haproxy node and DC2 and DC3 are 2 backup nodes. The script is made to use crm_mon which is not installed by default on each server by default so if you'll be using it you'll have to install it first, but anyways the script can easily be adapted to use pcs cmd instead.

Below is the bash shell script:

UserParameter=active.dc,f=0; for i in $(sudo /usr/sbin/crm_mon -n -1|grep -i 'Node ' |awk '{ print $2 }'); do ((f++)); DC[$f]="$i"; done; \
DC=$(sudo /usr/sbin/crm_mon -n -1 | grep 'Current DC' | awk '{ print $1 " " $2 " " $3}' | awk '{ print $3 }'); \
if [ “$DC” == “${DC[1]}” ]; then echo “1 Default DC Switched to ${DC[1]}”; elif [ “$DC” == “${DC[2]}” ]; then \
echo "2 Default DC Switched to ${DC[2]}”; elif [ “$DC” == “${DC[3]}” ]; then echo “3 Default DC: ${DC[3]}"; fi


To configure it with zabbix monitoring it can be configured via UserParameterScript.

The way I configured  it in Zabbix is as so:


1. Create the userpameter_active_node.conf

Below script is 3 nodes Haproxy cluster

# cat > /etc/zabbix/zabbix_agentd.d/userparameter_active_node.conf

UserParameter=active.dc,f=0; for i in $(sudo /usr/sbin/crm_mon -n -1|grep -i 'Node ' |awk '{ print $2 }'); do ((f++)); DC[$f]="$i"; done; \
DC=$(sudo /usr/sbin/crm_mon -n -1 | grep 'Current DC' | awk '{ print $1 " " $2 " " $3}' | awk '{ print $3 }'); \
if [ “$DC” == “${DC[1]}” ]; then echo “1 Default DC Switched to ${DC[1]}”; elif [ “$DC” == “${DC[2]}” ]; then \
echo "2 Default DC Switched to ${DC[2]}”; elif [ “$DC” == “${DC[3]}” ]; then echo “3 Default DC: ${DC[3]}"; fi

Once pasted to save the file press CTRL + D


The version of the script with 2 nodes slightly improved is like so:
 

UserParameter=active.dc,f=0; for i in $(sudo /usr/sbin/crm_mon -n -1|grep -i 'Node ' |awk '{ print $2 }' | sed -e 's#:##g'); do DC_ARRAY[$f]=”$i”; ((f++)); done; GET_CURR_DC=$(sudo /usr/sbin/crm_mon -n -1 | grep ‘Current DC’ | awk ‘{ print $1 ” ” $2 ” ” $3}’ | awk ‘{ print $3 }’); if [ “$GET_CURR_DC” == “${DC_ARRAY[0]}” ]; then echo “1 Default DC ${DC_ARRAY[0]}”; fi; if [ “$GET_CURR_DC” == “${DC_ARRAY[1]}” ]; then echo “2 Default Current DC Switched to ${DC_ARRAY[1]} Please check “; fi; if [ -z “$GET_CURR_DC” ] || [ -z “$DC_ARRAY[1]” ]; then printf "Error something might be wrong with HAProxy Cluster on  $HOSTNAME "; fi;


The haproxy_active_DC_zabbix.sh script with a bit of more comments as explanations is available here 
2. Configure access for /usr/sbin/crm_mon for zabbix user in sudoers

 

# vim /etc/sudoers

zabbix          ALL=NOPASSWD: /usr/sbin/crm_mon


3. Configure in Zabbix for active.dc key Trigger and Item

active-node-switch1

Monitoring Linux hardware Hard Drives / Temperature and Disk with lm_sensors / smartd / hddtemp and Zabbix Userparameter lm_sensors report script

Thursday, April 30th, 2020

monitoring-linux-hardware-with-software-temperature-disk-cpu-health-zabbix-userparameter-script

I'm part of a  SysAdmin Team that is partially doing some minor Zabbix imrovements on a custom corporate installed Zabbix in an ongoing project to substitute the previous HP OpenView monitoring for a bunch of Legacy Linux hosts.
As one of the necessery checks to have is regarding system Hardware, the task was to invent some simplistic way to monitor hardware with the Zabbix Monitoring tool.  Monitoring Bare Metal servers hardware of HP / Dell / Fujituse etc. servers  in Linux usually is done with a third party software provided by the Hardware vendor. But as this requires an additional services to run and sometimes is not desired. It was interesting to find out some alternative Linux native ways to do the System hardware monitoring.
Monitoring statistics from the system hardware components can be obtained directly from the server components with ipmi / ipmitool (for more info on it check my previous article Reset and Manage intelligent  Platform Management remote board article).
With ipmi
 hardware health info could be received straight from the ILO / IDRAC / HPMI of the server. However as often the Admin-Lan of the server is in a seperate DMZ secured network and available via only a certain set of routed IPs, ipmitool can't be used.

So what are the other options to use to implement Linux Server Hardware Monitoring?

The tools to use are perhaps many but I know of two which gives you most of the information you ever need to have a prelimitary hardware damage warning system before the crash, these are:
 

1. smartmontools (smartd)

Smartd is part of smartmontools package which contains two utility programs (smartctl and smartd) to control and monitor storage systems using the Self-Monitoring, Analysis and Reporting Technology system (SMART) built into most modern ATA/SATA, SCSI/SAS and NVMe disks

Disk monitoring is handled by a special service the package provides called smartd that does query the Hard Drives periodically aiming to find a warning signs of hardware failures.
The downside of smartd use is that it implies a little bit of extra load on Hard Drive read / writes and if misconfigured could reduce the the Hard disk life time.

 

linux:~#  /usr/sbin/smartctl -a /dev/sdb2
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-5-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     KINGSTON SA400S37240G
Serial Number:    50026B768340AA31
LU WWN Device Id: 5 0026b7 68340aa31
Firmware Version: S1Z40102
User Capacity:    240,057,409,536 bytes [240 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Thu Apr 30 14:05:01 2020 EEST
SMART support is: Available – device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  120) seconds.
Offline data collection
capabilities:                    (0x11) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0002) Does not save SMART data before
                                        entering power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  10) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   100   100   000    Old_age   Always       –       100
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       –       2820
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       –       21
148 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       0
149 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       0
167 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       0
168 Unknown_Attribute       0x0012   100   100   000    Old_age   Always       –       0
169 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       0
170 Unknown_Attribute       0x0000   100   100   010    Old_age   Offline      –       0
172 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       –       0
173 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       0
181 Program_Fail_Cnt_Total  0x0032   100   100   000    Old_age   Always       –       0
182 Erase_Fail_Count_Total  0x0000   100   100   000    Old_age   Offline      –       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       –       0
192 Power-Off_Retract_Count 0x0012   100   100   000    Old_age   Always       –       16
194 Temperature_Celsius     0x0022   034   052   000    Old_age   Always       –       34 (Min/Max 19/52)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       –       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       –       0
218 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       –       0
231 Temperature_Celsius     0x0000   097   097   000    Old_age   Offline      –       97
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       –       2104
241 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       –       1857
242 Total_LBAs_Read         0x0032   100   100   000    Old_age   Always       –       1141
244 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       32
245 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       107
246 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       15940

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

Selective Self-tests/Logging not supported

 

2. hddtemp

 

Usually if smartd is used it is useful to also use hddtemp which relies on smartd data.
 The hddtemp program monitors and reports the temperature of PATA, SATA
 or SCSI hard drives by reading Self-Monitoring Analysis and Reporting
 Technology (S.M.A.R.T.)
information on drives that support this feature.
 

linux:~# /usr/sbin/hddtemp /dev/sda1
/dev/sda1: Hitachi HDS721050CLA360: 31°C
linux:~# /usr/sbin/hddtemp /dev/sdc6
/dev/sdc6: KINGSTON SV300S37A120G: 25°C
linux:~# /usr/sbin/hddtemp /dev/sdb2
/dev/sdb2: KINGSTON SA400S37240G: 34°C
linux:~# /usr/sbin/hddtemp /dev/sdd1
/dev/sdd1: WD Elements 10B8: S.M.A.R.T. not available

 

 

3. lm-sensors / i2c-tools 

 Lm-sensors is a hardware health monitoring package for Linux. It allows you
 to access information from temperature, voltage, and fan speed sensors.
i2c-tools
was historically bundled in the same package as lm_sensors but has been seperated cause not all hardware monitoring chips are I2C devices, and not all I2C devices are hardware monitoring chips.

The most basic use of lm-sensors is with the sensors command

 

linux:~# sensors
i350bb-pci-0600
Adapter: PCI adapter
loc1:         +55.0 C  (high = +120.0 C, crit = +110.0 C)

 

coretemp-isa-0000
Adapter: ISA adapter
Physical id 0:  +28.0 C  (high = +78.0 C, crit = +88.0 C)
Core 0:         +26.0 C  (high = +78.0 C, crit = +88.0 C)
Core 1:         +28.0 C  (high = +78.0 C, crit = +88.0 C)
Core 2:         +28.0 C  (high = +78.0 C, crit = +88.0 C)
Core 3:         +28.0 C  (high = +78.0 C, crit = +88.0 C)

 


On CentOS Linux useful tool is also  lm_sensors-sensord.x86_64 – A Daemon that periodically logs sensor readings to syslog or a round-robin database, and warns of sensor alarms.

In Debian Linux there is also the psensors-server (an HTTP server providing JSON Web service which can be used by GTK+ Application to remotely monitor sensors) useful for developers
psesors-server

psensor-linux-graphical-tool-to-check-cpu-hard-disk-temperature-unix

If you have a Xserver installed on the Server accessed with Xclient or via VNC though quite rare,
You can use xsensors or Psensora GTK+ (Widget Toolkit for creating Graphical User Interface) application software.

With this 3 tools it is pretty easy to script one liners and use the Zabbix UserParameters functionality to send hardware report data to a Company's Zabbix Sserver, though Zabbix has already some templates to do so in my case, I couldn't import this templates cause I don't have Zabbix Super-Admin credentials, thus to work around that a sample work around is use script to monitor for higher and critical considered temperature.
Here is a tiny sample script I came up in 1 min time it can be used to used as 1 liner UserParameter and built upon something more complex.

SENSORS_HIGH=`sensors | awk '{ print $6 }'| grep '^+' | uniq`;
SENSORS_CRIT=`sensors | awk '{ print $9 }'| grep '^+' | uniq`; ;SENSORS_STAT=`sensors|grep -E 'Core\s' | awk '{ print $1" "$2" "$3 }' | grep "$SENSORS_HIGH|$SENSORS_CRIT"`;
if [ ! -z $SENSORS_STAT ]; then
echo 'Temperature HIGH';
else 
echo 'Sensors OK';
fi 

Of course there is much more sophisticated stuff to use for monitoring out there


Below script can be easily adapted and use on other Monitoring Platforms such as Nagios / Munin / Cacti / Icinga and there are plenty of paid solutions, but for anyone that wants to develop something from scratch just like me I hope this
article will be a good short introduction.
If you know some other Linux hardware monitoring tools, please share.

ipmitool: Reset and manage IPMI (Intelligent Platform Management Interface) / ILO (Integrated Lights Out) remote board on Linux servers

Friday, December 20th, 2019

ipmitool-how-to-get-information-about-hardware-and-reset-ipmi-bmc-linux-access-ipmi-ilo-interface-logo

As a system administration nomatter whether you manage a bunch of server in a own brew and run Data Center location with some Rack mounted Hardware like PowerEdge M600 / ProLiant DL360e G8 / ProLiant DL360 Gen9 (755258-B21) or you're managing a bunch of Dedicated Servers, you're or will be faced  at some point to use the embedded in many Rack mountable rack servers IPMI / ILO interface remote console board management. If IPMI / ILO terms are new for you I suggest you quickly read my earlier article What is IPMI / IPKVM / ILO /  DRAC Remote Management interfaces to server .

hp-proliant-bl460c-ILO-Interface-screenshot

HP Proliant BL460 C IPMI (ILO) Web management interface 

In short Remote Management Interface is a way that gives you access to the server just like if you had a Monitor and a Keyboard plugged in directly to server.
When a remote computer is down the sysadmin can access it through IPMI and utilize a text console to the boot screen.
The IPMI protocol specification is led by Intel and was first published on September 16, 1998. and currently is supported by more than 200 computer system vendors, such as Cisco, Dell, Hewlett Packard Enterprise, Intel, NEC Corporation, SuperMicro and Tyan and is a standard for remote board management for servers.

IPMI-Block-Diagram-how-ipmi-works-and-its-relation-to-BMC
As you can see from diagram Baseboard Management Controllers (BMCs) is like the heart of IPMI.

Having this ILO / IPMI access is usually via a Web Interface Java interface that gives you the console and usually many of the machines also have an IP address via which a normal SSH command prompt is available giving you ability to execute diagnostic commands to the ILO on the status of attached hardware components of the server / get information about the attached system sensors to get report about things such as:

  • The System Overall heat
  • CPU heat temperature
  • System fan rotation speed cycles
  • Extract information about the server chassis
  • Query info about various system peripherals
  • Configure BIOS or UEFI on a remote system with no monitor / keyboard attached

Having a IPMI (Intelligent Platform Management Interface) firmware embedded into the server Motherboard is essential for system administration because besides this goodies it allows you to remotely Install Operating System to a server without any pre-installed OS right after it is bought and mounted to the planned Data Center Rack nest, just like if you have a plugged Monitor / Keyboard and Mouse and being physically in the remote location.

IPMI is mega useful for system administration also in case of Linux / Windows system updates that requires reboot in which essential System Libraries or binaries are updated and a System reboot is required, because often after system Large bundle updates or Release updates the system fails to boot and you need a way to run a diagnostic stuff from a System rescue Operating System living on a plugged in via a USB stick or CD Drive.
As prior said IPMI remote board is usually accessed and used via some Remote HTTPS encrypted web interface or via Secure Shell crypted session but sometimes the Web server behind the IPMI Web Interface is hanging especially when multiple sysadmins try to access it or due to other stuff and at times due to strange stuff even console SSH access might not be there, thansfully those who run a GNU / Linux Operating system on the Hardware node can use ipmitool tool http://ipmitool.sourceforge.net/ written for Linux that is capable to do a number of useful things with the IPMI management board including a Cold Reset of it so it turns back to working state / adding users / grasping the System hardware and components information health status, changing the Listener address of the IPMI access Interface and even having ability to update the IPMI version firmware.

Prior to be able to access IPMI remotely it has to be enabled usually via a UTP cable connected to the Network from which you expect it to be accesible. The location of the IPMI port on different server vendors is different.

ibm-power9-server-ipmi

IBM Power 9 Server IPMI port

HP-ILO-Bladeserver-Management-port-MGMT-yellow-cabled

HP IPMI console called ILO (Integrated Lights-Out) Port cabled with yellow cable (usually labelled as
Management Port MGMT)

Supermicro-SSG-5029P-E1CTR12L-Rear-Annotated-dedicated-IPMI-lan-port

Supermicro server IPMI Dedicated Lan Port

 

 In this article I'll shortly explain how IPMITool is available and can be installed and used across GNU / Linux Debian / Ubuntu and other deb based Linuxes with apt or on Fedora / CentOS (RPM) based with yum etc.

 

1. Install IPMITool

 

– On Debian

 

# apt-get install –yes ipmitool 

 

– On CentOS

 

# yum install ipmitool OpenIPMI-tools

 

# ipmitool -V
ipmitool version 1.8.14

 

On CentOS ipmitool can run as a service and collect data and do some nice stuff to run it:

 

[root@linux ~]# chkconfig ipmi on 

 

[root@linux ~]# service ipmi start

 

Before start using it is worthy to give here short description from ipmitool man page
 

DESCRIPTION
       This program lets you manage Intelligent Platform Management Interface (IPMI) functions of either the local system, via a kernel device driver, or a remote system, using IPMI v1.5 and IPMI v2.0.
       These functions include printing FRU information, LAN configuration, sensor readings, and remote chassis power control.

IPMI management of a local system interface requires a compatible IPMI kernel driver to be installed and configured.  On Linux this driver is called OpenIPMI and it is included in standard  dis‐
       tributions.   On Solaris this driver is called BMC and is included in Solaris 10.  Management of a remote station requires the IPMI-over-LAN interface to be enabled and configured.  Depending on
       the particular requirements of each system it may be possible to enable the LAN interface using ipmitool over the system interface.

 

2. Get ADMIN IP configured for access

https://3.bp.blogspot.com/-jojgWqj7acg/Wo6bSP0Av1I/AAAAAAAAGdI/xaHewnmAujkprCiDXoBxV7uHonPFjtZDwCLcBGAs/s1600/22-02-2018%2B15-31-09

To get a list of what is the current listener IP with no access to above Web frontend via which IPMI can be accessed (if it is cabled to the Access / Admin LAN port).

 

# ipmitool lan print 1
Set in Progress         : Set Complete
Auth Type Support       : NONE MD2 MD5 PASSWORD
Auth Type Enable        : Callback : MD2 MD5 PASSWORD
                        : User     : MD2 MD5 PASSWORD
                        : Operator : MD2 MD5 PASSWORD
                        : Admin    : MD2 MD5 PASSWORD
                        : OEM      :
IP Address Source       : Static Address
IP Address              : 10.253.41.127
Subnet Mask             : 255.255.254.0
MAC Address             : 0c:c4:7a:4b:1f:70
SNMP Community String   : public
IP Header               : TTL=0x00 Flags=0x00 Precedence=0x00 TOS=0x00
BMC ARP Control         : ARP Responses Enabled, Gratuitous ARP Disabled
Default Gateway IP      : 10.253.41.254
Default Gateway MAC     : 00:00:0c:07:ac:7b
Backup Gateway IP       : 10.253.41.254
Backup Gateway MAC      : 00:00:00:00:00:00
802.1q VLAN ID          : 8
802.1q VLAN Priority    : 0
RMCP+ Cipher Suites     : 1,2,3,6,7,8,11,12
Cipher Suite Priv Max   : aaaaXXaaaXXaaXX
                        :     X=Cipher Suite Unused
                        :     c=CALLBACK
                        :     u=USER
                        :     o=OPERATOR
                        :     a=ADMIN
                        :     O=OEM

 

 

3. Configure custom access IP and gateway for IPMI

 

[root@linux ~]# ipmitool lan set 1 ipsrc static

 

[root@linux ~]# ipmitool lan set 1 ipaddr 192.168.1.211
Setting LAN IP Address to 192.168.1.211

 

[root@linux ~]# ipmitool lan set 1 netmask 255.255.255.0
Setting LAN Subnet Mask to 255.255.255.0

 

[root@linux ~]# ipmitool lan set 1 defgw ipaddr 192.168.1.254
Setting LAN Default Gateway IP to 192.168.1.254

 

[root@linux ~]# ipmitool lan set 1 defgw macaddr 00:0e:0c:aa:8e:13
Setting LAN Default Gateway MAC to 00:0e:0c:aa:8e:13

 

[root@linux ~]# ipmitool lan set 1 arp respond on
Enabling BMC-generated ARP responses

 

[root@linux ~]# ipmitool lan set 1 auth ADMIN MD5

[root@linux ~]# ipmitool lan set 1 access on

 

4. Getting a list of IPMI existing users

 

# ipmitool user list 1
ID  Name             Callin  Link Auth  IPMI Msg   Channel Priv Limit
2   admin1           false   false      true       ADMINISTRATOR
3   ovh_dontchange   true    false      true       ADMINISTRATOR
4   ro_dontchange    true    true       true       USER
6                    true    true       true       NO ACCESS
7                    true    true       true       NO ACCESS
8                    true    true       true       NO ACCESS
9                    true    true       true       NO ACCESS
10                   true    true       true       NO ACCESS


– To get summary of existing users

# ipmitool user summary
Maximum IDs         : 10
Enabled User Count  : 4
Fixed Name Count    : 2

5. Create new Admin username into IPMI board
 

[root@linux ~]# ipmitool user set name 2 Your-New-Username

 

[root@linux ~]# ipmitool user set password 2
Password for user 2: 
Password for user 2: 

 

[root@linux ~]# ipmitool channel setaccess 1 2 link=on ipmi=on callin=on privilege=4

 

[root@linux ~]# ipmitool user enable 2
[root@linux ~]# 

 

6. Configure non-privilege user into IPMI board

If a user should only be used for querying sensor data, a custom privilege level can be setup for that. This user then has no rights for activating or deactivating the server, for example. A user named monitor will be created for this in the following example:

[root@linux ~]# ipmitool user set name 3 monitor

 

[root@linux ~]# ipmitool user set password 3
Password for user 3: 
Password for user 3: 

 

[root@linux ~]# ipmitool channel setaccess 1 3 link=on ipmi=on callin=on privilege=2

 

[root@linux ~]# ipmitool user enable 3

The importance of the various privilege numbers will be displayed when  ipmitool channel  is called without any additional parameters.

 

 

[root@linux ~]# ipmitool channel
Channel Commands: authcap   <channel number> <max privilege>
                  getaccess <channel number> [user id]
                  setaccess <channel number> <user id> [callin=on|off] [ipmi=on|off] [link=on|off] [privilege=level]
                  info      [channel number]
                  getciphers <ipmi | sol> [channel]

 

Possible privilege levels are:
   1   Callback level
   2   User level
   3   Operator level
   4   Administrator level
   5   OEM Proprietary level
  15   No access
[root@linux ~]# 

The user just created (named 'monitor') has been assigned the USER privilege level. So that LAN access is allowed for this user, you must activate MD5 authentication for LAN access for this user group (USER privilege level).

[root@linux ~]# ipmitool channel getaccess 1 3
Maximum User IDs     : 15
Enabled User IDs     : 2

User ID              : 3
User Name            : monitor
Fixed Name           : No
Access Available     : call-in / callback
Link Authentication  : enabled
IPMI Messaging       : enabled
Privilege Level      : USER

[root@linux ~]# 

 

7. Check server firmware version on a server via IPMI

 

# ipmitool mc info
Device ID                 : 32
Device Revision           : 1
Firmware Revision         : 3.31
IPMI Version              : 2.0
Manufacturer ID           : 10876
Manufacturer Name         : Supermicro
Product ID                : 1579 (0x062b)
Product Name              : Unknown (0x62B)
Device Available          : yes
Provides Device SDRs      : no
Additional Device Support :
    Sensor Device
    SDR Repository Device
    SEL Device
    FRU Inventory Device
    IPMB Event Receiver
    IPMB Event Generator
    Chassis Device


ipmitool mc info is actually an alias for the ipmitool bmc info cmd.

8. Reset IPMI management controller or BMC if hanged

 

As earlier said if for some reason Web GUI access or SSH to IPMI is lost, reset with:

root@linux:/root#  ipmitool mc reset
[ warm | cold ]

 

If you want to stop electricity for a second to IPMI and bring it on use the cold reset (this usually
should be done if warm reset does not work).

 

root@linux:/root# ipmitool mc reset cold

 

otherwise soft / warm is with:

 

ipmitool mc reset warm

 

Sometimes the BMC component of IPMI hangs and only fix to restore access to server Remote board is to reset also BMC

 

root@linux:/root# ipmitool bmc reset cold

 

9. Print hardware system event log

 

root@linux:/root# ipmitool sel info
SEL Information
Version          : 1.5 (v1.5, v2 compliant)
Entries          : 0
Free Space       : 10240 bytes
Percent Used     : 0%
Last Add Time    : Not Available
Last Del Time    : 07/02/2015 17:22:34
Overflow         : false
Supported Cmds   : 'Reserve' 'Get Alloc Info'
# of Alloc Units : 512
Alloc Unit Size  : 20
# Free Units     : 512
Largest Free Blk : 512
Max Record Size  : 20

 

 ipmitool sel list
SEL has no entries

In this particular case the system shows no entres as it was run on a tiny Microtik 1U machine, however usually on most Dell PowerEdge / HP Proliant / Lenovo System X machines this will return plenty of messages.

ipmitool sel elist

ipmitool sel clear

To clear anything if such logged

ipmitool sel clear

 

10.  Print Field Replaceable Units ( FRUs ) on the server 

 

[root@linux ~]# ipmitool fru print
 

 

FRU Device Description : Builtin FRU Device (ID 0)
 Chassis Type          : Other
 Chassis Serial        : KD5V59B
 Chassis Extra         : c3903ebb6237363698cdbae3e991bbed
 Board Mfg Date        : Mon Sep 24 02:00:00 2012
 Board Mfg             : IBM
 Board Product         : System Board
 Board Serial          : XXXXXXXXXXX
 Board Part Number     : 00J6528
 Board Extra           : 00W2671
 Board Extra           : 1400
 Board Extra           : 0000
 Board Extra           : 5000
 Board Extra           : 10

 Product Manufacturer  : IBM
 Product Name          : System x3650 M4
 Product Part Number   : 1955B2G
 Product Serial        : KD7V59K
 Product Asset Tag     :

FRU Device Description : Power Supply 1 (ID 1)
 Board Mfg Date        : Mon Jan  1 01:00:00 1996
 Board Mfg             : ACBE
 Board Product         : IBM Designed Device
 Board Serial          : YK151127R1RN
 Board Part Number     : ZZZZZZZ
 Board Extra           : ZZZZZZ<FF><FF><FF><FF><FF>
 Board Extra           : 0200
 Board Extra           : 00
 Board Extra           : 0080
 Board Extra           : 1

FRU Device Description : Power Supply 2 (ID 2)
 Board Mfg Date        : Mon Jan  1 01:00:00 1996
 Board Mfg             : ACBE
 Board Product         : IBM Designed Device
 Board Serial          : YK131127M1LE
 Board Part Number     : ZZZZZ
 Board Extra           : ZZZZZ<FF><FF><FF><FF><FF>
 Board Extra           : 0200
 Board Extra           : 00
 Board Extra           : 0080
 Board Extra           : 1

FRU Device Description : DASD Backplane 1 (ID 3)
….

 

Worthy to mention here is some cheaper server vendors such as Trendmicro might show no data here (no idea whether this is a protocol incompitability or IPMItool issue).

 

11. Get output about system sensors Temperature / Fan / Power Supply

 

Most newer servers have sensors to track temperature / voltage / fanspeed peripherals temp overall system temp etc.
To get a full list of sensors statistics from IPMI 
 

# ipmitool sensor
CPU Temp         | 29.000     | degrees C  | ok    | 0.000     | 0.000     | 0.000     | 95.000    | 98.000    | 100.000
System Temp      | 40.000     | degrees C  | ok    | -9.000    | -7.000    | -5.000    | 80.000    | 85.000    | 90.000
Peripheral Temp  | 41.000     | degrees C  | ok    | -9.000    | -7.000    | -5.000    | 80.000    | 85.000    | 90.000
PCH Temp         | 56.000     | degrees C  | ok    | -11.000   | -8.000    | -5.000    | 90.000    | 95.000    | 100.000
FAN 1            | na         |            | na    | na        | na        | na        | na        | na        | na
FAN 2            | na         |            | na    | na        | na        | na        | na        | na        | na
FAN 3            | na         |            | na    | na        | na        | na        | na        | na        | na
FAN 4            | na         |            | na    | na        | na        | na        | na        | na        | na
FAN A            | na         |            | na    | na        | na        | na        | na        | na        | na
Vcore            | 0.824      | Volts      | ok    | 0.480     | 0.512     | 0.544     | 1.488     | 1.520     | 1.552
3.3VCC           | 3.296      | Volts      | ok    | 2.816     | 2.880     | 2.944     | 3.584     | 3.648     | 3.712
12V              | 12.137     | Volts      | ok    | 10.494    | 10.600    | 10.706    | 13.091    | 13.197    | 13.303
VDIMM            | 1.496      | Volts      | ok    | 1.152     | 1.216     | 1.280     | 1.760     | 1.776     | 1.792
5VCC             | 4.992      | Volts      | ok    | 4.096     | 4.320     | 4.576     | 5.344     | 5.600     | 5.632
CPU VTT          | 1.008      | Volts      | ok    | 0.872     | 0.896     | 0.920     | 1.344     | 1.368     | 1.392
VBAT             | 3.200      | Volts      | ok    | 2.816     | 2.880     | 2.944     | 3.584     | 3.648     | 3.712
VSB              | 3.328      | Volts      | ok    | 2.816     | 2.880     | 2.944     | 3.584     | 3.648     | 3.712
AVCC             | 3.312      | Volts      | ok    | 2.816     | 2.880     | 2.944     | 3.584     | 3.648     | 3.712
Chassis Intru    | 0x1        | discrete   | 0x0100| na        | na        | na        | na        | na        | na

 

To get only partial sensors data from the SDR (Sensor Data Repositry) entries and readings

 

[root@linux ~]# ipmitool sdr list 

Planar 3.3V      | 3.31 Volts        | ok
Planar 5V        | 5.06 Volts        | ok
Planar 12V       | 12.26 Volts       | ok
Planar VBAT      | 3.14 Volts        | ok
Avg Power        | 80 Watts          | ok
PCH Temp         | 45 degrees C      | ok
Ambient Temp     | 19 degrees C      | ok
PCI Riser 1 Temp | 25 degrees C      | ok
PCI Riser 2 Temp | no reading        | ns
Mezz Card Temp   | no reading        | ns
Fan 1A Tach      | 3071 RPM          | ok
Fan 1B Tach      | 2592 RPM          | ok
Fan 2A Tach      | 3145 RPM          | ok
Fan 2B Tach      | 2624 RPM          | ok
Fan 3A Tach      | 3108 RPM          | ok
Fan 3B Tach      | 2592 RPM          | ok
Fan 4A Tach      | no reading        | ns
Fan 4B Tach      | no reading        | ns
CPU1 VR Temp     | 27 degrees C      | ok
CPU2 VR Temp     | 27 degrees C      | ok
DIMM AB VR Temp  | 24 degrees C      | ok
DIMM CD VR Temp  | 23 degrees C      | ok
DIMM EF VR Temp  | 25 degrees C      | ok
DIMM GH VR Temp  | 24 degrees C      | ok
Host Power       | 0x00              | ok
IPMI Watchdog    | 0x00              | ok

 

[root@linux ~]# ipmitool sdr type Temperature
PCH Temp         | 31h | ok  | 45.1 | 45 degrees C
Ambient Temp     | 32h | ok  | 12.1 | 19 degrees C
PCI Riser 1 Temp | 3Ah | ok  | 16.1 | 25 degrees C
PCI Riser 2 Temp | 3Bh | ns  | 16.2 | No Reading
Mezz Card Temp   | 3Ch | ns  | 44.1 | No Reading
CPU1 VR Temp     | F7h | ok  | 20.1 | 27 degrees C
CPU2 VR Temp     | F8h | ok  | 20.2 | 27 degrees C
DIMM AB VR Temp  | F9h | ok  | 20.3 | 25 degrees C
DIMM CD VR Temp  | FAh | ok  | 20.4 | 23 degrees C
DIMM EF VR Temp  | FBh | ok  | 20.5 | 26 degrees C
DIMM GH VR Temp  | FCh | ok  | 20.6 | 24 degrees C
Ambient Status   | 8Eh | ok  | 12.1 |
CPU 1 OverTemp   | A0h | ok  |  3.1 | Transition to OK
CPU 2 OverTemp   | A1h | ok  |  3.2 | Transition to OK

 

[root@linux ~]# ipmitool sdr type Fan
Fan 1A Tach      | 40h | ok  | 29.1 | 3034 RPM
Fan 1B Tach      | 41h | ok  | 29.1 | 2592 RPM
Fan 2A Tach      | 42h | ok  | 29.2 | 3145 RPM
Fan 2B Tach      | 43h | ok  | 29.2 | 2624 RPM
Fan 3A Tach      | 44h | ok  | 29.3 | 3108 RPM
Fan 3B Tach      | 45h | ok  | 29.3 | 2592 RPM
Fan 4A Tach      | 46h | ns  | 29.4 | No Reading
Fan 4B Tach      | 47h | ns  | 29.4 | No Reading
PS 1 Fan Fault   | 73h | ok  | 10.1 | Transition to OK
PS 2 Fan Fault   | 74h | ok  | 10.2 | Transition to OK

 

[root@linux ~]# ipmitool sdr type ‘Power Supply’
Sensor Type "‘Power" not found.
Sensor Types:
        Temperature               (0x01)   Voltage                   (0x02)
        Current                   (0x03)   Fan                       (0x04)
        Physical Security         (0x05)   Platform Security         (0x06)
        Processor                 (0x07)   Power Supply              (0x08)
        Power Unit                (0x09)   Cooling Device            (0x0a)
        Other                     (0x0b)   Memory                    (0x0c)
        Drive Slot / Bay          (0x0d)   POST Memory Resize        (0x0e)
        System Firmwares          (0x0f)   Event Logging Disabled    (0x10)
        Watchdog1                 (0x11)   System Event              (0x12)
        Critical Interrupt        (0x13)   Button                    (0x14)
        Module / Board            (0x15)   Microcontroller           (0x16)
        Add-in Card               (0x17)   Chassis                   (0x18)
        Chip Set                  (0x19)   Other FRU                 (0x1a)
        Cable / Interconnect      (0x1b)   Terminator                (0x1c)
        System Boot Initiated     (0x1d)   Boot Error                (0x1e)
        OS Boot                   (0x1f)   OS Critical Stop          (0x20)
        Slot / Connector          (0x21)   System ACPI Power State   (0x22)
        Watchdog2                 (0x23)   Platform Alert            (0x24)
        Entity Presence           (0x25)   Monitor ASIC              (0x26)
        LAN                       (0x27)   Management Subsys Health  (0x28)
        Battery                   (0x29)   Session Audit             (0x2a)
        Version Change            (0x2b)   FRU State                 (0x2c)

 

12. Using System Chassis to initiate power on / off / reset / soft shutdown

 

!!!!!  Beware only run this if you know what you're realling doing don't just paste into a production system, If you do so it is your responsibility !!!!! 

–  do a soft-shutdown via acpi 

 

ipmitool [chassis] power soft

 

– issue a hard power off, wait 1s, power on 

 

ipmitool [chassis] power cycle

 

– run a hard power off

 

ipmitool [chassis] power off

 
– do a hard power on 

 

ipmitool [chassis] power on

 

–  issue a hard reset

 

ipmitool [chassis] power reset


– Get system power status
 

ipmitool chassis power status

 

13. Use IPMI (SoL) Serial over Lan to execute commands remotely


Besides using ipmitool locally on server that had its IPMI / ILO / DRAC console disabled it could be used also to query and make server do stuff remotely.

If not loaded you will have to load lanplus kernel module.
 

modprobe lanplus

 

 ipmitool -I lanplus -H 192.168.99.1 -U user -P pass chassis power status

ipmitool -I lanplus -H 192.168.98.1 -U user -P pass chassis power status

ipmitool -I lanplus -H 192.168.98.1 -U user -P pass chassis power reset

ipmitool -I lanplus -H 192.168.98.1 -U user -P pass chassis power reset

ipmitool -I lanplus -H 192.168.98.1 -U user -P pass password sol activate

– Deactivating Sol server capabilities
 

 ipmitool -I lanplus -H 192.168.99.1 -U user -P pass sol deactivate

 

14. Modify boot device order on next boot

 

!!!!! Do not run this except you want to really modify Boot device order, carelessly copy pasting could leave your server unbootable on next boot !!!!!

– Set first boot device to be as BIOS

ipmitool chassis bootdev bios

 

– Set first boot device to be CD Drive

ipmitool chassis bootdev cdrom 

 

– Set first boot device to be via Network Boot PXE protocol

ipmitool chassis bootdev pxe 

 

15. Using ipmitool shell

 

root@iqtestfb:~# ipmitool shell
ipmitool> 
help
Commands:
        raw           Send a RAW IPMI request and print response
        i2c           Send an I2C Master Write-Read command and print response
        spd           Print SPD info from remote I2C device
        lan           Configure LAN Channels
        chassis       Get chassis status and set power state
        power         Shortcut to chassis power commands
        event         Send pre-defined events to MC
        mc            Management Controller status and global enables
        sdr           Print Sensor Data Repository entries and readings
        sensor        Print detailed sensor information
        fru           Print built-in FRU and scan SDR for FRU locators
        gendev        Read/Write Device associated with Generic Device locators sdr
        sel           Print System Event Log (SEL)
        pef           Configure Platform Event Filtering (PEF)
        sol           Configure and connect IPMIv2.0 Serial-over-LAN
        tsol          Configure and connect with Tyan IPMIv1.5 Serial-over-LAN
        isol          Configure IPMIv1.5 Serial-over-LAN
        user          Configure Management Controller users
        channel       Configure Management Controller channels
        session       Print session information
        dcmi          Data Center Management Interface
        sunoem        OEM Commands for Sun servers
        kontronoem    OEM Commands for Kontron devices
        picmg         Run a PICMG/ATCA extended cmd
        fwum          Update IPMC using Kontron OEM Firmware Update Manager
        firewall      Configure Firmware Firewall
        delloem       OEM Commands for Dell systems
        shell         Launch interactive IPMI shell
        exec          Run list of commands from file
        set           Set runtime variable for shell and exec
        hpm           Update HPM components using PICMG HPM.1 file
        ekanalyzer    run FRU-Ekeying analyzer using FRU files
        ime           Update Intel Manageability Engine Firmware
ipmitool>

 

16. Changing BMC / DRAC time setting

 

# ipmitool -H XXX.XXX.XXX.XXX -U root -P pass sel time set "01/21/2011 16:20:44"

 

17. Loading script of IPMI commands

# ipmitool exec /path-to-script/script-with-instructions.txt  

 

Closure

As you saw ipmitool can be used to do plenty of cool things both locally or remotely on a server that had IPMI server interface available. The tool is mega useful in case if ILO console gets hanged as it can be used to reset it.
I explained shortly what is Intelligent Platform Management Interface, how it can be accessed and used on Linux via ipmitool. I went through some of its basic use, how it can be used to print the configured ILO access IP how
this Admin IP and Network configuration can be changed, how to print the IPMI existing users and how to add new Admin and non-privileged users.
Then I've shown how a system hardware and firmware could be shown, how IPMI management BMC could be reset in case if it hanging and how hardware system even logs can be printed (useful in case of hardware failure errors etc.), how to print reports on current system fan / power supply  and temperature. Finally explained how server chassis could be used for soft and cold server reboots locally or via SoL (Serial Over Lan) and how boot order of system could be modified.

ipmitool is a great tool to further automate different sysadmin tasks with shell scrpts for stuff such as tracking servers for a failing hardware and auto-reboot of inacessible failed servers to guarantee Higher Level of availability.
Hope you enjoyed artcle .. It wll be interested to hear of any other known ipmitool scripts or use, if you know such please share it.

How to check who is flooding your Apache, NGinx Webserver – Real time Monitor statistics about IPs doing most URL requests and Stopping DoS attacks with Fail2Ban

Wednesday, August 20th, 2014

check-who-is-flooding-your-apache-nginx-webserver-real-time-monitoring-ips-doing-most-url-requests-to-webserver-and-protecting-your-webserver-with-fail2ban

If you're Linux ystem administrator in Webhosting company providing WordPress / Joomla / Drupal web-sites hosting and your UNIX servers suffer from periodic denial of service attacks, because some of the site customers business is a target of competitor company who is trying to ruin your client business sites through DoS or DDOS attacks, then the best thing you can do is to identify who and how is the Linux server being hammered. If you find out DoS is not on a network level but Apache gets crashing because of memory leaks and connections to Apache are so much that the CPU is being stoned, the best thing to do is to check which IP addresses are causing the excessive GET / POST / HEAD requests in logged.
 

There is the Apachetop tool that can give you the most accessed webserver URLs in a refreshed screen like UNIX top command, however Apachetop does not show which IP does most URL hits on Apache / Nginx webserver. 

 

1. Get basic information on which IPs accesses Apache / Nginx the most using shell cmds


Before examining the Webserver logs it is useful to get a general picture on who is flooding you on a TCP / IP network level, with netstat like so:
 

# here is howto check clients count connected to your server
netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n


If you get an extensive number of connected various IPs / hosts (like 10000 or something huge as a number), depending on the type of hardware the server is running and the previous scaling planned for the system you can determine whether the count as huge as this can be handled normally by server, if like in most cases the server is planned to serve a couple of hundreds or thousands of clients and you get over 10000 connections hanging, then your server is under attack or if its Internet server suddenly your website become famous like someone posted an article on some major website and you suddenly received a tons of hits.


There is a way using standard shell tools, to get some basic information on which IP accesses the webserver the most with:

tail -n 500 /var/log/apache2/access.log | cut -d' ' -f1 | sort | uniq -c | sort -gr

Or if you want to keep it refreshing periodically every few seconds run it through watch command:

watch "tail -n 500 /var/log/apache2/access.log | cut -d' ' -f1 | sort | uniq -c | sort -gr"

monioring-access-hits-to-webserver-by-ip-show-most-visiting-apache-nginx-ip-with-shell-tools-tail-cut-uniq-sort-tools-refreshed-with-watch-cmd


Another useful combination of shell commands is to Monitor POST / GET / HEAD requests number in access.log :
 

 awk '{print $6}' access.log | sort | uniq -c | sort -n

     1 "alihack<%eval
      1 "CONNECT
      1 "fhxeaxb0xeex97x0fxe2-x19Fx87xd1xa0x9axf5x^xd0x125x0fx88x19"x84xc1xb3^v2xe9xpx98`X'dxcd.7ix8fx8fxd6_xcdx834x0c"
      1 "x16x03x01"
      1 "xe2
      2 "mgmanager&file=imgmanager&version=1576&cid=20
      6 "4–"
      7 "PUT
     22 "–"
     22 "OPTIONS
     38 "PROPFIND
   1476 "HEAD
   1539 "-"
  65113 "POST
 537122 "GET


However using shell commands combination is plenty of typing and hard to remember, plus above tools does not show you, approximately how frequenty IP hits the webserver

 

2. Real-time monitoring IP addresses with highest URL reqests with logtop

 


Real-time monitoring on IP addresses with highest URL requests is possible with no need of "console ninja skills"  through – logtop.

 

2.1 Install logtop on Debian / Ubuntu and deb derivatives Linux

 


a) Installing Logtop the debian way

LogTop is easily installable on Debian and Ubuntu in newer releases of Debian – Debian 7.0 and Ubuntu 13/14 Linux it is part of default package repositories and can be straightly apt-get-ed with:

apt-get install –yes logtop

b) Installing Logtop from source code (install on older deb based Linuxes)

On older Debian – Debian 6 and Ubuntu 7-12 servers to install logtop compile from source code – read the README installation instructions or if lazy copy / paste below:

cd /usr/local/src
wget https://github.com/JulienPalard/logtop/tarball/master
mv master JulienPalard-logtop.tar.gz
tar -zxf JulienPalard-logtop.tar.gz

cd JulienPalard-logtop-*/
aptitude install libncurses5-dev uthash-dev

aptitude install python-dev swig

make python-module

python setup.py install

make

make install

 

mkdir -p /usr/bin/
cp logtop /usr/bin/


2.2 Install Logtop on CentOS 6.5 / 7.0 / Fedora / RHEL and rest of RPM based Linux-es

b) Install logtop on CentOS 6.5 and CentOS 7 Linux

– For CentOS 6.5 you need to rpm install epel-release-6-8.noarch.rpm
 

wget http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
rpm -ivh epel-release-6-8.noarch.rpm
links http://dl.fedoraproject.org/pub/epel/6/SRPMS/uthash-1.9.9-6.el6.src.rpm
rpmbuild –rebuild
uthash-1.9.9-6.el6.src.rpm
cd /root/rpmbuild/RPMS/noarch
rpm -ivh uthash-devel-1.9.9-6.el6.noarch.rpm


– For CentOS 7 you need to rpm install epel-release-7-0.2.noarch.rpm

 

links http://download.fedoraproject.org/pub/epel/beta/7/x86_64/repoview/epel-release.html
 

Click on and download epel-release-7-0.2.noarch.rpm

rpm -ivh epel-release-7-0.2.noarch
rpm –import /etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
yum -y install git ncurses-devel uthash-devel
git clone https://github.com/JulienPalard/logtop.git
cd logtop
make
make install

 

2.3 Some Logtop use examples and short explanation

 

logtop shows 4 columns as follows – Line number, Count, Frequency, and Actual line

 

The quickest way to visualize which IP is stoning your Apache / Nginx webserver on Debian?

 

tail -f access.log | awk {'print $1; fflush();'} | logtop

 

 

logtop-check-which-ip-is-making-most-requests-to-your-apache-nginx-webserver-linux-screenshot

On CentOS / RHEL

tail -f /var/log/httpd/access_log | awk {'print $1; fflush();'} | logtop

 

Using LogTop even Squid Proxy caching server access.log can be monitored.
To get squid Top users by IP listed:

 

tail -f /var/log/squid/access.log | awk {'print $1; fflush();'} | logtop


logtop-visualizing-top-users-using-squid-proxy-cache
 

Or you might visualize in real-time squid cache top requested URLs
 

tail -f /var/log/squid/access.log | awk {'print $7; fflush();'} | logtop


visualizing-top-requested-urls-in-squid-proxy-cache-howto-screenshot

 

3. Automatically Filter IP addresses causing Apache / Nginx Webservices Denial of Service with fail2ban
 

Once you identify the problem if the sites hosted on server are target of Distributed DoS, probably your best thing to do is to use fail2ban to automatically filter (ban) IP addresses doing excessive queries to system services. Assuming that you have already installed fail2ban as explained in above link (On Debian / Ubuntu Linux) with:
 

apt-get install –yes fail2ban


To make fail2ban start filtering DoS attack IP addresses, you will have to set the following configurations:
 

vim /etc/fail2ban/jail.conf


Paste in file:
 

[http-get-dos]
 
enabled = true
port = http,https
filter = http-get-dos
logpath = /var/log/apache2/WEB_SERVER-access.log
# maxretry is how many GETs we can have in the findtime period before getting narky
maxretry = 300
# findtime is the time period in seconds in which we're counting "retries" (300 seconds = 5 mins)
findtime = 300
# bantime is how long we should drop incoming GET requests for a given IP for, in this case it's 5 minutes
bantime = 300
action = iptables[name=HTTP, port=http, protocol=tcp]


Before you paste make sure you put the proper logpath = location of webserver (default one is /var/log/apache2/access.log), if you're using multiple logs for each and every of hosted websites, you will probably want to write a script to automatically loop through all logs directory get log file names and automatically add auto-modified version of above [http-get-dos] configuration. Also configure maxtretry per IP, findtime and bantime, in above example values are a bit low and for heavy loaded websites which has to serve thousands of simultaneous connections originating from office networks using Network address translation (NAT), this might be low and tuned to prevent situations, where even the customer of yours can't access there websites 🙂

To finalize fail2ban configuration, you have to create fail2ban filter file:

vim /etc/fail2ban/filters.d/http-get-dos.conf


Paste:
 

# Fail2Ban configuration file
#
# Author: http://www.go2linux.org
#
[Definition]
 
# Option: failregex
# Note: This regex will match any GET entry in your logs, so basically all valid and not valid entries are a match.
# You should set up in the jail.conf file, the maxretry and findtime carefully in order to avoid false positives.
 
failregex = ^<HOST> -.*"(GET|POST).*
 
# Option: ignoreregex
# Notes.: regex to ignore. If this regex matches, the line is ignored.
# Values: TEXT
#
ignoreregex =


To make fail2ban load new created configs restart it:
 

/etc/init.d/fail2ban restart


If you want to test whether it is working you can use Apache webserver Benchmark tools such as ab or siege.
The quickest way to test, whether excessive IP requests get filtered – and make your IP banned temporary:
 

ab -n 1000 -c 20 http://your-web-site-dot-com/

This will make 1000 page loads in 20 concurrent connections and will add your IP to temporary be banned for (300 seconds) = 5 minutes. The ban will be logged in /var/log/fail2ban.log, there you will get smth like:

2014-08-20 10:40:11,943 fail2ban.actions: WARNING [http-get-dos] Ban 192.168.100.5
2013-08-20 10:44:12,341 fail2ban.actions: WARNING [http-get-dos] Unban 192.168.100.5

Howto Fix “sysstat Cannot open /var/log/sysstat/sa no such file or directory” on Debian / Ubuntu Linux

Monday, February 15th, 2016

sysstast-no-such-file-or-directory-fix-Debian-Ubuntu-Linux-howto
I really love sysstat and as a console maniac I tend to install it on every server however by default there is some <b>sysstat</b> tuning once installed to make it work, for those unfamiliar with <i>sysstat</i> I warmly recommend to check, it here is in short the package description:<br /><br />
 

server:~# apt-cache show sysstat|grep -i desc -A 15
Description: system performance tools for Linux
 The sysstat package contains the following system performance tools:
  – sar: collects and reports system activity information;
  – iostat: reports CPU utilization and disk I/O statistics;
  – mpstat: reports global and per-processor statistics;
  – pidstat: reports statistics for Linux tasks (processes);
  – sadf: displays data collected by sar in various formats;
  – nfsiostat: reports I/O statistics for network filesystems;
  – cifsiostat: reports I/O statistics for CIFS filesystems.
 .
 The statistics reported by sar deal with I/O transfer rates,
 paging activity, process-related activities, interrupts,
 network activity, memory and swap space utilization, CPU
 utilization, kernel activities and TTY statistics, among
 others. Both UP and SMP machines are fully supported.
Homepage: http://pagesperso-orange.fr/sebastien.godard/

 

If you happen to install sysstat on a Debian / Ubuntu server with:

server:~# apt-get install –yes sysstat


, and you try to get some statistics with sar command but you get some ugly error output from:

 

server:~# sar Cannot open /var/log/sysstat/sa20: No such file or directory


And you wonder how to resolve it and to be able to have the server log in text databases periodically the nice sar stats load avarages – %idle, %iowait, %system, %nice, %user, then to FIX that Cannot open /var/log/sysstat/sa20: No such file or directory

You need to:

server:~# vim /etc/default/sysstat


By Default value you will find out sysstat stats it is disabled, e.g.:

ENABLED="false"

Switch the value to "true"

ENABLED="true"


Then restart sysstat init script with:

server:~# /etc/init.d/sysstat restart

However for those who prefer to do things from menu Ncurses interfaces and are not familiar with Vi Improved, the easiest way is to run dpkg reconfigure of the sysstat:

server:~# dpkg –reconfigure


sysstat-reconfigure-on-gnu-linux

 

root@server:/# sar
Linux 2.6.32-5-amd64 (pcfreak) 15.02.2016 _x86_64_ (2 CPU)

0,00,01 CPU %user %nice %system %iowait %steal %idle
0,15,01 all 24,32 0,54 3,10 0,62 0,00 71,42
1,15,01 all 18,69 0,53 2,10 0,48 0,00 78,20
10,05,01 all 22,13 0,54 2,81 0,51 0,00 74,01
10,15,01 all 17,14 0,53 2,44 0,40 0,00 79,49
10,25,01 all 24,03 0,63 2,93 0,45 0,00 71,97
10,35,01 all 18,88 0,54 2,44 1,08 0,00 77,07
10,45,01 all 25,60 0,54 3,33 0,74 0,00 69,79
10,55,01 all 36,78 0,78 4,44 0,89 0,00 57,10
16,05,01 all 27,10 0,54 3,43 1,14 0,00 67,79


Well that's it now sysstat error resolved, text reporting stats data works again, Hooray! 🙂