Monitoring Archives - Page 2 of 4 - ☩ Walking in Light with Christ - Faith, Computing, Diary ☩ Walking in Light with Christ

Archive for the ‘Monitoring’ Category

Zabbix rkhunter monitoring check if rootkits trojans and viruses or suspicious OS activities are detected

Wednesday, December 8th, 2021

If you're using rkhunter to monitor for malicious activities, a binary changes, rootkits, viruses, malware, suspicious stuff and other famous security breach possible or actual issues, perhaps you have configured your machines to report to some Email.
But what if you want to have a scheduled rkhunter running on the machine and you don't want to count too much on email alerting (especially because email alerting) makes possible for emails to be tracked by sysadmin pretty late?

We have been in those situation and in this case me and my dear colleague Georgi Stoyanov developed a small rkhunter Zabbix userparameter check to track and Alert if any traces of "Warning"''s are mateched in the traditional rkhunter log file /var/log/rkhunter/rkhunter.log

To set it up and use it is pretty use you will need to have a recent version of zabbix-agent installed on the machine and connected to a Zabbix server, in my case this is:

[root@centos ~]# rpm -qa |grep -i zabbix-agent
zabbix-agent-4.0.7-1.el7.x86_64

placed inside /etc/zabbix/zabbix_agentd.d/userparameter_rkhunter_warning_check.conf

[root@centos /etc/zabbix/zabbix_agentd.d ]# cat userparameter_rkhunter_warning_check.conf
# userparameter script to check if any Warning is inside /var/log/rkhunter/rkhunter.log and if found to trigger Zabbix alert
UserParameter=rkhunter.warning, (TODAY=$(date |awk '{ print $1" "$2" "$3 }'); if [ $(cat /var/log/rkhunter/rkhunter.log | awk “/$TODAY/,EOF” | /bin/grep -i ‘\[ Warning \]’ | /usr/bin/wc -l) != ‘0’ ]; then echo 1; else echo 0; fi)
UserParameter=rkhunter.suspected,(/bin/grep -i 'Suspect files: ' /var/log/rkhunter/rkhunter.log|tail -n 1| awk '{ print $4 }')
UserParameter=rkhunter.rootkits,(/bin/grep -i 'Possible rootkits: ' /var/log/rkhunter/rkhunter.log|tail -n 1| awk '{ print $4 }')

2. Prepare Rkhunter Template, Triggers and Items

In Zabbix Server that you access from web control interface, you will have to prepare a new template called lets say Rkhunter with the necessery Triggers and Items

2.1 Create Rkhunter Items

On Zabbix Server side, uou will have to configure 3 Items for the 3 configured userparameter above script keys, like so:

rkhunter.suspected Item configuration

rkhunter-suspected-files

rkhunter.warning Zabbix Item config

rkhunter-warning-found-check-zabbix

rkhunter.rootkits Zabbix Item config

2.2 Create Triggers

You need to have an overall of 3 triggers like in below shot:

rkhunter.rootkits Trigger config

rkhunter-rootkits-trigger-zabbix1

rkhunter.suspected Trigger cfg

rkhunter warning Trigger cfg

3. Reload zabbix-agent and test the keys

It is necessery to reload zabbix agent for the new userparameter to start to be sent to remote zabbix server (through a proxy if you have one configured).

[root@centos ~]# systemctl restart zabbix-agent
…

To make the zabbix-agent send the keys to the server you can use zabbix_sender to have the test tool you will have to have installed (zabbix-sender) on the server.

To trigger a manualTest if you happen to have some problems with the key which shouldn''t be the case you can sent a value to the respectve key with below command:

[root@centos ~ ]# zabbix_sender -vv -c "/etc/zabbix/zabbix_agentd.conf" -k "khunter.warning" -o "1"

Check on Zabbix Server the sent value is received, for any oddities as usual check what is inside /var/log/zabbix/zabbix_agentd.log for any errors or warnings.

Tags: awk, case, check, inside, key, log, possible, script, test, trojans, use, var, Zabbix Server
Posted in Linux, Monitoring, Zabbix | No Comments »

Install and enable Sysstats IO / DIsk / CPU / Network monitoring console suite on Redhat 8.3, Few sar useful command examples

Tuesday, September 28th, 2021

Why to monitoring CPU, Memory, Hard Disk, Network usage etc. with sysstats tools?

Using system monitoring tools such as Zabbix, Nagios Monit is a good approach, however sometimes due to zabbix server interruptions you might not be able to track certain aspects of system performance on time. Thus it is always a good idea to
Gain more insights on system peroformance from command line. Of course there is cmd tools such as iostat and top, free, vnstat that provides plenty of useful info on system performance issues or bottlenecks. However from my experience to have a better historical data that is systimized and all the time accessible from console it is a great thing to have sysstat package at place. Since many years mostly on every server I administer, I've been using sysstats to monitor what is going on servers over a short time frames and I'm quite happy with it. In current company we're using Redhats and CentOS-es and I had to install sysstats on Redhat 8.3. I've earlier done it multiple times on Debian / Ubuntu Linux and while I've faced on some .deb distributions complications of making sysstat collect statistics I've come with an article on Howto fix sysstat Cannot open /var/log/sysstat/sa no such file or directory” on Debian / Ubuntu Linux

Sysstat contains the following tools related to collecting I/O and CPU statistics:
iostat
Displays an overview of CPU utilization, along with I/O statistics for one or more disk drives.
mpstat
Displays more in-depth CPU statistics.
Sysstat also contains tools that collect system resource utilization data and create daily reports based on that data. These tools are:
sadc
Known as the system activity data collector, sadc collects system resource utilization information and writes it to a file.
sar
Producing reports from the files created by sadc, sar reports can be generated interactively or written to a file for more intensive analysis.

My experience with CentOS 7 and Fedora to install sysstat it was pretty straight forward, I just had to install it via yum install sysstat wait for some time and use sar (System Activity Reporter) tool to report collected system activity info stats over time.
Unfortunately it seems on RedHat 8.3 as well as on CentOS 8.XX instaling sysstats does not work out of the box.

To complete a successful installation of it on RHEL 8.3, I had to:

[root@server ~]# yum install -y sysstat

To make sysstat enabled on the system and make it run, I've enabled it in sysstat

[root@server ~]# systemctl enable sysstat

Running immediately sar command, I've faced the shitty error:

“Cannot open /var/log/sysstat/sa18:
No such file or directory. Please check if data collecting is enabled”

Once installed I've waited for about 5 minutes hoping, that somehow automatically sysstat would manage it but it didn't.

To solve it, I've had to create additionally file /etc/cron.d/sysstat (weirdly RPM's post install instructions does not tell it to automatically create it)

[root@server ~]# vim /etc/cron.d/sysstat

# run system activity accounting tool every 10 minutes
0 * * * * root /usr/lib64/sa/sa1 60 59 &
# generate a daily summary of process accounting at 23:53
53 23 * * * root /usr/lib64/sa/sa2 -A &

/usr/local/lib/sa1 is a shell script that we can use for scheduling cron which will create daily binary log file.
/usr/local/lib/sa2 is a shell script will change binary log file to human-readable form.

[root@server ~]# chmod 600 /etc/cron.d/sysstat

[root@server ~]# systemctl restart sysstat

In a while if sysstat is working correctly you should get produced its data history logs inside /var/log/sa

[root@server ~]# ls -al /var/log/sa

Note that the standard sysstat history files on Debian and other modern .deb based distros such as Debian 10 (in y.2021) is stored under /var/log/sysstat

Here is few useful uses of sysstat cmds

1. Check with sysstat machine history SWAP and RAM Memory use

To lets say check last 10 minutes SWAP memory use:

[hipo@server yum.repos.d] $ sar -W |last -n 10

Linux 4.18.0-240.el8.x86_64 (server) 09/28/2021 _x86_64_ (8 CPU)

12:00:00 AM pswpin/s pswpout/s
12:00:01 AM 0.00 0.00
12:01:01 AM 0.00 0.00
12:02:01 AM 0.00 0.00
12:03:01 AM 0.00 0.00
12:04:01 AM 0.00 0.00
12:05:01 AM 0.00 0.00
12:06:01 AM 0.00 0.00

[root@ccnrlb01 ~]# sar -r | tail -n 10
14:00:01 93008 1788832 95.06 0 1357700 725740 9.02 795168 683484 32
14:10:01 78756 1803084 95.81 0 1358780 725740 9.02 827660 652248 16
14:20:01 92844 1788996 95.07 0 1344332 725740 9.02 813912 651620 28
14:30:01 92408 1789432 95.09 0 1344612 725740 9.02 816392 649544 24
14:40:01 91740 1790100 95.12 0 1344876 725740 9.02 816948 649436 36
14:50:01 91688 1790152 95.13 0 1345144 725740 9.02 817136 649448 36
15:00:02 91544 1790296 95.14 0 1345448 725740 9.02 817472 649448 36
15:10:01 91108 1790732 95.16 0 1345724 725740 9.02 817732 649340 36
15:20:01 90844 1790996 95.17 0 1346000 725740 9.02 818016 649332 28
Average: 93473 1788367 95.03 0 1369583 725074 9.02 800965 671266 29

2. Check system load? Are my processes waiting too long to run on the CPU?

[root@server ~ ]# sar -q |head -n 10
Linux 4.18.0-240.el8.x86_64 (server) 09/28/2021 _x86_64_ (8 CPU)

12:00:00 AM runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15 blocked
12:00:01 AM 0 272 0.00 0.02 0.00 0
12:01:01 AM 1 271 0.00 0.02 0.00 0
12:02:01 AM 0 268 0.00 0.01 0.00 0
12:03:01 AM 0 268 0.00 0.00 0.00 0
12:04:01 AM 1 271 0.00 0.00 0.00 0
12:05:01 AM 1 271 0.00 0.00 0.00 0
12:06:01 AM 1 265 0.00 0.00 0.00 0

3. Show various CPU statistics per CPU use

On a multiprocessor, multi core server sometimes for scripting it is useful to fetch processor per use historic data,
this can be attained with:

[hipo@server ~ ] $ mpstat -P ALL
Linux 4.18.0-240.el8.x86_64 (server) 09/28/2021 _x86_64_ (8 CPU)

06:08:38 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
06:08:38 PM all 0.17 0.02 0.25 0.00 0.05 0.02 0.00 0.00 0.00 99.49
06:08:38 PM 0 0.22 0.02 0.28 0.00 0.06 0.03 0.00 0.00 0.00 99.39
06:08:38 PM 1 0.28 0.02 0.36 0.00 0.08 0.02 0.00 0.00 0.00 99.23
06:08:38 PM 2 0.27 0.02 0.31 0.00 0.06 0.01 0.00 0.00 0.00 99.33
06:08:38 PM 3 0.15 0.02 0.22 0.00 0.03 0.01 0.00 0.00 0.00 99.57
06:08:38 PM 4 0.13 0.02 0.20 0.01 0.03 0.01 0.00 0.00 0.00 99.60
06:08:38 PM 5 0.14 0.02 0.27 0.00 0.04 0.06 0.01 0.00 0.00 99.47
06:08:38 PM 6 0.10 0.02 0.17 0.00 0.04 0.02 0.00 0.00 0.00 99.65
06:08:38 PM 7 0.09 0.02 0.15 0.00 0.02 0.01 0.00 0.00 0.00 99.70

sar-sysstat-cpu-statistics-screenshot

Monitor processes and threads currently being managed by the Linux kernel.

[hipo@server ~ ] $ pidstat

[hipo@server ~ ] $ pidstat -d 2

pidstat-show-processes-with-most-io-activities-linux-screenshot

This report tells us that there is few processes with heave I/O use Filesystem system journalling daemon jbd2, apache, mysqld and supervise, in 3rd column you see their respective PID IDs.

To show threads used inside a process (like if you press SHIFT + H) inside Linux top command:

[hipo@server ~ ] $ pidstat -t -p 10765 1 3

Linux 4.19.0-14-amd64 (server) 28.09.2021 _x86_64_ (10 CPU)

21:41:22 UID TGID TID %usr %system %guest %wait %CPU CPU Command
21:41:23 108 10765 – 1,98 0,99 0,00 0,00 2,97 1 mysqld
21:41:23 108 – 10765 0,00 0,00 0,00 0,00 0,00 1 |__mysqld
21:41:23 108 – 10768 0,00 0,00 0,00 0,00 0,00 0 |__mysqld
21:41:23 108 – 10771 0,00 0,00 0,00 0,00 0,00 5 |__mysqld
21:41:23 108 – 10784 0,00 0,00 0,00 0,00 0,00 7 |__mysqld
21:41:23 108 – 10785 0,00 0,00 0,00 0,00 0,00 6 |__mysqld
21:41:23 108 – 10786 0,00 0,00 0,00 0,00 0,00 2 |__mysqld
…

10765 – is the Process ID whose threads you would like to list

With pidstat, you can further monitor processes for memory leaks with:

[hipo@server ~ ] $ pidstat -r 2

4. Report paging statistics for some old period

[root@server ~ ]# sar -B -f /var/log/sa/sa27 |head -n 10
Linux 4.18.0-240.el8.x86_64 (server) 09/27/2021 _x86_64_ (8 CPU)

15:42:26 LINUX RESTART (8 CPU)

15:55:30 LINUX RESTART (8 CPU)

04:00:01 PM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff
04:01:01 PM 0.00 14.47 629.17 0.00 502.53 0.00 0.00 0.00 0.00
04:02:01 PM 0.00 13.07 553.75 0.00 419.98 0.00 0.00 0.00 0.00
04:03:01 PM 0.00 11.67 548.13 0.00 411.80 0.00 0.00 0.00 0.00

5. Monitor Received RX and Transmitted TX network traffic perl Network interface real time

To print out Received and Send traffic per network interface 4 times in a raw

sar-sysstats-network-traffic-statistics-screenshot

[hipo@server ~ ] $ sar -n DEV 1 4

To continusly monitor all network interfaces I/O traffic

[hipo@server ~ ] $ sar -n DEV 1

To only monitor a certain network interface lets say loopback interface (127.0.0.1) received / transmitted bytes

[hipo@server yum.repos.d] $ sar -n DEV 1 2|grep -i lo
06:29:53 PM lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
06:29:54 PM lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

6. Monitor block devices use

To check block devices use 3 times in a raw

[hipo@server yum.repos.d] $ sar -d 1 3

sar-sysstats-blockdevice-statistics-screenshot

7. Output server monitoring data in CSV database structured format

For preparing a nice graphs with Excel from CSV strucuted file format, you can dump the collected data as so:

[root@server yum.repos.d]# sadf -d /var/log/sa/sa27 — -n DEV | grep -v lo|head -n 10
server-name-fqdn;-1;2021-09-27 13:42:26 UTC;LINUX-RESTART (8 CPU)
# hostname;interval;timestamp;IFACE;rxpck/s;txpck/s;rxkB/s;txkB/s;rxcmp/s;txcmp/s;rxmcst/s;%ifutil
server-name-fqdn;-1;2021-09-27 13:55:30 UTC;LINUX-RESTART (8 CPU)
# hostname;interval;timestamp;IFACE;rxpck/s;txpck/s;rxkB/s;txkB/s;rxcmp/s;txcmp/s;rxmcst/s;%ifutil
server-name-fqdn;60;2021-09-27 14:01:01 UTC;eth1;19.42;16.12;1.94;1.68;0.00;0.00;0.00;0.00
server-name-fqdn;60;2021-09-27 14:01:01 UTC;eth0;7.18;9.65;0.55;0.78;0.00;0.00;0.00;0.00
server-name-fqdn;60;2021-09-27 14:01:01 UTC;eth2;5.65;5.13;0.42;0.39;0.00;0.00;0.00;0.00
server-name-fqdn;60;2021-09-27 14:02:01 UTC;eth1;18.90;15.55;1.89;1.60;0.00;0.00;0.00;0.00
server-name-fqdn;60;2021-09-27 14:02:01 UTC;eth0;7.15;9.63;0.55;0.74;0.00;0.00;0.00;0.00
server-name-fqdn;60;2021-09-27 14:02:01 UTC;eth2;5.67;5.15;0.42;0.39;0.00;0.00;0.00;0.00
…

To graph the output data you can use Excel / LibreOffice's Excel equivalent Calc or if you need to dump a CSV sar output and generate it on the fly from a script use gnuplot

What we've learned?

How to install and enable on cron sysstats on Redhat and CentOS 8 Linux ?
How to continuously monitor CPU / Disk and Network, block devices, paging use and processes and threads used by the kernel per process ?
As well as how to export previously collected data to CSV to import to database or for later use inrder to generate graphic presentation of data.
Cheers ! 🙂

Tags: access, cmds, com, command, console, cron, data, Debian Ubuntu Linux, eth0, eth1, eth2, How to, Install, installation, iowait, Output, period, Redhat, root, servers, sysstat, systemctl, timestamp, use, usr
Posted in Linux, Monitoring, System Administration, Various | No Comments »

Fix Zabbix selinux caused permission issues on CentOS 7 Linux / cannot set resource limit: [13] Permission denied error solution

Tuesday, July 6th, 2021

If you have to install Zabbix client that has to communicate towards Zabbix server via a Zabbix Proxy you might be unpleasently surprised that it cannot cannot be start if the selinux mode is set to Enforcing.
Error message like on below screenshot will be displayed when starting proxy client with systemctl.

zabbix-proxy-cannot-be-started-due-to-selinux-permissions

In the zabbix logs you will see error messages such as:

"cannot set resource limit: [13] Permission denied, CentOS 7"

29085:20160730:062959.263 Starting Zabbix Agent [Test host]. Zabbix 3.0.4 (revision 61185).
29085:20160730:062959.263 **** Enabled features ****
29085:20160730:062959.263 IPv6 support: YES
29085:20160730:062959.263 TLS support: YES
29085:20160730:062959.263 **************************
29085:20160730:062959.263 using configuration file: /etc/zabbix/zabbix_agentd.conf
29085:20160730:062959.263 cannot set resource limit: [13] Permission denied
29085:20160730:062959.263 cannot disable core dump, exiting…

Next step to do is to check whether zabbix is listed in selinux's enabled modules to do so run:

[root@centos ~ ]# semodules -l
…
…..
vhostmd   1.1.0
virt   1.5.0
vlock   1.2.0
vmtools   1.0.0
vmware   2.7.0
vnstatd   1.1.0
vpn   1.16.0
w3c   1.1.0
watchdog   1.8.0
wdmd   1.1.0
webadm   1.2.0
webalizer   1.13.0
wine   1.11.0
wireshark   2.4.0
xen   1.13.0
xguest   1.2.0
xserver   3.9.4
zabbix   1.6.0
zarafa   1.2.0
zebra   1.13.0
zoneminder   1.0.0
zosremote   1.2.0

[root@centos ~ ]# sestatus
# sestatusSELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: targeted
Current mode: enforcing
Mode from config file: enforcing
Policy MLS status: enabled
Policy deny_unknown status: allowed
Max kernel policy version: 28

To get exact zabbix IDs that needs to be added as permissive for Selinux you can use ps -eZ like so:

[root@centos ~ ]# ps -eZ |grep -i zabbix
system_u:system_r:zabbix_agent_t:s0 1149 ? 00:00:00 zabbix_agentd
system_u:system_r:zabbix_agent_t:s0 1150 ? 00:04:28 zabbix_agentd
system_u:system_r:zabbix_agent_t:s0 1151 ? 00:00:00 zabbix_agentd
system_u:system_r:zabbix_agent_t:s0 1152 ? 00:00:00 zabbix_agentd
system_u:system_r:zabbix_agent_t:s0 1153 ? 00:00:00 zabbix_agentd
system_u:system_r:zabbix_agent_t:s0 1154 ? 02:21:46 zabbix_agentd

As you can see zabbix is enabled and hence selinux enforcing mode is preventing zabbix client / server to operate and communicate normally, hence to make it work we need to change zabbix agent and zabbix proxy to permissive mode.

Setting selinux for zabbix agent and zabbix proxy to permissive mode

If you don't have them installed you might neet the setroubleshoot setools, setools-console and policycoreutils-python rpms packs (if you have them installed skip this step).

[root@centos ~ ]# yum install setroubleshoot.x86_64 setools.x86_64 setools-console.x86_64 policycoreutils-python.x86_64

Then to add zabbix service to become permissive either run

[root@centos ~ ]# semanage permissive –add zabbix_t

[root@centos ~ ]# semanage permissive -a zabbix_agent_t

In some cases you might also need in case if just adding the permissive for zabbix_agent_t try also :

setsebool -P zabbix_can_network=1

Next try to start zabbox-proxy and zabbix-agent systemd services

[root@centos ~ ]# systemctl start zabbix-proxy.service
…

[root@centos ~ ]# systemctl start zabbix-agent.service
…

Hopefully all should report fine with the service checking the status should show you something like:

[root@centos ~ ]# systemctl status zabbix-agent
● zabbix-agent.service – Zabbix Agent
Loaded: loaded (/usr/lib/systemd/system/zabbix-agent.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2021-06-24 07:47:42 CEST; 1 weeks 5 days ago
Main PID: 1149 (zabbix_agentd)
CGroup: /system.slice/zabbix-agent.service
├─1149 /usr/sbin/zabbix_agentd -c /etc/zabbix/zabbix_agentd.conf
├─1150 /usr/sbin/zabbix_agentd: collector [idle 1 sec]
├─1151 /usr/sbin/zabbix_agentd: listener #1 [waiting for connection]
├─1152 /usr/sbin/zabbix_agentd: listener #2 [waiting for connection]
├─1153 /usr/sbin/zabbix_agentd: listener #3 [waiting for connection]
└─1154 /usr/sbin/zabbix_agentd: active checks #1 [idle 1 sec]

Check the Logs finally to make sure all is fine with zabbix being allowed by selinux.

[root@centos ~ ]# grep zabbix_proxy /var/log/audit/audit.log
…

[root@centos ~ ]# tail -n 100 /var/log/zabbix/zabbix_agentd.log

If no errors are in and you receive and you can visualize the usual zabbix collected CPU / Memory / Disk etc. values you're good, Enjoy ! 🙂

Tags: active, Fix Zabbix, issues, limit, Mode, sbin, selinux, services, solution, usr
Posted in Linux, Monitoring, System Administration | No Comments »

Add Zabbix time synchronization ntp userparameter check script to Monitor Linux servers

Tuesday, December 8th, 2020

How to add Zabbix time synchronization ntp userparameter check script to Monitor Linux servers?

We needed to set on some servers at my work an elementary check with Zabbix monitoring to check whether servers time is correctly synchronized with ntpd time service as well report if the ntp daemon is correctly running on the machine. For that a userparameter script was developed called userparameter_ntp.conf the script is simplistic and few a lines of bash shell scripting
stuff is based on gresping information required from ntpq and ntpstat common ntp client commands to get information about the status of time synchronization on the servers.

[root@linuxserver ]# ntpstat
synchronised to NTP server (10.80.200.30) at stratum 3
time correct to within 47 ms
polling server every 1024 s

[root@linuxserver ]# ntpq -c peers
     remote           refid      st t when poll reach   delay   offset jitter
==============================================================================
+timeserver1 10.26.239.41     2 u 319 1024 377   15.864    1.270   0.262
+timeserver2 10.82.239.41     2 u 591 1024 377   16.287   -0.334   1.748
*timeserver3 10.82.239.43     2 u   47 1024 377   15.613   -0.553   0.251
timeserver4 .INIT.          16 u    – 1024    0    0.000    0.000   0.000

Below is Zabbix UserParameter script that does report us 3 important values we monitor to make sure time server synchronization works as expected the zabbix keys we set are ntp.offset, ntp.sync, ntp.exact in attempt to describe what we're fetching from ntp client:

[root@linuxserver ]# cat /etc/zabbix/zabbix-agent.d/userparameter_ntp.conf

UserParameter=ntp.offset,(/usr/sbin/ntpq -pn | /usr/bin/awk 'BEGIN { offset=1000 } $1 ~ /\*/ { offset=$9 } END { print offset }')
#UserParameter=ntp.offset,(/usr/sbin/ntpq -pn | /usr/bin/awk 'FNR==4{print $9}')
UserParameter=ntp.sync,(/usr/bin/ntpstat | cut -f 1 -d " " | tr -d ' \t\n\r\f')
UserParameter=ntp.exact,(/usr/bin/ntpstat | /usr/bin/awk 'FNR==2{print $5,$6}')

In Zabbix the monitored ntpd parameters set-upped looks like this:

ntp_time_synchronization_check-zabbix-screenshot.

!Note that in above userparameter example, the commented userparameter script is a just another way to do an ntpd offset returned value which was developed before the more sophisticated with more regular expression checks from the /usr/sbin/ntpd via ntpq, perhaps if you want to extend it you can also use another script to report more verbose information to Zabbix if that is required like ouput from ntpq -c peers command:

UserParameter=ntp.verbose,(/usr/sbin/ntpq -c peers)

Of course to make the Zabbix fetch necessery data from monitored hosts, we need to set-up further new Zabbix Template with the respective Trigger and Items.

Below are few screenshots including the triggers used.

ntpd_server-time_synchronization_check-zabbix-screenshot-triggers

ntpd.trigger

{NTP:net.udp.service[ntp].last(0)}<1

NTP Synchronization trigger

{NTP:ntp.sync.iregexp(unsynchronised)}=1

As you can see from history we have setup our items to Store history of reported data to Zabbix from parameter script for 90 days and update our monitor check, every 30 seconds from the monitored hosts to which Tempate is applied.

Well that's all folks, time synchronization issues we'll be promptly triggering a new Alarm in Zabbix !

Tags: Add Zabbix, checks, command, How to, net, ntpd, sbin, script, servers, time server
Posted in Linux, Monitoring, System Administration, Zabbix | No Comments »

Check server Internet connectivity Speedtest from Linux terminal CLI

Friday, August 7th, 2020

check-server-console-speedtest

If you are a system administrator of a dedicated server and you have no access to Xserver Graphical GNOME / KDE etc. environment and you wonder how you can track the bandwidth connectivity speed of remote system to the internet and you happen to have a modern Linux distribution, here is few ways to do a speedtest.

1. Use speedtest-cli command line tool to test connectivity

speedtest-cli is a tiny tool written in python, to use it hence you need to have python installed on the server.
It is available both for Redhat Linux distros and Debians / Ubuntus etc. in the list of standard installable packages.

a) Install speedtest-cli on Fedora / CentOS / RHEL

On CentOS / RHEL / Scientific Linux lower than ver 8:

$ sudo yum install python

On CentOS 8 / RHEL 8 user type the following command to install Python 3 or 2:

$sudo yum install python3
$ sudo yum install python2

On Fedora Linux version 22+

$ sudo dnf install python
$ sudo dnf install pytho3

Once python is at place download speedtest.py or in case if link is not reachable download mirrored version of speedtest.py on www.pc-freak.net here

$ wget -O speedtest-cli https://raw.githubusercontent.com/sivel/speedtest-cli/master/speedtest.py
$ chmod +x speedtest-cli

Then it is time to run script speedtest-screenshot-linux-terminal-console-cli-cmd
To test enabled Bandwidth on the server

$ python speedtest-cli

b) Install speedtest-cli on Debian

On Latest Debian 10 Buster speedtest is available out of the box in regular .deb repositories, so fetch it with apt

# apt install –yes speedtest-cli
…

You can give now speedtest-cli a try with –bytes arguments to get speed values in bytes instead of bits or if you want to generate an image with test results in picture just like it will appear if you use speedtest.net inside a gui browser, use the –share option

speedtest-screenshot-linux-terminal-console-cli-cmd-options

2. Getting connectivity results of all defined speedtest test City Locations

Speedtest has a list of servers through which a Upload and Download speed is tested, to run speedtest-cli to test with each and every server and get a better picture on what kind of connectivity to expect from your server towards the closest region capital cities, fetch speedtest-servers.php list and use a small shell loop below is how:

root@pcfreak:~# wget http://www.speedtest.net/speedtest-servers.php
–2020-08-07 16:31:34– http://www.speedtest.net/speedtest-servers.php
Преобразувам www.speedtest.net (www.speedtest.net)… 151.101.2.219, 151.101.66.219, 151.101.130.219, …
Connecting to www.speedtest.net (www.speedtest.net)|151.101.2.219|:80… успешно свързване.
HTTP изпратено искане, чакам отговор… 301 Moved Permanently
Адрес: https://www.speedtest.net/speedtest-servers.php [следва]
–2020-08-07 16:31:34– https://www.speedtest.net/speedtest-servers.php
Connecting to www.speedtest.net (www.speedtest.net)|151.101.2.219|:443… успешно свързване.
HTTP изпратено искане, чакам отговор… 307 Temporary Redirect
Адрес: https://c.speedtest.net/speedtest-servers-static.php [следва]
–2020-08-07 16:31:35– https://c.speedtest.net/speedtest-servers-static.php
Преобразувам c.speedtest.net (c.speedtest.net)… 151.101.242.219
Connecting to c.speedtest.net (c.speedtest.net)|151.101.242.219|:443… успешно свързване.
HTTP изпратено искане, чакам отговор… 200 OK
Дължина: 211695 (207K) [text/xml]
Saving to: ‘speedtest-servers.php’ speedtest-servers.php 100%[==========================================================================>] 206,73K –.-KB/s in 0,1s
2020-08-07 16:31:35 (1,75 MB/s) – ‘speedtest-servers.php’ saved [211695/211695]

Once file is there with below loop we extract all file defined servers id="" 's

root@pcfreak:~# for i in $(cat speedtest-servers.php | egrep -Eo 'id="[0-9]{4}"' |sed -e 's#id="##' -e 's#"##g'); do speedtest-cli –server $i; done
Retrieving speedtest.net configuration…
Testing from Vivacom (83.228.93.76)…
Retrieving speedtest.net server list…
Retrieving information for the selected server…
Hosted by Telecoms Ltd. (Varna) [38.88 km]: 25.947 ms
Testing download speed……………………………………………………………………..
Download: 57.71 Mbit/s
Testing upload speed…………………………………………………………………………………………
Upload: 93.85 Mbit/s
Retrieving speedtest.net configuration…
Testing from Vivacom (83.228.93.76)…
Retrieving speedtest.net server list…
Retrieving information for the selected server…
Hosted by GMB Computers (Constanta) [94.03 km]: 80.247 ms
Testing download speed……………………………………………………………………..
Download: 35.86 Mbit/s
Testing upload speed…………………………………………………………………………………………
Upload: 80.15 Mbit/s
Retrieving speedtest.net configuration…
Testing from Vivacom (83.228.93.76)…
…..
…

etc.

For better readability you might want to add the ouput to a file or even put it to run periodically on a cron if you have some suspcion that your server Internet dedicated lines dies out to some general locations sometimes.

3. Testing UPlink speed with Download some big file from source location

In the past a classical way to test the bandwidth connectivity of your Internet Service Provider was to fetch some big file, Linux guys should remember it was almost a standard to roll a download of Linux kernel source .tar file with some test browser as elinks / lynx / w3c.

or if those are not at hand test connectivity on remote free shell servers whatever file downloader as wget or curl was used.
Analogical method is still possible, for example to use wget to get an idea about bandwidtch connectivity, let it roll below 500 mb from speedtest.wdc01.softlayer.com to /dev/null few times:

$ wget –output-document=/dev/null http://speedtest.wdc01.softlayer.com/downloads/test500.zip

$ wget –output-document=/dev/null http://speedtest.wdc01.softlayer.com/downloads/test500.zip

$ wget –output-document=/dev/null http://speedtest.wdc01.softlayer.com/downloads/test500.zip

# wget -O /dev/null –progress=dot:mega http://cachefly.cachefly.net/10mb.test ; date
–2020-08-07 13:56:49– http://cachefly.cachefly.net/10mb.test
Resolving cachefly.cachefly.net (cachefly.cachefly.net)… 205.234.175.175
Connecting to cachefly.cachefly.net (cachefly.cachefly.net)|205.234.175.175|:80… connected.
HTTP request sent, awaiting response… 200 OK
Length: 10485760 (10M) [application/octet-stream]
Saving to: ‘/dev/null’
0K …….. …….. …….. …….. …….. …….. 30% 142M 0s
3072K …….. …….. …….. …….. …….. …….. 60% 179M 0s
6144K …….. …….. …….. …….. …….. …….. 90% 204M 0s
9216K …….. …….. 100% 197M=0.06s
2020-08-07 13:56:50 (173 MB/s) – ‘/dev/null’ saved [10485760/10485760]

Fri 07 Aug 2020 01:56:50 PM UTC

To be sure you have a real picture on remote machine Internet speed it is always a good idea to run download of random big files on a certain locations that are well known to have a very stable Internet bandwidth to the Internet backbone routers.

4. Using Simple shell script to test Internet speed

Fetch and use speedtest.sh

wget https://raw.github.com/blackdotsh/curl-speedtest/master/speedtest.sh && chmod u+x speedtest.sh && bash speedtest.sh

5. Using iperf to test connectivity between two servers

iperf is another good tool worthy to mention that can be used to test the speed between client and server.

To use iperf install it with apt and do on the server machine to which bandwidth will be tested:

# iperf -s

On the client machine do:

# iperf -c 192.168.1.1

where 192.168.1.1 is the IP of the server where iperf was spawned to listen.

6. Using Netflix fast to determine Internet connection speed on host

Fast

fast is a service provided by Netflix. Its web interface is located at Fast.com and it has a command-line interface available through npm (npm is a package manager for nodejs) so if you don't have it you will have to install it first with:

# apt install –yes npm

Note that if you run on Debian this will install you some 249 new nodejs packages which you might not want to have on the system, so this is useful only for machines that has already use of nodejs.

$ fast

82 Mbps ↓

The command returns your Internet download speed. To get your upload speed, use the -u flag:

$ fast -u

⠧ 80 Mbps ↓ / 8.2 Mbps ↑

7. Use speedometer / iftop to measure incoming and outgoing traffic on interface

If you're measuring connectivity on a live production server system, then you might consider that the measurement output might not be exactly correct especially if you're measuring the Uplink / Downlink on a Heavy loaded webserver / Mail Server / Samba or DNS server.
If this is the case a very useful tools to consider to extract the already taken traffic used on your Incoming and Outgoing ( TX / RX ) Network interfaces are speedometer and iftop, they're present and installable depending on the OS via yum / apt or the respective package manager.

To install on Debian server:

# apt install –yes iftop speedometer

The most basic use to check the live received traffic in a nice Ncurses like text graphic is with:

# speedometer -r

speedometer-check-received-transmitted-network-traffic-on-linux1

To generate real time ASCII art graph on RX / TX traffic do:

# speedometer -r eth0 -t eth0

speedometer-check-received-transmitted-network-traffic-on-linux

# iftop -P -i eth0

Tags: about, access, bandwidth, curl, download, elinks, howto, iftop, linux?, Lynx, python, script, server, speedometer, speedtest-cli, test connection, traffic, upload, w3m, wget
Posted in Educational, Linux, Monitoring, Networking, System Administration | No Comments »

Report haproxy node switch script useful for Zabbix or other monitoring

Tuesday, June 9th, 2020

For those who administer corosync clustered haproxy and needs to build monitoring in case if the main configured Haproxy node in the cluster is changed, I've developed a small script to be integrated with zabbix-agent installed to report to a central zabbix server via a zabbix proxy.
The script is very simple it assumed DC1 variable is the default used haproxy node and DC2 and DC3 are 2 backup nodes. The script is made to use crm_mon which is not installed by default on each server by default so if you'll be using it you'll have to install it first, but anyways the script can easily be adapted to use pcs cmd instead.

Below is the bash shell script:

UserParameter=active.dc,f=0; for i in $(sudo /usr/sbin/crm_mon -n -1|grep -i 'Node ' |awk '{ print $2 }'); do ((f++)); DC[$f]="$i"; done; \
DC=$(sudo /usr/sbin/crm_mon -n -1 | grep 'Current DC' | awk '{ print $1 " " $2 " " $3}' | awk '{ print $3 }'); \
if [ “$DC” == “${DC[1]}” ]; then echo “1 Default DC Switched to ${DC[1]}”; elif [ “$DC” == “${DC[2]}” ]; then \
echo "2 Default DC Switched to ${DC[2]}”; elif [ “$DC” == “${DC[3]}” ]; then echo “3 Default DC: ${DC[3]}"; fi

To configure it with zabbix monitoring it can be configured via UserParameterScript.

The way I configured it in Zabbix is as so:

1. Create the userpameter_active_node.conf

Below script is 3 nodes Haproxy cluster

# cat > /etc/zabbix/zabbix_agentd.d/userparameter_active_node.conf

UserParameter=active.dc,f=0; for i in $(sudo /usr/sbin/crm_mon -n -1|grep -i 'Node ' |awk '{ print $2 }'); do ((f++)); DC[$f]="$i"; done; \
DC=$(sudo /usr/sbin/crm_mon -n -1 | grep 'Current DC' | awk '{ print $1 " " $2 " " $3}' | awk '{ print $3 }'); \
if [ “$DC” == “${DC[1]}” ]; then echo “1 Default DC Switched to ${DC[1]}”; elif [ “$DC” == “${DC[2]}” ]; then \
echo "2 Default DC Switched to ${DC[2]}”; elif [ “$DC” == “${DC[3]}” ]; then echo “3 Default DC: ${DC[3]}"; fi

Once pasted to save the file press CTRL + D

The version of the script with 2 nodes slightly improved is like so:

UserParameter=active.dc,f=0; for i in $(sudo /usr/sbin/crm_mon -n -1|grep -i 'Node ' |awk '{ print $2 }' | sed -e 's#:##g'); do DC_ARRAY[$f]=”$i”; ((f++)); done; GET_CURR_DC=$(sudo /usr/sbin/crm_mon -n -1 | grep ‘Current DC’ | awk ‘{ print $1 ” ” $2 ” ” $3}’ | awk ‘{ print $3 }’); if [ “$GET_CURR_DC” == “${DC_ARRAY[0]}” ]; then echo “1 Default DC ${DC_ARRAY[0]}”; fi; if [ “$GET_CURR_DC” == “${DC_ARRAY[1]}” ]; then echo “2 Default Current DC Switched to ${DC_ARRAY[1]} Please check “; fi; if [ -z “$GET_CURR_DC” ] || [ -z “$DC_ARRAY[1]” ]; then printf "Error something might be wrong with HAProxy Cluster on $HOSTNAME "; fi;

The haproxy_active_DC_zabbix.sh script with a bit of more comments as explanations is available here
2. Configure access for /usr/sbin/crm_mon for zabbix user in sudoers

# vim /etc/sudoers

zabbix ALL=NOPASSWD: /usr/sbin/crm_mon

3. Configure in Zabbix for active.dc key Trigger and Item

Tags: access, ALL, and, Anyways, are, available, awk, bash shell, bash shell script, Below, bit, case, cat, Central, check, Cluster, cmd, Comments, conf, configure
Posted in Linux, Monitoring, Zabbix | No Comments »

Monitoring Linux hardware Hard Drives / Temperature and Disk with lm_sensors / smartd / hddtemp and Zabbix Userparameter lm_sensors report script

Thursday, April 30th, 2020

monitoring-linux-hardware-with-software-temperature-disk-cpu-health-zabbix-userparameter-script

I'm part of a SysAdmin Team that is partially doing some minor Zabbix imrovements on a custom corporate installed Zabbix in an ongoing project to substitute the previous HP OpenView monitoring for a bunch of Legacy Linux hosts.
As one of the necessery checks to have is regarding system Hardware, the task was to invent some simplistic way to monitor hardware with the Zabbix Monitoring tool. Monitoring Bare Metal servers hardware of HP / Dell / Fujituse etc. servers in Linux usually is done with a third party software provided by the Hardware vendor. But as this requires an additional services to run and sometimes is not desired. It was interesting to find out some alternative Linux native ways to do the System hardware monitoring.
Monitoring statistics from the system hardware components can be obtained directly from the server components with ipmi / ipmitool (for more info on it check my previous article Reset and Manage intelligent Platform Management remote board article).
With ipmi hardware health info could be received straight from the ILO / IDRAC / HPMI of the server. However as often the Admin-Lan of the server is in a seperate DMZ secured network and available via only a certain set of routed IPs, ipmitool can't be used.

So what are the other options to use to implement Linux Server Hardware Monitoring?

The tools to use are perhaps many but I know of two which gives you most of the information you ever need to have a prelimitary hardware damage warning system before the crash, these are:

1. smartmontools (smartd)

Smartd is part of smartmontools package which contains two utility programs (smartctl and smartd) to control and monitor storage systems using the Self-Monitoring, Analysis and Reporting Technology system (SMART) built into most modern ATA/SATA, SCSI/SAS and NVMe disks.

Disk monitoring is handled by a special service the package provides called smartd that does query the Hard Drives periodically aiming to find a warning signs of hardware failures.
The downside of smartd use is that it implies a little bit of extra load on Hard Drive read / writes and if misconfigured could reduce the the Hard disk life time.

linux:~# /usr/sbin/smartctl -a /dev/sdb2
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-5-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model: KINGSTON SA400S37240G
Serial Number: 50026B768340AA31
LU WWN Device Id: 5 0026b7 68340aa31
Firmware Version: S1Z40102
User Capacity: 240,057,409,536 bytes [240 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-3 T13/2161-D revision 4
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Thu Apr 30 14:05:01 2020 EEST
SMART support is: Available – device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 120) seconds.
Offline data collection
capabilities: (0x11) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0002) Does not save SMART data before
entering power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 10) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0032 100 100 000 Old_age Always – 100
9 Power_On_Hours 0x0032 100 100 000 Old_age Always – 2820
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always – 21
148 Unknown_Attribute 0x0000 100 100 000 Old_age Offline – 0
149 Unknown_Attribute 0x0000 100 100 000 Old_age Offline – 0
167 Unknown_Attribute 0x0000 100 100 000 Old_age Offline – 0
168 Unknown_Attribute 0x0012 100 100 000 Old_age Always – 0
169 Unknown_Attribute 0x0000 100 100 000 Old_age Offline – 0
170 Unknown_Attribute 0x0000 100 100 010 Old_age Offline – 0
172 Unknown_Attribute 0x0032 100 100 000 Old_age Always – 0
173 Unknown_Attribute 0x0000 100 100 000 Old_age Offline – 0
181 Program_Fail_Cnt_Total 0x0032 100 100 000 Old_age Always – 0
182 Erase_Fail_Count_Total 0x0000 100 100 000 Old_age Offline – 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always – 0
192 Power-Off_Retract_Count 0x0012 100 100 000 Old_age Always – 16
194 Temperature_Celsius 0x0022 034 052 000 Old_age Always – 34 (Min/Max 19/52)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always – 0
199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always – 0
218 Unknown_Attribute 0x0032 100 100 000 Old_age Always – 0
231 Temperature_Celsius 0x0000 097 097 000 Old_age Offline – 97
233 Media_Wearout_Indicator 0x0032 100 100 000 Old_age Always – 2104
241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always – 1857
242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always – 1141
244 Unknown_Attribute 0x0000 100 100 000 Old_age Offline – 32
245 Unknown_Attribute 0x0000 100 100 000 Old_age Offline – 107
246 Unknown_Attribute 0x0000 100 100 000 Old_age Offline – 15940

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

Selective Self-tests/Logging not supported

2. hddtemp

Usually if smartd is used it is useful to also use hddtemp which relies on smartd data.
The hddtemp program monitors and reports the temperature of PATA, SATA
or SCSI hard drives by reading Self-Monitoring Analysis and Reporting
Technology (S.M.A.R.T.) information on drives that support this feature.

linux:~# /usr/sbin/hddtemp /dev/sda1
/dev/sda1: Hitachi HDS721050CLA360: 31°C
linux:~# /usr/sbin/hddtemp /dev/sdc6
/dev/sdc6: KINGSTON SV300S37A120G: 25°C
linux:~# /usr/sbin/hddtemp /dev/sdb2
/dev/sdb2: KINGSTON SA400S37240G: 34°C
linux:~# /usr/sbin/hddtemp /dev/sdd1
/dev/sdd1: WD Elements 10B8: S.M.A.R.T. not available

3. lm-sensors / i2c-tools

Lm-sensors is a hardware health monitoring package for Linux. It allows you
to access information from temperature, voltage, and fan speed sensors.
i2c-tools was historically bundled in the same package as lm_sensors but has been seperated cause not all hardware monitoring chips are I²C devices, and not all I²C devices are hardware monitoring chips.

The most basic use of lm-sensors is with the sensors command

linux:~# sensors
i350bb-pci-0600
Adapter: PCI adapter
loc1: +55.0 C (high = +120.0 C, crit = +110.0 C)

coretemp-isa-0000
Adapter: ISA adapter
Physical id 0: +28.0 C (high = +78.0 C, crit = +88.0 C)
Core 0: +26.0 C (high = +78.0 C, crit = +88.0 C)
Core 1: +28.0 C (high = +78.0 C, crit = +88.0 C)
Core 2: +28.0 C (high = +78.0 C, crit = +88.0 C)
Core 3: +28.0 C (high = +78.0 C, crit = +88.0 C)

On CentOS Linux useful tool is also lm_sensors-sensord.x86_64 – A Daemon that periodically logs sensor readings to syslog or a round-robin database, and warns of sensor alarms.

In Debian Linux there is also the psensors-server (an HTTP server providing JSON Web service which can be used by GTK+ Application to remotely monitor sensors) useful for developers
psesors-server

If you have a Xserver installed on the Server accessed with Xclient or via VNC though quite rare,
You can use xsensors or Psensor – a GTK+ (Widget Toolkit for creating Graphical User Interface) application software.

With this 3 tools it is pretty easy to script one liners and use the Zabbix UserParameters functionality to send hardware report data to a Company's Zabbix Sserver, though Zabbix has already some templates to do so in my case, I couldn't import this templates cause I don't have Zabbix Super-Admin credentials, thus to work around that a sample work around is use script to monitor for higher and critical considered temperature.
Here is a tiny sample script I came up in 1 min time it can be used to used as 1 liner UserParameter and built upon something more complex.

SENSORS_HIGH=`sensors | awk '{ print $6 }'| grep '^+' | uniq`;
SENSORS_CRIT=`sensors | awk '{ print $9 }'| grep '^+' | uniq`; ;SENSORS_STAT=`sensors|grep -E 'Core\s' | awk '{ print $1" "$2" "$3 }' | grep "$SENSORS_HIGH|$SENSORS_CRIT"`;
if [ ! -z $SENSORS_STAT ]; then
echo 'Temperature HIGH';
else
echo 'Sensors OK';
fi
Of course there is much more sophisticated stuff to use for monitoring out there

Below script can be easily adapted and use on other Monitoring Platforms such as Nagios / Munin / Cacti / Icinga and there are plenty of paid solutions, but for anyone that wants to develop something from scratch just like me I hope this
article will be a good short introduction.
If you know some other Linux hardware monitoring tools, please share.

Tags: Adapter, around, Auto Offline Data Collection Disabled, awk, CPU, data, developers, Disk, Extended, firmware version, hard drives, hardware, hardware health, health, information, ISA, linux?, Monitoring, Monitoring Linux, nagios, package, pci, script, sensors, Short, software, system hardware, temperature, zabbix, Zabbix Userparameter
Posted in Linux, Monitoring, System Administration | 1 Comment »

ipmitool: Reset and manage IPMI (Intelligent Platform Management Interface) / ILO (Integrated Lights Out) remote board on Linux servers

Friday, December 20th, 2019

As a system administration nomatter whether you manage a bunch of server in a own brew and run Data Center location with some Rack mounted Hardware like PowerEdge M600 / ProLiant DL360e G8 / ProLiant DL360 Gen9 (755258-B21) or you're managing a bunch of Dedicated Servers, you're or will be faced at some point to use the embedded in many Rack mountable rack servers IPMI / ILO interface remote console board management. If IPMI / ILO terms are new for you I suggest you quickly read my earlier article What is IPMI / IPKVM / ILO / DRAC Remote Management interfaces to server .

hp-proliant-bl460c-ILO-Interface-screenshot

HP Proliant BL460 C IPMI (ILO) Web management interface

In short Remote Management Interface is a way that gives you access to the server just like if you had a Monitor and a Keyboard plugged in directly to server.
When a remote computer is down the sysadmin can access it through IPMI and utilize a text console to the boot screen.
The IPMI protocol specification is led by Intel and was first published on September 16, 1998. and currently is supported by more than 200 computer system vendors, such as Cisco, Dell, Hewlett Packard Enterprise, Intel, NEC Corporation, SuperMicro and Tyan and is a standard for remote board management for servers.

IPMI-Block-Diagram-how-ipmi-works-and-its-relation-to-BMC
As you can see from diagram Baseboard Management Controllers (BMCs) is like the heart of IPMI.

Having this ILO / IPMI access is usually via a Web Interface Java interface that gives you the console and usually many of the machines also have an IP address via which a normal SSH command prompt is available giving you ability to execute diagnostic commands to the ILO on the status of attached hardware components of the server / get information about the attached system sensors to get report about things such as:

The System Overall heat
CPU heat temperature
System fan rotation speed cycles
Extract information about the server chassis
Query info about various system peripherals
Configure BIOS or UEFI on a remote system with no monitor / keyboard attached

Having a IPMI (Intelligent Platform Management Interface) firmware embedded into the server Motherboard is essential for system administration because besides this goodies it allows you to remotely Install Operating System to a server without any pre-installed OS right after it is bought and mounted to the planned Data Center Rack nest, just like if you have a plugged Monitor / Keyboard and Mouse and being physically in the remote location.

IPMI is mega useful for system administration also in case of Linux / Windows system updates that requires reboot in which essential System Libraries or binaries are updated and a System reboot is required, because often after system Large bundle updates or Release updates the system fails to boot and you need a way to run a diagnostic stuff from a System rescue Operating System living on a plugged in via a USB stick or CD Drive.
As prior said IPMI remote board is usually accessed and used via some Remote HTTPS encrypted web interface or via Secure Shell crypted session but sometimes the Web server behind the IPMI Web Interface is hanging especially when multiple sysadmins try to access it or due to other stuff and at times due to strange stuff even console SSH access might not be there, thansfully those who run a GNU / Linux Operating system on the Hardware node can use ipmitool tool http://ipmitool.sourceforge.net/ written for Linux that is capable to do a number of useful things with the IPMI management board including a Cold Reset of it so it turns back to working state / adding users / grasping the System hardware and components information health status, changing the Listener address of the IPMI access Interface and even having ability to update the IPMI version firmware.

Prior to be able to access IPMI remotely it has to be enabled usually via a UTP cable connected to the Network from which you expect it to be accesible. The location of the IPMI port on different server vendors is different.

ibm-power9-server-ipmi

IBM Power 9 Server IPMI port

HP IPMI console called ILO (Integrated Lights-Out) Port cabled with yellow cable (usually labelled as
Management Port MGMT)

Supermicro server IPMI Dedicated Lan Port

In this article I'll shortly explain how IPMITool is available and can be installed and used across GNU / Linux Debian / Ubuntu and other deb based Linuxes with apt or on Fedora / CentOS (RPM) based with yum etc.

1. Install IPMITool

– On Debian

# apt-get install –yes ipmitool

– On CentOS

# yum install ipmitool OpenIPMI-tools

# ipmitool -V
ipmitool version 1.8.14

On CentOS ipmitool can run as a service and collect data and do some nice stuff to run it:

[root@linux ~]# chkconfig ipmi on

[root@linux ~]# service ipmi start

Before start using it is worthy to give here short description from ipmitool man page

DESCRIPTION
This program lets you manage Intelligent Platform Management Interface (IPMI) functions of either the local system, via a kernel device driver, or a remote system, using IPMI v1.5 and IPMI v2.0.
These functions include printing FRU information, LAN configuration, sensor readings, and remote chassis power control.

IPMI management of a local system interface requires a compatible IPMI kernel driver to be installed and configured. On Linux this driver is called OpenIPMI and it is included in standard dis‐
tributions. On Solaris this driver is called BMC and is included in Solaris 10. Management of a remote station requires the IPMI-over-LAN interface to be enabled and configured. Depending on
the particular requirements of each system it may be possible to enable the LAN interface using ipmitool over the system interface.

2. Get ADMIN IP configured for access

To get a list of what is the current listener IP with no access to above Web frontend via which IPMI can be accessed (if it is cabled to the Access / Admin LAN port).

# ipmitool lan print 1
Set in Progress : Set Complete
Auth Type Support : NONE MD2 MD5 PASSWORD
Auth Type Enable : Callback : MD2 MD5 PASSWORD
: User : MD2 MD5 PASSWORD
: Operator : MD2 MD5 PASSWORD
: Admin : MD2 MD5 PASSWORD
: OEM :
IP Address Source : Static Address
IP Address : 10.253.41.127
Subnet Mask : 255.255.254.0
MAC Address : 0c:c4:7a:4b:1f:70
SNMP Community String : public
IP Header : TTL=0x00 Flags=0x00 Precedence=0x00 TOS=0x00
BMC ARP Control : ARP Responses Enabled, Gratuitous ARP Disabled
Default Gateway IP : 10.253.41.254
Default Gateway MAC : 00:00:0c:07:ac:7b
Backup Gateway IP : 10.253.41.254
Backup Gateway MAC : 00:00:00:00:00:00
802.1q VLAN ID : 8
802.1q VLAN Priority : 0
RMCP+ Cipher Suites : 1,2,3,6,7,8,11,12
Cipher Suite Priv Max : aaaaXXaaaXXaaXX
: X=Cipher Suite Unused
: c=CALLBACK
: u=USER
: o=OPERATOR
: a=ADMIN
: O=OEM

3. Configure custom access IP and gateway for IPMI

[root@linux ~]# ipmitool lan set 1 ipsrc static

[root@linux ~]# ipmitool lan set 1 ipaddr 192.168.1.211
Setting LAN IP Address to 192.168.1.211

[root@linux ~]# ipmitool lan set 1 netmask 255.255.255.0
Setting LAN Subnet Mask to 255.255.255.0

[root@linux ~]# ipmitool lan set 1 defgw ipaddr 192.168.1.254
Setting LAN Default Gateway IP to 192.168.1.254

[root@linux ~]# ipmitool lan set 1 defgw macaddr 00:0e:0c:aa:8e:13
Setting LAN Default Gateway MAC to 00:0e:0c:aa:8e:13

[root@linux ~]# ipmitool lan set 1 arp respond on
Enabling BMC-generated ARP responses

[root@linux ~]# ipmitool lan set 1 auth ADMIN MD5

[root@linux ~]# ipmitool lan set 1 access on

4. Getting a list of IPMI existing users

# ipmitool user list 1
ID Name Callin Link Auth IPMI Msg Channel Priv Limit
2 admin1 false false true ADMINISTRATOR
3 ovh_dontchange true false true ADMINISTRATOR
4 ro_dontchange true true true USER
6 true true true NO ACCESS
7 true true true NO ACCESS
8 true true true NO ACCESS
9 true true true NO ACCESS
10 true true true NO ACCESS

– To get summary of existing users

# ipmitool user summary
Maximum IDs : 10
Enabled User Count : 4
Fixed Name Count : 2

5. Create new Admin username into IPMI board

[root@linux ~]# ipmitool user set name 2 Your-New-Username

[root@linux ~]# ipmitool user set password 2
Password for user 2:
Password for user 2:

[root@linux ~]# ipmitool channel setaccess 1 2 link=on ipmi=on callin=on privilege=4

[root@linux ~]# ipmitool user enable 2
[root@linux ~]#

6. Configure non-privilege user into IPMI board

If a user should only be used for querying sensor data, a custom privilege level can be setup for that. This user then has no rights for activating or deactivating the server, for example. A user named monitor will be created for this in the following example:

[root@linux ~]# ipmitool user set name 3 monitor

[root@linux ~]# ipmitool user set password 3
Password for user 3:
Password for user 3:

[root@linux ~]# ipmitool channel setaccess 1 3 link=on ipmi=on callin=on privilege=2

[root@linux ~]# ipmitool user enable 3

The importance of the various privilege numbers will be displayed when ipmitool channel is called without any additional parameters.

[root@linux ~]# ipmitool channel
Channel Commands: authcap <channel number> <max privilege>
getaccess <channel number> [user id]
setaccess <channel number> <user id> [callin=on|off] [ipmi=on|off] [link=on|off] [privilege=level]
info [channel number]
getciphers <ipmi | sol> [channel]

Possible privilege levels are:
1 Callback level
2 User level
3 Operator level
4 Administrator level
5 OEM Proprietary level
15 No access
[root@linux ~]#

The user just created (named 'monitor') has been assigned the USER privilege level. So that LAN access is allowed for this user, you must activate MD5 authentication for LAN access for this user group (USER privilege level).

[root@linux ~]# ipmitool channel getaccess 1 3
Maximum User IDs : 15
Enabled User IDs : 2

User ID : 3
User Name : monitor
Fixed Name : No
Access Available : call-in / callback
Link Authentication : enabled
IPMI Messaging : enabled
Privilege Level : USER

[root@linux ~]#

7. Check server firmware version on a server via IPMI

# ipmitool mc info
Device ID : 32
Device Revision : 1
Firmware Revision : 3.31
IPMI Version : 2.0
Manufacturer ID : 10876
Manufacturer Name : Supermicro
Product ID : 1579 (0x062b)
Product Name : Unknown (0x62B)
Device Available : yes
Provides Device SDRs : no
Additional Device Support :
Sensor Device
SDR Repository Device
SEL Device
FRU Inventory Device
IPMB Event Receiver
IPMB Event Generator
Chassis Device

ipmitool mc info is actually an alias for the ipmitool bmc info cmd.

8. Reset IPMI management controller or BMC if hanged

As earlier said if for some reason Web GUI access or SSH to IPMI is lost, reset with:

root@linux:/root# ipmitool mc reset
[ warm | cold ]

If you want to stop electricity for a second to IPMI and bring it on use the cold reset (this usually
should be done if warm reset does not work).

root@linux:/root# ipmitool mc reset cold

otherwise soft / warm is with:

ipmitool mc reset warm

Sometimes the BMC component of IPMI hangs and only fix to restore access to server Remote board is to reset also BMC

root@linux:/root# ipmitool bmc reset cold

9. Print hardware system event log

root@linux:/root# ipmitool sel info
SEL Information
Version : 1.5 (v1.5, v2 compliant)
Entries : 0
Free Space : 10240 bytes
Percent Used : 0%
Last Add Time : Not Available
Last Del Time : 07/02/2015 17:22:34
Overflow : false
Supported Cmds : 'Reserve' 'Get Alloc Info'
# of Alloc Units : 512
Alloc Unit Size : 20
# Free Units : 512
Largest Free Blk : 512
Max Record Size : 20

ipmitool sel list
SEL has no entries

In this particular case the system shows no entres as it was run on a tiny Microtik 1U machine, however usually on most Dell PowerEdge / HP Proliant / Lenovo System X machines this will return plenty of messages.

ipmitool sel elist

ipmitool sel clear

To clear anything if such logged

ipmitool sel clear

10. Print Field Replaceable Units ( FRUs ) on the server

[root@linux ~]# ipmitool fru print

FRU Device Description : Builtin FRU Device (ID 0)
Chassis Type : Other
Chassis Serial : KD5V59B
Chassis Extra : c3903ebb6237363698cdbae3e991bbed
Board Mfg Date : Mon Sep 24 02:00:00 2012
Board Mfg : IBM
Board Product : System Board
Board Serial : XXXXXXXXXXX
Board Part Number : 00J6528
Board Extra : 00W2671
Board Extra : 1400
Board Extra : 0000
Board Extra : 5000
Board Extra : 10
…
Product Manufacturer : IBM
Product Name : System x3650 M4
Product Part Number : 1955B2G
Product Serial : KD7V59K
Product Asset Tag :

FRU Device Description : Power Supply 1 (ID 1)
Board Mfg Date : Mon Jan 1 01:00:00 1996
Board Mfg : ACBE
Board Product : IBM Designed Device
Board Serial : YK151127R1RN
Board Part Number : ZZZZZZZ
Board Extra : ZZZZZZ<FF><FF><FF><FF><FF>
Board Extra : 0200
Board Extra : 00
Board Extra : 0080
Board Extra : 1

FRU Device Description : Power Supply 2 (ID 2)
Board Mfg Date : Mon Jan 1 01:00:00 1996
Board Mfg : ACBE
Board Product : IBM Designed Device
Board Serial : YK131127M1LE
Board Part Number : ZZZZZ
Board Extra : ZZZZZ<FF><FF><FF><FF><FF>
Board Extra : 0200
Board Extra : 00
Board Extra : 0080
Board Extra : 1

FRU Device Description : DASD Backplane 1 (ID 3)
….

Worthy to mention here is some cheaper server vendors such as Trendmicro might show no data here (no idea whether this is a protocol incompitability or IPMItool issue).

11. Get output about system sensors Temperature / Fan / Power Supply

Most newer servers have sensors to track temperature / voltage / fanspeed peripherals temp overall system temp etc.
To get a full list of sensors statistics from IPMI

# ipmitool sensor
CPU Temp | 29.000 | degrees C | ok | 0.000 | 0.000 | 0.000 | 95.000 | 98.000 | 100.000
System Temp | 40.000 | degrees C | ok | -9.000 | -7.000 | -5.000 | 80.000 | 85.000 | 90.000
Peripheral Temp | 41.000 | degrees C | ok | -9.000 | -7.000 | -5.000 | 80.000 | 85.000 | 90.000
PCH Temp | 56.000 | degrees C | ok | -11.000 | -8.000 | -5.000 | 90.000 | 95.000 | 100.000
FAN 1 | na | | na | na | na | na | na | na | na
FAN 2 | na | | na | na | na | na | na | na | na
FAN 3 | na | | na | na | na | na | na | na | na
FAN 4 | na | | na | na | na | na | na | na | na
FAN A | na | | na | na | na | na | na | na | na
Vcore | 0.824 | Volts | ok | 0.480 | 0.512 | 0.544 | 1.488 | 1.520 | 1.552
3.3VCC | 3.296 | Volts | ok | 2.816 | 2.880 | 2.944 | 3.584 | 3.648 | 3.712
12V | 12.137 | Volts | ok | 10.494 | 10.600 | 10.706 | 13.091 | 13.197 | 13.303
VDIMM | 1.496 | Volts | ok | 1.152 | 1.216 | 1.280 | 1.760 | 1.776 | 1.792
5VCC | 4.992 | Volts | ok | 4.096 | 4.320 | 4.576 | 5.344 | 5.600 | 5.632
CPU VTT | 1.008 | Volts | ok | 0.872 | 0.896 | 0.920 | 1.344 | 1.368 | 1.392
VBAT | 3.200 | Volts | ok | 2.816 | 2.880 | 2.944 | 3.584 | 3.648 | 3.712
VSB | 3.328 | Volts | ok | 2.816 | 2.880 | 2.944 | 3.584 | 3.648 | 3.712
AVCC | 3.312 | Volts | ok | 2.816 | 2.880 | 2.944 | 3.584 | 3.648 | 3.712
Chassis Intru | 0x1 | discrete | 0x0100| na | na | na | na | na | na

To get only partial sensors data from the SDR (Sensor Data Repositry) entries and readings

[root@linux ~]# ipmitool sdr list

Planar 3.3V | 3.31 Volts | ok
Planar 5V | 5.06 Volts | ok
Planar 12V | 12.26 Volts | ok
Planar VBAT | 3.14 Volts | ok
Avg Power | 80 Watts | ok
PCH Temp | 45 degrees C | ok
Ambient Temp | 19 degrees C | ok
PCI Riser 1 Temp | 25 degrees C | ok
PCI Riser 2 Temp | no reading | ns
Mezz Card Temp | no reading | ns
Fan 1A Tach | 3071 RPM | ok
Fan 1B Tach | 2592 RPM | ok
Fan 2A Tach | 3145 RPM | ok
Fan 2B Tach | 2624 RPM | ok
Fan 3A Tach | 3108 RPM | ok
Fan 3B Tach | 2592 RPM | ok
Fan 4A Tach | no reading | ns
Fan 4B Tach | no reading | ns
CPU1 VR Temp | 27 degrees C | ok
CPU2 VR Temp | 27 degrees C | ok
DIMM AB VR Temp | 24 degrees C | ok
DIMM CD VR Temp | 23 degrees C | ok
DIMM EF VR Temp | 25 degrees C | ok
DIMM GH VR Temp | 24 degrees C | ok
Host Power | 0x00 | ok
IPMI Watchdog | 0x00 | ok

[root@linux ~]# ipmitool sdr type Temperature
PCH Temp | 31h | ok | 45.1 | 45 degrees C
Ambient Temp | 32h | ok | 12.1 | 19 degrees C
PCI Riser 1 Temp | 3Ah | ok | 16.1 | 25 degrees C
PCI Riser 2 Temp | 3Bh | ns | 16.2 | No Reading
Mezz Card Temp | 3Ch | ns | 44.1 | No Reading
CPU1 VR Temp | F7h | ok | 20.1 | 27 degrees C
CPU2 VR Temp | F8h | ok | 20.2 | 27 degrees C
DIMM AB VR Temp | F9h | ok | 20.3 | 25 degrees C
DIMM CD VR Temp | FAh | ok | 20.4 | 23 degrees C
DIMM EF VR Temp | FBh | ok | 20.5 | 26 degrees C
DIMM GH VR Temp | FCh | ok | 20.6 | 24 degrees C
Ambient Status | 8Eh | ok | 12.1 |
CPU 1 OverTemp | A0h | ok | 3.1 | Transition to OK
CPU 2 OverTemp | A1h | ok | 3.2 | Transition to OK

[root@linux ~]# ipmitool sdr type Fan
Fan 1A Tach | 40h | ok | 29.1 | 3034 RPM
Fan 1B Tach | 41h | ok | 29.1 | 2592 RPM
Fan 2A Tach | 42h | ok | 29.2 | 3145 RPM
Fan 2B Tach | 43h | ok | 29.2 | 2624 RPM
Fan 3A Tach | 44h | ok | 29.3 | 3108 RPM
Fan 3B Tach | 45h | ok | 29.3 | 2592 RPM
Fan 4A Tach | 46h | ns | 29.4 | No Reading
Fan 4B Tach | 47h | ns | 29.4 | No Reading
PS 1 Fan Fault | 73h | ok | 10.1 | Transition to OK
PS 2 Fan Fault | 74h | ok | 10.2 | Transition to OK

[root@linux ~]# ipmitool sdr type ‘Power Supply’
Sensor Type "‘Power" not found.
Sensor Types:
Temperature (0x01) Voltage (0x02)
Current (0x03) Fan (0x04)
Physical Security (0x05) Platform Security (0x06)
Processor (0x07) Power Supply (0x08)
Power Unit (0x09) Cooling Device (0x0a)
Other (0x0b) Memory (0x0c)
Drive Slot / Bay (0x0d) POST Memory Resize (0x0e)
System Firmwares (0x0f) Event Logging Disabled (0x10)
Watchdog1 (0x11) System Event (0x12)
Critical Interrupt (0x13) Button (0x14)
Module / Board (0x15) Microcontroller (0x16)
Add-in Card (0x17) Chassis (0x18)
Chip Set (0x19) Other FRU (0x1a)
Cable / Interconnect (0x1b) Terminator (0x1c)
System Boot Initiated (0x1d) Boot Error (0x1e)
OS Boot (0x1f) OS Critical Stop (0x20)
Slot / Connector (0x21) System ACPI Power State (0x22)
Watchdog2 (0x23) Platform Alert (0x24)
Entity Presence (0x25) Monitor ASIC (0x26)
LAN (0x27) Management Subsys Health (0x28)
Battery (0x29) Session Audit (0x2a)
Version Change (0x2b) FRU State (0x2c)

12. Using System Chassis to initiate power on / off / reset / soft shutdown

!!!!! Beware only run this if you know what you're realling doing don't just paste into a production system, If you do so it is your responsibility !!!!!

– do a soft-shutdown via acpi

ipmitool [chassis] power soft

– issue a hard power off, wait 1s, power on

ipmitool [chassis] power cycle

– run a hard power off

ipmitool [chassis] power off

– do a hard power on

ipmitool [chassis] power on

– issue a hard reset

ipmitool [chassis] power reset

– Get system power status

ipmitool chassis power status

13. Use IPMI (SoL) Serial over Lan to execute commands remotely

Besides using ipmitool locally on server that had its IPMI / ILO / DRAC console disabled it could be used also to query and make server do stuff remotely.

If not loaded you will have to load lanplus kernel module.

modprobe lanplus

ipmitool -I lanplus -H 192.168.99.1 -U user -P pass chassis power status

ipmitool -I lanplus -H 192.168.98.1 -U user -P pass chassis power status

ipmitool -I lanplus -H 192.168.98.1 -U user -P pass chassis power reset

ipmitool -I lanplus -H 192.168.98.1 -U user -P pass password sol activate

– Deactivating Sol server capabilities

ipmitool -I lanplus -H 192.168.99.1 -U user -P pass sol deactivate

14. Modify boot device order on next boot

!!!!! Do not run this except you want to really modify Boot device order, carelessly copy pasting could leave your server unbootable on next boot !!!!!

– Set first boot device to be as BIOS

ipmitool chassis bootdev bios

– Set first boot device to be CD Drive

ipmitool chassis bootdev cdrom

– Set first boot device to be via Network Boot PXE protocol

ipmitool chassis bootdev pxe

15. Using ipmitool shell

root@iqtestfb:~# ipmitool shell
ipmitool> help
Commands:
raw Send a RAW IPMI request and print response
i2c Send an I2C Master Write-Read command and print response
spd Print SPD info from remote I2C device
lan Configure LAN Channels
chassis Get chassis status and set power state
power Shortcut to chassis power commands
event Send pre-defined events to MC
mc Management Controller status and global enables
sdr Print Sensor Data Repository entries and readings
sensor Print detailed sensor information
fru Print built-in FRU and scan SDR for FRU locators
gendev Read/Write Device associated with Generic Device locators sdr
sel Print System Event Log (SEL)
pef Configure Platform Event Filtering (PEF)
sol Configure and connect IPMIv2.0 Serial-over-LAN
tsol Configure and connect with Tyan IPMIv1.5 Serial-over-LAN
isol Configure IPMIv1.5 Serial-over-LAN
user Configure Management Controller users
channel Configure Management Controller channels
session Print session information
dcmi Data Center Management Interface
sunoem OEM Commands for Sun servers
kontronoem OEM Commands for Kontron devices
picmg Run a PICMG/ATCA extended cmd
fwum Update IPMC using Kontron OEM Firmware Update Manager
firewall Configure Firmware Firewall
delloem OEM Commands for Dell systems
shell Launch interactive IPMI shell
exec Run list of commands from file
set Set runtime variable for shell and exec
hpm Update HPM components using PICMG HPM.1 file
ekanalyzer run FRU-Ekeying analyzer using FRU files
ime Update Intel Manageability Engine Firmware
ipmitool>

16. Changing BMC / DRAC time setting

# ipmitool -H XXX.XXX.XXX.XXX -U root -P pass sel time set "01/21/2011 16:20:44"

17. Loading script of IPMI commands

# ipmitool exec /path-to-script/script-with-instructions.txt

Closure

As you saw ipmitool can be used to do plenty of cool things both locally or remotely on a server that had IPMI server interface available. The tool is mega useful in case if ILO console gets hanged as it can be used to reset it.
I explained shortly what is Intelligent Platform Management Interface, how it can be accessed and used on Linux via ipmitool. I went through some of its basic use, how it can be used to print the configured ILO access IP how
this Admin IP and Network configuration can be changed, how to print the IPMI existing users and how to add new Admin and non-privileged users.
Then I've shown how a system hardware and firmware could be shown, how IPMI management BMC could be reset in case if it hanging and how hardware system even logs can be printed (useful in case of hardware failure errors etc.), how to print reports on current system fan / power supply and temperature. Finally explained how server chassis could be used for soft and cold server reboots locally or via SoL (Serial Over Lan) and how boot order of system could be modified.

ipmitool is a great tool to further automate different sysadmin tasks with shell scrpts for stuff such as tracking servers for a failing hardware and auto-reboot of inacessible failed servers to guarantee Higher Level of availability.
Hope you enjoyed artcle .. It wll be interested to hear of any other known ipmitool scripts or use, if you know such please share it.

Tags: ACBE, bios, bmc management, Channel Commands, clear, Commands, Event Logging Disabled, existing users, Fan Fault, firmware version, gettng ilo sensors info, hardware info, ibm, Install Operating System, Intelligent Platform Management Interface, ipmi getting failed fru, logged, manage ipmi linux with ipmitool, Modify, power supply, Print, restart hanged ipmi, root linux, Sensor Types, short description, shutdown, User Name ipmi
Posted in Linux, Monitoring, Remote System Administration, System Administration | 2 Comments »

How to check who is flooding your Apache, NGinx Webserver – Real time Monitor statistics about IPs doing most URL requests and Stopping DoS attacks with Fail2Ban

Wednesday, August 20th, 2014

check-who-is-flooding-your-apache-nginx-webserver-real-time-monitoring-ips-doing-most-url-requests-to-webserver-and-protecting-your-webserver-with-fail2ban

If you're Linux ystem administrator in Webhosting company providing WordPress / Joomla / Drupal web-sites hosting and your UNIX servers suffer from periodic denial of service attacks, because some of the site customers business is a target of competitor company who is trying to ruin your client business sites through DoS or DDOS attacks, then the best thing you can do is to identify who and how is the Linux server being hammered. If you find out DoS is not on a network level but Apache gets crashing because of memory leaks and connections to Apache are so much that the CPU is being stoned, the best thing to do is to check which IP addresses are causing the excessive GET / POST / HEAD requests in logged.

There is the Apachetop tool that can give you the most accessed webserver URLs in a refreshed screen like UNIX top command, however Apachetop does not show which IP does most URL hits on Apache / Nginx webserver.

1. Get basic information on which IPs accesses Apache / Nginx the most using shell cmds

Before examining the Webserver logs it is useful to get a general picture on who is flooding you on a TCP / IP network level, with netstat like so:

# here is howto check clients count connected to your server
netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n

If you get an extensive number of connected various IPs / hosts (like 10000 or something huge as a number), depending on the type of hardware the server is running and the previous scaling planned for the system you can determine whether the count as huge as this can be handled normally by server, if like in most cases the server is planned to serve a couple of hundreds or thousands of clients and you get over 10000 connections hanging, then your server is under attack or if its Internet server suddenly your website become famous like someone posted an article on some major website and you suddenly received a tons of hits.

There is a way using standard shell tools, to get some basic information on which IP accesses the webserver the most with:

tail -n 500 /var/log/apache2/access.log | cut -d' ' -f1 | sort | uniq -c | sort -gr

Or if you want to keep it refreshing periodically every few seconds run it through watch command:

watch "tail -n 500 /var/log/apache2/access.log | cut -d' ' -f1 | sort | uniq -c | sort -gr"

monioring-access-hits-to-webserver-by-ip-show-most-visiting-apache-nginx-ip-with-shell-tools-tail-cut-uniq-sort-tools-refreshed-with-watch-cmd

Another useful combination of shell commands is to Monitor POST / GET / HEAD requests number in access.log :

awk '{print $6}' access.log | sort | uniq -c | sort -n

1 "alihack<%eval
 1 "CONNECT
 1 "fhxeaxb0xeex97x0fxe2-x19Fx87xd1xa0x9axf5x^xd0x125x0fx88x19"x84xc1xb3^v2xe9xpx98`X'dxcd.7ix8fx8fxd6_xcdx834x0c"
 1 "x16x03x01"
 1 "xe2
 2 "mgmanager&file=imgmanager&version=1576&cid=20
 6 "4–"
 7 "PUT
 22 "–"
 22 "OPTIONS
 38 "PROPFIND
 1476 "HEAD
 1539 "-"
65113 "POST
537122 "GET

However using shell commands combination is plenty of typing and hard to remember, plus above tools does not show you, approximately how frequenty IP hits the webserver

2. Real-time monitoring IP addresses with highest URL reqests with logtop

Real-time monitoring on IP addresses with highest URL requests is possible with no need of "console ninja skills" through – logtop.

2.1 Install logtop on Debian / Ubuntu and deb derivatives Linux

a) Installing Logtop the debian way

LogTop is easily installable on Debian and Ubuntu in newer releases of Debian – Debian 7.0 and Ubuntu 13/14 Linux it is part of default package repositories and can be straightly apt-get-ed with:

apt-get install –yes logtop

b) Installing Logtop from source code (install on older deb based Linuxes)

On older Debian – Debian 6 and Ubuntu 7-12 servers to install logtop compile from source code – read the README installation instructions or if lazy copy / paste below:

cd /usr/local/src
wget https://github.com/JulienPalard/logtop/tarball/master
mv master JulienPalard-logtop.tar.gz
tar -zxf JulienPalard-logtop.tar.gz
cd JulienPalard-logtop-*/
aptitude install libncurses5-dev uthash-dev
…
aptitude install python-dev swig
…

make python-module
…

python setup.py install
…

make
…

make install

mkdir -p /usr/bin/
cp logtop /usr/bin/

2.2 Install Logtop on CentOS 6.5 / 7.0 / Fedora / RHEL and rest of RPM based Linux-es

b) Install logtop on CentOS 6.5 and CentOS 7 Linux

– For CentOS 6.5 you need to rpm install epel-release-6-8.noarch.rpm

wget http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
rpm -ivh epel-release-6-8.noarch.rpm
links http://dl.fedoraproject.org/pub/epel/6/SRPMS/uthash-1.9.9-6.el6.src.rpm
rpmbuild –rebuild uthash-1.9.9-6.el6.src.rpm
cd /root/rpmbuild/RPMS/noarch
rpm -ivh uthash-devel-1.9.9-6.el6.noarch.rpm

– For CentOS 7 you need to rpm install epel-release-7-0.2.noarch.rpm

links http://download.fedoraproject.org/pub/epel/beta/7/x86_64/repoview/epel-release.html

Click on and download epel-release-7-0.2.noarch.rpm

rpm -ivh epel-release-7-0.2.noarch
rpm –import /etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
yum -y install git ncurses-devel uthash-devel
git clone https://github.com/JulienPalard/logtop.git
cd logtop
make
make install

2.3 Some Logtop use examples and short explanation

logtop shows 4 columns as follows – Line number, Count, Frequency, and Actual line

The quickest way to visualize which IP is stoning your Apache / Nginx webserver on Debian?

tail -f access.log | awk {'print $1; fflush();'} | logtop

logtop-check-which-ip-is-making-most-requests-to-your-apache-nginx-webserver-linux-screenshot

On CentOS / RHEL

tail -f /var/log/httpd/access_log | awk {'print $1; fflush();'} | logtop

Using LogTop even Squid Proxy caching server access.log can be monitored.
To get squid Top users by IP listed:

tail -f /var/log/squid/access.log | awk {'print $1; fflush();'} | logtop

logtop-visualizing-top-users-using-squid-proxy-cache

Or you might visualize in real-time squid cache top requested URLs

tail -f /var/log/squid/access.log | awk {'print $7; fflush();'} | logtop

visualizing-top-requested-urls-in-squid-proxy-cache-howto-screenshot

3. Automatically Filter IP addresses causing Apache / Nginx Webservices Denial of Service with fail2ban

Once you identify the problem if the sites hosted on server are target of Distributed DoS, probably your best thing to do is to use fail2ban to automatically filter (ban) IP addresses doing excessive queries to system services. Assuming that you have already installed fail2ban as explained in above link (On Debian / Ubuntu Linux) with:

apt-get install –yes fail2ban

To make fail2ban start filtering DoS attack IP addresses, you will have to set the following configurations:

vim /etc/fail2ban/jail.conf

Paste in file:

[http-get-dos]

enabled = true
port = http,https
filter = http-get-dos
logpath = /var/log/apache2/WEB_SERVER-access.log
# maxretry is how many GETs we can have in the findtime period before getting narky
maxretry = 300
# findtime is the time period in seconds in which we're counting "retries" (300 seconds = 5 mins)
findtime = 300
# bantime is how long we should drop incoming GET requests for a given IP for, in this case it's 5 minutes
bantime = 300
action = iptables[name=HTTP, port=http, protocol=tcp]

Before you paste make sure you put the proper logpath = location of webserver (default one is /var/log/apache2/access.log), if you're using multiple logs for each and every of hosted websites, you will probably want to write a script to automatically loop through all logs directory get log file names and automatically add auto-modified version of above [http-get-dos] configuration. Also configure maxtretry per IP, findtime and bantime, in above example values are a bit low and for heavy loaded websites which has to serve thousands of simultaneous connections originating from office networks using Network address translation (NAT), this might be low and tuned to prevent situations, where even the customer of yours can't access there websites 🙂

To finalize fail2ban configuration, you have to create fail2ban filter file:

vim /etc/fail2ban/filters.d/http-get-dos.conf

Paste:

# Fail2Ban configuration file
#
# Author: http://www.go2linux.org
#
[Definition]

# Option: failregex
# Note: This regex will match any GET entry in your logs, so basically all valid and not valid entries are a match.
# You should set up in the jail.conf file, the maxretry and findtime carefully in order to avoid false positives.

failregex = ^<HOST> -.*"(GET|POST).*

# Option: ignoreregex
# Notes.: regex to ignore. If this regex matches, the line is ignored.
# Values: TEXT
#
ignoreregex =

To make fail2ban load new created configs restart it:

/etc/init.d/fail2ban restart

If you want to test whether it is working you can use Apache webserver Benchmark tools such as ab or siege.
The quickest way to test, whether excessive IP requests get filtered – and make your IP banned temporary:

ab -n 1000 -c 20 http://your-web-site-dot-com/

This will make 1000 page loads in 20 concurrent connections and will add your IP to temporary be banned for (300 seconds) = 5 minutes. The ban will be logged in /var/log/fail2ban.log, there you will get smth like:

2014-08-20 10:40:11,943 fail2ban.actions: WARNING [http-get-dos] Ban 192.168.100.5
2013-08-20 10:44:12,341 fail2ban.actions: WARNING [http-get-dos] Unban 192.168.100.5

Tags: check, configure, Debian Ubuntu, fail2ban stopping apache dos attacks, How to, how to view get post head requests, install logtop centos rhel fedora, install logtop debian ubuntu, install logtop from source, Installing Logtop, logtop, make, monitor apache log real-time, monitoring top ip requests apache log, nginx monitor top url requests, noarch, script, squid proxy log monitoring logtop, SRPMS, text, url, var
Posted in Computer Security, Everyday Life, Monitoring, System Administration, Various, Web and CMS | 2 Comments »

Howto Fix “sysstat Cannot open /var/log/sysstat/sa no such file or directory” on Debian / Ubuntu Linux

Monday, February 15th, 2016

sysstast-no-such-file-or-directory-fix-Debian-Ubuntu-Linux-howto
I really love sysstat and as a console maniac I tend to install it on every server however by default there is some sysstat tuning once installed to make it work, for those unfamiliar with sysstat I warmly recommend to check, it here is in short the package description:

server:~# apt-cache show sysstat|grep -i desc -A 15
Description: system performance tools for Linux
The sysstat package contains the following system performance tools:
– sar: collects and reports system activity information;
– iostat: reports CPU utilization and disk I/O statistics;
– mpstat: reports global and per-processor statistics;
– pidstat: reports statistics for Linux tasks (processes);
– sadf: displays data collected by sar in various formats;
– nfsiostat: reports I/O statistics for network filesystems;
– cifsiostat: reports I/O statistics for CIFS filesystems.
.
The statistics reported by sar deal with I/O transfer rates,
paging activity, process-related activities, interrupts,
network activity, memory and swap space utilization, CPU
utilization, kernel activities and TTY statistics, among
others. Both UP and SMP machines are fully supported.
Homepage: http://pagesperso-orange.fr/sebastien.godard/

If you happen to install sysstat on a Debian / Ubuntu server with:

server:~# apt-get install –yes sysstat

, and you try to get some statistics with sar command but you get some ugly error output from:

server:~# sar Cannot open /var/log/sysstat/sa20: No such file or directory

And you wonder how to resolve it and to be able to have the server log in text databases periodically the nice sar stats load avarages – %idle, %iowait, %system, %nice, %user, then to FIX that Cannot open /var/log/sysstat/sa20: No such file or directory

You need to:

server:~# vim /etc/default/sysstat

By Default value you will find out sysstat stats it is disabled, e.g.:

ENABLED="false"

Switch the value to "true"

ENABLED="true"

Then restart sysstat init script with:

server:~# /etc/init.d/sysstat restart

However for those who prefer to do things from menu Ncurses interfaces and are not familiar with Vi Improved, the easiest way is to run dpkg reconfigure of the sysstat:

server:~# dpkg –reconfigure

sysstat-reconfigure-on-gnu-linux

root@server:/# sar
Linux 2.6.32-5-amd64 (pcfreak) 15.02.2016 _x86_64_ (2 CPU)

0,00,01 CPU %user %nice %system %iowait %steal %idle
0,15,01 all 24,32 0,54 3,10 0,62 0,00 71,42
1,15,01 all 18,69 0,53 2,10 0,48 0,00 78,20
10,05,01 all 22,13 0,54 2,81 0,51 0,00 74,01
10,15,01 all 17,14 0,53 2,44 0,40 0,00 79,49
10,25,01 all 24,03 0,63 2,93 0,45 0,00 71,97
10,35,01 all 18,88 0,54 2,44 1,08 0,00 77,07
10,45,01 all 25,60 0,54 3,33 0,74 0,00 69,79
10,55,01 all 36,78 0,78 4,44 0,89 0,00 57,10
16,05,01 all 27,10 0,54 3,43 1,14 0,00 67,79

Well that's it now sysstat error resolved, text reporting stats data works again, Hooray! 🙂

Tags: databases, debian linux, Debian Ubuntu Linux, directory, file, Howto Fix, information, init script, log, network filesystems, package description, root server, sysstat, var
Posted in Everyday Life, Linux, Linux and FreeBSD Desktop, Monitoring, Performance Tuning, Remote System Administration, System Administration, Various | No Comments »

☩ Walking in Light with Christ – Faith, Computing, Diary

Archive for the ‘Monitoring’ Category

How to check who is flooding your Apache, NGinx Webserver – Real time Monitor statistics about IPs doing most URL requests and Stopping DoS attacks with Fail2Ban

Daily Bible quote

GET ARTICLE UPDATES

Useful blog? Help it:

Links to Other Places

Recent Posts

Ads

Categories

About Myself

Recent Comments

Top Post Views

blogtopsites

Archive for the ‘Monitoring’ Category

2. Prepare Rkhunter Template, Triggers and Items

2.1 Create Rkhunter Items

3. Reload zabbix-agent and test the keys

“Cannot open /var/log/sysstat/sa18: No such file or directory. Please check if data collecting is enabled”

1. Check with sysstat machine history SWAP and RAM Memory use

2. Check system load? Are my processes waiting too long to run on the CPU?

3. Show various CPU statistics per CPU use

4. Report paging statistics for some old period

5. Monitor Received RX and Transmitted TX network traffic perl Network interface real time

6. Monitor block devices use

7. Output server monitoring data in CSV database structured format

What we've learned?

Setting selinux for zabbix agent and zabbix proxy to permissive mode

1. Use speedtest-cli command line tool to test connectivity

2. Getting connectivity results of all defined speedtest test City Locations

3. Testing UPlink speed with Download some big file from source location

4. Using Simple shell script to test Internet speed

5. Using iperf to test connectivity between two servers

7. Use speedometer / iftop to measure incoming and outgoing traffic on interface

1. Create the userpameter_active_node.conf Below script is 3 nodes Haproxy cluster

3. Configure in Zabbix for active.dc key Trigger and Item

1. smartmontools (smartd)

2. hddtemp

3. lm-sensors / i2c-tools

1. Install IPMITool

2. Get ADMIN IP configured for access

3. Configure custom access IP and gateway for IPMI

4. Getting a list of IPMI existing users

5. Create new Admin username into IPMI board

6. Configure non-privilege user into IPMI board

7. Check server firmware version on a server via IPMI

8. Reset IPMI management controller or BMC if hanged

9. Print hardware system event log

10. Print Field Replaceable Units ( FRUs ) on the server

11. Get output about system sensors Temperature / Fan / Power Supply

12. Using System Chassis to initiate power on / off / reset / soft shutdown

13. Use IPMI (SoL) Serial over Lan to execute commands remotely

14. Modify boot device order on next boot

15. Using ipmitool shell

16. Changing BMC / DRAC time setting

17. Loading script of IPMI commands

Daily Bible quote

GET ARTICLE UPDATES

Useful blog? Help it:

Links to Other Places

Recent Posts

Ads

Categories

About Myself

Recent Comments

Tags

Top Post Views

blogtopsites

“Cannot open /var/log/sysstat/sa18:
No such file or directory. Please check if data collecting is enabled”

1. Create the userpameter_active_node.conf

Below script is 3 nodes Haproxy cluster