Archive for the ‘Monitoring’ Category

Report haproxy node switch script useful for Zabbix or other monitoring

Tuesday, June 9th, 2020

zabbix-monitoring-logo
For those who administer corosync clustered haproxy and needs to build monitoring in case if the main configured Haproxy node in the cluster is changed, I've developed a small script to be integrated with zabbix-agent installed to report to a central zabbix server via a zabbix proxy.
The script  is very simple it assumed DC1 variable is the default used haproxy node and DC2 and DC3 are 2 backup nodes. The script is made to use crm_mon which is not installed by default on each server by default so if you'll be using it you'll have to install it first, but anyways the script can easily be adapted to use pcs cmd instead.

Below is the bash shell script:

UserParameter=active.dc,f=0; for i in $(sudo /usr/sbin/crm_mon -n -1|grep -i 'Node ' |awk '{ print $2 }'); do ((f++)); DC[$f]="$i"; done; \
DC=$(sudo /usr/sbin/crm_mon -n -1 | grep 'Current DC' | awk '{ print $1 " " $2 " " $3}' | awk '{ print $3 }'); \
if [ “$DC” == “${DC[1]}” ]; then echo “1 Default DC Switched to ${DC[1]}”; elif [ “$DC” == “${DC[2]}” ]; then \
echo "2 Default DC Switched to ${DC[2]}”; elif [ “$DC” == “${DC[3]}” ]; then echo “3 Default DC: ${DC[3]}"; fi


To configure it with zabbix monitoring it can be configured via UserParameterScript.

The way I configured  it in Zabbix is as so:


1. Create the userpameter_active_node.conf

Below script is 3 nodes Haproxy cluster

# cat > /etc/zabbix/zabbix_agentd.d/userparameter_active_node.conf

UserParameter=active.dc,f=0; for i in $(sudo /usr/sbin/crm_mon -n -1|grep -i 'Node ' |awk '{ print $2 }'); do ((f++)); DC[$f]="$i"; done; \
DC=$(sudo /usr/sbin/crm_mon -n -1 | grep 'Current DC' | awk '{ print $1 " " $2 " " $3}' | awk '{ print $3 }'); \
if [ “$DC” == “${DC[1]}” ]; then echo “1 Default DC Switched to ${DC[1]}”; elif [ “$DC” == “${DC[2]}” ]; then \
echo "2 Default DC Switched to ${DC[2]}”; elif [ “$DC” == “${DC[3]}” ]; then echo “3 Default DC: ${DC[3]}"; fi

Once pasted to save the file press CTRL + D


The version of the script with 2 nodes slightly improved is like so:
 

UserParameter=active.dc,f=0; for i in $(sudo /usr/sbin/crm_mon -n -1|grep -i 'Node ' |awk '{ print $2 }' | sed -e 's#:##g'); do DC_ARRAY[$f]=”$i”; ((f++)); done; GET_CURR_DC=$(sudo /usr/sbin/crm_mon -n -1 | grep ‘Current DC’ | awk ‘{ print $1 ” ” $2 ” ” $3}’ | awk ‘{ print $3 }’); if [ “$GET_CURR_DC” == “${DC_ARRAY[0]}” ]; then echo “1 Default DC ${DC_ARRAY[0]}”; fi; if [ “$GET_CURR_DC” == “${DC_ARRAY[1]}” ]; then echo “2 Default Current DC Switched to ${DC_ARRAY[1]} Please check “; fi; if [ -z “$GET_CURR_DC” ] || [ -z “$DC_ARRAY[1]” ]; then printf "Error something might be wrong with HAProxy Cluster on  $HOSTNAME "; fi;


The haproxy_active_DC_zabbix.sh script with a bit of more comments as explanations is available here 
2. Configure access for /usr/sbin/crm_mon for zabbix user in sudoers

# vim /etc/sudoers

zabbix          ALL=NOPASSWD: /usr/sbin/crm_mon


3. Configure in Zabbix for active.dc key Trigger and Item

active-node-switch1

Monitoring Linux hardware Hard Drives / Temperature and Disk with lm_sensors / smartd / hddtemp and Zabbix Userparameter lm_sensors report script

Thursday, April 30th, 2020

monitoring-linux-hardware-with-software-temperature-disk-cpu-health-zabbix-userparameter-script

I'm part of a  SysAdmin Team that is partially doing some minor Zabbix imrovements on a custom corporate installed Zabbix in an ongoing project to substitute the previous HP OpenView monitoring for a bunch of Legacy Linux hosts.
As one of the necessery checks to have is regarding system Hardware, the task was to invent some simplistic way to monitor hardware with the Zabbix Monitoring tool.  Monitoring Bare Metal servers hardware of HP / Dell / Fujituse etc. servers  in Linux usually is done with a third party software provided by the Hardware vendor. But as this requires an additional services to run and sometimes is not desired. It was interesting to find out some alternative Linux native ways to do the System hardware monitoring.
Monitoring statistics from the system hardware components can be obtained directly from the server components with ipmi / ipmitool (for more info on it check my previous article Reset and Manage intelligent  Platform Management remote board article).
With ipmi
 hardware health info could be received straight from the ILO / IDRAC / HPMI of the server. However as often the Admin-Lan of the server is in a seperate DMZ secured network and available via only a certain set of routed IPs, ipmitool can't be used.

So what are the other options to use to implement Linux Server Hardware Monitoring?

The tools to use are perhaps many but I know of two which gives you most of the information you ever need to have a prelimitary hardware damage warning system before the crash, these are:
 

1. smartmontools (smartd)

Smartd is part of smartmontools package which contains two utility programs (smartctl and smartd) to control and monitor storage systems using the Self-Monitoring, Analysis and Reporting Technology system (SMART) built into most modern ATA/SATA, SCSI/SAS and NVMe disks

Disk monitoring is handled by a special service the package provides called smartd that does query the Hard Drives periodically aiming to find a warning signs of hardware failures.
The downside of smartd use is that it implies a little bit of extra load on Hard Drive read / writes and if misconfigured could reduce the the Hard disk life time.

linux:~#  /usr/sbin/smartctl -a /dev/sdb2
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-5-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     KINGSTON SA400S37240G
Serial Number:    50026B768340AA31
LU WWN Device Id: 5 0026b7 68340aa31
Firmware Version: S1Z40102
User Capacity:    240,057,409,536 bytes [240 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Thu Apr 30 14:05:01 2020 EEST
SMART support is: Available – device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  120) seconds.
Offline data collection
capabilities:                    (0x11) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0002) Does not save SMART data before
                                        entering power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  10) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   100   100   000    Old_age   Always       –       100
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       –       2820
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       –       21
148 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       0
149 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       0
167 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       0
168 Unknown_Attribute       0x0012   100   100   000    Old_age   Always       –       0
169 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       0
170 Unknown_Attribute       0x0000   100   100   010    Old_age   Offline      –       0
172 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       –       0
173 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       0
181 Program_Fail_Cnt_Total  0x0032   100   100   000    Old_age   Always       –       0
182 Erase_Fail_Count_Total  0x0000   100   100   000    Old_age   Offline      –       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       –       0
192 Power-Off_Retract_Count 0x0012   100   100   000    Old_age   Always       –       16
194 Temperature_Celsius     0x0022   034   052   000    Old_age   Always       –       34 (Min/Max 19/52)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       –       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       –       0
218 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       –       0
231 Temperature_Celsius     0x0000   097   097   000    Old_age   Offline      –       97
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       –       2104
241 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       –       1857
242 Total_LBAs_Read         0x0032   100   100   000    Old_age   Always       –       1141
244 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       32
245 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       107
246 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       15940

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

Selective Self-tests/Logging not supported

2. hddtemp

Usually if smartd is used it is useful to also use hddtemp which relies on smartd data.
 The hddtemp program monitors and reports the temperature of PATA, SATA
 or SCSI hard drives by reading Self-Monitoring Analysis and Reporting
 Technology (S.M.A.R.T.)
information on drives that support this feature.
 

linux:~# /usr/sbin/hddtemp /dev/sda1
/dev/sda1: Hitachi HDS721050CLA360: 31°C
linux:~# /usr/sbin/hddtemp /dev/sdc6
/dev/sdc6: KINGSTON SV300S37A120G: 25°C
linux:~# /usr/sbin/hddtemp /dev/sdb2
/dev/sdb2: KINGSTON SA400S37240G: 34°C
linux:~# /usr/sbin/hddtemp /dev/sdd1
/dev/sdd1: WD Elements 10B8: S.M.A.R.T. not available

3. lm-sensors / i2c-tools 

 Lm-sensors is a hardware health monitoring package for Linux. It allows you
 to access information from temperature, voltage, and fan speed sensors.
i2c-tools
was historically bundled in the same package as lm_sensors but has been seperated cause not all hardware monitoring chips are I2C devices, and not all I2C devices are hardware monitoring chips.

The most basic use of lm-sensors is with the sensors command

linux:~# sensors
i350bb-pci-0600
Adapter: PCI adapter
loc1:         +55.0 C  (high = +120.0 C, crit = +110.0 C)

coretemp-isa-0000
Adapter: ISA adapter
Physical id 0:  +28.0 C  (high = +78.0 C, crit = +88.0 C)
Core 0:         +26.0 C  (high = +78.0 C, crit = +88.0 C)
Core 1:         +28.0 C  (high = +78.0 C, crit = +88.0 C)
Core 2:         +28.0 C  (high = +78.0 C, crit = +88.0 C)
Core 3:         +28.0 C  (high = +78.0 C, crit = +88.0 C)

 


On CentOS Linux useful tool is also  lm_sensors-sensord.x86_64 – A Daemon that periodically logs sensor readings to syslog or a round-robin database, and warns of sensor alarms.

In Debian Linux there is also the psensors-server (an HTTP server providing JSON Web service which can be used by GTK+ Application to remotely monitor sensors) useful for developers
psesors-server

psensor-linux-graphical-tool-to-check-cpu-hard-disk-temperature-unix

If you have a Xserver installed on the Server accessed with Xclient or via VNC though quite rare,
You can use xsensors or Psensora GTK+ (Widget Toolkit for creating Graphical User Interface) application software.

With this 3 tools it is pretty easy to script one liners and use the Zabbix UserParameters functionality to send hardware report data to a Company's Zabbix Sserver, though Zabbix has already some templates to do so in my case, I couldn't import this templates cause I don't have Zabbix Super-Admin credentials, thus to work around that a sample work around is use script to monitor for higher and critical considered temperature.
Here is a tiny sample script I came up in 1 min time it can be used to used as 1 liner UserParameter and built upon something more complex.

SENSORS_HIGH=`sensors | awk '{ print $6 }'| grep '^+' | uniq`;
SENSORS_CRIT=`sensors | awk '{ print $9 }'| grep '^+' | uniq`; ;SENSORS_STAT=`sensors|grep -E 'Core\s' | awk '{ print $1" "$2" "$3 }' | grep "$SENSORS_HIGH|$SENSORS_CRIT"`;
if [ ! -z $SENSORS_STAT ]; then
echo 'Temperature HIGH';
else 
echo 'Sensors OK';
fi 

Of course there is much more sophisticated stuff to use for monitoring out there


Below script can be easily adapted and use on other Monitoring Platforms such as Nagios / Munin / Cacti / Icinga and there are plenty of paid solutions, but for anyone that wants to develop something from scratch just like me I hope this
article will be a good short introduction.
If you know some other Linux hardware monitoring tools, please share.

ipmitool: Reset and manage IPMI (Intelligent Platform Management Interface) / ILO (Integrated Lights Out) remote board on Linux servers

Friday, December 20th, 2019

ipmitool-how-to-get-information-about-hardware-and-reset-ipmi-bmc-linux-access-ipmi-ilo-interface-logo

As a system administration nomatter whether you manage a bunch of server in a own brew and run Data Center location with some Rack mounted Hardware like PowerEdge M600 / ProLiant DL360e G8 / ProLiant DL360 Gen9 (755258-B21) or you're managing a bunch of Dedicated Servers, you're or will be faced  at some point to use the embedded in many Rack mountable rack servers IPMI / ILO interface remote console board management. If IPMI / ILO terms are new for you I suggest you quickly read my earlier article What is IPMI / IPKVM / ILO /  DRAC Remote Management interfaces to server .

hp-proliant-bl460c-ILO-Interface-screenshot

HP Proliant BL460 C IPMI (ILO) Web management interface 

In short Remote Management Interface is a way that gives you access to the server just like if you had a Monitor and a Keyboard plugged in directly to server.
When a remote computer is down the sysadmin can access it through IPMI and utilize a text console to the boot screen.
The IPMI protocol specification is led by Intel and was first published on September 16, 1998. and currently is supported by more than 200 computer system vendors, such as Cisco, Dell, Hewlett Packard Enterprise, Intel, NEC Corporation, SuperMicro and Tyan and is a standard for remote board management for servers.

IPMI-Block-Diagram-how-ipmi-works-and-its-relation-to-BMC
As you can see from diagram Baseboard Management Controllers (BMCs) is like the heart of IPMI.

Having this ILO / IPMI access is usually via a Web Interface Java interface that gives you the console and usually many of the machines also have an IP address via which a normal SSH command prompt is available giving you ability to execute diagnostic commands to the ILO on the status of attached hardware components of the server / get information about the attached system sensors to get report about things such as:

  • The System Overall heat
  • CPU heat temperature
  • System fan rotation speed cycles
  • Extract information about the server chassis
  • Query info about various system peripherals
  • Configure BIOS or UEFI on a remote system with no monitor / keyboard attached

Having a IPMI (Intelligent Platform Management Interface) firmware embedded into the server Motherboard is essential for system administration because besides this goodies it allows you to remotely Install Operating System to a server without any pre-installed OS right after it is bought and mounted to the planned Data Center Rack nest, just like if you have a plugged Monitor / Keyboard and Mouse and being physically in the remote location.

IPMI is mega useful for system administration also in case of Linux / Windows system updates that requires reboot in which essential System Libraries or binaries are updated and a System reboot is required, because often after system Large bundle updates or Release updates the system fails to boot and you need a way to run a diagnostic stuff from a System rescue Operating System living on a plugged in via a USB stick or CD Drive.
As prior said IPMI remote board is usually accessed and used via some Remote HTTPS encrypted web interface or via Secure Shell crypted session but sometimes the Web server behind the IPMI Web Interface is hanging especially when multiple sysadmins try to access it or due to other stuff and at times due to strange stuff even console SSH access might not be there, thansfully those who run a GNU / Linux Operating system on the Hardware node can use ipmitool tool http://ipmitool.sourceforge.net/ written for Linux that is capable to do a number of useful things with the IPMI management board including a Cold Reset of it so it turns back to working state / adding users / grasping the System hardware and components information health status, changing the Listener address of the IPMI access Interface and even having ability to update the IPMI version firmware.

Prior to be able to access IPMI remotely it has to be enabled usually via a UTP cable connected to the Network from which you expect it to be accesible. The location of the IPMI port on different server vendors is different.

ibm-power9-server-ipmi

IBM Power 9 Server IPMI port

HP-ILO-Bladeserver-Management-port-MGMT-yellow-cabled

HP IPMI console called ILO (Integrated Lights-Out) Port cabled with yellow cable (usually labelled as
Management Port MGMT)

Supermicro-SSG-5029P-E1CTR12L-Rear-Annotated-dedicated-IPMI-lan-port

Supermicro server IPMI Dedicated Lan Port

 In this article I'll shortly explain how IPMITool is available and can be installed and used across GNU / Linux Debian / Ubuntu and other deb based Linuxes with apt or on Fedora / CentOS (RPM) based with yum etc.

1. Install IPMITool

– On Debian

# apt-get install –yes ipmitool 

– On CentOS

# yum install ipmitool OpenIPMI-tools

# ipmitool -V
ipmitool version 1.8.14

On CentOS ipmitool can run as a service and collect data and do some nice stuff to run it:

[root@linux ~]# chkconfig ipmi on 

[root@linux ~]# service ipmi start

Before start using it is worthy to give here short description from ipmitool man page
 

DESCRIPTION
       This program lets you manage Intelligent Platform Management Interface (IPMI) functions of either the local system, via a kernel device driver, or a remote system, using IPMI v1.5 and IPMI v2.0.
       These functions include printing FRU information, LAN configuration, sensor readings, and remote chassis power control.

IPMI management of a local system interface requires a compatible IPMI kernel driver to be installed and configured.  On Linux this driver is called OpenIPMI and it is included in standard  dis‐
       tributions.   On Solaris this driver is called BMC and is included in Solaris 10.  Management of a remote station requires the IPMI-over-LAN interface to be enabled and configured.  Depending on
       the particular requirements of each system it may be possible to enable the LAN interface using ipmitool over the system interface.

2. Get ADMIN IP configured for access

https://3.bp.blogspot.com/-jojgWqj7acg/Wo6bSP0Av1I/AAAAAAAAGdI/xaHewnmAujkprCiDXoBxV7uHonPFjtZDwCLcBGAs/s1600/22-02-2018%2B15-31-09

To get a list of what is the current listener IP with no access to above Web frontend via which IPMI can be accessed (if it is cabled to the Access / Admin LAN port).

# ipmitool lan print 1
Set in Progress         : Set Complete
Auth Type Support       : NONE MD2 MD5 PASSWORD
Auth Type Enable        : Callback : MD2 MD5 PASSWORD
                        : User     : MD2 MD5 PASSWORD
                        : Operator : MD2 MD5 PASSWORD
                        : Admin    : MD2 MD5 PASSWORD
                        : OEM      :
IP Address Source       : Static Address
IP Address              : 10.253.41.127
Subnet Mask             : 255.255.254.0
MAC Address             : 0c:c4:7a:4b:1f:70
SNMP Community String   : public
IP Header               : TTL=0x00 Flags=0x00 Precedence=0x00 TOS=0x00
BMC ARP Control         : ARP Responses Enabled, Gratuitous ARP Disabled
Default Gateway IP      : 10.253.41.254
Default Gateway MAC     : 00:00:0c:07:ac:7b
Backup Gateway IP       : 10.253.41.254
Backup Gateway MAC      : 00:00:00:00:00:00
802.1q VLAN ID          : 8
802.1q VLAN Priority    : 0
RMCP+ Cipher Suites     : 1,2,3,6,7,8,11,12
Cipher Suite Priv Max   : aaaaXXaaaXXaaXX
                        :     X=Cipher Suite Unused
                        :     c=CALLBACK
                        :     u=USER
                        :     o=OPERATOR
                        :     a=ADMIN
                        :     O=OEM

 

3. Configure custom access IP and gateway for IPMI

[root@linux ~]# ipmitool lan set 1 ipsrc static

[root@linux ~]# ipmitool lan set 1 ipaddr 192.168.1.211
Setting LAN IP Address to 192.168.1.211

[root@linux ~]# ipmitool lan set 1 netmask 255.255.255.0
Setting LAN Subnet Mask to 255.255.255.0

[root@linux ~]# ipmitool lan set 1 defgw ipaddr 192.168.1.254
Setting LAN Default Gateway IP to 192.168.1.254

[root@linux ~]# ipmitool lan set 1 defgw macaddr 00:0e:0c:aa:8e:13
Setting LAN Default Gateway MAC to 00:0e:0c:aa:8e:13

[root@linux ~]# ipmitool lan set 1 arp respond on
Enabling BMC-generated ARP responses

[root@linux ~]# ipmitool lan set 1 auth ADMIN MD5

[root@linux ~]# ipmitool lan set 1 access on

4. Getting a list of IPMI existing users

# ipmitool user list 1
ID  Name             Callin  Link Auth  IPMI Msg   Channel Priv Limit
2   admin1           false   false      true       ADMINISTRATOR
3   ovh_dontchange   true    false      true       ADMINISTRATOR
4   ro_dontchange    true    true       true       USER
6                    true    true       true       NO ACCESS
7                    true    true       true       NO ACCESS
8                    true    true       true       NO ACCESS
9                    true    true       true       NO ACCESS
10                   true    true       true       NO ACCESS


– To get summary of existing users

# ipmitool user summary
Maximum IDs         : 10
Enabled User Count  : 4
Fixed Name Count    : 2

5. Create new Admin username into IPMI board
 

[root@linux ~]# ipmitool user set name 2 Your-New-Username

[root@linux ~]# ipmitool user set password 2
Password for user 2: 
Password for user 2: 

[root@linux ~]# ipmitool channel setaccess 1 2 link=on ipmi=on callin=on privilege=4

[root@linux ~]# ipmitool user enable 2
[root@linux ~]# 

6. Configure non-privilege user into IPMI board

If a user should only be used for querying sensor data, a custom privilege level can be setup for that. This user then has no rights for activating or deactivating the server, for example. A user named monitor will be created for this in the following example:

[root@linux ~]# ipmitool user set name 3 monitor

[root@linux ~]# ipmitool user set password 3
Password for user 3: 
Password for user 3: 

[root@linux ~]# ipmitool channel setaccess 1 3 link=on ipmi=on callin=on privilege=2

[root@linux ~]# ipmitool user enable 3

The importance of the various privilege numbers will be displayed when ipmitool channel is called without any additional parameters.

[root@linux ~]# ipmitool channel
Channel Commands: authcap   <channel number> <max privilege>
                  getaccess <channel number> [user id]
                  setaccess <channel number> <user id> [callin=on|off] [ipmi=on|off] [link=on|off] [privilege=level]
                  info      [channel number]
                  getciphers <ipmi | sol> [channel]

Possible privilege levels are:
   1   Callback level
   2   User level
   3   Operator level
   4   Administrator level
   5   OEM Proprietary level
  15   No access
[root@linux ~]# 

The user just created (named 'monitor') has been assigned the USER privilege level. So that LAN access is allowed for this user, you must activate MD5 authentication for LAN access for this user group (USER privilege level).

[root@linux ~]# ipmitool channel getaccess 1 3
Maximum User IDs     : 15
Enabled User IDs     : 2

User ID              : 3
User Name            : monitor
Fixed Name           : No
Access Available     : call-in / callback
Link Authentication  : enabled
IPMI Messaging       : enabled
Privilege Level      : USER

[root@linux ~]# 

7. Check server firmware version on a server via IPMI

# ipmitool mc info
Device ID                 : 32
Device Revision           : 1
Firmware Revision         : 3.31
IPMI Version              : 2.0
Manufacturer ID           : 10876
Manufacturer Name         : Supermicro
Product ID                : 1579 (0x062b)
Product Name              : Unknown (0x62B)
Device Available          : yes
Provides Device SDRs      : no
Additional Device Support :
    Sensor Device
    SDR Repository Device
    SEL Device
    FRU Inventory Device
    IPMB Event Receiver
    IPMB Event Generator
    Chassis Device


ipmitool mc info is actually an alias for the ipmitool bmc info cmd.

8. Reset IPMI management controller or BMC if hanged

As earlier said if for some reason Web GUI access or SSH to IPMI is lost, reset with:

root@linux:/root#  ipmitool mc reset
[ warm | cold ]

If you want to stop electricity for a second to IPMI and bring it on use the cold reset (this usually
should be done if warm reset does not work).

root@linux:/root# ipmitool mc reset cold

otherwise soft / warm is with:

ipmitool mc reset warm

Sometimes the BMC component of IPMI hangs and only fix to restore access to server Remote board is to reset also BMC

root@linux:/root# ipmitool bmc reset cold

9. Print hardware system event log

root@linux:/root# ipmitool sel info
SEL Information
Version          : 1.5 (v1.5, v2 compliant)
Entries          : 0
Free Space       : 10240 bytes
Percent Used     : 0%
Last Add Time    : Not Available
Last Del Time    : 07/02/2015 17:22:34
Overflow         : false
Supported Cmds   : 'Reserve' 'Get Alloc Info'
# of Alloc Units : 512
Alloc Unit Size  : 20
# Free Units     : 512
Largest Free Blk : 512
Max Record Size  : 20

 ipmitool sel list
SEL has no entries

In this particular case the system shows no entres as it was run on a tiny Microtik 1U machine, however usually on most Dell PowerEdge / HP Proliant / Lenovo System X machines this will return plenty of messages.

ipmitool sel elist

ipmitool sel clear

To clear anything if such logged

ipmitool sel clear

10.  Print Field Replaceable Units ( FRUs ) on the server 

[root@linux ~]# ipmitool fru print
 

FRU Device Description : Builtin FRU Device (ID 0)
 Chassis Type          : Other
 Chassis Serial        : KD5V59B
 Chassis Extra         : c3903ebb6237363698cdbae3e991bbed
 Board Mfg Date        : Mon Sep 24 02:00:00 2012
 Board Mfg             : IBM
 Board Product         : System Board
 Board Serial          : XXXXXXXXXXX
 Board Part Number     : 00J6528
 Board Extra           : 00W2671
 Board Extra           : 1400
 Board Extra           : 0000
 Board Extra           : 5000
 Board Extra           : 10

 Product Manufacturer  : IBM
 Product Name          : System x3650 M4
 Product Part Number   : 1955B2G
 Product Serial        : KD7V59K
 Product Asset Tag     :

FRU Device Description : Power Supply 1 (ID 1)
 Board Mfg Date        : Mon Jan  1 01:00:00 1996
 Board Mfg             : ACBE
 Board Product         : IBM Designed Device
 Board Serial          : YK151127R1RN
 Board Part Number     : ZZZZZZZ
 Board Extra           : ZZZZZZ<FF><FF><FF><FF><FF>
 Board Extra           : 0200
 Board Extra           : 00
 Board Extra           : 0080
 Board Extra           : 1

FRU Device Description : Power Supply 2 (ID 2)
 Board Mfg Date        : Mon Jan  1 01:00:00 1996
 Board Mfg             : ACBE
 Board Product         : IBM Designed Device
 Board Serial          : YK131127M1LE
 Board Part Number     : ZZZZZ
 Board Extra           : ZZZZZ<FF><FF><FF><FF><FF>
 Board Extra           : 0200
 Board Extra           : 00
 Board Extra           : 0080
 Board Extra           : 1

FRU Device Description : DASD Backplane 1 (ID 3)
….

Worthy to mention here is some cheaper server vendors such as Trendmicro might show no data here (no idea whether this is a protocol incompitability or IPMItool issue).

11. Get output about system sensors Temperature / Fan / Power Supply

Most newer servers have sensors to track temperature / voltage / fanspeed peripherals temp overall system temp etc.
To get a full list of sensors statistics from IPMI 
 

# ipmitool sensor
CPU Temp         | 29.000     | degrees C  | ok    | 0.000     | 0.000     | 0.000     | 95.000    | 98.000    | 100.000
System Temp      | 40.000     | degrees C  | ok    | -9.000    | -7.000    | -5.000    | 80.000    | 85.000    | 90.000
Peripheral Temp  | 41.000     | degrees C  | ok    | -9.000    | -7.000    | -5.000    | 80.000    | 85.000    | 90.000
PCH Temp         | 56.000     | degrees C  | ok    | -11.000   | -8.000    | -5.000    | 90.000    | 95.000    | 100.000
FAN 1            | na         |            | na    | na        | na        | na        | na        | na        | na
FAN 2            | na         |            | na    | na        | na        | na        | na        | na        | na
FAN 3            | na         |            | na    | na        | na        | na        | na        | na        | na
FAN 4            | na         |            | na    | na        | na        | na        | na        | na        | na
FAN A            | na         |            | na    | na        | na        | na        | na        | na        | na
Vcore            | 0.824      | Volts      | ok    | 0.480     | 0.512     | 0.544     | 1.488     | 1.520     | 1.552
3.3VCC           | 3.296      | Volts      | ok    | 2.816     | 2.880     | 2.944     | 3.584     | 3.648     | 3.712
12V              | 12.137     | Volts      | ok    | 10.494    | 10.600    | 10.706    | 13.091    | 13.197    | 13.303
VDIMM            | 1.496      | Volts      | ok    | 1.152     | 1.216     | 1.280     | 1.760     | 1.776     | 1.792
5VCC             | 4.992      | Volts      | ok    | 4.096     | 4.320     | 4.576     | 5.344     | 5.600     | 5.632
CPU VTT          | 1.008      | Volts      | ok    | 0.872     | 0.896     | 0.920     | 1.344     | 1.368     | 1.392
VBAT             | 3.200      | Volts      | ok    | 2.816     | 2.880     | 2.944     | 3.584     | 3.648     | 3.712
VSB              | 3.328      | Volts      | ok    | 2.816     | 2.880     | 2.944     | 3.584     | 3.648     | 3.712
AVCC             | 3.312      | Volts      | ok    | 2.816     | 2.880     | 2.944     | 3.584     | 3.648     | 3.712
Chassis Intru    | 0x1        | discrete   | 0x0100| na        | na        | na        | na        | na        | na

 

To get only partial sensors data from the SDR (Sensor Data Repositry) entries and readings

[root@linux ~]# ipmitool sdr list 

Planar 3.3V      | 3.31 Volts        | ok
Planar 5V        | 5.06 Volts        | ok
Planar 12V       | 12.26 Volts       | ok
Planar VBAT      | 3.14 Volts        | ok
Avg Power        | 80 Watts          | ok
PCH Temp         | 45 degrees C      | ok
Ambient Temp     | 19 degrees C      | ok
PCI Riser 1 Temp | 25 degrees C      | ok
PCI Riser 2 Temp | no reading        | ns
Mezz Card Temp   | no reading        | ns
Fan 1A Tach      | 3071 RPM          | ok
Fan 1B Tach      | 2592 RPM          | ok
Fan 2A Tach      | 3145 RPM          | ok
Fan 2B Tach      | 2624 RPM          | ok
Fan 3A Tach      | 3108 RPM          | ok
Fan 3B Tach      | 2592 RPM          | ok
Fan 4A Tach      | no reading        | ns
Fan 4B Tach      | no reading        | ns
CPU1 VR Temp     | 27 degrees C      | ok
CPU2 VR Temp     | 27 degrees C      | ok
DIMM AB VR Temp  | 24 degrees C      | ok
DIMM CD VR Temp  | 23 degrees C      | ok
DIMM EF VR Temp  | 25 degrees C      | ok
DIMM GH VR Temp  | 24 degrees C      | ok
Host Power       | 0x00              | ok
IPMI Watchdog    | 0x00              | ok

[root@linux ~]# ipmitool sdr type Temperature
PCH Temp         | 31h | ok  | 45.1 | 45 degrees C
Ambient Temp     | 32h | ok  | 12.1 | 19 degrees C
PCI Riser 1 Temp | 3Ah | ok  | 16.1 | 25 degrees C
PCI Riser 2 Temp | 3Bh | ns  | 16.2 | No Reading
Mezz Card Temp   | 3Ch | ns  | 44.1 | No Reading
CPU1 VR Temp     | F7h | ok  | 20.1 | 27 degrees C
CPU2 VR Temp     | F8h | ok  | 20.2 | 27 degrees C
DIMM AB VR Temp  | F9h | ok  | 20.3 | 25 degrees C
DIMM CD VR Temp  | FAh | ok  | 20.4 | 23 degrees C
DIMM EF VR Temp  | FBh | ok  | 20.5 | 26 degrees C
DIMM GH VR Temp  | FCh | ok  | 20.6 | 24 degrees C
Ambient Status   | 8Eh | ok  | 12.1 |
CPU 1 OverTemp   | A0h | ok  |  3.1 | Transition to OK
CPU 2 OverTemp   | A1h | ok  |  3.2 | Transition to OK

[root@linux ~]# ipmitool sdr type Fan
Fan 1A Tach      | 40h | ok  | 29.1 | 3034 RPM
Fan 1B Tach      | 41h | ok  | 29.1 | 2592 RPM
Fan 2A Tach      | 42h | ok  | 29.2 | 3145 RPM
Fan 2B Tach      | 43h | ok  | 29.2 | 2624 RPM
Fan 3A Tach      | 44h | ok  | 29.3 | 3108 RPM
Fan 3B Tach      | 45h | ok  | 29.3 | 2592 RPM
Fan 4A Tach      | 46h | ns  | 29.4 | No Reading
Fan 4B Tach      | 47h | ns  | 29.4 | No Reading
PS 1 Fan Fault   | 73h | ok  | 10.1 | Transition to OK
PS 2 Fan Fault   | 74h | ok  | 10.2 | Transition to OK

[root@linux ~]# ipmitool sdr type ‘Power Supply’
Sensor Type "‘Power" not found.
Sensor Types:
        Temperature               (0x01)   Voltage                   (0x02)
        Current                   (0x03)   Fan                       (0x04)
        Physical Security         (0x05)   Platform Security         (0x06)
        Processor                 (0x07)   Power Supply              (0x08)
        Power Unit                (0x09)   Cooling Device            (0x0a)
        Other                     (0x0b)   Memory                    (0x0c)
        Drive Slot / Bay          (0x0d)   POST Memory Resize        (0x0e)
        System Firmwares          (0x0f)   Event Logging Disabled    (0x10)
        Watchdog1                 (0x11)   System Event              (0x12)
        Critical Interrupt        (0x13)   Button                    (0x14)
        Module / Board            (0x15)   Microcontroller           (0x16)
        Add-in Card               (0x17)   Chassis                   (0x18)
        Chip Set                  (0x19)   Other FRU                 (0x1a)
        Cable / Interconnect      (0x1b)   Terminator                (0x1c)
        System Boot Initiated     (0x1d)   Boot Error                (0x1e)
        OS Boot                   (0x1f)   OS Critical Stop          (0x20)
        Slot / Connector          (0x21)   System ACPI Power State   (0x22)
        Watchdog2                 (0x23)   Platform Alert            (0x24)
        Entity Presence           (0x25)   Monitor ASIC              (0x26)
        LAN                       (0x27)   Management Subsys Health  (0x28)
        Battery                   (0x29)   Session Audit             (0x2a)
        Version Change            (0x2b)   FRU State                 (0x2c)

12. Using System Chassis to initiate power on / off / reset / soft shutdown

!!!!!  Beware only run this if you know what you're realling doing don't just paste into a production system, If you do so it is your responsibility !!!!! 

–  do a soft-shutdown via acpi 

ipmitool [chassis] power soft

– issue a hard power off, wait 1s, power on 

ipmitool [chassis] power cycle

– run a hard power off

ipmitool [chassis] power off

 
– do a hard power on 

ipmitool [chassis] power on

–  issue a hard reset

ipmitool [chassis] power reset


– Get system power status
 

ipmitool chassis power status

13. Use IPMI (SoL) Serial over Lan to execute commands remotely


Besides using ipmitool locally on server that had its IPMI / ILO / DRAC console disabled it could be used also to query and make server do stuff remotely.

If not loaded you will have to load lanplus kernel module.
 

modprobe lanplus

 ipmitool -I lanplus -H 192.168.99.1 -U user -P pass chassis power status

ipmitool -I lanplus -H 192.168.98.1 -U user -P pass chassis power status

ipmitool -I lanplus -H 192.168.98.1 -U user -P pass chassis power reset

ipmitool -I lanplus -H 192.168.98.1 -U user -P pass chassis power reset

ipmitool -I lanplus -H 192.168.98.1 -U user -P pass password sol activate

– Deactivating Sol server capabilities
 

 ipmitool -I lanplus -H 192.168.99.1 -U user -P pass sol deactivate

14. Modify boot device order on next boot

!!!!! Do not run this except you want to really modify Boot device order, carelessly copy pasting could leave your server unbootable on next boot !!!!!

– Set first boot device to be as BIOS

ipmitool chassis bootdev bios

– Set first boot device to be CD Drive

ipmitool chassis bootdev cdrom 

– Set first boot device to be via Network Boot PXE protocol

ipmitool chassis bootdev pxe 

15. Using ipmitool shell

root@iqtestfb:~# ipmitool shell
ipmitool> 
help
Commands:
        raw           Send a RAW IPMI request and print response
        i2c           Send an I2C Master Write-Read command and print response
        spd           Print SPD info from remote I2C device
        lan           Configure LAN Channels
        chassis       Get chassis status and set power state
        power         Shortcut to chassis power commands
        event         Send pre-defined events to MC
        mc            Management Controller status and global enables
        sdr           Print Sensor Data Repository entries and readings
        sensor        Print detailed sensor information
        fru           Print built-in FRU and scan SDR for FRU locators
        gendev        Read/Write Device associated with Generic Device locators sdr
        sel           Print System Event Log (SEL)
        pef           Configure Platform Event Filtering (PEF)
        sol           Configure and connect IPMIv2.0 Serial-over-LAN
        tsol          Configure and connect with Tyan IPMIv1.5 Serial-over-LAN
        isol          Configure IPMIv1.5 Serial-over-LAN
        user          Configure Management Controller users
        channel       Configure Management Controller channels
        session       Print session information
        dcmi          Data Center Management Interface
        sunoem        OEM Commands for Sun servers
        kontronoem    OEM Commands for Kontron devices
        picmg         Run a PICMG/ATCA extended cmd
        fwum          Update IPMC using Kontron OEM Firmware Update Manager
        firewall      Configure Firmware Firewall
        delloem       OEM Commands for Dell systems
        shell         Launch interactive IPMI shell
        exec          Run list of commands from file
        set           Set runtime variable for shell and exec
        hpm           Update HPM components using PICMG HPM.1 file
        ekanalyzer    run FRU-Ekeying analyzer using FRU files
        ime           Update Intel Manageability Engine Firmware
ipmitool>

16. Changing BMC / DRAC time setting

# ipmitool -H XXX.XXX.XXX.XXX -U root -P pass sel time set "01/21/2011 16:20:44"

17. Loading script of IPMI commands

# ipmitool exec /path-to-script/script-with-instructions.txt  

 

Closure

As you saw ipmitool can be used to do plenty of cool things both locally or remotely on a server that had IPMI server interface available. The tool is mega useful in case if ILO console gets hanged as it can be used to reset it.
I explained shortly what is Intelligent Platform Management Interface, how it can be accessed and used on Linux via ipmitool. I went through some of its basic use, how it can be used to print the configured ILO access IP how
this Admin IP and Network configuration can be changed, how to print the IPMI existing users and how to add new Admin and non-privileged users.
Then I've shown how a system hardware and firmware could be shown, how IPMI management BMC could be reset in case if it hanging and how hardware system even logs can be printed (useful in case of hardware failure errors etc.), how to print reports on current system fan / power supply  and temperature. Finally explained how server chassis could be used for soft and cold server reboots locally or via SoL (Serial Over Lan) and how boot order of system could be modified.

ipmitool is a great tool to further automate different sysadmin tasks with shell scrpts for stuff such as tracking servers for a failing hardware and auto-reboot of inacessible failed servers to guarantee Higher Level of availability.
Hope you enjoyed artcle .. It wll be interested to hear of any other known ipmitool scripts or use, if you know such please share it.

Howto Fix “sysstat Cannot open /var/log/sysstat/sa no such file or directory” on Debian / Ubuntu Linux

Monday, February 15th, 2016

sysstast-no-such-file-or-directory-fix-Debian-Ubuntu-Linux-howto
I really love sysstat and as a console maniac I tend to install it on every server however by default there is some <b>sysstat</b> tuning once installed to make it work, for those unfamiliar with <i>sysstat</i> I warmly recommend to check, it here is in short the package description:<br /><br />
 

server:~# apt-cache show sysstat|grep -i desc -A 15
Description: system performance tools for Linux
 The sysstat package contains the following system performance tools:
  – sar: collects and reports system activity information;
  – iostat: reports CPU utilization and disk I/O statistics;
  – mpstat: reports global and per-processor statistics;
  – pidstat: reports statistics for Linux tasks (processes);
  – sadf: displays data collected by sar in various formats;
  – nfsiostat: reports I/O statistics for network filesystems;
  – cifsiostat: reports I/O statistics for CIFS filesystems.
 .
 The statistics reported by sar deal with I/O transfer rates,
 paging activity, process-related activities, interrupts,
 network activity, memory and swap space utilization, CPU
 utilization, kernel activities and TTY statistics, among
 others. Both UP and SMP machines are fully supported.
Homepage: http://pagesperso-orange.fr/sebastien.godard/

If you happen to install sysstat on a Debian / Ubuntu server with:

server:~# apt-get install –yes sysstat


, and you try to get some statistics with sar command but you get some ugly error output from:

server:~# sar Cannot open /var/log/sysstat/sa20: No such file or directory


And you wonder how to resolve it and to be able to have the server log in text databases periodically the nice sar stats load avarages – %idle, %iowait, %system, %nice, %user, then to FIX that Cannot open /var/log/sysstat/sa20: No such file or directory

You need to:

server:~# vim /etc/default/sysstat


By Default value you will find out sysstat stats it is disabled, e.g.:

ENABLED="false"

Switch the value to "true"

ENABLED="true"


Then restart sysstat init script with:

server:~# /etc/init.d/sysstat restart

However for those who prefer to do things from menu Ncurses interfaces and are not familiar with Vi Improved, the easiest way is to run dpkg reconfigure of the sysstat:

server:~# dpkg –reconfigure


sysstat-reconfigure-on-gnu-linux

root@server:/# sar
Linux 2.6.32-5-amd64 (pcfreak) 15.02.2016 _x86_64_ (2 CPU)

0,00,01 CPU %user %nice %system %iowait %steal %idle
0,15,01 all 24,32 0,54 3,10 0,62 0,00 71,42
1,15,01 all 18,69 0,53 2,10 0,48 0,00 78,20
10,05,01 all 22,13 0,54 2,81 0,51 0,00 74,01
10,15,01 all 17,14 0,53 2,44 0,40 0,00 79,49
10,25,01 all 24,03 0,63 2,93 0,45 0,00 71,97
10,35,01 all 18,88 0,54 2,44 1,08 0,00 77,07
10,45,01 all 25,60 0,54 3,33 0,74 0,00 69,79
10,55,01 all 36,78 0,78 4,44 0,89 0,00 57,10
16,05,01 all 27,10 0,54 3,43 1,14 0,00 67,79


Well that's it now sysstat error resolved, text reporting stats data works again, Hooray! 🙂

How to find and Delete Duplicate files in directory on Linux server with find and fdupes command

Monday, March 16th, 2015

search-duplcate-files-linux-command-and-graphical-tools-how-to-find-duplicate-files-on-linux-mac-and-windows-os

Linux / UNIX find command is very helpful to do a lot of tasks to us admins such as Deleting empty directories to free up occupied inodes or finding and printing only empty files within a root file system within all sub-directories
There is too much of uses of find, however one that is probably rarely used known by sysadmins find command use is how to search for duplicate files on a Linux server:
 

find -not -empty -type f -printf “%s\n” | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 –all-repeated=separate

If you're curious how does duplicate files finding works, they are found by comparing file sizes and MD5 signatures, followed by a byte-by-byte comparison.

Most common application of below command is when you want to search and get rid of some old obsolete files which you forgot to delete such as old /etc/ configurations, old SQL backups and PHP / Java / Python programming code files etc.

If you have to do a regular duplicate file find on multiple servers Linux servers perhaps you should install and use  fdupes command.
On Debian Linux to install it:

root@pcfreak:/# apt-cache show fdupes|grep -i descr -A 4
Description: identifies duplicate files within given directories
 FDupes uses md5sums and then a byte by byte comparison to find
 duplicate files within a set of directories. It has several useful
 options including recursion.
Homepage: http://code.google.com/p/fdupes/

root@pc-freak.net:/# apt-get install –yes fdupes

To search for duplicate files with fdupes in lets /etc/ just run fdupes without arguments:

root@pcfreak:/# fdupes /etc/
/etc/magic
/etc/magic.mime

/etc/odbc.ini
/etc/.pwd.lock
/etc/environment
/etc/odbcinst.ini

/etc/shadow-
/etc/shadow


If you want to look up for all duplicate files within root directory:
 

root@pcfreak:/# fdupes -r /etc/
Building file list /

You can also find duplicate files for multiple directories by just passing all directories as arguments to fdupes

root@pcfreak:/# fdupes -r /etc/ /usr/ /root /disk /nfs_mount /nas


The -r argument (makes a recursive subdirectory search for duplicates), if you want to also see what is the size of duplicate files found add -S option

 

fdupes -r -S /etc/ /usr/ /root /disk /nfs_mount /nas


If you want to delete all duplicate files within lets say /etc/

root@pcfreak:/# fdupes -d /etc/

fdupes is also available and installable also on RPM based Linux distros Fedora / RHEL / CentOS etc., install on CentOS with:
 

[root@centos~ ]# yum -y install fdupes


There is also a port available for those who want to run it on FreeBSD on BSD install it from ports:

freebsd# cd /usr/ports/sysutils/fdupes
freebsd# make install clean


If you have a GUI environment installed on the server and you don't want to bother with command line to search for all duplicate files under main filesystem and other lint (junk) files take a look at FSlint

FSlint-2.02-search-for-duplicate-and-lint-files-linux-gui-tool

If you're looking for a GUI cross platform duplicate file finder tool that runs on all major used Operating Systems Mac OS X / Windows / Linux take a look at dupeGuru

How to check Apache Webserver and MySQL server uptime – Check uptime of a running daemon with PS (process) command

Tuesday, March 10th, 2015

check_Apache_Webserver_and_MySQL_server_uptime_-_Check-uptime-of-running-daemon-service-with-PS-process-command

Something very useful that most Apache LAMP (Linux Apache MySQL PHP) admins should know is how to check Apache Webserver uptime and MySQL server running (uptime).
Checking Apache / MySQL uptime is primary useful for scripting purposes – creating auto Apache / MySQL service restart scripts, or just as a quick console way to check what is the status and uptime of Webserver / SQL.

My experience as a sysadmin shows that lack of Periodic Apache and MySQL restart every week or every month often creates sys-admin a lot of a headaches cause (Apache / NGINX / SQL  server) starts eating too much memory or under some circumstances leads to service or system crashes. Periodic system main services restart is especially helpful in case if Website's backend programming code is writetn in a bad and buggy uneffient way by unprofessional (novice) programmers.
While I was still working as Senior SysAdmin in Design.BG, I've encountered many such Crappy Web applications developed by dozen of different programmers (because company's programmers changed too frequently and many of the hired Web Developers ,were still learning to program, I guess same is true also for other Start-UP Web / IT Company where crappy programming code is developed you will certainly need to keep an eye on Apache / MYSQL uptime.  If that's the case below 2 quick one liners with PS command will help you keep an eye on Apache / MYSQL uptime

ps -eo "%U %c %t"| grep apache2 | grep -v grep|grep root
root     apache2            02:30:05

Note that above example is Debian specific on RPM based distributions you will have to grep for httpd instead of apache2
 

ps -eo "%U %c %t"| grep http| grep -v grep|grep root

root     apache2            10:30:05

To check MySQL uptine:
 

ps -eo "%U %c %t"| grep mysqld
root     mysqld_safe        20:42:53
mysql    mysqld             20:42:53


Though example is for mysql and Apache you can easily use ps cmd in same way to check any other Linux service uptime such as Java / Qmail / PostgreSQL / Postfix etc.
 

ps -eo "%U %c %t"|grep qmail
qmails   qmail-send      19-01:10:48
qmaill   multilog        19-01:10:48
qmaill   multilog        19-01:10:48
qmaill   multilog        19-01:10:48
root     qmail-lspawn    19-01:10:48
qmailr   qmail-rspawn    19-01:10:48
qmailq   qmail-clean     19-01:10:48
qmails   qmail-todo      19-01:10:48
qmailq   qmail-clean     19-01:10:48
qmaill   multilog        40-18:02:53

 ps -eo "%U %c %t"|grep -i nginx|grep -v root|uniq
nobody   nginx           55-01:22:44

ps -eo "%U %c %t"|grep -i java|grep -v root |uniq
hipo   java            27-22:02:07

Linux: How to see / change supported network bandwidth of NIC interface and get various eth network statistics with ethtool

Monday, January 19th, 2015

linux-how-to-see-change-supported-network-bandwidth-of-NIC-interface-and-view-network-statistics
If you're a novice Linux sysadmin and inherited some dedicated servers without any documentation and hence on of the first things you have to do to start a new server documentation is to check the supported TCP/IP network speed of servers Network (ethernet) Interfaces. On Linux this is very easy task to verify the speed of LAN card supported Local / Internet traffic install ethtool (if not already preseont on the servers) – assuming you're dealing with Debian / Ubuntu Linux servers.

1. Install ethtool on Deb and RPM based distros

dedi-server1:~# apt-cache show ethtool|grep -i desc -A 3
Description: display or change Ethernet device settings
 ethtool can be used to query and change settings such as speed, auto-
 negotiation and checksum offload on many network devices, especially
 Ethernet devices.

dedi-server1:~# apt-get install –yes ethtool
..

ethtool should be installed by default on CentOS / Fedora / RHEL and  syntax is same like on Debs. If you happen to miss ethtool on any (SuSE) / RedHat / RPM based distro install it with yum

[root@centos:~] # yum -y install ethtool


2. Get ethernet configurations

To check the current eth0 / eth1 / ethX network (Speed / Duplex) and other network related configuration configuration:
 

dedi-server5:~# ethtool eth0

Settings for eth0:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full

        Advertised pause frame use: No
        Advertised auto-negotiation: Yes
        Speed: 1000Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        MDI-X: off
        Supports Wake-on: pumbag
        Wake-on: g
        Current message level: 0x00000001 (1)
        Link detected: yes

Having a NIC configured to act as Duplex is very important as Duplex communication enables LAN card to communicate both sides (Sent / Receive) packets simultaneously.

full-duplex-half-duplex-explained-picture

Probably most interesting parameters for most admins are the ones that are telling whether the NIC UpLink is 10megabyte / 100 megabyte or 1Gigabyte as well as supported Receive / Send ( Transfer ) speeds of LAN, a common useful ethtool admin use to just show current LAN ethernet interface speed:

server-admin1:~# ethtool eth0 |grep -i speed
        Speed: 1000Mb/s

To get info about NIC (kernel module / driver) used with ethtool:

dedi-server3:~# ethtool -i eth0 driver: e1000e
version: 1.2.20-k2
firmware-version: 1.8-0
bus-info: 0000:06:00.0

3. Make LAN Card blink to recognize eth is mapped to which Physical LAN

Besides that ethtool has many other useful use cases, for example if you have a server with 5 lan or more LAN cards, but you're not sure to which of all different EthX interfaces correspond, a very useful thing is to make eth0, eth1, eth2, eth3, etc. blink for 5 seconds in order to identify which static IP is binded physically to which NIC , here is how:

ethtool -p eth0 5


Then you can follow the procedure for any interface on the server and map them with a sticker 🙂

Ethtool is also useful for getting "deep" (thorough) statistics on Server LAN cards, this could be useful to identify sometimes hard to determine broadcast flood attacks:
 

4. Get network statistics with ethtool for interfaces
 

dedi-server5:~# ethtool -S eth0|grep -vw 0
NIC statistics:
     rx_packets: 6196644448
     tx_packets: 7197385158
     rx_bytes: 2038559235701
     tx_bytes: 8281206569250
     rx_broadcast: 357508947
     tx_broadcast: 172
     rx_multicast: 34731963
     tx_multicast: 20
     rx_errors: 115
     multicast: 34731963
     rx_length_errors: 115
     rx_no_buffer_count: 26391
     rx_missed_errors: 10059
     tx_timeout_count: 3
     tx_restart_queue: 2590
     rx_short_length_errors: 115
     tx_tcp_seg_good: 964136993
     rx_long_byte_count: 2038559235701
     rx_csum_offload_good: 5824813965
     rx_csum_offload_errors: 42186
     rx_smbus: 383640020

5. Turn on Auto Negotiation and change NIC set speed to 10 / 100 / 1000 Mb/s

Auto-negotiation is important as an ethernet procedure by which two communication devices (2 network cards) choose common transmission parameters such as speed, duplex mode, and flow control in order to achieve maximum transmission speed over the network. On 1000BASE-T basednetworks the standard is a mandatory. There is also backward compatability for older 10BASE-T Networks.

a) To raise up NIC to use 1000 Mb/s in case if the bandwidth was raised to 1Gb/s but NIC settings were not changed:

dedi-server1:~# ethtool -s eth0 speed 1000 duplex half autoneg off


b) In case if LAN speed has to be reduced for some weird reason to 10 / 100Mb/s

dedi-server1:~# ethtool -s eth0 speed 10 duplex half autoneg off

dedi-server1:~# ethtool -s eth0 speed 100 duplex half autoneg off

c) To enable disable NIC Autonegotiation:

dedi-server1:~# ethtool -s eth0 autoneg on


6. Change Speed / Duplex settings to load on boot

a) Set Network to Duplex on Fedora / CentOS etc.

Quickest way to do it is of course to use /etc/rc.local. If you want to do it following distribution logic on CentOS / RHEL Linux:

Add to /etc/sysconfig/network-scripts/ifcfg-eth0

vim /etc/sysconfig/network/-scripts/ifcfg-eth0

ETHTOOL_OPTS="speed 1000 duplex full autoneg off"

To load the new settings restart networking (be careful to have physical access to server if something goes wrong 🙂 )

service network restart

b) Change network speed / duplex setting on Debian / Ubuntu Linux

Add at the end of /etc/network/interfaces

vim /etc/network/interfaces

post-up ethtool -s eth0 speed 100 duplex full autoneg off

7. Tune NIC ring buffers

dedi-server1:~# ethtool -g eth0

Ring parameters for eth0:
Pre-set maximums:
RX:             4096
RX Mini:        0
RX Jumbo:       0
TX:             4096
Current hardware settings:
RX:             256
RX Mini:        0
RX Jumbo:       0
TX:             256

As you can see the default setting of RX (receive) buffer size is low 256 and on busy servers with high traffic loads, depending on the hardware NIC vendor this RX buffer size varies.
Through increasing the Rx/Tx ring buffer size , you can decrease the probability of discarding packets in the NIC during a scheduling delay.
A change in rx buffer ring requires NIC restart so (be careful not to loose connection to remote server), be sure to have iLO access to it.

Here is how to raise Rx ring buffer size 4 times from default value:

ethtool -G eth0 rx 4096 tx 4069

How to check who is flooding your Apache, NGinx Webserver – Real time Monitor statistics about IPs doing most URL requests and Stopping DoS attacks with Fail2Ban

Wednesday, August 20th, 2014

check-who-is-flooding-your-apache-nginx-webserver-real-time-monitoring-ips-doing-most-url-requests-to-webserver-and-protecting-your-webserver-with-fail2ban

If you're Linux ystem administrator in Webhosting company providing WordPress / Joomla / Drupal web-sites hosting and your UNIX servers suffer from periodic denial of service attacks, because some of the site customers business is a target of competitor company who is trying to ruin your client business sites through DoS or DDOS attacks, then the best thing you can do is to identify who and how is the Linux server being hammered. If you find out DoS is not on a network level but Apache gets crashing because of memory leaks and connections to Apache are so much that the CPU is being stoned, the best thing to do is to check which IP addresses are causing the excessive GET / POST / HEAD requests in logged.
 

There is the Apachetop tool that can give you the most accessed webserver URLs in a refreshed screen like UNIX top command, however Apachetop does not show which IP does most URL hits on Apache / Nginx webserver. 

1. Get basic information on which IPs accesses Apache / Nginx the most using shell cmds


Before examining the Webserver logs it is useful to get a general picture on who is flooding you on a TCP / IP network level, with netstat like so:
 

# here is howto check clients count connected to your server
netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n


If you get an extensive number of connected various IPs / hosts (like 10000 or something huge as a number), depending on the type of hardware the server is running and the previous scaling planned for the system you can determine whether the count as huge as this can be handled normally by server, if like in most cases the server is planned to serve a couple of hundreds or thousands of clients and you get over 10000 connections hanging, then your server is under attack or if its Internet server suddenly your website become famous like someone posted an article on some major website and you suddenly received a tons of hits.


There is a way using standard shell tools, to get some basic information on which IP accesses the webserver the most with:

tail -n 500 /var/log/apache2/access.log | cut -d' ' -f1 | sort | uniq -c | sort -gr

Or if you want to keep it refreshing periodically every few seconds run it through watch command:

watch "tail -n 500 /var/log/apache2/access.log | cut -d' ' -f1 | sort | uniq -c | sort -gr"

monioring-access-hits-to-webserver-by-ip-show-most-visiting-apache-nginx-ip-with-shell-tools-tail-cut-uniq-sort-tools-refreshed-with-watch-cmd


Another useful combination of shell commands is to Monitor POST / GET / HEAD requests number in access.log :
 

 awk '{print $6}' access.log | sort | uniq -c | sort -n

     1 "alihack<%eval
      1 "CONNECT
      1 "fhxeaxb0xeex97x0fxe2-x19Fx87xd1xa0x9axf5x^xd0x125x0fx88x19"x84xc1xb3^v2xe9xpx98`X'dxcd.7ix8fx8fxd6_xcdx834x0c"
      1 "x16x03x01"
      1 "xe2
      2 "mgmanager&file=imgmanager&version=1576&cid=20
      6 "4–"
      7 "PUT
     22 "–"
     22 "OPTIONS
     38 "PROPFIND
   1476 "HEAD
   1539 "-"
  65113 "POST
 537122 "GET


However using shell commands combination is plenty of typing and hard to remember, plus above tools does not show you, approximately how frequenty IP hits the webserver

2. Real-time monitoring IP addresses with highest URL reqests with logtop


Real-time monitoring on IP addresses with highest URL requests is possible with no need of "console ninja skills"  through – logtop.

 

2.1 Install logtop on Debian / Ubuntu and deb derivatives Linux


a) Installing Logtop the debian way

LogTop is easily installable on Debian and Ubuntu in newer releases of Debian – Debian 7.0 and Ubuntu 13/14 Linux it is part of default package repositories and can be straightly apt-get-ed with:

apt-get install –yes logtop

b) Installing Logtop from source code (install on older deb based Linuxes)

On older Debian – Debian 6 and Ubuntu 7-12 servers to install logtop compile from source code – read the README installation instructions or if lazy copy / paste below:

cd /usr/local/src
wget https://github.com/JulienPalard/logtop/tarball/master
mv master JulienPalard-logtop.tar.gz
tar -zxf JulienPalard-logtop.tar.gz

cd JulienPalard-logtop-*/
aptitude install libncurses5-dev uthash-dev

aptitude install python-dev swig

make python-module

python setup.py install

make

make install

mkdir -p /usr/bin/
cp logtop /usr/bin/


2.2 Install Logtop on CentOS 6.5 / 7.0 / Fedora / RHEL and rest of RPM based Linux-es

b) Install logtop on CentOS 6.5 and CentOS 7 Linux

– For CentOS 6.5 you need to rpm install epel-release-6-8.noarch.rpm
 

wget http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
rpm -ivh epel-release-6-8.noarch.rpm
links http://dl.fedoraproject.org/pub/epel/6/SRPMS/uthash-1.9.9-6.el6.src.rpm
rpmbuild –rebuild
uthash-1.9.9-6.el6.src.rpm
cd /root/rpmbuild/RPMS/noarch
rpm -ivh uthash-devel-1.9.9-6.el6.noarch.rpm


– For CentOS 7 you need to rpm install epel-release-7-0.2.noarch.rpm

links http://download.fedoraproject.org/pub/epel/beta/7/x86_64/repoview/epel-release.html
 

Click on and download epel-release-7-0.2.noarch.rpm

rpm -ivh epel-release-7-0.2.noarch
rpm –import /etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
yum -y install git ncurses-devel uthash-devel
git clone https://github.com/JulienPalard/logtop.git
cd logtop
make
make install

2.3 Some Logtop use examples and short explanation

logtop shows 4 columns as follows – Line number, Count, Frequency, and Actual line

The quickest way to visualize which IP is stoning your Apache / Nginx webserver on Debian?

tail -f access.log | awk {'print $1; fflush();'} | logtop

logtop-check-which-ip-is-making-most-requests-to-your-apache-nginx-webserver-linux-screenshot

On CentOS / RHEL

tail -f /var/log/httpd/access_log | awk {'print $1; fflush();'} | logtop

Using LogTop even Squid Proxy caching server access.log can be monitored.
To get squid Top users by IP listed:

tail -f /var/log/squid/access.log | awk {'print $1; fflush();'} | logtop


logtop-visualizing-top-users-using-squid-proxy-cache
 

Or you might visualize in real-time squid cache top requested URLs
 

tail -f /var/log/squid/access.log | awk {'print $7; fflush();'} | logtop


visualizing-top-requested-urls-in-squid-proxy-cache-howto-screenshot

3. Automatically Filter IP addresses causing Apache / Nginx Webservices Denial of Service with fail2ban
 

Once you identify the problem if the sites hosted on server are target of Distributed DoS, probably your best thing to do is to use fail2ban to automatically filter (ban) IP addresses doing excessive queries to system services. Assuming that you have already installed fail2ban as explained in above link (On Debian / Ubuntu Linux) with:
 

apt-get install –yes fail2ban


To make fail2ban start filtering DoS attack IP addresses, you will have to set the following configurations:
 

vim /etc/fail2ban/jail.conf


Paste in file:
 

[http-get-dos]
 
enabled = true
port = http,https
filter = http-get-dos
logpath = /var/log/apache2/WEB_SERVER-access.log
# maxretry is how many GETs we can have in the findtime period before getting narky
maxretry = 300
# findtime is the time period in seconds in which we're counting "retries" (300 seconds = 5 mins)
findtime = 300
# bantime is how long we should drop incoming GET requests for a given IP for, in this case it's 5 minutes
bantime = 300
action = iptables[name=HTTP, port=http, protocol=tcp]


Before you paste make sure you put the proper logpath = location of webserver (default one is /var/log/apache2/access.log), if you're using multiple logs for each and every of hosted websites, you will probably want to write a script to automatically loop through all logs directory get log file names and automatically add auto-modified version of above [http-get-dos] configuration. Also configure maxtretry per IP, findtime and bantime, in above example values are a bit low and for heavy loaded websites which has to serve thousands of simultaneous connections originating from office networks using Network address translation (NAT), this might be low and tuned to prevent situations, where even the customer of yours can't access there websites 🙂

To finalize fail2ban configuration, you have to create fail2ban filter file:

vim /etc/fail2ban/filters.d/http-get-dos.conf


Paste:
 

# Fail2Ban configuration file
#
# Author: http://www.go2linux.org
#
[Definition]
 
# Option: failregex
# Note: This regex will match any GET entry in your logs, so basically all valid and not valid entries are a match.
# You should set up in the jail.conf file, the maxretry and findtime carefully in order to avoid false positives.
 
failregex = ^<HOST> -.*"(GET|POST).*
 
# Option: ignoreregex
# Notes.: regex to ignore. If this regex matches, the line is ignored.
# Values: TEXT
#
ignoreregex =


To make fail2ban load new created configs restart it:
 

/etc/init.d/fail2ban restart


If you want to test whether it is working you can use Apache webserver Benchmark tools such as ab or siege.
The quickest way to test, whether excessive IP requests get filtered – and make your IP banned temporary:
 

ab -n 1000 -c 20 http://your-web-site-dot-com/

This will make 1000 page loads in 20 concurrent connections and will add your IP to temporary be banned for (300 seconds) = 5 minutes. The ban will be logged in /var/log/fail2ban.log, there you will get smth like:

2014-08-20 10:40:11,943 fail2ban.actions: WARNING [http-get-dos] Ban 192.168.100.5
2013-08-20 10:44:12,341 fail2ban.actions: WARNING [http-get-dos] Unban 192.168.100.5

Monitoring Disk use, CPU Load, Memory use and Network in one console ncurses interface – Glance

Thursday, August 14th, 2014

monitoring-disk-use-memory-cpu-load-and-network-in-one-common-interfaces-with-glances-Linux-BSD-UNIX
If you're Linux / UNIX / BSD system administrator you already have experience with basic admin's system monitoring:

  •     CPU load
  •     OS Name/Kernel version
  •     System load avarage and Uptime
  •     Disk and Network Input/Output I/O operations by interface
  •     Process statistics / Top loading processes etc.
  •     Memory / SWAP usage and free memory
  •     Mounted partitions


Such info is provided by command line tools such as:

top, df, free, sensors, ifconfig, iotop, hddtemp, mount, nfsstat, nfsiostat, dstat, uptime, nethogs iptraf

etc.

There are plenty of others advanced tools also Web based server monitoring visualization  tools, such as Monit, Icanga, PHPSysInfo, Cacti which provide you statistics on computer hardware and network utilization

So far so good, if you already are used to convenience of web *NIX based monitoring but you don't want to put load on the servers with such and you're lazy to write custom scripts that show most important monitoring information – necessery for daily system administration monitoring and prevention from downtimes and tracking bottlenecks you will be glad to hear about Glances
 

Glances is a free (LGPL) cross-platform curses-based monitoring tool which aims to present a maximum of information in a minimum of space, ideally to fit in a classical 80×24 terminal or higher to have additionnal information. Glances can adapt dynamically the displayed information depending on the terminal size. It can also work in a client/server mode for remote monitoring.


1. Installing Glances curses-based monitoring tool on Debian 7 / Ubuntu 13+ / Mint  Linux

We have to install python-pip (python package installer tool) to later install Glances

apt-get install –yes 'python-dev' 'python-jinja2' 'python-psutil'
                        'python-setuptools' 'hddtemp' 'python-pip' 'lm-sensors'


Before proceeding to install Glances to make Thermal sensors working (if supported by hardware) run:

 sensors-detect

Glances is written in Python and uses psutil library to obtain monitoring statistic values, thus it is necessery to install few more Python libraries:

pip install 'batinfo' 'pysensors'

If you're about to use pip – Python package installer tool, behind a proxy server use instead:
 

pip install –proxy=http://your-proxy-host.com:8080 'batinfo' 'pysensors'

Then install Glances script itself again using pip
 

pip install 'Glances'

Downloading/unpacking Glances
  Downloading Glances-2.0.1.tar.gz (3.3Mb): 3.3Mb downloaded
  Running setup.py egg_info for package Glances
    
Downloading/unpacking psutil>=2.0.0 (from Glances)
  Downloading psutil-2.1.1.tar.gz (216Kb): 216Kb downloaded
  Running setup.py egg_info for package psutil

Successfully installed Glances psutil

Then run glances from terminal
 

glances -t 3

-t 3 option tells glances to refresh collected statistics every 3 seconds

glances-console-monitoring-tool-every-systemad-ministrator-should-know-and-use-show-memory-disk-cpu-mount-point-statistics-in-common-shared-screen-linux-freebsd-unix

2. Installing Glances monitoring console tool on CentOS / RHEL / Fedora / Scientific Linux

Installing glances on CentOS 7 / Fedora and rest of RPM based distributions can be done by adding external RPM repositories, cause glances is not available in default yum repositories.

To enable Extra-packages repositories:
 

rpm -ivh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm


Then update yum to include new repository's packages into package list and install python-pip and python-devel rpms
 

yum update
yum install python-pip python-devel


Glances-console-server-stateScreenhot-on-CentOS-Linux-monitoring-in-ncurses-Linux-BSD

There is also FreeBSD port to install Glances on FreeBSD:
 

cd /usr/sysutils/py-glances
make install


Enjoy 🙂 !