Posts Tagged ‘dev’

Monitor cluster heartbeat lines IP reahability via ping ICMP protocol with Zabbix

Wednesday, April 12th, 2023

https://pc-freak.net/images/zabbix-monitoring-icmp-ping-on-application-crm-clusters-with-userparameter-script-howto

Say you're having an haproxy load balancer cluster with two or more nodes and you are running the servers inside some complex organizational hybrid complex network that is a combination of a local DMZ lans, many switches, dedicated connectivity lines and every now and then it happens for the network to mysteriously go down. Usually simply setting monitoring on the network devices CISCO itself or the smart switches used is enough to give you an overview on what's going on but if haproxy is in the middle of the end application servers and in front of other Load balancers and network equipment sometimes it might happen that due to failure of a network equipment / routing issues or other strange unexpected reasons one of the 2 nodes connectivity might fail down via the configured dedicated additional Heartbeat lines that are usually configured in order to keep away the haproxy CRM Resource Manager cluster thus ending it up in a split brain scenarios.

Assuming that this is the case like it is with us you would definitely want to keep an eye on the connectivity of Connect Line1 and Connect Line2 inside some monitoring software like zabbix. As our company main monitoring software used to monitor our infrastructure is Zabbix in this little article, I'll briefly explain how to configre the network connectivity status change from haproxy node1 and haproxy node2 Load balancer cluster to be monitored via a simple ICMP ping echo checks.

Of course the easies way to configure an ICMP monitor via Zabbix is using EnableRemoteCommands=1 inside /etc/zabbix/zabbix-agentd.conf but if your infrastructure should be of High Security and PCI perhaps this options is prohibited to be used on the servers. This is why to achieve still the ICMP ping checks with EnableRemoteCommands=0 a separate simple bash user parameter script could be used. Read further to find out one way ICMP monitoring with a useparameter script can be achieved with Zabbix.


1. Create the userparameter check for heartbeat lines

root@haproxy1 zabbix_agentd.d]# cat userparameter_check_heartbeat_lines.conf
UserParameter=heartbeat.check,\
/etc/zabbix/scripts/check_heartbeat_lines.sh

root@haproxy2 zabbix_agentd.d]# cat userparameter_check_heartbeat_lines.conf
UserParameter=heartbeat.check,\
/etc/zabbix/scripts/check_heartbeat_lines.sh

2. Create check_heartbeat_lines.sh script which will be actually checking connectivity with simple ping

root@haproxy1 zabbix_agentd.d]# cat /etc/zabbix/scripts/check_heartbeat_lines.sh
#!/bin/bash
hb1=haproxy2-lb1
hb2=haproxy2-lb2
if ping -c 1 $hb1  &> /dev/null
then
  echo "$hb1 1"
else
  echo "$hb1 0"
fi
if ping -c 1 $hb2  &> /dev/null
then
  echo "$hb2 1"
else
  echo "$hb2 0"
fi

[root@haproxy1 zabbix_agentd.d]#

root@haproxy2 zabbix_agentd.d]# cat /etc/zabbix/scripts/check_heartbeat_lines.sh
#!/bin/bash
hb1=haproxy1-hb1
hb2=haproxy1-hb2
if ping -c 1 $hb1  &> /dev/null
then
  echo "$hb1 1"
else
  echo "$hb1 0"
fi
if ping -c 1 $hb2  &> /dev/null
then
  echo "$hb2 1"
else
  echo "$hb2 0"
fi

[root@haproxy2 zabbix_agentd.d]#


3. Test script heartbeat lines first time

Each of the nodes from the cluster are properly pingable via ICMP protocol

The script has to be run on both haproxy1 and haproxy2 Cluster (load) balancer nodes

[root@haproxy-hb1 zabbix_agentd.d]# /etc/zabbix/scripts/check_heartbeat_lines.sh
haproxy2-hb1 1
haproxy2-hb2 1

[root@haproxy-hb2 zabbix_agentd.d]# /etc/zabbix/scripts/check_heartbeat_lines.sh
haproxy1-hb1 1
haproxy1-hb2 1


The status of 1 returned by the script should be considered remote defined haproxy node is reachable / 0 means ping command does not return any ICMP status pings back.

4. Restart the zabbix-agent on both cluster node machines that will be conducting the ICMP ping check

[root@haproxy zabbix_agentd.d]# systemctl restart zabbix-agentd
[root@haproxy zabbix_agentd.d]# systemctl status zabbix-agentd

[root@haproxy zabbix_agentd.d]# tail -n 100 /var/log/zabbix_agentd.log


5. Create Item to process the userparam script

Create Item as follows:

6. Create the Dependent Item required
 

zabbix-heartbeat-check-screenshots/heartbeat-line1-preprocessing

For processing you need to put the following simple regular expression

Name: Regular Expression
Parameters: hb1(\s+)(\d+)
Custom on fail: \2

zabbix-heartbeat-check-screenshots/heartbeat-line2-preprocessing1

zabbix-heartbeat-check-screenshots/heartbeat-lines-triggers

 

7. Create triggers that will be generating the Alert

Create the required triggers as well

zabbix-heartbeat-check-screenshots/heartbeat2-line
Main thing to configure here in Zabbix is below expression

Expression: {FQDN:heartbeat2.last()}<1

triggers_heartbeat1

You can further configure Zabbix Alerts to mail yourself or send via Slack / MatterMost or Teams alarms in case of problems.

How to extend LVM full partition to bigger size on Linux Virtual machine Guest running in VMware vSphere

Tuesday, September 20th, 2022

lvm-filesystem-extend-on-linux-virtual-machine-vmware-physical-group-volume-group-logical-volume-partitions-picture

Lets say you have to resize a partition that is wrongly made by some kind of automation like ansible or puppet,
because the Linux RHEL family OS template was prepared with a /home (or other partition with some very small size)  on VMware Vsphere Hypervisor hosting the Guest linux VM and the partition got quickly out of space.

To resolve the following question comes for the sysadmin

I. How to extend the LVM parititon that run out of space (without rebooting the VM Guest Linux Host)

II. how to add new disk partition space to the vSphere hypervisor OS. 

In below article i'll shortly describe that trivials steps to take to achieve that. Article won't show anything new original but I wrote it,
because I want it to have it logged for myself in case I need to LVM extend the space of my own Virtual machines and 
cause hopefully that might be of help to someone else from the Linux community that has to complete the same task.
 

I . Extending a LVM parititon that run out of space on a Linux Guest VM
 
1. Check the current parititon size that you want to extend

[root@linux-hostname home]# df -h /home/
 Filesystem            Size  Used Avail Use% Mounted on
 /dev/mapper/vg00-home
                       4.7G  4.5G     0 100% /home

2. Check the Virtualization platform

[root@vm-hostname ~]# lshw |head -3
linux-hostname
    description: Computer
    product: VMware Virtual Platform

3. Check the Operating System Linux OS type and version 

In this specific case this is a bit old Redhat -like CentOS 6.9 Linux
 

[root@vmware-host ~]# cat /etc/*release*
CentOS release 6.9 (Final)
CentOS release 6.9 (Final)
CentOS release 6.9 (Final)
cpe:/o:centos:linux:6:GA

4. Find out the type of target filesystem is EXT3, EXT4 or XFS etc.?

[root@vm-hostname ~]# grep home /proc/mounts
/dev/mapper/vg00-home /home ext3 rw,relatime,errors=continue,user_xattr,acl,barrier=1,data=ordered 0 0


Filesystem is handled by LVM thus

5. Check the size of the LVM partition we want to exchange

[root@vm-hostname ~]# lvs |grep home
home vg00 -wi-ao—- 5.00g

6. Check whether free space is available space in the volume group ?

[root@vm-hostname ~]# vgdisplay vg00
  — Volume group —
  VG Name               vg00
  System ID
  Format                lvm2
  Metadata Areas        2
  Metadata Sequence No  15
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                10
  Open LV               10
  Max PV                0
  Cur PV                2
  Act PV                2
  VG Size               128.99 GiB
  PE Size               4.00 MiB
  Total PE              33022
  Alloc PE / Size       30976 / 121.00 GiB
  Free  PE / Size       2046 / 7.99 GiB
  VG UUID               1F89PB-nIP2-7Hgu-zEVR-5H0R-7GdB-Lfj7t4


Extend VMWare space configured for additional hard disk on Hypervisor (if necessery)

In order for to extend the LVM of course you need to have a pre-existing additional hard-drive on VM (sdb,sdc etc. attached )

– If you need to extend on Vmware Vsphere Hypervisor:
Extend additional harddrive by entering the new size and Validate.

If you have previously extended the size of the Virtual Disk from VMWare to make the Linux guest vm find out about the change
you have to rerun rescan for the respective device that was grown on the HV.

7. Rescan on Linux VM host for changes in disk size from Hypervisor

Rescan disk for new size :

[root@vm-hostname ~]# echo 1> /sys/block/sdX/device/rescan

(where sdX is the extended additional harddrive)

8. Resize LVM physical volume

[root@vm-hostname ~]# pvresize /dev/sdX

9. Enlarge Logical Volume size 

[root@vm-hostname ~]# lvextend -L+5G /dev/mapper/vg00-home
     Extending logical volume LogVol00 to 10.77 GiB
     Logical volume LogVol00 successfully resized

10. Enlarge LVM hosted filesystem size

Filesystem is ext3 or ext4 :

[root@vm-hostname ~]# resize2fs /dev/mapper/vg00-home

– If the filesystem is not ext3 / ext4 but XFS you have to use xfs_growfs to let the FS know about the change.

Filesystem is XFS :
 

[root@vm-hostname ~]# xfs_growfs /dev/mapper/vg00-home

11. Check the additional filespace is already active on the Linux Guest VM

[root@vm-hostname ~]# df -h /home/
 Filesystem            Size  Used Avail Use% Mounted on
 /dev/mapper/vg_cloud-LogVol00
                        10G  4.2G  4.9G  48% /home


12. Verify  the extension of filesystem completed without errors


Check of system log:

[root@vm-hostname ~]# grep -i error /var/log/messages

Check if filesystem is writable.

[root@vm-hostname ~]# touch /home/test

[root@vm-hostname ~]# ls -al /home/test
-rw-r—– 1 root root 0 Sep 20 13:39 /home/test
[root@vm-hostname ~]# rm -f /home/test


II.  How to add additional sdb drive to a Linux host from vSPhere HV lets say (sdb)


1.  On VSphere GUI  interface

-> Select New hard drive and click Add

Enter the desired size for the new disk then unpack the disk parameters to choose Thin provision. Validate and Apply the recommendations.

basic-lvm-create-volume_group-diagram-on-linux-explained

2. On Linux system VM guest host to detect the new added sdb available space

Discover new disk :

[root@vm-hostname ~]# echo "- – -"> /sys/class/scsi_host/host2/scan && echo "- – -"> /sys/class/scsi_host/host1/scan && echo "- – -"> /sys/class/scsi_host/host0/scan

See  if discovered disk is found in /var/log/messages :

[…]
Nov 8 17:33:26 bict4004s kernel: scsi 2:0:2:0: Direct-Access VMware Virtual disk 1.0 PQ: 0 ANSI: 2
Nov 8 17:33:26 bict4004s kernel: scsi target2:0:2: Beginning Domain Validation
Nov 8 17:33:26 bict4004s kernel: scsi target2:0:2: Domain Validation skipping write tests
Nov 8 17:33:26 bict4004s kernel: scsi target2:0:2: Ending Domain Validation
Nov 8 17:33:26 bict4004s kernel: scsi target2:0:2: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, offset 127)
Nov 8 17:33:26 bict4004s kernel: sd 2:0:2:0: Attached scsi generic sg3 type 0
Nov 8 17:33:26 bict4004s kernel: sd 2:0:2:0: [sdb] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB)
Nov 8 17:33:26 bict4004s kernel: sd 2:0:2:0: [sdb] Write Protect is off
Nov 8 17:33:26 bict4004s kernel: sd 2:0:2:0: [sdb] Cache data unavailable
Nov 8 17:33:26 bict4004s kernel: sd 2:0:2:0: [sdb] Assuming drive cache: write through
Nov 8 17:33:26 bict4004s kernel: sd 2:0:2:0: [sdb] Attached SCSI disk
[…]

3. Create new LVM Physical Volume

[root@vm-hostname ~]# pvcreate /dev/sdb

4. Enlarge LVM Volume Group to the max available size of /dev/sdb

[root@vm-hostname ~]# vgextend vg00 /dev/sdb

Enlarge LVM Logical Volume

[root@vm-hostname ~]# lvextend -L+10G /dev/mapper/vg00-home

5. Enlarge filesystem to max size of just created LVM

If Filesystem is ext3 or ext4 :

[root@vm-hostname ~]# resize2fs /dev/mapper/vg00-home


Again if we work with XFS additionally do:

[root@vm-hostname ~]# xfs_growfs /dev/mapper/vg00-home

6. Checking filesystem extension completed correct

 [root@vm-hostname ~]# df -h /home


7. Check filesystem is writtable and no errors produced in logs

Check of system log:

[root@vm-hostname ~]# grep -i error /var/log/messages


Check if filesystem is writable.

[root@vm-hostname ~]# touch /home/test

How to configure multiple haproxies and frontends to log in separate log files via rsyslog

Monday, September 5th, 2022

log-multiple-haproxy-servers-to-separate-files-log-haproxy-froentend-to-separate-file-haproxy-rsyslog-Logging-diagram
In my last article How to create multiple haproxy instance separate processes for different configuration listeners,  I've shortly explained how to create a multiple instances of haproxies by cloning the systemd default haproxy.service and the haproxy.cfg to haproxyX.cfg.
But what if you need also to configure a separate logging for both haproxy.service and haproxy-customname.service instances how this can be achieved?

The simplest way is to use some system local handler staring from local0 to local6, As local 1,2,3 are usually used by system services a good local handler to start off would be at least 4.
Lets say we already have the 2 running haproxies, e.g.:

[root@haproxy2:/usr/lib/systemd/system ]# ps -ef|grep -i hapro|grep -v grep
root      128464       1  0 Aug11 ?        00:01:19 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -S /run/haproxy-master.sock
haproxy   128466  128464  0 Aug11 ?        00:49:29 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -S /run/haproxy-master.sock

root      346637       1  0 13:15 ?        00:00:00 /usr/sbin/haproxy-customname-wrapper -Ws -f /etc/haproxy/haproxy_customname_prod.cfg -p /run/haproxy_customname_prod.pid -S /run/haproxy-customname-master.sock
haproxy   346639  346637  0 13:15 ?        00:00:00 /usr/sbin/haproxy-customname-wrapper -Ws -f /etc/haproxy/haproxy_customname_prod.cfg -p /run/haproxy_customname_prod.pid -S /run/haproxy-customname-master.sock


1. Configure local messaging handlers to work via /dev/log inside both haproxy instance config files
 

To congigure the separte logging we need to have in /etc/haproxy/haproxy.cfg and in /etc/haproxy/haproxy_customname_prod.cfg the respective handlers.

To log in separate files you should already configured in /etc/haproxy/haproxy.cfg something like:

 

global
        stats socket /var/run/haproxy/haproxy.sock mode 0600 level admin #Creates Unix-Like socket to fetch stats
        log /dev/log    local0
        log /dev/log    local1 notice

#       nbproc 1
#       nbthread 2
#       cpu-map auto:1/1-2 0-1
        nbproc          1
        nbthread 2
        cpu-map         1 0
        cpu-map         2 1
        chroot /var/lib/haproxy
        user haproxy
        group haproxy
        daemon
        maxconn 99999

defaults
        log     global
        mode    tcp


        timeout connect 5000
        timeout connect 30s
        timeout server 10s

    timeout queue 5s
    timeout tunnel 2m
    timeout client-fin 1s
    timeout server-fin 1s

    option forwardfor
        maxconn 3000
    retries                 15

frontend http-in
        mode tcp

        option tcplog
        log global

 

        option logasap
        option forwardfor
        bind 0.0.0.0:80

default_backend webservers_http
backend webservers_http
    fullconn 20000
        balance source
stick match src
    stick-table type ip size 200k expire 30m

        server server-1 192.168.1.50:80 check send-proxy weight 255 backup
        server server-2 192.168.1.54:80 check send-proxy weight 254
        server server-3 192.168.0.219:80 check send-proxy weight 252 backup
        server server-4 192.168.0.210:80 check send-proxy weight 253 backup
        server server-5 192.168.0.5:80 maxconn 3000 check send-proxy weight 251 backup

For the second /etc/haproxy/haproxy_customname_prod.cfg the logging configuration should be similar to:
 

global
        stats socket /var/run/haproxy/haproxycustname.sock mode 0600 level admin #Creates Unix-Like socket to fetch stats
        log /dev/log    local5
        log /dev/log    local5 notice

#       nbproc 1
#       nbthread 2
#       cpu-map auto:1/1-2 0-1
        nbproc          1
        nbthread 2
        cpu-map         1 0
        cpu-map         2 1
        chroot /var/lib/haproxy
        user haproxy
        group haproxy
        daemon
        maxconn 99999

defaults
        log     global
        mode    tcp

 

2. Configure separate haproxy Frontend logging via local5 inside haproxy.cfg
 

As a minimum you need a configuration for frontend like:

 

frontend http-in
        mode tcp

        option tcplog
        log /dev/log    local5 debug
…..
….

..
.

Of course the mode tcp in my case is conditional you might be using mode http etc. 


3. Optionally but (preferrably) make local5 / local6 handlers to work via rsyslogs UDP imudp protocol

 

In this example /dev/log is straightly read by haproxy instead of sending the messages first to rsyslog, this is a good thing in case if you have doubts that rsyslog might stop working and respectively you might end up with no logging, however if you prefer to use instead rsyslog which most of people usually do you will have instead for /etc/haproxy/haproxy.cfg to use config:

global
    log          127.0.0.1 local6 debug

defaults
        log     global
        mode    tcp

And for /etc/haproxy_customname_prod.cfg config like:

global
    log          127.0.0.1 local5 debug

defaults
        log     global
        mode    tcp

If you're about to send the haproxy logs directly via rsyslog, it should have enabled in /etc/rsyslog.conf the imudp module if you're not going to use directly /dev/log

# provides UDP syslog reception
module(load="imudp")
input(type="imudp" port="514")

 

4. Prepare first and second log file and custom frontend output file and set right permissions
 

Assumably you already have /var/log/haproxy.log and this will be the initial haproxy log if you don't want to change it, normally it is installed on haproxy package install time on Linux and should have some permissions like following:

root@haproxy2:/etc/rsyslog.d# ls -al /var/log/haproxy.log
-rw-r–r– 1 haproxy haproxy 6681522  1 сеп 16:05 /var/log/haproxy.log


To create the second config with exact permissions like haproxy.log run:

root@haproxy2:/etc/rsyslog.d# touch /var/log/haproxy_customname.log
root@haproxy2:/etc/rsyslog.d# chown haproxy:haproxy /var/log/haproxy_customname.log

Create the haproxy_custom_frontend.log file that will only log output of exact frontend or match string from the logs
 

root@haproxy2:/etc/rsyslog.d# touch  /var/log/haproxy_custom_frontend.log
root@haproxy2:/etc/rsyslog.d# chown haproxy:haproxy  /var/log/haproxy_custom_frontend.log


5. Create the rsyslog config for haproxy.service to log via local6 to /var/log/haproxy.log
 

root@haproxy2:/etc/rsyslog.d# cat 49-haproxy.conf
# Create an additional socket in haproxy's chroot in order to allow logging via
# /dev/log to chroot'ed HAProxy processes
$AddUnixListenSocket /var/lib/haproxy/dev/log

# Send HAProxy messages to a dedicated logfile
:programname, startswith, "haproxy" {
  /var/log/haproxy.log
  stop
}

 

Above configs will make anything returned with string haproxy (e.g. proccess /usr/sbin/haproxy) to /dev/log to be written inside /var/log/haproxy.log and trigger a stop (by the way the the stop command works exactly as the tilda '~' discard one, except in some newer versions of haproxy the ~ is no now obsolete and you need to use stop instead (bear in mind that ~ even though obsolete proved to be working for me whether stop not ! but come on this is no strange this is linux mess), for example if you run latest debian Linux 11 as of September 2022 haproxy with package 2.2.9-2+deb11u3.
 

6. Create configuration for rsyslog to log from single Frontend outputting local2 to /var/log/haproxy_customname.log
 

root@haproxy2:/etc/rsyslog.d# cat 48-haproxy.conf
# Create an additional socket in haproxy's chroot in order to allow logging via
# /dev/log to chroot'ed HAProxy processes
$AddUnixListenSocket /var/lib/haproxy/dev/log

# Send HAProxy messages to a dedicated logfile
#:programname, startswith, "haproxy" {
#  /var/log/haproxy.log
#  stop
#}
# GGE/DPA 2022/08/02: HAProxy logs to local2, save the messages
local5.*                                                /var/log/haproxy_customname.log
 


You might also explicitly define the binary that will providing the logs inside the 48-haproxy.conf as we have a separate /usr/sbin/haproxy-customname-wrapper in that way you can log the output from the haproxy instance only based
on its binary command and you can omit writting to local5 to log via it something else 🙂

root@haproxy2:/etc/rsyslog.d# cat 48-haproxy.conf
# Create an additional socket in haproxy's chroot in order to allow logging via
# /dev/log to chroot'ed HAProxy processes
$AddUnixListenSocket /var/lib/haproxy/dev/log

# Send HAProxy messages to a dedicated logfile
#:programname, startswith, "haproxy" {
#  /var/log/haproxy.log
#  stop
#}
# GGE/DPA 2022/08/02: HAProxy logs to local2, save the messages

:programname, startswith, "haproxy-customname-wrapper " {
 
/var/log/haproxy_customname.log
  stop
}

 

7. Create the log file to log the custom frontend of your preference e.g. /var/log/haproxy_custom_frontend.log under local5 /prepare rsyslog config for
 

root@haproxy2:/etc/rsyslog.d# cat 47-haproxy-custom-frontend.conf
$ModLoad imudp
$UDPServerAddress 127.0.0.1
$UDPServerRun 514
#2022/02/02: HAProxy logs to local6, save the messages
local4.*                                                /var/log/haproxy_custom_frontend.log
:msg, contains, "https-in" ~

The 'https-in' is my frontend inside /etc/haproxy/haproxy.cfg it returns the name of it every time in /var/log/haproxy.log therefore I will log the frontend to local5 and to prevent double logging inside /var/log/haproxy.log of connections incoming towards the same frontend inside /var/log/haproxy.log, I have the tilda symbol '~' which instructs rsyslog to discard any message coming to rsyslog with "https-in" string in, immediately after the same frontend as configured inside /etc/haproxy/haproxy.cfg will output the frontend operations inside local5.


!!! Note that for rsyslog it is very important to have the right order of configurations, the configuration order is being considered based on the file numbering. !!!
 

Hence notice that my filter file number 47_* preceeds the other 2 configured rsyslog configs.
 

root@haproxy2:/etc/rsyslog.d# ls -1
47-haproxy-custom-frontend.conf
48-haproxy.conf
49-haproxy.conf

This will make 47-haproxy-custom-frontend.conf to be read and processed first 48-haproxy.conf processed second and 49-haproxy.conf processed third.


8. Reload rsyslog and haproxy and test

 

root@haproxy2: ~# systemctl restart rsyslog
root@haproxy2: ~# systemctl restart haproxy
root@haproxy2: ~# systemctl status rsyslog

● rsyslog.service – System Logging Service
     Loaded: loaded (/lib/systemd/system/rsyslog.service; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2022-09-01 17:34:51 EEST; 1s ago
TriggeredBy: ● syslog.socket
       Docs: man:rsyslogd(8)
             man:rsyslog.conf(5)
             https://www.rsyslog.com/doc/
   Main PID: 372726 (rsyslogd)
      Tasks: 6 (limit: 4654)
     Memory: 980.0K
        CPU: 8ms
     CGroup: /system.slice/rsyslog.service
             └─372726 /usr/sbin/rsyslogd -n -iNONE

сеп 01 17:34:51 haproxy2 systemd[1]: Stopped System Logging Service.
сеп 01 17:34:51 haproxy2 rsyslogd[372726]: warning: ~ action is deprecated, consider using the 'stop' statement instead [v8.210>
сеп 01 17:34:51 haproxy2 systemd[1]: Starting System Logging Service…
сеп 01 17:34:51 haproxy2 rsyslogd[372726]: [198B blob data]
сеп 01 17:34:51 haproxy2 systemd[1]: Started System Logging Service.
сеп 01 17:34:51 haproxy2 rsyslogd[372726]: [198B blob data]
сеп 01 17:34:51 haproxy2 rsyslogd[372726]: [198B blob data]
сеп 01 17:34:51 haproxy2 rsyslogd[372726]: [198B blob data]
сеп 01 17:34:51 haproxy2 rsyslogd[372726]: imuxsock: Acquired UNIX socket '/run/systemd/journal/syslog' (fd 3) from systemd.  [>
сеп 01 17:34:51 haproxy2 rsyslogd[372726]: [origin software="rsyslogd" swVersion="8.2102.0" x-pid="372726" x-info="https://www.

Do some testing with some tool like curl / wget / lynx / elinks etc. on each of the configured haproxy listeners and frontends and check whether everything ends up in the correct log files.
That's all folks enjoy ! 🙂
 

How to fresh Upgrade mistakenly installed 32-bit Windows 10 Professional to 64-bit Windows / A failure to Disk Clone old SSD 120GB to 512GB HDD due to failed Solid State Drive

Wednesday, November 17th, 2021

upgrade-windows-10-32-bit-to-64-bit-howto-picture

I've been Setting up a new PC with Windows OS that is a bit old a 11 years old Lenovo ThinkCentre model M90P with 8 GB of Memory, Intel(R) Core(TM) i5 CPU         650  @ 3.20GHz   3.19 GHz, Intel Q57 Express Chipset. The machine came to me with Windows 7 preinstalled and the intial goal was to migrate Windows as it is with its data from the old 120GB SSD to new 512 SSD and then to keep the machine at least a bit more up to date to upgrade the old Windows 7 to Windows 10.

This as usual seemed like a very trivial task for a System Administrator, and even if you haven't touched much of Windows as me it makes it look a piece of cake, however as always with computers, once you think you'll be done in 2 hours usually it takes 20+ . Some call it Murphy's law "If something could go wrong then it will go wrong". But putting this situation that I thought all well that's easy lets do it is a kind of a proud Thought for man and the to save us from this Passion of Proudness which according to Church fathers is the worst passion one can have and humiliate us a bit.

God allows some unforseen stuff to happen   🙂 The case with this machine whose original idea I had is to OK I Simply Duplicate the Old Hard Drive to the New one and Place the new one on the ThinkCentre is not a big deal turned to a small adventure 🙂

For this machine hardware I have to say, the old English saying "Old but Gold" is pretty true, especially after I've attached the Samsung 512GB NVME SSD Drive, which my dear friend and brother in Christ "Uncle Emilian" had received as a gift from another friend called Angel. To put even more rant, here name Emilian stems from the Greek Emilianos which translated to English means Adversary.. But anyways The old Intel SSD 120 GB drive which besides being already completely Full of Data,  turned to have Memory DATA Chips (that perhaps burn out / wasted),  so parts of the Drive were Unreadable.
I've realized the fauly SSD fact after, 
trying to first clone the drives with my Hardware Disk Clone device Orico Dual Bay 2.5 6629US3-C device and then using a simple bit to bit copy with dd command.

orico-6629us3-c2-bay-usb3-type-b2.5-type3-5.inch-sata


At first for some weird reason the Cloning of 120GB SSD HDD towards -> 512 GB newer one was unsuccessful – one of the 2 lamp indicators on Source and Destination Drives was continuiously blinking orange as it seemed data could not be read, even though I tried few times and wait for about 1 hour of time for the cloning to complete, so I first suspected that might be an issue with my  last year bought Disk Clone hardware device. So I've attached the 2 Hard Drives towards my Debian GNU / Linux 10 as USB attached drives using the "Toaster" device  and tried a classical copy   from terminal with Disk Druid e.g.


# dd if=/dev/sdb2 of=/dev/sdbc2 bs=180M status=progress conv=noerror, sync

 
dd: error reading '/dev/sdb2': Input/output error
1074889+17746 records in
1092635+0 records out
559429120 bytes (559 MB, 534 MiB) copied, 502933 s, 1.1 kB/s
dd: writing to '/dev/dc2': Input/output error
1074889+17747 records in
1092635+0 records out
559429120 bytes (559 MB, 534 MiB) copied, 502933 s, 1.1 kB/s

Finally I did a manual copy of files from /dev/sdb2 /dev/sdc2 with rsync and part of the files managed to be succesfully copied, about 55Gigabytes out of 110 managed to copy.  Luckily the data on the broken Intel 320 Series 120GB was not top secret stuff so wasting some bits wasn't the end of the world 🙂

Next, I've removed the broken 120Gb SSD which perhaps was about at least 9+ years old and attached to the Lenovo ThinkCentre, the new drive and as my dear friend wanted to have Windows again (his computer has Microsoft "Certificate of Authenticity"), e.g. that OEM Registration Serial Key for Windows 7.

Lenovo-ThinkCentre-M90p-certificate-of-authenticity

I've jumped in and used some old Flash USB Stick Drive to place again Windows 7 (in order to use the same active license) and from there on, I've used another old Windows 10 Installation Bootable stick of mine to upgrade the Windows 7 to Windows 10 (by using this Win 7 to Win 10 upgrade trick it is possible to still continue use your old Windows 7 License Key on Windows 10). So far so good, now I've had Windows 10 Professional Edition installed on the machine, but faced another issue the Memory of the Machine which is 8GB did not get fully detected the machine had detected only 3.22 GB of Memory, for some weird reason.

only-2-80-gb-usable-windows-10-problem-32-bit-cpu-cause-screenshot

After few minutes of investigation online, I've realized, I've installed by mistake a 32 Bit version of Windows 10 Pro…So the next step was of course to upgrade to 64 bit to work around the unrecognized 5.2GB memory… To make sure my Windows 10 Installation is up-to-date I've downloaded the latest one from the Media Creation Installation Tool from Microsoft's website used the tool to burn the Downloaded Image to an Empty USB Stick (mine is 16GB but minimum required would be 4Gb) and proceeded to reboot the Lenovo Desktop machine and boot from the Windows 10 Install Flash Drive. From there on I've had to select I need to install a 64 Bit version of Windows and Skip the Licensing Key fill in Prompt Twice (act as I have no license) as Windows already could recognize the older OEM installed 32 bit install Windows key and automatically fetches the key from there.

Before proceeding to install the 64 Bit Windows, of course double check  that the Machine you have at hand has already the License Key recognized by Microsoft  is 64 Bit capable:

To check 32 bit version of Windows before attempted upgrade is Properly Licensed :

Settings > Update & security > Activation

check-if-windows-is-already-activated-settings-update-and-security-Activation-menus

 

To check whether Hardware is 64 Capable:

Settings -> System -> About

 

is-hardware-processor-64-bit-capable-windows-screenshot

32 bit Windows on x64based processor (Machine supports 64 bit OS)

 

windows10-OS-Installation-media-install-tool

Media Creation Tool Windows 10 MS Installer tool (make sure you select 64-bit (x86) instead of the default

From the Installer, I've installed Windows just like I install a brand new fersh Win OS and after asking the few trivial Installation Program questions landed to the new working OS and proceeded to install the usual software which are a must have on a freshly installed Windows for some of them check my previous article Essential Must have software to install on Fresh  new Windows installation host.

Fix Out of inodes on Postfix Linux Mail Cluster. How to clean up filesystem running out of Inodes, Filesystem inodes on partition is 100% full

Wednesday, August 25th, 2021

Inode_Entry_inode-table-content

Recently we have faced a strange issue with with one of our Clustered Postfix Mail servers (the cluster is with 2 nodes that each has configured Postfix daemon mail servers (running on an OpenVZ virtualized environment).
A heartbeat that checks liveability of clusters and switches nodes in case of one of the two gets broken due to some reason), pretty much a standard SMTP cluster.

So far so good but since the cluster is a kind of abondoned and is pretty much legacy nowadays and used just for some Monitoring emails from different scripts and systems on servers, it was not really checked thoroughfully for years and logically out of sudden the alarming email content sent via the cluster stopped working.

The normal sysadmin job here  was to analyze what is going on with the cluster and fix it ASAP. After some very basic analyzing we catched the problem is caused by a  "inodes full" (100% of available inodes were occupied) problem, e.g. file system run out of inodes on both machines perhaps due to a pengine heartbeat process  bug  leading to producing a high number of .bz2 pengine recovery archive files stored in /var/lib/pengine>

Below are the few steps taken to analyze and fix the problem.
 

1. Finding out about the the system run out of inodes problem


After logging on to system and not finding something immediately is wrong with inodes, all I can see from crm_mon is cluster was broken.
A plenty of emails were left inside the postfix mail queue visible with a standard command

[root@smtp1: ~ ]# postqueue -p

It took me a while to find ot the problem is with inodes because a simple df -h  was showing systems have enough space but still cluster quorum was not complete.
A bit of further investigation led me to a  simple df -i reporting the number of inodes on the local filesystems on both our SMTP1 and SMTP2 got all occupied.

[root@smtp1: ~ ]# df -i
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/simfs            500000   500000  0   100% /
none                   65536      61   65475    1% /dev

As you can see the number of inodes on the Virual Machine are unfortunately depleted

Next step was to check directories occupying most inodes, as this is the place from where files could be temporary moved to a remote server filesystem or moved to another partition with space on a server locally attached drives.
Below command gives an ordered list with directories locally under the mail root filesystem / and its respective occupied number files / inodes,
the more files under a directory the more inodes are being occupied by the files on the filesystem.

 

run-out-if-inodes-what-is-inode-find-out-which-filesystem-or-directory-eating-up-all-your-system-inodes-linux_inode_diagram.gif
1.1 Getting which directory consumes most of the inodes on the systems

 

[root@smtp1: ~ ]# { find / -xdev -printf '%h\n' | sort | uniq -c | sort -k 1 -n; } 2>/dev/null
….
…..

…….
    586 /usr/lib64/python2.4
    664 /usr/lib64
    671 /usr/share/man/man8
    860 /usr/bin
   1006 /usr/share/man/man1
   1124 /usr/share/man/man3p
   1246 /var/lib/Pegasus/prev_repository_2009-03-10-1236698426.308128000.rpmsave/root#cimv2/classes
   1246 /var/lib/Pegasus/prev_repository_2009-05-18-1242636104.524113000.rpmsave/root#cimv2/classes
   1246 /var/lib/Pegasus/prev_repository_2009-11-06-1257494054.380244000.rpmsave/root#cimv2/classes
   1246 /var/lib/Pegasus/prev_repository_2010-08-04-1280907760.750543000.rpmsave/root#cimv2/classes
   1381 /var/lib/Pegasus/prev_repository_2010-11-15-1289811714.398469000.rpmsave/root#cimv2/classes
   1381 /var/lib/Pegasus/prev_repository_2012-03-19-1332151633.572875000.rpmsave/root#cimv2/classes
   1398 /var/lib/Pegasus/repository/root#cimv2/classes
   1696 /usr/share/man/man3
   400816 /var/lib/pengine

Note, the above command orders the files from bottom to top order and obviosuly the bottleneck directory that is over-eating Filesystem inodes with an exceeding amount of files is
/var/lib/pengine
 

2. Backup old multitude of files just in case of something goes wrong with the cluster after some files are wiped out


The next logical step of course is to check what is going on inside /var/lib/pengine just to find a very ,very large amount of pe-input-*NUMBER*.bz2 files were suddenly produced.

 

[root@smtp1: ~ ]# ls -1 pe-input*.bz2 | wc -l
 400816


The files are produced by the pengine process which is one of the processes that is controlling the heartbeat cluster state, presumably it is done by running process:

[root@smtp1: ~ ]# ps -ef|grep -i pengine
24        5649  5521  0 Aug10 ?        00:00:26 /usr/lib64/heartbeat/pengine


Hence in order to fix the issue, to prevent some inconsistencies in the cluster due to the file deletion,  copied the whole directory to another mounted parition (you can mount it remotely with sshfs for example) or use a local one if you have one:

[root@smtp1: ~ ]# cp -rpf /var/lib/pengine /mnt/attached_storage


and proceeded to clean up some old multitde of files that are older than 2 years of times (720 days):


3. Clean  up /var/lib/pengine files that are older than two years with short loop and find command

 


First I made a list with all the files to be removed in external text file and quickly reviewed it by lessing it like so

[root@smtp1: ~ ]#  cd /var/lib/pengine
[root@smtp1: ~ ]# find . -type f -mtime +720|grep -v pe-error.last | grep -v pe-input.last |grep -v pe-warn.last -fprint /home/myuser/pengine_older_than_720days.txt
[root@smtp1: ~ ]# less /home/myuser/pengine_older_than_720days.txt


Once reviewing commands I've used below command to delete the files you can run below command do delete all older than 2 years that are different from pe-error.last / pe-input.last / pre-warn.last which might be needed for proper cluster operation.

[root@smtp1: ~ ]#  for i in $(find . -type f -mtime +720 -exec echo '{}' \;|grep -v pe-error.last | grep -v pe-input.last |grep -v pe-warn.last); do echo $i; done


Another approach to the situation is to simply review all the files inside /var/lib/pengine and delete files based on year of creation, for example to delete all files in /var/lib/pengine from 2010, you can run something like:
 

[root@smtp1: ~ ]# for i in $(ls -al|grep -i ' 2010 ' | awk '{ print $9 }' |grep -v 'pe-warn.last'); do rm -f $i; done


4. Monitor real time inodes freeing

While doing the clerance of old unnecessery pengine heartbeat archives you can open another ssh console to the server and view how the inodes gets freed up with a command like:

 

# check if inodes is not being rapidly decreased

[root@csmtp1: ~ ]# watch 'df -i'


5. Restart basic Linux services producing pid files and logs etc. to make then workable (some services might not be notified the inodes on the Hard drive are freed up)

Because the hard drive on the system was full some services started to misbehaving and /var/log logging was impacted so I had to also restart them in our case this is the heartbeat itself
that  checks clusters nodes availability as well as the logging daemon service rsyslog

 

# restart rsyslog and heartbeat services
[root@csmtp1: ~ ]# /etc/init.d/heartbeat restart
[root@csmtp1: ~ ]# /etc/init.d/rsyslog restart

The systems had been a data integrity legacy service samhain so I had to restart this service as well to reforce the /var/log/samhain log file to again continusly start writting data to HDD.

# Restart samhain service init script 
[root@csmtp1: ~ ]# /etc/init.d/samhain restart


6. Check up enough inodes are freed up with df

[root@smtp1 log]# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/simfs 500000 410531 19469 91% /
none 65536 61 65475 1% /dev


I had to repeat the same process on the second Postfix cluster node smtp2, and after all the steps like below check the status of smtp2 node and the postfix queue, following same procedure made the second smtp2 cluster member as expected 🙂

 

7. Check the cluster node quorum is complete, e.g. postfix cluster is operating normally

 

# Test if email cluster is ok with pacemaker resource cluster manager – lt-crm_mon
 

[root@csmtp1: ~ ]# crm_mon -1
============
Last updated: Tue Aug 10 18:10:48 2021
Stack: Heartbeat
Current DC: smtp2.fqdn.com (bfb3d029-89a8-41f6-a9f0-52d377cacd83) – partition with quorum
Version: 1.0.12-unknown
2 Nodes configured, unknown expected votes
4 Resources configured.
============

Online: [ smtp2.fqdn.com smtp1.fqdn.com ]

failover-ip (ocf::heartbeat:IPaddr2): Started csmtp1.ikossvan.de
Clone Set: postfix_clone
Started: [ smtp2.fqdn.com smtp1fqdn.com ]
Clone Set: pingd_clone
Started: [ smtp2.fqdn.com smtp1.fqdn.com ]
Clone Set: mailto_clone
Started: [ smtp2.fqdn.com smtp1.fqdn.com ]

 

8.  Force resend a few hundred thousands of emails left in the email queue


After some inodes gets freed up due to the file deletion, i've reforced a couple of times the queued mail servers to be immediately resent to remote mail destinations with cmd:

 

# force emails in queue to be resend with postfix

[root@smtp1: ~ ]# sendmail -q


– It was useful to watch in real time how the queued emails are quickly decreased (queued mails are successfully sent to destination addresses) with:

 

# Monitor  the decereasing size of the email queue
[root@smtp1: ~ ]# watch 'postqueue -p|grep -i '@'|wc -l'

Set all logs to log to to physical console /dev/tty12 (tty12) on Linux

Wednesday, August 12th, 2020

tty linux-logo how to log everything to last console terminal tty12

Those who administer servers from the days of birth of Linux and who used actively GNU / Linux over the years or any other UNIX knows how practical could be to configure logging of all running services / kernel messages / errors and warnings on a physical console.

Traditionally from the days I was learning Linux basics I was shown how to do this on an old Debian Sarge 3.0 Linux without systemd and on all Linux distributions Redhat 9.0 / Calderas and Mandrakes I've used either as a home systems or for servers. I've always configured output of all messages to go to the last easy to access console /dev/tty12 (for those who never use it console switching under Linux plain text console mode is done with key combination of CTRL + ALT + F1 .. F12.

In recent times however with the introduction of systemd pretty much things changed as messages to console are not handled by /etc/inittab which was used to add and refresh physical consoles tty1, tty2 … tty7 (the default added one on Linux were usually 7), but I had to manually include more respawn lines for each console in /etc/inittab.
Nowadays as of year 2020 Linux distros /etc/inittab is no longer there being obsoleted and console print out of INPUT / OUTPUT messages are handled by systemd.
 

1. Enable Physical TTYs from TTY8 till TTY12 etc.


The number of default consoles existing in most Linux distributions I've seen is still from tty1 to tty7. Hence to add more tty consoles and be ready to be able to switch out  not only towards tty7 but towards tty12 once you're connected to the server via a remote ILO (Integrated Lights Out) / IdRAC (Dell Remote Access Controller) / IPMI / IMM (Imtegrated Management Module), you have to do it by telling systemd issuing below systemctl commands:
 

 

 # systemctl enable getty@tty8.service Created symlink /etc/systemd/system/getty.target.wants/getty@tty8.service -> /lib/systemd/system/getty@.service.

systemctl enable getty@tty9.service

Created symlink /etc/systemd/system/getty.target.wants/getty@tty9.service -> /lib/systemd/system/getty@.service.

systemctl enable getty@tty10.service

Created symlink /etc/systemd/system/getty.target.wants/getty@tty10.service -> /lib/systemd/system/getty@.service.

systemctl enable getty@tty11.service

Created symlink /etc/systemd/system/getty.target.wants/getty@tty11.service -> /lib/systemd/system/getty@.service.

systemctl enable getty@tty12.service

Created symlink /etc/systemd/system/getty.target.wants/getty@tty12.service -> /lib/systemd/system/getty@.service.


Once the TTYS tty7 to tty12 are enabled you will be able to switch to this consoles either if you have a physical LCD / CRT monitor or KVM switch connected to the machine mounted on the Rack shelf once you're in the Data Center or will be able to see it once connected remotely via the Management IP Interface (ILO) remote console.
 

2. Taking screenshot of the physical console TTY with fbcat


For example below is a screenshot of the 10th enabled tty10:

tty10-linux-screenshot-fbcat-how-to-screenshot-console

As you can in the screenshot I've used the nice tool fbcat that can be used to make a screenshot of remote console. This is very useful especially if remote access via a SSH client such as PuTTY / MobaXterm is not there but you have only a physical attached monitor access on a DCs that are under a heavy firewall that is preventing anyone to get to the system remotely. For example screenshotting the physical console in case if there is a major hardware failure occurs and you need to dump a hardware error message to a flash drive that will be used to later be handled to technicians to analyize it and exchange the broken server hardware part.

Screenshots of the CLI with fbcat is possible across most Linux distributions where as usual.

In Debian you have to first instal the tool via :
 

# apt install –yes fbcat


and on RedHats / CentOS / Fedoras

# yum install -y fbcat


Taking screenshot once tool is on the server of whatever you have printed on console is as easy as

# fbcat > tty_name.ppm


Note that you might want to convert the .ppm created picture to png with any converter such as imagemagick's convert command or if you have a GUI perhaps with GNU Image Manipulation Tool (GIMP).

3. Enabling every rsyslog handled message to log to Physical TTY12


To make everything such as errors, notices, debug, warning messages  become instantly logging towards above added new /dev/tty12.

Open /etc/rsyslog.conf and to the end of the file append below line :
 

daemon,mail.*;\
   news.=crit;news.=err;news.=notice;\
   *.=debug;*.=info;\
   *.=notice;*.=warn   /dev/tty12


To make rsyslog load its new config restart it:

 

# systemctl status rsyslog

 

 

 

rsyslog.service – System Logging Service
   Loaded: loaded (/lib/systemd/system/rsyslog.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2020-08-10 04:09:36 EEST; 2 days ago
     Docs: man:rsyslogd(8)
           https://www.rsyslog.com/doc/
 Main PID: 671 (rsyslogd)
    Tasks: 4 (limit: 4915)
   Memory: 12.5M
   CGroup: /system.slice/rsyslog.service
           └─671 /usr/sbin/rsyslogd -n -iNONE

 

авг 12 00:00:05 pcfreak rsyslogd[671]:  [origin software="rsyslogd" swVersion="8.1901.0" x-pid="671" x-info="https://www.rsyslo
Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

 

systemctl restart rsyslog


That's all folks navigate by pressing simultaneously CTRL + ALT + F12 to get to TTY12 or use ALT + LEFT / ALT + RIGHT ARROW (console switch commands) till you get to the console where everything should be now logged.

Enjoy and if you like this article share to tell your sysadmin friends about this nice hack  ! 🙂

 

 

 

How to make sure your Linux system users won’t hide or delete their .bash_history / Securing .bash_history file – Protect Linux system users shell history

Monday, July 19th, 2010

linux-bin-bash-600x600logo
If you're running multi user login Linux system, you have probably realized that there are some clever users that prefer to prevent their command line executed commands to be logged in .bash_history.
To achieve that they use a number of generally known methodologist to prevent the Linux system from logging into their $HOME/.bash_history file (of course if running bash as a default user shell).
This though nice for the user is a real nightmare for the sysadmin, since he couldn't keep track of all system command events executed by users. For instance sometimes an unprivilegd user might be responsible for executing a malicious code which crashes or breaks your server.
This is especially unpleasent, because you will find your system crashed and if it's not some of the system services that causes the issue you won’t even be able to identify which of all the users is the malicious user account and respectively the code excecuted which fail the system to the ground.
In this post I will try to tell you a basic ways that some malevolent users might use to hide their bash history from the system administrator.
I will also discuss a few possible ways to assure your users .bash_history keeps intact and possibly the commands executed by your users gets logged in in their.
The most basic way that even an unexperienced shell user will apply if he wants to prevent his .bash_history from sys admins review would be of directly wiping out the .bash_history file from his login account or alternatively emptying it with commands like:

malicious-user@server:~$ rm -f. bash_history
ormalicious-user@server:~# cat /dev/null > ~/.bash_history

In order to prevent this type of attack against cleaning the .bash_history you can use the chattr command.
To counter attack this type of history tossing method you can set your malicious-user .bash_history’s file the (append only flag) with chattr like so:

root@server:~# cd /home/malicious-user/
root@server:~# chattr +a .bash_history

It’s also recommended that the immunable flag is placed to the file ~/.profile in user home

root@server:~# chattr +i ~/.profile

It would be probably also nice to take a look at all chattr command attributes since the command is like swiss army knife for the Linux admin:
Here is all available flags that can be passed to chattr
append only (a)
compressed (c)
don~@~Yt update atime (A)
synchronous directory updates (D)
synchronous updates (S)
data journalling (j)
no dump (d)
top of directory hierarchy (T)
no tail-merging (t)
secure deletion (s)
undeletable (u)
immutable (i)

It’s also nice that setting the “append only” flag in to the user .bash_history file prevents the user to link the .bash_history file to /dev/null like so:

malicious-user@server:~$ ln -sf /dev/null ~/.bash_history
ln: cannot remove `.bash_history': Operation not permitted

malicious-user@server:~$ echo > .bash_history
bash: .bash_history: Operation not permitted

However this will just make your .bash_history append only, so the user trying to execute cat /dev/null > .bash_history won’t be able to truncate the content of .bash_history.

Unfortunately he will yet be able to delete the file with rm so this type of securing your .bash_history file from being overwritten is does not completely guarantee you that user commands will get logged.
Also in order to prevent user to play tricks and escape the .bash_history logging by changing the default bash shell variables for HISTFILE an d HISTFILESIZE, exporting them either to a different file location or a null file size.
You have to put the following bash variables to be loaded in /etc/bash.bashrc or in /etc/profile
# #Prevent unset of histfile, /etc/profile
HISTFILE=~/.bash_history
HISTSIZE=10000
HISTFILESIZE=999999
# Don't let the users enter commands that are ignored# in the history file
HISTIGNORE=""
HISTCONTROL=""
readonly HISTFILE
readonly HISTSIZE
readonly HISTFILESIZE
readonly HISTIGNORE
readonly HISTCONTROL
export HISTFILE HISTSIZE HISTFILESIZE HISTIGNORE HISTCONTROL

everytime a user logs in to your Linux system the bash commands above will be set.
The above tip is directly taken from Securing debian howto which by the way is quite an interesting and nice reading for system administrators 🙂

If you want to apply an append only attribute to all user .bash_history to all your existing Linux server system users assuming the default users directory is /home in bash you can execute the following 1 liner shell code:

#Set .bash_history as attr +a
2. find /home/ -maxdepth 3|grep -i bash_history|while read line; do chattr +a "$line"; done

Though the above steps will stop some of the users to voluntary clean their .bash_history history files it won’t a 100% guaranttee that a good cracker won’t be able to come up with a way to get around the imposed .bash_history security measures.

One possible way to get around the user command history prevention restrictions for a user is to simply using another shell from the ones available on the system:
Here is an example:

malicious-user:~$ /bin/csh
malicious-user:~>

csh shell logs by default to the file .history

Also as far as I know it should be possible for a user to simply delete the .bash_history file overwritting all the .bash_history keep up attempts up-shown.
If you need a complete statistics about accounting you’d better take a look at The GNU Accounting Utilities

In Debian the GNU Accounting Utilities are available as a package called acct, so installation of acct on Debian is as simple as:

debian:~# apt-get install acct

I won’t get into much details about acct and would probably take a look at it in my future posts.
For complete .bash_history delete prevention maybe the best practice is to useg grsecurity (grsec)

Hopefully this article is gonna be a step further in tightening up your Server or Desktop Linux based system security and will also give you some insight on .bash_history files 🙂 .

Nginx increase security by putting websites into Linux jails howto

Monday, August 27th, 2018

linux-jail-nginx-webserver-increase-security-by-putting-it-and-its-data-into-jail-environment

If you're sysadmining a large numbers of shared hosted websites which use Nginx Webserver to interpret PHP scripts and serve HTML, Javascript, CSS … whatever data.

You realize the high amount of risk that comes with a possible successful security breach / hack into a server by a malicious cracker. Compromising Nginx Webserver by an intruder automatically would mean that not only all users web data will get compromised, but the attacker would get an immediate access to other data such as Email or SQL (if the server is running multiple services).

Nowadays it is not so common thing to have a multiple shared websites on the same server together with other services, but historically there are many legacy servers / webservers left which host some 50 or 100+ websites.

Of course the best thing to do is to isolate each and every website into a separate Virtual Container however as this is a lot of work and small and mid-sized companies refuse to spend money on mostly anything this might be not an option for you.

Considering that this might be your case and you're running Nginx either as a Load Balancing, Reverse Proxy server etc. , even though Nginx is considered to be among the most secure webservers out there, there is absolutely no gurantee it would not get hacked and the server wouldn't get rooted by a script kiddie freak that just got in darknet some 0day exploit.

To minimize the impact of a possible Webserver hack it is a good idea to place all websites into Linux Jails.

linux-jail-simple-explained-diagram-chroot-jail

For those who hear about Linux Jail for a first time,
chroot() jail is a way to isolate a process / processes and its forked children from the rest of the *nix system. It should / could be used only for UNIX processes that aren't running as root (administrator user), because of the fact the superuser could break out (escape) the jail pretty easily.

Jailing processes is a concept that is pretty old that was first time introduced in UNIX version 7 back in the distant year 1979, and it was first implemented into BSD Operating System ver. 4.2 by Bill Joy (a notorious computer scientist and co-founder of Sun Microsystems). Its original use for the creation of so called HoneyPot – a computer security mechanism set to detect, deflect, or, in some manner, counteract attempts at unauthorized use of information systems that appears completely legimit service or part of website whose only goal is to track, isolate, and monitor intruders, a very similar to police string operations (baiting) of the suspect. It is pretty much like а bait set to collect the fish (which in this  case is the possible cracker).

linux-chroot-jail-environment-explained-jailing-hackers-and-intruders-unix

BSD Jails nowadays became very popular as iPhones environment where applications are deployed are inside a customly created chroot jail, the principle is exactly the same as in Linux.

But anyways enough talk, let's create a new jail and deploy set of system binaries for our Nginx installation, here is the things you will need:

1. You need to have set a directory where a copy of /bin/ls /bin/bash /bin/,  /bin/cat … /usr/bin binaries /lib and other base system Linux system binaries copy will reside.

 

server:~# mkdir -p /usr/local/chroot/nginx

 


2. You need to create the isolated environment backbone structure /etc/ , /dev, /var/, /usr/, /lib64/ (in case if deploying on 64 bit architecture Operating System).

 

server:~# export DIR_N=/usr/local/chroot/nginx;
server:~# mkdir -p $DIR_N/etc
server:~# mkdir -p $DIR_N/dev
server:~# mkdir -p $DIR_N/var
server:~# mkdir -p $DIR_N/usr
server:~# mkdir -p $DIR_N/usr/local/nginx
server:~# mkdir -p $DIR_N/tmp
server:~# chmod 1777 $DIR_N/tmp
server:~# mkdir -p $DIR_N/var/tmp
server:~# chmod 1777 $DIR_N/var/tmp
server:~# mkdir -p $DIR_N/lib64
server:~# mkdir -p $DIR_N/usr/local/

 

3. Create required device files for the new chroot environment

 

server:~# /bin/mknod -m 0666 $D/dev/null c 1 3
server:~# /bin/mknod -m 0666 $D/dev/random c 1 8
server:~# /bin/mknod -m 0444 $D/dev/urandom c 1 9

 

mknod COMMAND is used instead of the usual /bin/touch command to create block or character special files.

Once create the permissions of /usr/local/chroot/nginx/{dev/null, dev/random, dev/urandom} have to be look like so:

 

server:~# ls -l /usr/local/chroot/nginx/dev/{null,random,urandom}
crw-rw-rw- 1 root root 1, 3 Aug 17 09:13 /dev/null
crw-rw-rw- 1 root root 1, 8 Aug 17 09:13 /dev/random
crw-rw-rw- 1 root root 1, 9 Aug 17 09:13 /dev/urandom

 

4. Install nginx files into the chroot directory (copy all files of current nginx installation into the jail)
 

If your NGINX webserver installation was installed from source to keep it latest
and is installed in lets say, directory location /usr/local/nginx you have to copy /usr/local/nginx to /usr/local/chroot/nginx/usr/local/nginx, i.e:

 

server:~# /bin/cp -varf /usr/local/nginx/* /usr/local/chroot/nginx/usr/local/nginx

 


5. Copy necessery Linux system libraries to newly created jail
 

NGINX webserver is compiled to depend on various libraries from Linux system root e.g. /lib/* and /lib64/* therefore in order to the server work inside the chroot-ed environment you need to transfer this libraries to the jail folder /usr/local/chroot/nginx

If you are curious to find out which libraries exactly is nginx binary dependent on run:

server:~# ldd /usr/local/nginx/usr/local/nginx/sbin/nginx

        linux-vdso.so.1 (0x00007ffe3e952000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f2b4762c000)
        libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007f2b473f4000)
        libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre.so.3 (0x00007f2b47181000)
        libcrypto.so.0.9.8 => /usr/local/lib/libcrypto.so.0.9.8 (0x00007f2b46ddf000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f2b46bc5000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f2b46826000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f2b47849000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f2b46622000)


The best way is to copy only the libraries in the list from ldd command for best security, like so:

 

server: ~# cp -rpf /lib/x86_64-linux-gnu/libthread.so.0 /usr/local/chroot/nginx/lib/*
server: ~# cp -rpf library chroot_location

etc.

 

However if you're in a hurry (not a recommended practice) and you don't care for maximum security anyways (you don't worry the jail could be exploited from some of the many lib files not used by nginx and you don't  about HDD space), you can also copy whole /lib into the jail, like so:

 

server: ~# cp -rpf /lib/ /usr/local/chroot/nginx/usr/local/nginx/lib

 

NOTE! Once again copy whole /lib directory is a very bad practice but for a time pushing activities sometimes you can do it …


6. Copy /etc/ some base files and ld.so.conf.d , prelink.conf.d directories to jail environment
 

 

server:~# cp -rfv /etc/{group,prelink.cache,services,adjtime,shells,gshadow,shadow,hosts.deny,localtime,nsswitch.conf,nscd.conf,prelink.conf,protocols,hosts,passwd,ld.so.cache,ld.so.conf,resolv.conf,host.conf}  \
/usr/local/chroot/nginx/usr/local/nginx/etc

 

server:~# cp -avr /etc/{ld.so.conf.d,prelink.conf.d} /usr/local/chroot/nginx/nginx/etc


7. Copy HTML, CSS, Javascript websites data from the root directory to the chrooted nginx environment

 

server:~# nice -n 10 cp -rpf /usr/local/websites/ /usr/local/chroot/nginx/usr/local/


This could be really long if the websites are multiple gigabytes and million of files, but anyways the nice command should reduce a little bit the load on the server it is best practice to set some kind of temporary server maintenance page to show on the websites index in order to prevent the accessing server clients to not have interrupts (that's especially the case on older 7200 / 7400 RPM non-SSD HDDs.)
 

 

8. Stop old Nginx server outside of Chroot environment and start the new one inside the jail


a) Stop old nginx server

Either stop the old nginx using it start / stop / restart script inside /etc/init.d/nginx (if you have such installed) or directly kill the running webserver with:

 

server:~# killall -9 nginx

 

b) Test the chrooted nginx installation is correct and ready to run inside the chroot environment

 

server:~# /usr/sbin/chroot /usr/local/chroot/nginx /usr/local/nginx/nginx/sbin/nginx -t
server:~# /usr/sbin/chroot /usr/local/chroot/nginx /usr/local/nginx/nginx/sbin/nginx

 

c) Restart the chrooted nginx webserver – when necessery later

 

server:~# /usr/sbin/chroot /nginx /usr/local/chroot/nginx/sbin/nginx -s reload

 

d) Edit the chrooted nginx conf

If you need to edit nginx configuration, be aware that the chrooted NGINX will read its configuration from /usr/local/chroot/nginx/nginx/etc/conf/nginx.conf (i'm saying that if you by mistake forget and try to edit the old config that is usually under /usr/local/nginx/conf/nginx.conf

 

 

Recover/Restore unbootable GRUB boot loader on Debian Testing GNU/Linux using Linux LiveCD (Debian Install CD1)

Tuesday, July 6th, 2010

I’ve recently broke my grub untentianally while whiping out one of my disk partitions who was prepared to run a hackintosh.
Thus yesterday while switching on my notebook I was unpleasently surprised with an error Grub Error 17 and the boot process was hanging before it would even get to grub’s OS select menu.

That was nasty and gave me a big headache, since I wasn’t even sure if my partitions are still present.
What made things even worse that I haven’t created any backups preliminary to prepare for an emergency!
Thus restoring my system was absolutely compulsory at any cost.
In recovering the my grub boot manager I have used as a basis of my recovery an article called How to install Grub from a live Ubuntu cd
Though the article is quite comprehensive, it’s written a bit foolish, probably because it targets Ubuntu novice users 🙂
Another interesting article that gave me a hand during solving my issues was HOWTO: install grub with a chroot
Anyways, My first unsuccessful attempt was following a mix of the aforementioned articles and desperately trying to chroot to my mounted unbootable partition in order to be able to rewrite the grub loader in my MBR.
The error that slap me in my face during chroot was:

chroot: cannot execute /bin/sh : exec format error

Ghh Terrible … After reasoning on the shitty error I came to the conclusion that probably the livecd I’m trying to chroot to my debian linux partition is probably using a different incompatible version of glibc , if that kind of logic is true I concluded that it’s most likely that the glibc on my Linux system is newer (I came to that assumption because I was booting from livecds (Elive, Live CentOS as well Sabayon Linux, which were burnt about two years ago).

To proof my guesses I decided to try using Debian testing Squeeze/Sid install cd . That is the time to mention that I’m running Debian testing/unstable under the code name (Squeeze / Sid).
I downloaded the Debian testing amd64 last built image from here burnt it to a cd on another pc.
And booted it to my notebook, I wasn’t completely sure if the Install CD would have all the necessary recovery tools that I would need to rebuilt my grub but eventually it happened that the debian install cd1 has everything necessary for emergency situations like this one.

After I booted from the newly burned Debian install cd I followed the following recovery route to be able to recovery my system back to normal.It took me a while until I come with the steps described here, but I won’t get into details for brevity

1. Make new dir where you intend to mount your Linux partition and mount /proc, /dev, /dev/pts filesystems and the partition itself

noah:~# mkdir /mnt/root
noah:~# mount -t ext3 /dev/sda8 /mnt/root
noah:~# mount -o bind /dev /mnt/root/dev
noah:~# mount devpts /dev/pts -t devpts

Change /dev/sda8 in the above example commands with your partition name and number.
2. chroot to the mounted partition in order to be able to use your filesystem, exactly like you normally use it when you’re using your Linux partition

noah:~# chroot /mnt/root /bin/bash

Hopefully now you should be in locked in your filesystem and use your Linux non-bootable system as usual.

Being able to access your /boot/grub directory I suggest you first check that everything inside:

/boot/grub/menu.lst is well defined and there are no problems with the paths to the Linux partitions.

Next issue the following commands which will hopefully recover your broken grub boot loader.

noah:~# grub
noah:~# find /boot/grub/stage1

The second command find /boot/grub/stage1 should provide you with your partitions range e.g. it should return something like:

root (hd0,7)

Nevertheless in my case instead of the expected root (hd0,7) , I was returned

/boot/grub/stage1 not found

Useless to say this is uncool 🙂

As a normal reaction I tried experimenting in order to fix the mess. Logically enough I tried to reinstall grub using the

noah:~# grub-install --root-directory=/boot /dev/sda
noah:~# update-grub

To check if that would fix my grub issues I restarted my notebook. Well now grub menu appeared with some error generated by splashy
Trying to boot any of the setup Linux kernels was failing with some kind of error where the root file system was trying to be loaded from /root directory instead of the normal / because of that neither /proc /dev and /sys filesystems was unable to be mounted and the boot process was interrupting in some kind of rescue mode similar to busybox, though it was a was less flexible than a normal busybox shell.

To solve that shitty issue I once again booted with the Debian Testing (Sid / Squeeze ) Install CD1 and used the commands displayed above to mount my linux partition.

Next I reinstalled the following packages:

noah:~# apt-get update
noah:~# apt-get install --reinstall linux-image-amd64 uswsusp hibernate grub grub-common initramfs-tools

Here the grub reinstall actually required me to install the new grub generation 2 (version 2)
It was also necessary to remove the splashy

noah:~# apt-get remove splashy
As well as to grep through all my /etc/ and look for a /dev/sda6 and substitute it with my changed partition name /dev/sda8

One major thing where I substituted /dev/sda6 to my actual linux partition now with a name /dev/sda8 was in:

initramfs-tools/conf.d/resumeThe kernel reinstall and consequently (update) does offered me to substitute my normal /dev/sda* content in my /etc/fstab to some UUIDS like UUID=ba6058da-37f8-4065-854b-e3d0a874fb4e

Including this UUIDs and restarting now rendered my system completely unbootable … So I booted once again from the debian install cd .. arrgh 🙂 and removed the UUID new included lines in /etc/fstab and left the good old declarations.
After rebooting the system now my system booted once again! Hooray! All my data and everything is completely intact now Thanks God! 🙂

How to detect failing storage LUN on Linux – multipath

Wednesday, April 16th, 2014

detect-failing-lun-on-linux-failing-scsi-detection

If you login to server and after running dmesg – show kernel log command you get tons of messages like:

# dmesg

 

end_request: I/O error, dev sdc, sector 0
sd 0:0:0:1: SCSI error: return code = 0x00010000
end_request: I/O error, dev sdb, sector 0
sd 0:0:0:0: SCSI error: return code = 0x00010000
end_request: I/O error, dev sda, sector 0
sd 0:0:0:2: SCSI error: return code = 0x00010000
end_request: I/O error, dev sdc, sector 0
sd 0:0:0:1: SCSI error: return code = 0x00010000
end_request: I/O error, dev sdb, sector 0
sd 0:0:0:0: SCSI error: return code = 0x00010000
end_request: I/O error, dev sda, sector 0
sd 0:0:0:2: SCSI error: return code = 0x00010000
end_request: I/O error, dev sdc, sector 0
sd 0:0:0:1: SCSI error: return code = 0x00010000
end_request: I/O error, dev sdb, sector 0
sd 0:0:0:0: SCSI error: return code = 0x00010000
end_request: I/O error, dev sda, sector 0

 

In /var/log/messages there are also log messages present like:

# cat /var/log/messages
...

 

Apr 13 09:45:49 sl02785 kernel: end_request: I/O error, dev sda, sector 0
Apr 13 09:45:49 sl02785 kernel: klogd 1.4.1, ———- state change ———-
Apr 13 09:45:50 sl02785 kernel: sd 0:0:0:1: SCSI error: return code = 0x00010000
Apr 13 09:45:50 sl02785 kernel: end_request: I/O error, dev sdb, sector 0
Apr 13 09:45:55 sl02785 kernel: sd 0:0:0:2: SCSI error: return code = 0x00010000
Apr 13 09:45:55 sl02785 kernel: end_request: I/O error, dev sdc, sector 0

 

 

This is a sure sign something is wrong with SCSI hard drives or SCSI controller, to further debug the situation you can use the multipath command, i.e.:


multipath -ll | grep -i failed
_ 0:0:0:2 sdc 8:32 [failed][faulty]
_ 0:0:0:1 sdb 8:16 [failed][faulty]
_ 0:0:0:0 sda 8:0 [failed][faulty]

 

As you can see all 3 drives merged (sdc, sdb and sda) to show up on 1 physical drive via the remote network connectedLUN to the server is showing as faulty. This is a clear sign something is either wrong with 3 hard drive members of LUN – (Logical Unit Number) (which is less probable)  or most likely there is problem with the LUN network  SCSI controller. It is common error that LUN SCSI controller optics cable gets dirty and needs a physical clean up to solve it.

In case you don't know what is storage LUN? – In computer storage, a logical unit number, or LUN, is a number used to identify a logical unit, which is a device addressed by the SCSI protocol or protocols which encapsulate SCSI, such as Fibre Channel or iSCSI. A LUN may be used with any device which supports read/write operations, such as a tape drive, but is most often used to refer to a logical disk as created on a SAN. Though not technically correct, the term "LUN" is often also used to refer to the logical disk itself.

What LUN's does is very similar to Linux's Software LVM (Logical Volume Manager).