Posts Tagged ‘health’

Exaltation of the Holy Cross day in the Bulgarian Orthodox Church / Veneration of the Holy cross church feast

Saturday, April 3rd, 2021

Exaltation of the Holy Christ Cross icon, day in the Bulgarian Orthodox ChurchThe Universal Exaltation of the precious and life-giving Cross XVII century – icon from village of Belovo Trqvna region

he Universal Exaltation of the Precious and Life-Giving Cross (Elevation of the Cross)'s day is feted on 14th of September each year in accordance to the new calendar order, according to old church calendar, the Church celebrated it on 27th of September.

The day is marked by a strong fasting, for short we refer in Bulgarian to this big Church feast as (The day of the Cross / Krystov den).
The Holy Orthodox Christian Church does worship the cross 4 times a year on:

  • Third Sunday after the start of the Great Lent
  • Holy Friday (Good Friday)
  • 1st of August
  • and

  • 14th of September – The Exaltation of the Holy life giving Cross

On this day, we the orthodox christians go to the Church and admirationally bow and kiss the Holy Cross on which our Saviour Jesus Christ suffered for our salvation.Through his cross suffering our Lord has granted to all of us the Christians an unbeatable "weapon" against evil and sin.
Through cross Christ has been victorious over sin and death.

According to old Church tradition, on the day Christian people were asking for the priest to go to their homes and do vodosvet (sprinkle their houses with a blessed holy water).Vodosvet is one of the Orthodox Church mystics, there is a specific prayers begging God for forgiveness of sins, asking for good health and blessing. The prayers are being red over a vessel containing clean water. Finally he blesses the vessel with Water with the life Giving cross 3 times (as a symbol of the Holy Trinity). Then the priest uses a the cross and a tiny piece of twig to sprinkle all the people and objects in the house.

On the holy exaltation of the holy cross feast we also celebrate also the following 3 events:

1. The miraculous appearance of the Holy Cross to emper st. Constantine
2. The finding in Jerusalem of the Holy life-giving cross in Golgotha
3. The return of the Life-giving cross from persian captivity
On 14th of September, according to Church tradition saint John of John Chrysostom has give away his spirit and joined the assembly of saints of God, however because of the Exaltation great significance, the church holy fathers decided that this is celebrated on 13th of September.

The Antiphon for the Exaltation of the Cross feast is singed and translated from Slavonic its meaning goes like this:

Troparion-Vyzdvizhenie-na-Svetiq-Kryst-Gospoden-Exalation-of-the-Cross-Church-slavonic

 

Troparion of the Exaltation of the Holy Cross voice 1 / Тропар на светия Кръст, глас 1

Спаси, Господи, люди Твоя и благослови достояние Твое, победы на сопротивныя даруя, и Твое сохраняя Крестом Твоим жительство.

Troparion voice 1

Save Oh Lord, your people and bless possession, grant us oh Lord a victory over our enemies and save your inheritance with your Cross

Troparion voice 2

You've been lifted willingly on the cross, grant you mercies to your inheritance oh Christ our Lord,
stregthen the spirits of the pious king and to your people,
grant us victory against our enemies,
Surround us with peace and with peace give us unbeatable victory

In the Glorification part of the Holy Liturgy feast service its singed:

We magnify you, oh Christ life giver,
and your Holy cross, because you have saved us from the enemy.

What does the Church tradition says about the finding of the holy cross of Christs sufferings?

After the Church crucifixion of Christ, according to the tradition of that times, the weapon for punishment from this kinds – the cross tree was buried in the ground on the same place, where the punishment was executed.
Following the tradition on Gologthas where Christ was crucified was buried the cross used for crucifixion.
In later times, emperor Adrian in his attempt to destroy christianity and the place of pilgrimage of Golgotha has issued an order to built a pagan shrine on the same place.
Later under the reign of Emperor st. Constantine the cross appeared in the sky in a miraculous way and again under his reign the Golgotha place which means literally translated( the place of the Skulls) was discovered.

Third Sunday after start of the Great Lent – Sunday of the Veneration of the Holy Cross

Today 03.04.2021 we the orthodox are in the blessed period of the Great Lent. It is no coincidence the Church has set this feast on exactly this date. It is set on 3rd week of the 43 days (7 Weeks) that lasts the fasting period on the Eastern Orthodox Christian church because this is a little bit less than the half of the lent period. We know by the experience of spiritual fathers that once we start the job the hardest periods are nearby in 40% once the work is done and in that times it is a desire of the person to leave and quit the job but if he perseveres suddenly when the set goal is progressing this is overcome but then again in the end of the period of the goal to complete we start desiring to quit the started job and loose all energy put together, as a plan of the evil which wants us to always loose energy (both spiritual and physical) and never gain anything. Thus the Church set the feast of the Exaltation of the Cross to give us a way to attain new energy for the cross to be able to goodly continue in the deed of the lent. By the cross and his glorious power hence the spirit of despondency is crashed down and we're strenghtened and rejoice for the great glory our God has given us.
The_Exaltation-of-the-Holy-Cross-of-Christ-bishop-Polikarp-Bulgarian-Orthodox-Church
 

The exaltation of the cross is also a feast of everyone celebrating his own cross. The victory over death and everything was once fulfilled by Christ on the Cross. The humanity is saved already but it is up to everyone's free will to accept this salvation or not. The path is set it is the path of the Cross of Christ, meaning acceptance (humility) of all the unpleasant life events and situations, accepting everydays unexpected changes believing that this is God's providence and cross for each one of us, accepting the pain and suffering that is part of the personal cross we hold, accepting that one day our beloved and friends will pass away from this life, accepting the fact we age and the aging guarantees sufferings of the body but the spirit is refreshed by the grace of God, accepting all and enduring everyhing for the sake of the cross … 

Sunday-of-the-Veneration-of-the-Holy-Cross-Bulgarian_Orthodox-Church

The cross is a holder of the Universe and there is no power that will ever overwhelm it as it is said in the Church singings The Cross is the Holder (binds together) the whole universe. It is by the Cross all the evil has been conquered and life eternal has been giving. The path of the cross is the suffering, this is hard for the modern man to accept as we have been set to believe the only measure for success is prosperity, personal well being, physical health, posessing things. On the contrary the Christian says the most blessed and best thing one can have is the cross meaning personal suffering for and with Christ. By the suffering of the Cross Christ has glorified the bodily flesh he was possessing while being on earth in the body. By the Cross Christ has become the one begotten of all the sons of God. By the cross saints has conquered all evils and has sanctified, by the cross we still continue to progress in the goodness. 
Let with the Holy miracle making power of our Saviour's cross by the prayers of all the Saints and our Theotokos (Holy Virgin Merry) God grant to all of us christians victory of our enemies! Amen

 

Vodka! :)

Wednesday, September 12th, 2007

Yesterday I drinked 200 gr. of Vodka yesterday Night, it was pretty refreshing for me but I got drunk a little.I'm smoking again … Things are going bad in my life recently. I have health issues. And I intend to go to doctor today.Yesterday I went to the polyclinic but my personal Dr. Nikolay  was not there (I was angry, I went to doctor once in years and he is not there) so I'll try again today. I had pains somewhere around the stomach. At least at work things are going smoothly at least God hears my prayers about this. I'm very confused and I have completely no idea what to do with my life. Yesterday I was out with Lily and Kiril on the fountain. The previous day Nomen, I, Yavor, Kiro and Bino went to the "Kobaklyka" (a woody place which is close to Dobrich.) Well that's most of what's happening lately with my life. I wrote a little script to make that nautilus to get restarted if it starts burning the cpu. It's a dumb script (the bad thing is that I'm loosing form scripting, Well I don't script much lately). Here is the script http://pcfreak.d-bg.net/bshscr/restart_nautilus.sh https://www.pc-freak.net/bshscr/restart_nautilus.sh. The days before the 4 days weekend, I hat to spend a lot of time on one of the servers fighting with Spammers. Hate spammers really! I ended removing bounce messages at all for one of the domains, which fixed the bounce spam method spammers use (btw qmail's chkuser seems to not work properly for some reason) … Also I started watching Stargate – SG1. First I thought it's a stupid sci-fi serial. But after the first serie I now think it has it's good moments :]. Also I had something like a Mortification Day going on during Monday. The whole day I listened to Mortification (The first Christian Death Metal Band). I Liked much the "Hammer of God" album. In the evening Sabin (Bino) came home and we watched some Mortification videos at Youtube. Right now I listen again to "Ever – Idyll" a pretty great song. And yeah I keep listening to ChristianIndustrial.net a lot, a great radio. Try it if you haven't!END—–

Monitoring Linux hardware Hard Drives / Temperature and Disk with lm_sensors / smartd / hddtemp and Zabbix Userparameter lm_sensors report script

Thursday, April 30th, 2020

monitoring-linux-hardware-with-software-temperature-disk-cpu-health-zabbix-userparameter-script

I'm part of a  SysAdmin Team that is partially doing some minor Zabbix imrovements on a custom corporate installed Zabbix in an ongoing project to substitute the previous HP OpenView monitoring for a bunch of Legacy Linux hosts.
As one of the necessery checks to have is regarding system Hardware, the task was to invent some simplistic way to monitor hardware with the Zabbix Monitoring tool.  Monitoring Bare Metal servers hardware of HP / Dell / Fujituse etc. servers  in Linux usually is done with a third party software provided by the Hardware vendor. But as this requires an additional services to run and sometimes is not desired. It was interesting to find out some alternative Linux native ways to do the System hardware monitoring.
Monitoring statistics from the system hardware components can be obtained directly from the server components with ipmi / ipmitool (for more info on it check my previous article Reset and Manage intelligent  Platform Management remote board article).
With ipmi
 hardware health info could be received straight from the ILO / IDRAC / HPMI of the server. However as often the Admin-Lan of the server is in a seperate DMZ secured network and available via only a certain set of routed IPs, ipmitool can't be used.

So what are the other options to use to implement Linux Server Hardware Monitoring?

The tools to use are perhaps many but I know of two which gives you most of the information you ever need to have a prelimitary hardware damage warning system before the crash, these are:
 

1. smartmontools (smartd)

Smartd is part of smartmontools package which contains two utility programs (smartctl and smartd) to control and monitor storage systems using the Self-Monitoring, Analysis and Reporting Technology system (SMART) built into most modern ATA/SATA, SCSI/SAS and NVMe disks

Disk monitoring is handled by a special service the package provides called smartd that does query the Hard Drives periodically aiming to find a warning signs of hardware failures.
The downside of smartd use is that it implies a little bit of extra load on Hard Drive read / writes and if misconfigured could reduce the the Hard disk life time.

 

linux:~#  /usr/sbin/smartctl -a /dev/sdb2
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-5-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     KINGSTON SA400S37240G
Serial Number:    50026B768340AA31
LU WWN Device Id: 5 0026b7 68340aa31
Firmware Version: S1Z40102
User Capacity:    240,057,409,536 bytes [240 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Thu Apr 30 14:05:01 2020 EEST
SMART support is: Available – device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  120) seconds.
Offline data collection
capabilities:                    (0x11) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0002) Does not save SMART data before
                                        entering power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  10) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   100   100   000    Old_age   Always       –       100
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       –       2820
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       –       21
148 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       0
149 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       0
167 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       0
168 Unknown_Attribute       0x0012   100   100   000    Old_age   Always       –       0
169 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       0
170 Unknown_Attribute       0x0000   100   100   010    Old_age   Offline      –       0
172 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       –       0
173 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       0
181 Program_Fail_Cnt_Total  0x0032   100   100   000    Old_age   Always       –       0
182 Erase_Fail_Count_Total  0x0000   100   100   000    Old_age   Offline      –       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       –       0
192 Power-Off_Retract_Count 0x0012   100   100   000    Old_age   Always       –       16
194 Temperature_Celsius     0x0022   034   052   000    Old_age   Always       –       34 (Min/Max 19/52)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       –       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       –       0
218 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       –       0
231 Temperature_Celsius     0x0000   097   097   000    Old_age   Offline      –       97
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       –       2104
241 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       –       1857
242 Total_LBAs_Read         0x0032   100   100   000    Old_age   Always       –       1141
244 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       32
245 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       107
246 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       15940

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

Selective Self-tests/Logging not supported

 

2. hddtemp

 

Usually if smartd is used it is useful to also use hddtemp which relies on smartd data.
 The hddtemp program monitors and reports the temperature of PATA, SATA
 or SCSI hard drives by reading Self-Monitoring Analysis and Reporting
 Technology (S.M.A.R.T.)
information on drives that support this feature.
 

linux:~# /usr/sbin/hddtemp /dev/sda1
/dev/sda1: Hitachi HDS721050CLA360: 31°C
linux:~# /usr/sbin/hddtemp /dev/sdc6
/dev/sdc6: KINGSTON SV300S37A120G: 25°C
linux:~# /usr/sbin/hddtemp /dev/sdb2
/dev/sdb2: KINGSTON SA400S37240G: 34°C
linux:~# /usr/sbin/hddtemp /dev/sdd1
/dev/sdd1: WD Elements 10B8: S.M.A.R.T. not available

 

 

3. lm-sensors / i2c-tools 

 Lm-sensors is a hardware health monitoring package for Linux. It allows you
 to access information from temperature, voltage, and fan speed sensors.
i2c-tools
was historically bundled in the same package as lm_sensors but has been seperated cause not all hardware monitoring chips are I2C devices, and not all I2C devices are hardware monitoring chips.

The most basic use of lm-sensors is with the sensors command

 

linux:~# sensors
i350bb-pci-0600
Adapter: PCI adapter
loc1:         +55.0 C  (high = +120.0 C, crit = +110.0 C)

 

coretemp-isa-0000
Adapter: ISA adapter
Physical id 0:  +28.0 C  (high = +78.0 C, crit = +88.0 C)
Core 0:         +26.0 C  (high = +78.0 C, crit = +88.0 C)
Core 1:         +28.0 C  (high = +78.0 C, crit = +88.0 C)
Core 2:         +28.0 C  (high = +78.0 C, crit = +88.0 C)
Core 3:         +28.0 C  (high = +78.0 C, crit = +88.0 C)

 


On CentOS Linux useful tool is also  lm_sensors-sensord.x86_64 – A Daemon that periodically logs sensor readings to syslog or a round-robin database, and warns of sensor alarms.

In Debian Linux there is also the psensors-server (an HTTP server providing JSON Web service which can be used by GTK+ Application to remotely monitor sensors) useful for developers
psesors-server

psensor-linux-graphical-tool-to-check-cpu-hard-disk-temperature-unix

If you have a Xserver installed on the Server accessed with Xclient or via VNC though quite rare,
You can use xsensors or Psensora GTK+ (Widget Toolkit for creating Graphical User Interface) application software.

With this 3 tools it is pretty easy to script one liners and use the Zabbix UserParameters functionality to send hardware report data to a Company's Zabbix Sserver, though Zabbix has already some templates to do so in my case, I couldn't import this templates cause I don't have Zabbix Super-Admin credentials, thus to work around that a sample work around is use script to monitor for higher and critical considered temperature.
Here is a tiny sample script I came up in 1 min time it can be used to used as 1 liner UserParameter and built upon something more complex.

SENSORS_HIGH=`sensors | awk '{ print $6 }'| grep '^+' | uniq`;
SENSORS_CRIT=`sensors | awk '{ print $9 }'| grep '^+' | uniq`; ;SENSORS_STAT=`sensors|grep -E 'Core\s' | awk '{ print $1" "$2" "$3 }' | grep "$SENSORS_HIGH|$SENSORS_CRIT"`;
if [ ! -z $SENSORS_STAT ]; then
echo 'Temperature HIGH';
else 
echo 'Sensors OK';
fi 

Of course there is much more sophisticated stuff to use for monitoring out there


Below script can be easily adapted and use on other Monitoring Platforms such as Nagios / Munin / Cacti / Icinga and there are plenty of paid solutions, but for anyone that wants to develop something from scratch just like me I hope this
article will be a good short introduction.
If you know some other Linux hardware monitoring tools, please share.

Monitor General Server / Desktop system health in console on Linux and FreeBSD

Tuesday, October 4th, 2011

slurm-output-monitoring-networking
saidar
is a text based ncurses program to display live statistics about general system health.

It displays in one refreshable screen (similar to top) statistics about server state of:
CPU, Load, Memory, Swap, Network, I/O disk operations
Besides that saidar supports a ncurses console colors, which makes it more funny to look at.
Saidar extracts the statistics for system state based on libgstrap cross platform statistics library about pc system health.

On Debian, Ubuntu, Fedora, CentOS Linuxes saider is available for install straight from distribution repositories.
On Debian and Ubuntu saidar is installed with cmd:

debian:~# apt-get install saidar
...

On CentOS and Fedora saidar is bundled as a part of statgrab-tools rpm package.
Installing it on 64 bit CentOS with yum is with command:

[root@centos ~]# yum install statgrab-tools.x86_64

Saidar is also available on FreeBSD as a part of the /usr/ports/devel/libgstrab, hence to use on my FreeBSD I had to install the libgstrab port:

freebsd# cd /usr/ports/devel/libstatgrab
freebsd# make install clean

Here is saidar running on my Desktop Debian on Thinkpad in color output:

debian:~# saidar -c

Saidar Linux General statistics Screenshot

I've seen many people, who use various shell scripts to output system monitoring information, this scripts however are often written to just run without efficiency in mind and they put some let's say 1% extra load on the system CPU. This is not the case with saidar which is written in C and hence the program is optimized well for what it does.

Update: Next to saidar I recommend you check out Slurm (Real Time Network Interface Monitor) it can visualizes network interface traffic using ascii graph such as on top of the article. On Debian and Ubuntu Slurm is available and easily installable via simple:
 

apt-get install –yes slurm

 

Save data from failing hard disk on Linux – Rescuing data from failing disk with bad blocks

Wednesday, April 16th, 2014

save-data-from-failing-hard-drive-data-recovery-badblocks-linux_1.jpg
Sooner or later your Linux Desktop or Linux server hard drive will start breaking up, whether you have a hardware or software RAID 1, 6 or 10 you can  and good hard disk health monitoring software you can react on time but sometimes as admins we have to take care of old servers which either have RAID 0 or missing RAID configuration and or disk firmware is unable to recognize failing blocks on time and remap them. Thus it is quite useful to have techniques to save data from failing hard disk drives with physical badblocks.

With ddrescue tool there is still hope for your Linux data though disk is full of unrecoverable I/O errors.

apt-cache show ddrescue
 

apt-cache show ddrescue|grep -i description -A 12

Description: copy data from one file or block device to another
 dd_rescue is a tool to help you to save data from crashed
 partition. Like dd, dd_rescue does copy data from one file or
 block device to another. But dd_rescue does not abort on errors
 on the input file (unless you specify a maximum error number).
 It uses two block sizes, a large (soft) block size and a small
 (hard) block size. In case of errors, the size falls back to the
 small one and is promoted again after a while without errors.
 If the copying process is interrupted by the user it is possible
 to continue at any position later. It also does not truncate
 the output file (unless asked to). It allows you to start from
 the end of a file and move backwards as well. dd_rescue does
 not provide character conversions.

 

To use ddrescue for saving data first thing is to shutdown the Linux host boot the system with a Rescue LiveCD like SystemRescueCD – (Linux system rescue disk), Knoppix (Most famous bootable LiveCD / LiveDVD), Ubuntu Rescue Remix or BackTrack LiveCD – (A security centered "hackers" distro which can be used also for forensics and data recovery), then mount the failing disk (I assume disk is still mountable :). Note that it is very important to mount the disk as read only, because any write operation on hard drive increases chance that it completely becomes unusable before saving your data!

To make backup of your whole hard disk data to secondary mounted disk into /mnt/second_disk

# mkdir /mnt/second_disk/rescue
# mount /dev/sda2 /mnt/second_disk/rescue
# dd_rescue -d -r 10 /dev/sda1 /mnt/second_disk/rescue/backup.img
# mount -o loop /mnt/second_disk/rescue/backup.img

In above example change /dev/sda2 to whatever your hard drive device is named.

Whether you have already an identical secondary drive attached to the Linux host and you would like to copy whole failing Linux partition (/dev/sda) to the identical drive (/dev/sdb) issue:

ddrescue -d -f -r3 /dev/sda /dev/sdb /media/PNY_usb/rescue.logfile

If you got just a few unreadable files and you would like to recover only them then run ddrescue just on the damaged files:

ddrescue -d –R -r 100 /damaged/disk/some_dir/damaged_file /mnt/secondary_disk/some_dir/recoveredfile

-d instructs to use direct I/O
-R retrims the error area on each retry
-r 100 sets the retry limit to 100 (tries to read data 100 times before resign)

Of course this is not always working as on some HDDs recovery is impossible due to hard physical damages, if above command can't recover a file in 10 attempts it is very likely that it never succeeds …

A small note to make here is that there is another tool dd_rescue (make sure you don't confuse them) – which is also for recovery but GNU ddrescue performs better with recovery.
How ddrescue works is it keeps track of the bad sectors, and go back and try to do a slow read of that data in order to read them.
By the way BSD users would happy to know there is ddrescue port already, so data recovery on BSDs *NIX filesystems if you're a Windows user you can use ddrescue to recover data too via Cygwin.
Of course final data recovery is also very much into God's hands so before launching ddrescue, don't forget to say a prayer 🙂

Preventive measures against hard disk failures with smard / Installing smartmontools on Linux

Friday, March 15th, 2013

Many admins might not know about smartmontools Linux package. It provides two useful tools  smartctl and smard which use (Self Monitoring and Reporting Technology system) often abreviated as S.M.A.R.T.. SMART support is nowdays available across any modern ATA, SATA and SCSI hard disks. smartontools package is installable via default package repositories on virtually all different Linux distributions. Having smartmontools installed on all critical productive server is a must for the reason it serves as early notification system in case if hard disk is on the down-verge of break-up (i.e. physical media of hard disk storage starts getting damaged). Through the last 14 years I worked as Linux sysadmin. I've used smartmontools on hundreds of servers and on many times it save companies hundreds of dollars by simply reporting a system hdd is dying and by replacing the server or hard disk with identifically configured ones. smartmontools supports monitoring of single  hard disks as well as ones configured on a hardware level to work in some RAID array. As of time of writing you can check list of smartmontools supported hardware RAID-Controllers here.

1. Installing smartmontools

a) To install smartmontools on Debian and Ubuntu and other .deb based servers:

debian:~# apt-get install --yes smartmontools
.....

b) On CentOS, Fedora,RHEL and other RPM based  install with:

[root@centos ~]# yum --yes install smartmontools
.....

2. Configuring and Enabling smartd hard disk health monitoring

a) on Debian and derivatives

Edit /etc/default/smartmontools:

debian:~# vim /etc/default/smartmontools

By default file looks smth. like;

 

# Defaults for smartmontools initscript (/etc/init.d/smartmontools)
# This is a POSIX shell fragment

# List of devices you want to explicitly enable S.M.A.R.T. for
# Not needed (and not recommended) if the device is monitored by smartd
#enable_smart="/dev/hda /dev/hdb"
#enable_smart="/dev/hda"
# uncomment to start smartd on system startup
#start_smartd=yes

# uncomment to pass additional options to smartd on startup
#smartd_opts="–interval=1800"

Config file should look something like;

 

# Defaults for smartmontools initscript (/etc/init.d/smartmontools)
# This is a POSIX shell fragment

# List of devices you want to explicitly enable S.M.A.R.T. for
# Not needed (and not recommended) if the device is monitored by smartd
#enable_smart="/dev/hda /dev/hdb"
enable_smart="/dev/sda"
# uncomment to start smartd on system startup
start_smartd=yes

# uncomment to pass additional options to smartd on startup
#smartd_opts="–interval=1800"

 

b) on CentOS, RHEL, Fedora  for smartd options

By default on RPM based distros there is no need for special configuration. However for some custom cases edit /etc/sysconfig/smartmontools and /etc/smartd.conf

c) Enabling smartmontools

[root@centos default]# /etc/init.d/smartd start
Starting smartd:           [  OK  ]

3. Checking hard disk failure status with smartctl

Checking whether a SMART hard disk consistency check Passes is done simplest with:

debian:~# /usr/sbin/smartctl -H /dev/sda

smartctl 5.40 2010-07-12 r3124 [x86_64-unknown-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

SMART Health Status: OK

 

 

debian:~# /usr/sbin/smartctl -i /dev/sda1

smartctl version 5.38 [i686-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.7 and 7200.7 Plus family
Device Model:     ST340014AS
Serial Number:    4MQ0LV3B
Firmware Version: 3.43
User Capacity:    40,020,664,320 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   6
ATA Standard is:  ATA/ATAPI-6 T13 1410D revision 2
Local Time is:    Fri Mar 15 15:27:12 2013 EET
SMART support is: Available – device has SMART capability.
SMART support is: Enabled

To print as much information as possible for hard disk health status;

 

[root@centos default]# /usr/sbin/smartctl -a /dev/sda1

smartctl version 5.38 [i686-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.7 and 7200.7 Plus family
Device Model:     ST340014AS
Serial Number:    4MQ0LV3B
Firmware Version: 3.43
User Capacity:    40,020,664,320 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   6
ATA Standard is:  ATA/ATAPI-6 T13 1410D revision 2
Local Time is:    Fri Mar 15 15:14:53 2013 EET
SMART support is: Available – device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:          ( 423) seconds.
Offline data collection
capabilities:              (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   1) minutes.
Extended self-test routine
recommended polling time:      (  19) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   052   045   006    Pre-fail  Always       –       172137473
  3 Spin_Up_Time            0x0002   098   098   000    Old_age   Always       –       0
  4 Start_Stop_Count        0x0033   096   096   020    Pre-fail  Always       –       4198
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       –       0
  7 Seek_Error_Rate         0x000f   090   060   030    Pre-fail  Always       –       945095084
  9 Power_On_Hours          0x0032   075   075   000    Old_age   Always       –       22769
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       –       0
 12 Power_Cycle_Count       0x0033   099   099   020    Pre-fail  Always       –       1084
194 Temperature_Celsius     0x0022   038   046   000    Old_age   Always       –       38 (0 15 0 0)
195 Hardware_ECC_Recovered  0x001a   052   045   000    Old_age   Always       –       172137473
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      –       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       –       0
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      –       0
202 TA_Increase_Count       0x0032   100   253   000    Old_age   Always       –       0

SMART Error Log Version: 1
ATA Error Count: 33 (device log contains only the most recent five errors)
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 33 occurred at disk power-on lifetime: 21588 hours (899 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  — — — — — — —
  40 51 00 77 c3 6a e0  Error: UNC at LBA = 0x006ac377 = 6996855

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  — — — — — — — —  —————-  ——————–
  c8 00 08 77 c3 6a e0 00      14:07:39.385  READ DMA
  ec 00 00 00 00 00 a0 00      14:07:35.553  IDENTIFY DEVICE
  ef 03 45 00 00 00 a0 00      14:07:35.550  SET FEATURES [Set transfer mode]
  ec 00 00 00 00 00 a0 00      14:07:35.547  IDENTIFY DEVICE
  c8 00 08 77 c3 6a e0 00      14:07:35.543  READ DMA

Error 32 occurred at disk power-on lifetime: 21588 hours (899 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  — — — — — — —
  40 51 00 77 c3 6a e0  Error: UNC at LBA = 0x006ac377 = 6996855

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  — — — — — — — —  —————-  ——————–
  c8 00 08 77 c3 6a e0 00      14:07:23.940  READ DMA
  ec 00 00 00 00 00 a0 00      14:07:35.553  IDENTIFY DEVICE
  ef 03 45 00 00 00 a0 00      14:07:35.550  SET FEATURES [Set transfer mode]
  ec 00 00 00 00 00 a0 00      14:07:35.547  IDENTIFY DEVICE
  c8 00 08 77 c3 6a e0 00      14:07:35.543  READ DMA

Error 31 occurred at disk power-on lifetime: 21588 hours (899 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  — — — — — — —
  40 51 00 77 c3 6a e0  Error: UNC at LBA = 0x006ac377 = 6996855

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  — — — — — — — —  —————-  ——————–
  c8 00 08 77 c3 6a e0 00      14:07:23.940  READ DMA
  ec 00 00 00 00 00 a0 00      14:07:23.937  IDENTIFY DEVICE
  ef 03 45 00 00 00 a0 00      14:07:20.071  SET FEATURES [Set transfer mode]
  ec 00 00 00 00 00 a0 00      14:07:20.057  IDENTIFY DEVICE
  c8 00 08 77 c3 6a e0 00      14:07:20.044  READ DMA

Error 30 occurred at disk power-on lifetime: 21588 hours (899 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  — — — — — — —
  40 51 00 77 c3 6a e0  Error: UNC at LBA = 0x006ac377 = 6996855

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  — — — — — — — —  —————-  ——————–
  c8 00 08 77 c3 6a e0 00      14:07:23.940  READ DMA
  ec 00 00 00 00 00 a0 00      14:07:23.937  IDENTIFY DEVICE
  ef 03 45 00 00 00 a0 00      14:07:20.071  SET FEATURES [Set transfer mode]
  ec 00 00 00 00 00 a0 00      14:07:20.057  IDENTIFY DEVICE
  c8 00 08 77 c3 6a e0 00      14:07:20.044  READ DMA

Error 29 occurred at disk power-on lifetime: 21588 hours (899 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  — — — — — — —
  40 51 00 77 c3 6a e0  Error: UNC at LBA = 0x006ac377 = 6996855

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  — — — — — — — —  —————-  ——————–
  c8 00 08 77 c3 6a e0 00      14:07:23.940  READ DMA
  ec 00 00 00 00 00 a0 00      14:07:23.937  IDENTIFY DEVICE
  ef 03 45 00 00 00 a0 00      14:07:20.071  SET FEATURES [Set transfer mode]
  ec 00 00 00 00 00 a0 00      14:07:20.057  IDENTIFY DEVICE
  c8 00 08 77 c3 6a e0 00      14:07:20.044  READ DMA

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%         1         –

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

4. Visualizing smartd collected data in GUI with gsmartcontrol

For people who prefer to visualize things in Graphical environment smartd service hard disk health data can be viewed in nice graphical interface wth gsmartcontrol tool. Most Linux servers don't have graphical environment as having a X server with any graphics manager is a waste of system resources thus installing gsmartcontrol doesn't make much sense, however for monitoring and reporting for upcoming Hard Disk issues gsmartcontrol is a good one to have.

a) To install gsmartcontrol on Debian and Ubuntu Linux;

debian:~# apt-get install --yes gsmartcontrol
....

 

b) Installing gsmartcontrol on CentOS, Fedora, RHEL and SuSE;

gsmartcontrol has a binary package builds for all major Linux distributions, except Slackware Linux. For any of RPM based Linux distros. Go and download required smartmontools distro version and type binary from here then install the RPMs one by one with the usual:

[root@centos ~]# rpm -ivh glimm*
....
[root@centos ~]# rpm -ivh libglademm*
....
[root@centos ~]# rpm -ivh libsigc*
....
[root@centos ~]# rpm -ivh cairomm*
....
[root@centos ~]# rpm -ivh gsmartcontrol*
....

Below, are 2 screenshots of GSmartControl taken from my

gsmartmontools Debian stable Linux screenshot monitor hard disk health in graphical environment

Lenovo gsmartcontrol Thinkpad Device information /dev/sda ST9160824AS screenshot 
If you get something different from Overall health self-assessment test PASSED, this means hard disk has a surface damage and needs to be replaced ASAP. If during hard disk normal operation HDD hits I/O errors and you can't afford to have a GUI environment just for gsmartcontrol, errors gets logged in dmesg hence dmesg could be useful to provide you with info of a failing hard drive.

One more day passed

Thursday, November 29th, 2007

One more day passed. I have pains and I scream to God for guidance and help. I’m not sure where am going to as usual. Today we had to have German. I wentto the college only to find out that the German lesson is removed from the schedule. A friend of mine who is in germany Shaltev has sent me a Video of his band.The band is called viamala, here is there website http://viamala.org. I really liked there music btw. Tomorrow I have Dutch. The day was a sort of quite for me thanks and Praise be to the Lord creator. By the way my health is not well. I have pains in different organs sometimes. You know life is hard. I’m loving more and more the FreeBSD :). I watched Ice Age and currently I’m watching Ice Age 2. Great anime (I’m having fun with it. I often think of becoming a monk. Life is such a vanity.END—–

6 days in sickness

Friday, August 10th, 2007

My physical health was quite not good during the last 6 / 7 days. Today it was a quiet day.I haven’t prayed seriously for few days but I can’t. Since my life looks like going nowhere.There is almost nothing in this town which keeps me still. I went to the Old Dobrich inMino’s coffee. But after a little argue and being a little rude to a girl I leavedthis awful mess. This guys are not a good company/match for me. It seems I don’t have friendsexcept Lily. Well I hope at least I haven’t builded all the time for nothing.Thanks Goodness that at least at work there isn’t a lot of work so I’m in a period of recovery.The world is going mad. I’m starting to scare my self. Seems like, life is created to be livednot to think about it’s purpose.END—–

In Rusalka a.k.a. Marmayed and Shabla Camping

Monday, September 3rd, 2007

I spend the weekend with Megi, Niki and Nomen in Rusalka (we beached there), although there was no sun at allthe water was warm and it was good experience (this happened in the late evening). In 06:00 or 07:00 o’clock.We decided to go to Tulenovo’s caves and stay there and make a wood fire. But the caves were already taken by others.So in the end we went to Shablenska Tuzla. We stretch the 2 tents and fired a firewood on the beach and started having a supper, unfortunately a rain started and we have to gather the 2 tents and the food and go to the car. We waited to see ifthe rain would stop but it was raining and we went to a near family hotel where Mitko, Megi and Niki slept into a room and slept in the car (this is the first time I have to sleep in a car). In the morning we went to the beach I stayed out of the sea because there was wind and I was scared of getting sick again. Around 12:30 we were in Dobrich. So this is how most of the weekend passed in the night we went to my Grandma and Grandpa’s (Peace be upon him) village with my father and we stayed there for 30 minutes or so. During the weekend I successfully made a binary upgrade of my xorg 6.9 -> 7.2 (it was a full mess), it took me 2 days! As usual the upgrades under FBSD are a real nightmare. Speaking about faith I’m not sure what do I believe anymore I still hope that God would fix my health issues, but I’m tired of waiting really :[ The bad thing about the weekend was that one more time I felt like not being on my right place. I realized soon that I can’t hear the voice of God. And currently I’m praying that God would give me this ability. But ofcourse only time will show.END—–

Management Games and Theathre Sports with Joop Vinke

Friday, April 11th, 2008

Yesterday and today we had Management Games and Theathre Games with Joop Vinke.At the management game we play a sort of Human Resources Management game. All the students are devided into groups and we play a simulator game. We had to manage a company. First we setup our 2 year goals and then we play the game on quarters (6 quarters). Every quarter we have to made some managerial decisions (invest money into different stuff, hire personnel, promote ppl etc.).

Basicly the company consists of 660 employees, there are 5 levels in the company starting from 1 where there is unqualified specialists and 5 which are the top management.

When we make our choices then all this data is inputed into a computer which gives us some feedback which helps us in taking the decisions for the next quarter. At the meantime Vinke organizes fun games to entertain us and make us feel comfortable with him and through this games he tries to show us basic concepts in business. The last two days I really enjoyed.

Today the game that impressed me the most was called

“The Werewolves from Wackedan”. Basicly it’s a strategic game with roles. In it you’ve got a bunch of ppl who play different roles, 3 of them are werewolves, others are citizens others are ppl who have special abilities to foresee who are the werewolves.

We had cards in front of us turned back to prevent others except us to see the cards. Some of the cards are citizens and ppl who belong to the citizens other 3 are werewolves.

Every night the werewolf kills a person (by selecting somebody from the crowd, when they sleep), because the werewolves are out at night when everybody sleeps. At the morning citizens awake and one of their friends is dead so they try to revenge by pointing someone to be killed (it may be a citizen again it may be a werewolf).

At the end only werewolves or citizens should servive 🙂 It was a big fun today to play this simple game. At the end of the day at 18:00 we had a session of the so called Theathre/Games. Theathre Games include different entertaining games which are designed to improve our communication skills and teach us to act like an actors plus they are pretty entertaining 🙂 That’s all thanks to God everything seems to run smoothly around my life. Except my health I’m still having some health issues although I can say I have an improvement I am not still healed and I still drink herbs.

At 20:00 I was out with Narf and we went to the fountain a little later Kimmo and Yavor joined us and we spend some time their. Well that’s most of the day at night I went to my grandma just to see how she is doing and now I write this post tomorrow the Management Game continues at 09:00. So probably after few minutes I’ll go for the night prayers and then I’ll go to sleep. END—–