Posts Tagged ‘broken’

Debug and fix Virtuozzo / KVM broken Hypervisor error: ‘PrlSDKError(‘SDK error: 0x80000249: Unable to connect to Virtuozzo. You may experience a connection problem or the server may be down.’ on CentOS Linux howto

Thursday, January 28th, 2021

Reading Time: 6minutes

fix-sdkerror-virtuozzo-kvm-how-to-debug-problems-with-hypervisor-host-linux

I've recently yum upgraded a CentOS Linux server runinng Virtuozzo kernel and Virtuozzo virtualization Virtual Machines to the latest available CentOS Linux release 7.9.2009 (Core) just to find out after the upgrade there was issues with both virtuozzo (VZ) way to list installed VZ enabled VMs reporting Unable to connect to Virtuozzo error like below:
 

[root@CENTOS etc]# prlctl list -a
Unable to connect to Virtuozzo. You may experience a connection problem or the server may be down. Contact your Virtuozzo administrator for assistance.


Even the native QEMU KVM VMs installed on the Hypervisor system failed to work to list and bring up the VMs producing another unexplainable error with virsh unable to connect to the hypervisor socket

[root@CENTOS etc]# virsh list –all
error: failed to connect to the hypervisor
error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory


In dmesg cmd kernel log messages the error found looked as so:

[root@CENTOS etc]# dmesg|grep -i sdk


[    5.314601] PrlSDKError('SDK error: 0x80000249: Unable to connect to Virtuozzo. You may experience a connection problem or the server may be down. Contact your Virtuozzo administrator for assistance.',)

To fix it I had to experiment a bit based on some suggestions from Google results as usual and what turned to be the cause is a now obsolete setting for disk probing that is breaking libvirtd

Disable allow_disk_format_probing in /etc/libvirt/qemu.conf

The fix to PrlSDKError('SDK error: 0x80000249: Unable to connect to Virtuozzo comes to commenting a parameter inside 

/etc/libvirt/qemu.conf

which for historical reasons seems to be turned on by default it is like this

allow_disk_format_probing = 1


Resolution is to either change the value to 0 or completely comment the line:

[root@CENTOS etc]# grep allow_disk_format_probing /etc/libvirt/qemu.conf
# If allow_disk_format_probing is enabled, libvirt will probe disk
#allow_disk_format_probing = 1
#allow_disk_format_probing = 1


Debug problems with Virtuozzo services and validate virtualization setup

What really helped to debug the issue was to check the extended status info of libvirtd.service vzevent vz.service libvirtguestd.service prl-disp systemd services

[root@CENTOS etc]# systemctl -l status libvirtd.service vzevent vz.service libvirtguestd.service prl-disp

Here I had to analyze the errors and googled a little bit about it


Once this is changed I had to of course restart libvirtd.service and rest of virtuozzo / kvm services

[root@CENTOS etc]# systemctl restart libvirtd.service ibvirtd.service vzevent vz.service libvirtguest.service prl-disp


Another useful tool part of a standard VZ install that I've used to make sure each of the Host OS Hypervisor components is running smoothly is virt-host-validate(tool is part of libvirt-client rpm package)

[root@CENTOS etc]# virt-host-validate
  QEMU: Checking for hardware virtualization                                 : PASS
  QEMU: Checking if device /dev/kvm exists                                   : PASS
  QEMU: Checking if device /dev/kvm is accessible                            : PASS
  QEMU: Checking if device /dev/vhost-net exists                             : PASS
  QEMU: Checking if device /dev/net/tun exists                               : PASS
  QEMU: Checking for cgroup 'memory' controller support                      : PASS
  QEMU: Checking for cgroup 'memory' controller mount-point                  : PASS
  QEMU: Checking for cgroup 'cpu' controller support                         : PASS
  QEMU: Checking for cgroup 'cpu' controller mount-point                     : PASS
  QEMU: Checking for cgroup 'cpuacct' controller support                     : PASS
  QEMU: Checking for cgroup 'cpuacct' controller mount-point                 : PASS
  QEMU: Checking for cgroup 'cpuset' controller support                      : PASS
  QEMU: Checking for cgroup 'cpuset' controller mount-point                  : PASS
  QEMU: Checking for cgroup 'devices' controller support                     : PASS
  QEMU: Checking for cgroup 'devices' controller mount-point                 : PASS
  QEMU: Checking for cgroup 'blkio' controller support                       : PASS
  QEMU: Checking for cgroup 'blkio' controller mount-point                   : PASS
  QEMU: Checking for device assignment IOMMU support                         : PASS
  QEMU: Checking if IOMMU is enabled by kernel                               : WARN (IOMMU appears to be disabled in kernel. Add intel_iommu=on to kernel cmdline arguments)
   LXC: Checking for Linux >= 2.6.26                                         : PASS
   LXC: Checking for namespace ipc                                           : PASS
   LXC: Checking for namespace mnt                                           : PASS
   LXC: Checking for namespace pid                                           : PASS
   LXC: Checking for namespace uts                                           : PASS
   LXC: Checking for namespace net                                           : PASS
   LXC: Checking for namespace user                                          : PASS
   LXC: Checking for cgroup 'memory' controller support                      : PASS
   LXC: Checking for cgroup 'memory' controller mount-point                  : PASS
   LXC: Checking for cgroup 'cpu' controller support                         : PASS
   LXC: Checking for cgroup 'cpu' controller mount-point                     : PASS
   LXC: Checking for cgroup 'cpuacct' controller support                     : PASS
   LXC: Checking for cgroup 'cpuacct' controller mount-point                 : PASS
   LXC: Checking for cgroup 'cpuset' controller support                      : PASS
   LXC: Checking for cgroup 'cpuset' controller mount-point                  : PASS
   LXC: Checking for cgroup 'devices' controller support                     : PASS
   LXC: Checking for cgroup 'devices' controller mount-point                 : PASS
   LXC: Checking for cgroup 'blkio' controller support                       : PASS
   LXC: Checking for cgroup 'blkio' controller mount-point                   : PASS
   LXC: Checking if device /sys/fs/fuse/connections exists                   : PASS


One thing to note here that virt-host-validate helped me to realize the  fuse (File system in userspace) module kernel support enabled on the HV was missing so I've enabled temporary for this boot with modprobe and permanently via a configuration like so:

# to load it one time
[root@CENTOS etc]#  modprobe fuse
# to load fuse permnanently on next boot

[root@CENTOS etc]#  echo fuse >> /etc/modules-load.d/fuse.conf

Disable selinux on CentOS HV

Another thing was selinux was enabled on the HV. Selinux is really annoying thing and to be honest I never used it on any server and though its idea is quite good the consequences it creates for daily sysadmin work are terrible so I usually disable it. It could be that a Hypervisor Host OS might work just normal with the selinux enabled but just in case I decided to remove it. This is how

[root@CENTOS etc]#  sestatus
SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   enforcing
Mode from config file:          enforcing
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Max kernel policy version:      31

To temporarily change the SELinux mode from targeted to permissive with the following command:

[root@CENTOS etc]#  setenforce 0

Edit /etc/selinux/config file and set the SELINUX mod to disabled

[root@CENTOS etc]# vim /etc/selinux/config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#       enforcing – SELinux security policy is enforced.
#       permissive – SELinux prints warnings instead of enforcing.
#       disabled – No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
#       targeted – Targeted processes are protected,
#       mls – Multi Level Security protection.
SELINUXTYPE=targeted

Finally rebooted graceously the machine just in case with the good recommended way to reboot servers with shutdown command instead of /sbin/reboot

[root@CENTOS etc]# shutdown -r now

The advantage of shutdown is that it tries to shutdown each service by sending stop requests but usually this takes some time and even a shutdown request could take longer to proccess as each service such as a WebServer application is being waited to close all its network connections etc. |
However if you want to have a quick reboot and you don't care about any established network connections to third party IPs you can go for the brutal old fashioned /sbin/reboot 🙂

How to check Linux server power supply state is Okay / How to find out a Linux Power Supply is broken

Wednesday, January 6th, 2021

Reading Time: 4minutes

2U-power-supplies-get-status-if-Power-supply-broken-information-linux-ipmitool

If you're a sysadmin and managing remotely Linux servers, every now and then if a machine is hanging without a reason it useful to check the server Power Supply state. I say that because often if the machine is mysteriously hanging and a standard Root Cause Analysis (RCA) on /var/log/messages /var/log/dmesg /var/log/boot etc. did not bring you to any different conclusion. The next step after you send a technician to reboot the machine is to check on Linux OS level whether Power Supply Unit (PSU) hardware on the machine does not have some issues.
As blogged earlier on how to use ipmitool to manage remote ILO remote boards etc. the ipmitool can also be used to check status of Server PSUs.

Below is example output of 2 PSU server whose Power Supplies are functioning normally.
 

[root@linux-server ~]# ipmitool sdr type "Power Supply"

PS Heavy Load    | 2Bh | ok  | 19.1 | State Deasserted
Power Supply 1   | 70h | ok  | 10.1 | Presence detected
Power Supply 2   | 71h | ok  | 10.2 | Presence detected
PS Configuration | 72h | ok  | 19.1 |
PS 1 Therm Fault | 75h | ok  | 10.1 | Transition to OK
PS 2 Therm Fault | 76h | ok  | 10.2 | Transition to OK
PS1 12V OV Fault | 77h | ok  | 10.1 | Transition to OK
PS2 12V OV Fault | 78h | ok  | 10.2 | Transition to OK
PS1 12V UV Fault | 79h | ok  | 10.1 | Transition to OK
PS2 12V UV Fault | 7Ah | ok  | 10.2 | Transition to OK
PS1 12V OC Fault | 7Bh | ok  | 10.1 | Transition to OK
PS2 12V OC Fault | 7Ch | ok  | 10.2 | Transition to OK
PS1 12Vaux Fault | 7Dh | ok  | 10.1 | Transition to OK
PS2 12Vaux Fault | 7Eh | ok  | 10.2 | Transition to OK
Power Unit       | 7Fh | ok  | 19.1 | Fully Redundant

Now if you have a server lets say on an old ProLiant DL360e Gen8 whose Power Supply is damaged, you will get an from ipmitool similar to:

[root@linux-server  systemd]# ipmitool sdr type "Power Supply"
Power Supply 1   | 30h | ok  | 10.1 | 100 Watts, Presence detected
Power Supply 2   | 31h | ok  | 10.2 | 0 Watts, Presence detected, Failure detected, Power Supply AC lost
Power Supplies   | 33h | ok  | 10.3 | Redundancy Lost


If you don't have ipmitool installed due to security or whatever but you have the hardware detection software dmidecode you can use it too to get the Power Supply state

[root@linux-server  systemd]# dmidecode -t chassis
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 2.8 present.

 

Handle 0x0300, DMI type 3, 21 bytes
Chassis Information
        Manufacturer: HP
        Type: Rack Mount Chassis
        Lock: Not Present
        Version: Not Specified
        Serial Number: CZJ38201ZH
        Asset Tag:
        Boot-up State: Critical
        Power Supply State: Critical

        Thermal State: Safe
        Security Status: Unknown
        OEM Information: 0x00000000
        Height: 1 U
        Number Of Power Cords: 2
        Contained Elements: 0

To find only Power Supply info status on a server with dmideode.

# dmidecode –type 39

monitoring-power-supply-hardware-information-linux-ipmitool

Plug between the power supply and the mainboard voltage / coms ATX specification

This can also be used on a normal Linux desktop PCs which usually have only 1U (one power supply) on many of Ubuntus and Linux desktops where lshw (list hardaware information) is installed to get the machine PSUs status with lshw 

 root@ubuntu:~# lshw -c power
  *-battery               
       product: 45N1111
       vendor: SONY
       physical id: 1
       slot: Front
       capacity: 23200mWh
       configuration: voltage=11.1V
        Thermal State: Safe
        Security Status: Unknown
        OEM Information: 0x00000000
        Height: 1 U
        Number Of Power Cords: 2
        Contained Elements: 0


Finally to get an extensive information on the voltages of the Power Supply you can use the good old lm_sensors.

# apt-get install lm-sensors
# sensors-detect 
# service kmod start

# sensors
# watch sensors


As manually monitoring Power Supplies and other various data is dubious, finally you might want to use some centralized monitoring. For one example on that you might want to check my prior Zabbix to Monitor Hardware Hard Drive / Temperature and Disk with lm_sensors / smartd on Linux with Zabbix.

Common commands to Repair Broken or Unbootable Windows XP / Vista / 7 without system Re-install

Monday, October 22nd, 2012

Reading Time: 3minutesCommon commands to repair broken unbootable Windows XP 7 and Vista, the famous genuine great! mfc command :)
If you have severe problems with Windows 7 / Vista / XP whatever and you don’t want to re-install. It is handy to know about the existence of few commands, which can help you fix your basis Windows system without re-install. This commands are not to fix 100% of messed up Windows installs but in most cases I know, they either improved the state of the system or fixed it completely, so here are commands:

1. Change permissions of C:\boot.ini and delete it

Many Viruses install via standard Windows boot.ini file and change permissions of the file to make it hard to delete by programs and by
Administrator user. To solve that in Windows Safe Mode (without networking) exec:


C:\> COPY boot.ini boot.bak
C:\> ATTRIB -RSH C:\boot.ini
C:\> DEL boot.ini

-RSH ATTRIB cmd options instruct it to remove Read Only, System and Hidden flags from boot.ini.

2. Re-build boot.ini and other essential Win boot components


C:\> CD WINDOWS
C:\WINDOWS> BOOTCFG /REBUILD

BOOTCFG /Rebuild is a very important for recovering. The command will do complete evaluation with diagnostic tests trying to replace / repair whatever files are preventing OS boot.

3. Fix problems with Unbootable Windows Systems

If the system is completely unbootable you need to use the Windows Install (Setup) CDRecovery Console and in first boot up (blue screen), type R key to enter recovery console. There is option of automated recovery console but for me Automated System Recovery – ASR, never worked.
Once in Recovery Console to repair broken Windows boot up (fix winboot loader):


C:\WINDOWS> BOOTCFG /Rebuild
C:\WINDOWS> CHKDSK /R /F
C:\WINDOWS> FIXBOOT

You should already know from the MS-DOS, DR-DOS times CHKDSK (Check Disk) is thanksfully still on every next Windows release. As CHKDSK does a hard drive check for irregularities and BAD Blocks (depending on the size of HDD) it takes time usually from 30 minutes to 1 hour.
4. System File Checker (SFC) command – restore basis .DLLs and others to Setup CD (install) originals


C:\WINDOWS> SFC /SCANNOW

SFC – has been useful in many, many Windows installs, I fixed it is really precious cmd. It does check the system essential (DLL – Dynamic Loadable Libraries) and matches them against a clean working copy which was copied on the system by Windows Install (CD) SETUP program. If some of the primary win .EXE or .DLL files checksums are not matching, the file is substituted with a clean (working) copy of the Install CD original ones. Some Viruses and Spyware might change those original (clean) binary files placed by Windows during install time. So intelligent Virus progs are very rare so in lets say 90% of broken Windows installs SFC /SCANNOW solves problems with main win files 🙂

If you have doubt that those binaries which SFC matches with are changed, you can always use a Setup Install CD with same Service Pack version as installed on the host. To restore main Windows binaries and libs using the external recovery CD use:


C:\WINDOWS> sfc /offbootdir=c:\ /offwindir=c:\windows /scannow

This tutorial should solve also all kind of start-up errors like:


Windows could not start because the following file is missing or corrupt:
\\WINDOWS\\SYSTEM32\\CONFIG\\SYSTEM


You can attempt to repair this file by starting Windows Setup
using the original Setup CD-ROM.
Select ‘R’ at the first screen to start repair.

Windows NT could not start because the below file is missing or corrupt:
X:\\WINNT\\System32\\Ntoskrnl.exe


Windows NT could not start because the below file is missing or corrupt:
X:\\WINNT\\System32\\HAL.dll


NTLDR is Missing
Press any key to restart


Invalid boot.ini
Press any key to restart