Firstly, I’d like to thank Bence Bertalan for the inspiration for this blog post (you can see his original blog post here: https://www.linkedin.com/pulse/qucik-tip-vmware-bringing-back-semi-dead-esxi-live-bence-bertalan/).
I have actually experienced this issue myself recently with a host that was still running ESXi 6.0. It appears that some activities had taken place on the host which resulted in the storage connection to a LUN becoming unavailable. This resulted in an ‘All Paths Down’ scenario… will would eventually progress onto a ‘Permanent Device Loss’ situation.
With the ‘All Paths Down’ scenario, the ESXi kernel will keep trying to reconnect to the lost device… unfortunately, this continues to retry until all of the kernel resources are focused purely on recovering connectivity to the lost volume.
The reason that this matters, is that you will begin the notice the following items on the host:
There are a number of ways that you could go about correcting this issue.
One option would be to reboot the host, this would require you to RDP onto each of the desktops to power them down, as we want to make sure that they are gracefully shut down before utilising a remote management tool (DRAC, IMM, iLO) to reboot the host, if it is remote. As you can tell, this requires downtime on the virtual machines and the host, which is not always possible.
An alternative solution, expanded on below is to restart the relevant services using SSH – please note that this may take several hours to complete depending on the responsiveness of your host: