Investigating disk failures
When a disk fails in VSAN, it is important to address the problem by replacing the disk. A key part of this may be determining why the failure was triggered. Determining which disks have failed is a straightforward operation in vSphere Web Client. Determining the cause of the failure will involve investigating the ESXi system logs.
Getting ready
You should be logged into vSphere Web Client as an administrator
You should be logged in to the affected ESXi host as the root, preferably via SSH
How to do it…
If you have configured VSAN alarms according to Chapter 4, Monitoring VSAN, when a disk fails you will be presented with an alert icon on the ESXi host and Triggered Alarms will reflect a disk error:
The failed disk will also be reflected in the Disk Management view:
From here, it is fairly straightforward to remove the failed disks and replace them if desired. However, finding out why the disks have failed will require examining the applicable host's
/var/log/vmkernel...