System recovery of a Proxmox cluster failure
Normally, the cluster can be automatically recovered from a cluster node after a network or system failure. However, when you need to upgrade or replace the existing cluster node, we have to follow the procedures listed in the following sections to make it work.
Replacing a failed cluster node
In this case, you probably have one broken machine inside your cluster. When you log in to the web management console, you might see that the status of the broken cluster node turns red and shows that the node is not online; this can be seen on the Summary page under Datacenter. Assume that the broken machine is vmsrv02, as shown in the following screenshot:
When you check the status of the cluster with the clustat
command, you will get the following result:
root@vmsrv01# clustat Member Name ID Status ------ ---- ---- ------ vmsrv01 1 Online, Local, rgmanager vmsrv02 2 Offline /dev/block/8:16 ...