This is a cache of https://docs.okd.io/4.12/virt/virtual_machines/virt-triggering-vm-failover-resolving-failed-node.html. It is a snapshot of the page at 2024-11-24T19:39:24.985+0000.
Triggering virtual machine failover by resolving a failed node - Virtual machines | Virtualization | OKD 4.12
×

If a node fails and machine health checks are not deployed on your cluster, virtual machines (VMs) with RunStrategy: Always configured are not automatically relocated to healthy nodes. To trigger VM failover, you must manually delete the Node object.

If you installed your cluster by using installer-provisioned infrastructure and you properly configured machine health checks:

  • Failed nodes are automatically recycled.

  • Virtual machines with RunStrategy set to Always or RerunOnFailure are automatically scheduled on healthy nodes.

Prerequisites

  • A node where a virtual machine was running has the NotReady condition.

  • The virtual machine that was running on the failed node has RunStrategy set to Always.

  • You have installed the OpenShift cli (oc).

Deleting nodes from a bare metal cluster

When you delete a node using the cli, the node object is deleted in Kubernetes, but the pods that exist on the node are not deleted. Any bare pods not backed by a replication controller become inaccessible to OKD. Pods backed by replication controllers are rescheduled to other available nodes. You must delete local manifest pods.

Procedure

Delete a node from an OKD cluster running on bare metal by completing the following steps:

  1. Mark the node as unschedulable:

    $ oc adm cordon <node_name>
  2. Drain all pods on the node:

    $ oc adm drain <node_name> --force=true

    This step might fail if the node is offline or unresponsive. Even if the node does not respond, it might still be running a workload that writes to shared storage. To avoid data corruption, power down the physical hardware before you proceed.

  3. Delete the node from the cluster:

    $ oc delete node <node_name>

    Although the node object is now deleted from the cluster, it can still rejoin the cluster after reboot or if the kubelet service is restarted. To permanently delete the node and all its data, you must decommission the node.

  4. If you powered down the physical hardware, turn it back on so that the node can rejoin the cluster.

Verifying virtual machine failover

After all resources are terminated on the unhealthy node, a new virtual machine instance (VMI) is automatically created on a healthy node for each relocated VM. To confirm that the VMI was created, view all VMIs by using the oc cli.

Listing all virtual machine instances using the cli

You can list all virtual machine instances (VMIs) in your cluster, including standalone VMIs and those owned by virtual machines, by using the oc command-line interface (cli).

Procedure
  • List all VMIs by running the following command:

    $ oc get vmis -A