This is a cache of https://docs.okd.io/4.18/backup_and_restore/control_plane_backup_and_restore/disaster_recovery/quorum-restoration.html. It is a snapshot of the page at 2025-04-01T21:21:11.337+0000.
Quorum restoration - Control plane backup and restore | Backup and restore | OKD 4.18
×

You can use the quorum-restore.sh script to restore etcd quorum on clusters that are offline due to quorum loss. When quorum is lost, the OKD API becomes read-only. After quorum is restored, the OKD API returns to read/write mode.

Restoring etcd quorum for high availability clusters

You can use the quorum-restore.sh script to instantly bring back a new single-member etcd cluster based on its local data directory and mark all other members as invalid by retiring the previous cluster identifier. No prior backup is required to restore the control plane from.

For high availability (HA) clusters, a three-node HA cluster requires you to shut down etcd on two hosts to avoid a cluster split. On four-node and five-node HA clusters, you must shut down three hosts. Quorum requires a simple majority of nodes. The minimum number of nodes required for quorum on a three-node HA cluster is two. On four-node and five-node HA clusters, the minimum number of nodes required for quorum is three. If you start a new cluster from backup on your recovery host, the other etcd members might still be able to form quorum and continue service.

You might experience data loss if the host that runs the restoration does not have all data replicated to it.

Quorum restoration should not be used to decrease the number of nodes outside of the restoration process. Decreasing the number of nodes results in an unsupported cluster configuration.

Prerequisites
  • You have SSH access to the node used to restore quorum.

Procedure
  1. Select a control plane host to use as the recovery host. You run the restore operation on this host.

    1. List the running etcd pods by running the following command:

      $ oc get pods -n openshift-etcd -l app=etcd --field-selector="status.phase==Running"
    2. Choose a pod and run the following command to obtain its IP address:

      $ oc exec -n openshift-etcd <etcd-pod> -c etcdctl -- etcdctl endpoint status -w table

      Note the IP address of a member that is not a learner and has the highest Raft index.

    3. Run the following command and note the node name that corresponds to the IP address of the chosen etcd member:

      $ oc get nodes -o jsonpath='{range .items[*]}[{.metadata.name},{.status.addresses[?(@.type=="InternalIP")].address}]{end}'
  2. Using SSH, connect to the chosen recovery node and run the following command to restore etcd quorum:

    $ sudo -E /usr/local/bin/quorum-restore.sh

    After a few minutes, the nodes that went down are automatically synchronized with the node that the recovery script was run on. Any remaining online nodes automatically rejoin the new etcd cluster created by the quorum-restore.sh script. This process takes a few minutes.

  3. Exit the SSH session.

  4. Return to a three-node configuration if any nodes are offline. Repeat the following steps for each node that is offline to delete and re-create them. After the machines are re-created, a new revision is forced and etcd automatically scales up.

    • If you use a user-provisioned bare-metal installation, you can re-create a control plane machine by using the same method that you used to originally create it. For more information, see "Installing a user-provisioned cluster on bare metal".

      Do not delete and re-create the machine for the recovery host.

    • If you are running installer-provisioned infrastructure, or you used the Machine API to create your machines, follow these steps:

      Do not delete and re-create the machine for the recovery host.

      For bare-metal installations on installer-provisioned infrastructure, control plane machines are not re-created. For more information, see "Replacing a bare-metal control plane node".

      1. Obtain the machine for one of the offline nodes.

        In a terminal that has access to the cluster as a cluster-admin user, run the following command:

        $ oc get machines -n openshift-machine-api -o wide
        Example output:
        NAME                                        PHASE     TYPE        REGION      ZONE         AGE     NODE                           PROVIDERID                              STATE
        clustername-8qw5l-master-0                  Running   m4.xlarge   us-east-1   us-east-1a   3h37m   ip-10-0-131-183.ec2.internal   aws:///us-east-1a/i-0ec2782f8287dfb7e   stopped (1)
        clustername-8qw5l-master-1                  Running   m4.xlarge   us-east-1   us-east-1b   3h37m   ip-10-0-143-125.ec2.internal   aws:///us-east-1b/i-096c349b700a19631   running
        clustername-8qw5l-master-2                  Running   m4.xlarge   us-east-1   us-east-1c   3h37m   ip-10-0-154-194.ec2.internal    aws:///us-east-1c/i-02626f1dba9ed5bba  running
        clustername-8qw5l-worker-us-east-1a-wbtgd   Running   m4.large    us-east-1   us-east-1a   3h28m   ip-10-0-129-226.ec2.internal   aws:///us-east-1a/i-010ef6279b4662ced   running
        clustername-8qw5l-worker-us-east-1b-lrdxb   Running   m4.large    us-east-1   us-east-1b   3h28m   ip-10-0-144-248.ec2.internal   aws:///us-east-1b/i-0cb45ac45a166173b   running
        clustername-8qw5l-worker-us-east-1c-pkg26   Running   m4.large    us-east-1   us-east-1c   3h28m   ip-10-0-170-181.ec2.internal   aws:///us-east-1c/i-06861c00007751b0a   running
        1 This is the control plane machine for the offline node, ip-10-0-131-183.ec2.internal.
      2. Delete the machine of the offline node by running:

        $ oc delete machine -n openshift-machine-api clustername-8qw5l-master-0 (1)
        1 Specify the name of the control plane machine for the offline node.

        A new machine is automatically provisioned after deleting the machine of the offline node.

  5. Verify that a new machine has been created by running:

    $ oc get machines -n openshift-machine-api -o wide
    Example output:
    NAME                                        PHASE          TYPE        REGION      ZONE         AGE     NODE                           PROVIDERID                              STATE
    clustername-8qw5l-master-1                  Running        m4.xlarge   us-east-1   us-east-1b   3h37m   ip-10-0-143-125.ec2.internal   aws:///us-east-1b/i-096c349b700a19631   running
    clustername-8qw5l-master-2                  Running        m4.xlarge   us-east-1   us-east-1c   3h37m   ip-10-0-154-194.ec2.internal    aws:///us-east-1c/i-02626f1dba9ed5bba  running
    clustername-8qw5l-master-3                  Provisioning   m4.xlarge   us-east-1   us-east-1a   85s     ip-10-0-173-171.ec2.internal    aws:///us-east-1a/i-015b0888fe17bc2c8  running (1)
    clustername-8qw5l-worker-us-east-1a-wbtgd   Running        m4.large    us-east-1   us-east-1a   3h28m   ip-10-0-129-226.ec2.internal   aws:///us-east-1a/i-010ef6279b4662ced   running
    clustername-8qw5l-worker-us-east-1b-lrdxb   Running        m4.large    us-east-1   us-east-1b   3h28m   ip-10-0-144-248.ec2.internal   aws:///us-east-1b/i-0cb45ac45a166173b   running
    clustername-8qw5l-worker-us-east-1c-pkg26   Running        m4.large    us-east-1   us-east-1c   3h28m   ip-10-0-170-181.ec2.internal   aws:///us-east-1c/i-06861c00007751b0a   running
    1 The new machine, clustername-8qw5l-master-3 is being created and is ready after the phase changes from Provisioning to Running.

    It might take a few minutes for the new machine to be created. The etcd cluster Operator will automatically synchronize when the machine or node returns to a healthy state.

    1. Repeat these steps for each node that is offline.

  6. Wait until the control plane recovers by running the following command:

    $ oc adm wait-for-stable-cluster

    It can take up to 15 minutes for the control plane to recover.

Troubleshooting
  • If you see no progress rolling out the etcd static pods, you can force redeployment from the etcd cluster Operator by running the following command:

    $ oc patch etcd cluster -p='{"spec": {"forceRedeploymentReason": "recovery-'"$(date --rfc-3339=ns )"'"}}' --type=merge