$ oc get pods -n openshift-etcd -l app=etcd --field-selector="status.phase==Running"
You can use the quorum-restore.sh
script to restore etcd quorum on clusters that are offline due to quorum loss. When quorum is lost, the OKD API becomes read-only. After quorum is restored, the OKD API returns to read/write mode.
You can use the quorum-restore.sh
script to instantly bring back a new single-member etcd cluster based on its local data directory and mark all other members as invalid by retiring the previous cluster identifier. No prior backup is required to restore the control plane from.
For high availability (HA) clusters, a three-node HA cluster requires you to shut down etcd on two hosts to avoid a cluster split. On four-node and five-node HA clusters, you must shut down three hosts. Quorum requires a simple majority of nodes. The minimum number of nodes required for quorum on a three-node HA cluster is two. On four-node and five-node HA clusters, the minimum number of nodes required for quorum is three. If you start a new cluster from backup on your recovery host, the other etcd members might still be able to form quorum and continue service.
You might experience data loss if the host that runs the restoration does not have all data replicated to it. |
Quorum restoration should not be used to decrease the number of nodes outside of the restoration process. Decreasing the number of nodes results in an unsupported cluster configuration. |
You have SSH access to the node used to restore quorum.
Select a control plane host to use as the recovery host. You run the restore operation on this host.
List the running etcd pods by running the following command:
$ oc get pods -n openshift-etcd -l app=etcd --field-selector="status.phase==Running"
Choose a pod and run the following command to obtain its IP address:
$ oc exec -n openshift-etcd <etcd-pod> -c etcdctl -- etcdctl endpoint status -w table
Note the IP address of a member that is not a learner and has the highest Raft index.
Run the following command and note the node name that corresponds to the IP address of the chosen etcd member:
$ oc get nodes -o jsonpath='{range .items[*]}[{.metadata.name},{.status.addresses[?(@.type=="InternalIP")].address}]{end}'
Using SSH, connect to the chosen recovery node and run the following command to restore etcd quorum:
$ sudo -E /usr/local/bin/quorum-restore.sh
After a few minutes, the nodes that went down are automatically synchronized with the node that the recovery script was run on. Any remaining online nodes automatically rejoin the new etcd cluster created by the quorum-restore.sh
script. This process takes a few minutes.
Exit the SSH session.
Return to a three-node configuration if any nodes are offline. Repeat the following steps for each node that is offline to delete and re-create them. After the machines are re-created, a new revision is forced and etcd automatically scales up.
If you use a user-provisioned bare-metal installation, you can re-create a control plane machine by using the same method that you used to originally create it. For more information, see "Installing a user-provisioned cluster on bare metal".
Do not delete and re-create the machine for the recovery host. |
If you are running installer-provisioned infrastructure, or you used the Machine API to create your machines, follow these steps:
Do not delete and re-create the machine for the recovery host. For bare-metal installations on installer-provisioned infrastructure, control plane machines are not re-created. For more information, see "Replacing a bare-metal control plane node". |
Obtain the machine for one of the offline nodes.
In a terminal that has access to the cluster as a cluster-admin
user, run the following command:
$ oc get machines -n openshift-machine-api -o wide
NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE
clustername-8qw5l-master-0 Running m4.xlarge us-east-1 us-east-1a 3h37m ip-10-0-131-183.ec2.internal aws:///us-east-1a/i-0ec2782f8287dfb7e stopped (1)
clustername-8qw5l-master-1 Running m4.xlarge us-east-1 us-east-1b 3h37m ip-10-0-143-125.ec2.internal aws:///us-east-1b/i-096c349b700a19631 running
clustername-8qw5l-master-2 Running m4.xlarge us-east-1 us-east-1c 3h37m ip-10-0-154-194.ec2.internal aws:///us-east-1c/i-02626f1dba9ed5bba running
clustername-8qw5l-worker-us-east-1a-wbtgd Running m4.large us-east-1 us-east-1a 3h28m ip-10-0-129-226.ec2.internal aws:///us-east-1a/i-010ef6279b4662ced running
clustername-8qw5l-worker-us-east-1b-lrdxb Running m4.large us-east-1 us-east-1b 3h28m ip-10-0-144-248.ec2.internal aws:///us-east-1b/i-0cb45ac45a166173b running
clustername-8qw5l-worker-us-east-1c-pkg26 Running m4.large us-east-1 us-east-1c 3h28m ip-10-0-170-181.ec2.internal aws:///us-east-1c/i-06861c00007751b0a running
1 | This is the control plane machine for the offline node, ip-10-0-131-183.ec2.internal . |
Delete the machine of the offline node by running:
$ oc delete machine -n openshift-machine-api clustername-8qw5l-master-0 (1)
1 | Specify the name of the control plane machine for the offline node. |
A new machine is automatically provisioned after deleting the machine of the offline node.
Verify that a new machine has been created by running:
$ oc get machines -n openshift-machine-api -o wide
NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE
clustername-8qw5l-master-1 Running m4.xlarge us-east-1 us-east-1b 3h37m ip-10-0-143-125.ec2.internal aws:///us-east-1b/i-096c349b700a19631 running
clustername-8qw5l-master-2 Running m4.xlarge us-east-1 us-east-1c 3h37m ip-10-0-154-194.ec2.internal aws:///us-east-1c/i-02626f1dba9ed5bba running
clustername-8qw5l-master-3 Provisioning m4.xlarge us-east-1 us-east-1a 85s ip-10-0-173-171.ec2.internal aws:///us-east-1a/i-015b0888fe17bc2c8 running (1)
clustername-8qw5l-worker-us-east-1a-wbtgd Running m4.large us-east-1 us-east-1a 3h28m ip-10-0-129-226.ec2.internal aws:///us-east-1a/i-010ef6279b4662ced running
clustername-8qw5l-worker-us-east-1b-lrdxb Running m4.large us-east-1 us-east-1b 3h28m ip-10-0-144-248.ec2.internal aws:///us-east-1b/i-0cb45ac45a166173b running
clustername-8qw5l-worker-us-east-1c-pkg26 Running m4.large us-east-1 us-east-1c 3h28m ip-10-0-170-181.ec2.internal aws:///us-east-1c/i-06861c00007751b0a running
1 | The new machine, clustername-8qw5l-master-3 is being created and is ready after the phase changes from Provisioning to Running . |
It might take a few minutes for the new machine to be created. The etcd cluster Operator will automatically synchronize when the machine or node returns to a healthy state.
Repeat these steps for each node that is offline.
Wait until the control plane recovers by running the following command:
$ oc adm wait-for-stable-cluster
It can take up to 15 minutes for the control plane to recover. |
If you see no progress rolling out the etcd static pods, you can force redeployment from the etcd cluster Operator by running the following command:
$ oc patch etcd cluster -p='{"spec": {"forceRedeploymentReason": "recovery-'"$(date --rfc-3339=ns )"'"}}' --type=merge