This is a cache of https://docs.okd.io/4.17/edge_computing/day_2_core_cnf_clusters/updating/telco-update-completing-the-control-plane-only-update.html. It is a snapshot of the page at 2025-10-22T01:35:22.356+0000.
Completing the Control Plane Only update - Day 2 operations for telco core CNF clusters | Edge computing | OKD 4.17
×

Follow these steps to perform the Control Plane Only cluster update and monitor the update through to completion.

Control Plane Only updates were previously known as EUS-to-EUS updates. Control Plane Only updates are only viable between even-numbered minor versions of OKD.

Acknowledging the Control Plane Only or y-stream update

When you update to all versions from 4.11 and later, you must manually acknowledge that the update can continue.

Before you acknowledge the update, verify that you are not using any of the Kubernetes APIs that are removed from the version you are updating to. For example, in OKD 4.17, there are no API removals. See "Kubernetes API removals" for more information.

Prerequisites
  • You have verified that APIs for all of the applications running on your cluster are compatible with the next Y-stream release of OKD. For more details about compatibility, see "Verifying cluster API versions between update versions".

Procedure
  • Complete the administrative acknowledgment to start the cluster update by running the following command:

    $ oc adm upgrade

    If the cluster update does not complete successfully, more details about the update failure are provided in the Reason and Message sections.

    Example output
    Cluster version is 4.15.45
    
    Upgradeable=False
    
      Reason: MultipleReasons
      Message: Cluster should not be upgraded between minor versions for multiple reasons: AdminAckRequired,ResourceDeletesInProgress
      * Kubernetes 1.29 and therefore OpenShift 4.16 remove several APIs which require admin consideration. Please see the knowledge article https://access.redhat.com/articles/7031404 for details and instructions.
      * Cluster minor level upgrades are not allowed while resource deletions are in progress; resources=PrometheusRule "openshift-kube-apiserver/kube-apiserver-recording-rules"
    
    ReleaseAccepted=False
    
      Reason: PreconditionChecks
      Message: Preconditions failed for payload loaded version="4.16.34" image="quay.io/openshift-release-dev/ocp-release@sha256:41bb08c560f6db5039ccdf242e590e8b23049b5eb31e1c4f6021d1d520b353b8": Precondition "ClusterVersionUpgradeable" failed because of "MultipleReasons": Cluster should not be upgraded between minor versions for multiple reasons: AdminAckRequired,ResourceDeletesInProgress
      * Kubernetes 1.29 and therefore OpenShift 4.16 remove several APIs which require admin consideration. Please see the knowledge article https://access.redhat.com/articles/7031404 for details and instructions.
      * Cluster minor level upgrades are not allowed while resource deletions are in progress; resources=PrometheusRule "openshift-kube-apiserver/kube-apiserver-recording-rules"
    
    Upstream is unset, so the cluster will use an appropriate default.
    Channel: eus-4.16 (available channels: candidate-4.15, candidate-4.16, eus-4.16, fast-4.15, fast-4.16, stable-4.15, stable-4.16)
    
    Recommended updates:
    
      VERSION     IMAGE
      4.16.34     quay.io/openshift-release-dev/ocp-release@sha256:41bb08c560f6db5039ccdf242e590e8b23049b5eb31e1c4f6021d1d520b353b8

    In this example, a linked Red Hat Knowledgebase article (Preparing to upgrade to OKD 4.16) provides more detail about verifying API compatibility between releases.

Verification
  • Verify the update by running the following command:

    $ oc get configmap admin-acks -n openshift-config -o json | jq .data
    Example output
    {
      "ack-4.14-kube-1.28-api-removals-in-4.15": "true",
      "ack-4.15-kube-1.29-api-removals-in-4.16": "true"
    }

    In this example, the cluster is updated from version 4.14 to 4.15, and then from 4.15 to 4.16 in a Control Plane Only update.

Starting the cluster update

When updating from one y-stream release to the next, you must ensure that the intermediate z-stream releases are also compatible.

You can verify that you are updating to a viable release by running the oc adm upgrade command. The oc adm upgrade command lists the compatible update releases.

Procedure
  1. Start the update:

    $ oc adm upgrade --to=4.15.33
    • Control Plane Only update: Make sure you point to the interim <y+1> release path

    • Y-stream update - Make sure you use the correct <y.z> release that follows the Kubernetes version skew policy.

    • Z-stream update - Verify that there are no problems moving to that specific release

    Example output
    Requested update to 4.15.33 (1)
    
    1 The Requested update value changes depending on your particular update.
Additional resources

Monitoring the cluster update

You should check the cluster health often during the update. Check for the node status, cluster Operators status and failed pods.

Procedure
  • Monitor the cluster update. For example, to monitor the cluster update from version 4.14 to 4.15, run the following command:

    $ watch "oc get clusterversion; echo; oc get co | head -1; oc get co | grep 4.14; oc get co | grep 4.15; echo; oc get no; echo; oc get po -A | grep -E -iv 'running|complete'"
    Example output
    NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
    version   4.14.34   True        True          4m6s    Working towards 4.15.33: 111 of 873 done (12% complete), waiting on kube-apiserver
    
    NAME                           VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
    authentication                 4.14.34   True        False         False      4d22h
    baremetal                      4.14.34   True        False         False      4d23h
    cloud-controller-manager       4.14.34   True        False         False      4d23h
    cloud-credential               4.14.34   True        False         False      4d23h
    cluster-autoscaler             4.14.34   True        False         False      4d23h
    console                        4.14.34   True        False         False      4d22h
    
    ...
    
    storage                        4.14.34   True        False         False      4d23h
    config-operator                4.15.33   True        False         False      4d23h
    etcd                           4.15.33   True        False         False      4d23h
    
    NAME           STATUS   ROLES                  AGE     VERSION
    ctrl-plane-0   Ready    control-plane,master   4d23h   v1.27.15+6147456
    ctrl-plane-1   Ready    control-plane,master   4d23h   v1.27.15+6147456
    ctrl-plane-2   Ready    control-plane,master   4d23h   v1.27.15+6147456
    worker-0       Ready    mcp-1,worker           4d23h   v1.27.15+6147456
    worker-1       Ready    mcp-2,worker           4d23h   v1.27.15+6147456
    
    NAMESPACE               NAME                       READY   STATUS              RESTARTS   AGE
    openshift-marketplace   redhat-marketplace-rf86t   0/1     ContainerCreating   0          0s
Verification

During the update the watch command cycles through one or several of the cluster Operators at a time, providing a status of the Operator update in the MESSAGE column.

When the cluster Operators update process is complete, each control plane nodes is rebooted, one at a time.

During this part of the update, messages are reported that state cluster Operators are being updated again or are in a degraded state. This is because the control plane node is offline while it reboots nodes.

As soon as the last control plane node reboot is complete, the cluster version is displayed as updated.

When the control plane update is complete a message such as the following is displayed. This example shows an update completed to the intermediate y-stream release.

NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.15.33   True        False         28m     Cluster version is 4.15.33

NAME                         VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication               4.15.33   True        False         False	  5d
baremetal                    4.15.33   True        False         False	  5d
cloud-controller-manager     4.15.33   True        False         False	  5d1h
cloud-credential             4.15.33   True        False         False	  5d1h
cluster-autoscaler           4.15.33   True        False         False	  5d
config-operator              4.15.33   True        False         False	  5d
console                      4.15.33   True        False         False	  5d

...

service-ca                   4.15.33   True        False         False	  5d
storage                      4.15.33   True        False         False	  5d

NAME           STATUS   ROLES                  AGE   VERSION
ctrl-plane-0   Ready    control-plane,master   5d    v1.28.13+2ca1a23
ctrl-plane-1   Ready    control-plane,master   5d    v1.28.13+2ca1a23
ctrl-plane-2   Ready    control-plane,master   5d    v1.28.13+2ca1a23
worker-0       Ready    mcp-1,worker           5d    v1.28.13+2ca1a23
worker-1       Ready    mcp-2,worker           5d    v1.28.13+2ca1a23

Updating the OLM Operators

In telco environments, software needs to vetted before it is loaded onto a production cluster. Production clusters are also configured in a disconnected network, which means that they are not always directly connected to the internet. Because the clusters are in a disconnected network, the OpenShift Operators are configured for manual update during installation so that new versions can be managed on a cluster-by-cluster basis. Perform the following procedure to move the Operators to the newer versions.

Procedure
  1. Check to see which Operators need to be updated:

    $ oc get installplan -A | grep -E 'APPROVED|false'
    Example output
    NAMESPACE           NAME            CSV                                               APPROVAL   APPROVED
    metallb-system      install-nwjnh   metallb-operator.v4.16.0-202409202304             Manual     false
    openshift-nmstate   install-5r7wr   kubernetes-nmstate-operator.4.16.0-202409251605   Manual     false
  2. Patch the InstallPlan resources for those Operators:

    $ oc patch installplan -n metallb-system install-nwjnh --type merge --patch \
    '{"spec":{"approved":true}}'
    Example output
    installplan.operators.coreos.com/install-nwjnh patched
  3. Monitor the namespace by running the following command:

    $ oc get all -n metallb-system
    Example output
    NAME                                                       READY   STATUS              RESTARTS   AGE
    pod/metallb-operator-controller-manager-69b5f884c-8bp22    0/1     ContainerCreating   0          4s
    pod/metallb-operator-controller-manager-77895bdb46-bqjdx   1/1     Running             0          4m1s
    pod/metallb-operator-webhook-server-5d9b968896-vnbhk       0/1     ContainerCreating   0          4s
    pod/metallb-operator-webhook-server-d76f9c6c8-57r4w        1/1     Running             0          4m1s
    
    ...
    
    NAME                                                             DESIRED   CURRENT   READY   AGE
    replicaset.apps/metallb-operator-controller-manager-69b5f884c    1         1         0       4s
    replicaset.apps/metallb-operator-controller-manager-77895bdb46   1         1         1       4m1s
    replicaset.apps/metallb-operator-controller-manager-99b76f88     0         0         0       4m40s
    replicaset.apps/metallb-operator-webhook-server-5d9b968896       1         1         0       4s
    replicaset.apps/metallb-operator-webhook-server-6f7dbfdb88       0         0         0       4m40s
    replicaset.apps/metallb-operator-webhook-server-d76f9c6c8        1         1         1       4m1s

    When the update is complete, the required pods should be in a Running state, and the required ReplicaSet resources should be ready:

    NAME                                                      READY   STATUS    RESTARTS   AGE
    pod/metallb-operator-controller-manager-69b5f884c-8bp22   1/1     Running   0          25s
    pod/metallb-operator-webhook-server-5d9b968896-vnbhk      1/1     Running   0          25s
    
    ...
    
    NAME                                                             DESIRED   CURRENT   READY   AGE
    replicaset.apps/metallb-operator-controller-manager-69b5f884c    1         1         1       25s
    replicaset.apps/metallb-operator-controller-manager-77895bdb46   0         0         0       4m22s
    replicaset.apps/metallb-operator-webhook-server-5d9b968896       1         1         1       25s
    replicaset.apps/metallb-operator-webhook-server-d76f9c6c8        0         0         0       4m22s
Verification
  • Verify that the Operators do not need to be updated for a second time:

    $ oc get installplan -A | grep -E 'APPROVED|false'

    There should be no output returned.

    Sometimes you have to approve an update twice because some Operators have interim z-stream release versions that need to be installed before the final version.

Additional resources

Performing the second y-stream update

After completing the first y-stream update, you must update the y-stream control plane version to the new EUS version.

Procedure
  1. Verify that the <4.y.z> release that you selected is still listed as a good channel to move to:

    $ oc adm upgrade
    Example output
    Cluster version is 4.15.33
    
    Upgradeable=False
    
      Reason: AdminAckRequired
      Message: Kubernetes 1.29 and therefore OpenShift 4.16 remove several APIs which require admin consideration. Please see the knowledge article https://access.redhat.com/articles/7031404 for details and instructions.
    
    Upstream is unset, so the cluster will use an appropriate default.
    Channel: eus-4.16 (available channels: candidate-4.15, candidate-4.16, eus-4.16, fast-4.15, fast-4.16, stable-4.15, stable-4.16)
    
    Recommended updates:
    
      VERSION     IMAGE
      4.16.14     quay.io/openshift-release-dev/ocp-release@sha256:0521a0f1acd2d1b77f76259cb9bae9c743c60c37d9903806a3372c1414253658
      4.16.13     quay.io/openshift-release-dev/ocp-release@sha256:6078cb4ae197b5b0c526910363b8aff540343bfac62ecb1ead9e068d541da27b
      4.15.34     quay.io/openshift-release-dev/ocp-release@sha256:f2e0c593f6ed81250c11d0bac94dbaf63656223477b7e8693a652f933056af6e

    If you update soon after the initial GA of a new Y-stream release, you might not see new y-stream releases available when you run the oc adm upgrade command.

  2. Optional: View the potential update releases that are not recommended. Run the following command:

    $ oc adm upgrade --include-not-recommended
    Example output
    Cluster version is 4.15.33
    
    Upgradeable=False
    
      Reason: AdminAckRequired
      Message: Kubernetes 1.29 and therefore OpenShift 4.16 remove several APIs which require admin consideration. Please see the knowledge article https://access.redhat.com/articles/7031404 for details and instructions.
    
    Upstream is unset, so the cluster will use an appropriate default.Channel: eus-4.16 (available channels: candidate-4.15, candidate-4.16, eus-4.16, fast-4.15, fast-4.16, stable-4.15, stable-4.16)
    
    Recommended updates:
    
      VERSION     IMAGE
      4.16.14     quay.io/openshift-release-dev/ocp-release@sha256:0521a0f1acd2d1b77f76259cb9bae9c743c60c37d9903806a3372c1414253658
      4.16.13     quay.io/openshift-release-dev/ocp-release@sha256:6078cb4ae197b5b0c526910363b8aff540343bfac62ecb1ead9e068d541da27b
      4.15.34     quay.io/openshift-release-dev/ocp-release@sha256:f2e0c593f6ed81250c11d0bac94dbaf63656223477b7e8693a652f933056af6e
    
    Supported but not recommended updates:
    
      Version: 4.16.15
      Image: quay.io/openshift-release-dev/ocp-release@sha256:671bc35e
      Recommended: Unknown
      Reason: EvaluationFailed
      Message: Exposure to AzureRegistryImagePreservation is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 0
      In Azure clusters, the in-cluster image registry may fail to preserve images on update. https://issues.redhat.com/browse/IR-461

    The example shows a potential error that can affect clusters hosted in Microsoft Azure. It does not show risks for bare-metal clusters.

Acknowledging the y-stream release update

When moving between y-stream releases, you must run a patch command to explicitly acknowledge the update. In the output of the oc adm upgrade command, a URL is provided that shows the specific command to run.

Before you acknowledge the update, verify that you are not using any of the Kubernetes APIs that are removed from the version you are updating to. For example, in OKD 4.17, there are no API removals. See "Kubernetes API removals" for more information.

Prerequisites
  • You have verified that APIs for all of the applications running on your cluster are compatible with the next Y-stream release of OKD. For more details about compatibility, see "Verifying cluster API versions between update versions".

Procedure
  • Complete the administrative acknowledgment to start the cluster update by running the following command:

    $ oc adm upgrade

    If the cluster update does not complete successfully, more details about the update failure are provided in the Reason and Message sections.

    Example output
    Cluster version is 4.15.45
    
    Upgradeable=False
    
      Reason: MultipleReasons
      Message: Cluster should not be upgraded between minor versions for multiple reasons: AdminAckRequired,ResourceDeletesInProgress
      * Kubernetes 1.29 and therefore OpenShift 4.16 remove several APIs which require admin consideration. Please see the knowledge article https://access.redhat.com/articles/7031404 for details and instructions.
      * Cluster minor level upgrades are not allowed while resource deletions are in progress; resources=PrometheusRule "openshift-kube-apiserver/kube-apiserver-recording-rules"
    
    ReleaseAccepted=False
    
      Reason: PreconditionChecks
      Message: Preconditions failed for payload loaded version="4.16.34" image="quay.io/openshift-release-dev/ocp-release@sha256:41bb08c560f6db5039ccdf242e590e8b23049b5eb31e1c4f6021d1d520b353b8": Precondition "ClusterVersionUpgradeable" failed because of "MultipleReasons": Cluster should not be upgraded between minor versions for multiple reasons: AdminAckRequired,ResourceDeletesInProgress
      * Kubernetes 1.29 and therefore OpenShift 4.16 remove several APIs which require admin consideration. Please see the knowledge article https://access.redhat.com/articles/7031404 for details and instructions.
      * Cluster minor level upgrades are not allowed while resource deletions are in progress; resources=PrometheusRule "openshift-kube-apiserver/kube-apiserver-recording-rules"
    
    Upstream is unset, so the cluster will use an appropriate default.
    Channel: eus-4.16 (available channels: candidate-4.15, candidate-4.16, eus-4.16, fast-4.15, fast-4.16, stable-4.15, stable-4.16)
    
    Recommended updates:
    
      VERSION     IMAGE
      4.16.34     quay.io/openshift-release-dev/ocp-release@sha256:41bb08c560f6db5039ccdf242e590e8b23049b5eb31e1c4f6021d1d520b353b8

    In this example, a linked Red Hat Knowledgebase article (Preparing to upgrade to OKD 4.16) provides more detail about verifying API compatibility between releases.

Verification
  • Verify the update by running the following command:

    $ oc get configmap admin-acks -n openshift-config -o json | jq .data
    Example output
    {
      "ack-4.14-kube-1.28-api-removals-in-4.15": "true",
      "ack-4.15-kube-1.29-api-removals-in-4.16": "true"
    }

    In this example, the cluster is updated from version 4.14 to 4.15, and then from 4.15 to 4.16 in a Control Plane Only update.

Starting the y-stream control plane update

After you have determined the full new release that you are moving to, you can run the oc adm upgrade –to=x.y.z command.

Procedure
  • Start the y-stream control plane update. For example, run the following command:

    $ oc adm upgrade --to=4.16.14
    Example output
    Requested update to 4.16.14

    You might move to a z-stream release that has potential issues with platforms other than the one you are running on. The following example shows a potential problem for cluster updates on Microsoft Azure:

    $ oc adm upgrade --to=4.16.15
    Example output
    error: the update 4.16.15 is not one of the recommended updates, but is available as a conditional update. To accept the Recommended=Unknown risk and to proceed with update use --allow-not-recommended.
      Reason: EvaluationFailed
      Message: Exposure to AzureRegistryImagePreservation is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 0
      In Azure clusters, the in-cluster image registry may fail to preserve images on update. https://issues.redhat.com/browse/IR-461

    The example shows a potential error that can affect clusters hosted in Microsoft Azure. It does not show risks for bare-metal clusters.

    $ oc adm upgrade --to=4.16.15 --allow-not-recommended
    Example output
    warning: with --allow-not-recommended you have accepted the risks with 4.14.11 and bypassed Recommended=Unknown EvaluationFailed: Exposure to AzureRegistryImagePreservation is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 0
    In Azure clusters, the in-cluster image registry may fail to preserve images on update. https://issues.redhat.com/browse/IR-461
    
    Requested update to 4.16.15

Monitoring the second part of a <y+1> cluster update

Monitor the second part of the cluster update to the <y+1> version.

Procedure
  • Monitor the progress of the second part of the <y+1> update. For example, to monitor the update from 4.15 to 4.16, run the following command:

    $ watch "oc get clusterversion; echo; oc get co | head -1; oc get co | grep 4.15; oc get co | grep 4.16; echo; oc get no; echo; oc get po -A | grep -E -iv 'running|complete'"
    Example output
    NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
    version   4.15.33   True        True          10m     Working towards 4.16.14: 132 of 903 done (14% complete), waiting on kube-controller-manager, kube-scheduler
    
    NAME                         VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
    authentication               4.15.33   True        False         False      5d3h
    baremetal                    4.15.33   True        False         False      5d4h
    cloud-controller-manager     4.15.33   True        False         False      5d4h
    cloud-credential             4.15.33   True        False         False      5d4h
    cluster-autoscaler           4.15.33   True        False         False      5d4h
    console                      4.15.33   True        False         False      5d3h
    ...
    config-operator              4.16.14   True        False         False      5d4h
    etcd                         4.16.14   True        False         False      5d4h
    kube-apiserver               4.16.14   True        True          False      5d4h    NodeInstallerProgressing: 1 node is at revision 15; 2 nodes are at revision 17
    
    NAME           STATUS   ROLES                  AGE    VERSION
    ctrl-plane-0   Ready    control-plane,master   5d4h   v1.28.13+2ca1a23
    ctrl-plane-1   Ready    control-plane,master   5d4h   v1.28.13+2ca1a23
    ctrl-plane-2   Ready    control-plane,master   5d4h   v1.28.13+2ca1a23
    worker-0       Ready    mcp-1,worker           5d4h   v1.27.15+6147456
    worker-1       Ready    mcp-2,worker           5d4h   v1.27.15+6147456
    
    NAMESPACE                                          NAME                                                              READY   STATUS      RESTARTS       AGE
    openshift-kube-apiserver                           kube-apiserver-ctrl-plane-0                                       0/5     Pending     0              <invalid>

    As soon as the last control plane node is complete, the cluster version is updated to the new EUS release. For example:

    NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
    version   4.16.14   True        False         123m    Cluster version is 4.16.14
    
    NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
    authentication                             4.16.14   True        False         False	  5d6h
    baremetal                                  4.16.14   True        False         False	  5d7h
    cloud-controller-manager                   4.16.14   True        False         False	  5d7h
    cloud-credential                           4.16.14   True        False         False	  5d7h
    cluster-autoscaler                         4.16.14   True        False         False	  5d7h
    config-operator                            4.16.14   True        False         False	  5d7h
    console                                    4.16.14   True        False         False	  5d6h
    #...
    operator-lifecycle-manager-packageserver   4.16.14   True        False         False	  5d7h
    service-ca                                 4.16.14   True        False         False	  5d7h
    storage                                    4.16.14   True        False         False	  5d7h
    
    NAME           STATUS   ROLES                  AGE     VERSION
    ctrl-plane-0   Ready    control-plane,master   5d7h    v1.29.8+f10c92d
    ctrl-plane-1   Ready    control-plane,master   5d7h    v1.29.8+f10c92d
    ctrl-plane-2   Ready    control-plane,master   5d7h    v1.29.8+f10c92d
    worker-0       Ready    mcp-1,worker           5d7h    v1.27.15+6147456
    worker-1       Ready    mcp-2,worker           5d7h    v1.27.15+6147456
Additional resources

Updating all the OLM Operators

In the second phase of a multi-version upgrade, you must approve all of the Operators and additionally add installations plans for any other Operators that you want to upgrade.

Follow the same procedure as outlined in "Updating the OLM Operators". Ensure that you also update any non-OLM Operators as required.

Procedure
  1. Monitor the cluster update. For example, to monitor the cluster update from version 4.14 to 4.15, run the following command:

    $ watch "oc get clusterversion; echo; oc get co | head -1; oc get co | grep 4.14; oc get co | grep 4.15; echo; oc get no; echo; oc get po -A | grep -E -iv 'running|complete'"
  2. Check to see which Operators need to be updated:

    $ oc get installplan -A | grep -E 'APPROVED|false'
  3. Patch the InstallPlan resources for those Operators:

    $ oc patch installplan -n metallb-system install-nwjnh --type merge --patch \
    '{"spec":{"approved":true}}'
  4. Monitor the namespace by running the following command:

    $ oc get all -n metallb-system

    When the update is complete, the required pods should be in a Running state, and the required ReplicaSet resources should be ready.

Verification

During the update the watch command cycles through one or several of the cluster Operators at a time, providing a status of the Operator update in the MESSAGE column.

When the cluster Operators update process is complete, each control plane nodes is rebooted, one at a time.

During this part of the update, messages are reported that state cluster Operators are being updated again or are in a degraded state. This is because the control plane node is offline while it reboots nodes.

Updating the worker nodes

You upgrade the worker nodes after you have updated the control plane by unpausing the relevant mcp groups you created. Unpausing the mcp group starts the upgrade process for the worker nodes in that group. Each of the worker nodes in the cluster reboot to upgrade to the new EUS, y-stream or z-stream version as required.

In the case of Control Plane Only upgrades note that when a worker node is updated it will only require one reboot and will jump <y+2>-release versions. This is a feature that was added to decrease the amount of time that it takes to upgrade large bare-metal clusters.

This is a potential holding point. You can have a cluster version that is fully supported to run in production with the control plane that is updated to a new EUS release while the worker nodes are at a <y-2>-release. This allows large clusters to upgrade in steps across several maintenance windows.

  1. You can check how many nodes are managed in an mcp group. Run the following command to get the list of mcp groups:

    $ oc get mcp
    Example output
    NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
    master   rendered-master-c9a52144456dbff9c9af9c5a37d1b614   True      False      False      3              3                   3                     0                      36d
    mcp-1    rendered-mcp-1-07fe50b9ad51fae43ed212e84e1dcc8e    False     False      False      1              0                   0                     0                      47h
    mcp-2    rendered-mcp-2-07fe50b9ad51fae43ed212e84e1dcc8e    False     False      False      1              0                   0                     0                      47h
    worker   rendered-worker-f1ab7b9a768e1b0ac9290a18817f60f0   True      False      False      0              0                   0                     0                      36d

    You decide how many mcp groups to upgrade at a time. This depends on how many CNF pods can be taken down at a time and how your pod disruption budget and anti-affinity settings are configured.

  2. Get the list of nodes in the cluster:

    $ oc get nodes
    Example output
    NAME           STATUS   ROLES                  AGE    VERSION
    ctrl-plane-0   Ready    control-plane,master   5d8h   v1.29.8+f10c92d
    ctrl-plane-1   Ready    control-plane,master   5d8h   v1.29.8+f10c92d
    ctrl-plane-2   Ready    control-plane,master   5d8h   v1.29.8+f10c92d
    worker-0       Ready    mcp-1,worker           5d8h   v1.27.15+6147456
    worker-1       Ready    mcp-2,worker           5d8h   v1.27.15+6147456
  3. Confirm the MachineConfigPool groups that are paused:

    $ oc get mcp -o json | jq -r '["MCP","Paused"], ["---","------"], (.items[] | [(.metadata.name), (.spec.paused)]) | @tsv' | grep -v worker
    Example output
    MCP     Paused
    ---     ------
    master  false
    mcp-1   true
    mcp-2   true

    Each MachineConfigPool can be unpaused independently. Therefore, if a maintenance window runs out of time other MCPs do not need to be unpaused immediately. The cluster is supported to run with some worker nodes still at <y-2>-release version.

  4. Unpause the required mcp group to begin the upgrade:

    $ oc patch mcp/mcp-1 --type merge --patch '{"spec":{"paused":false}}'
    Example output
    machineconfigpool.machineconfiguration.openshift.io/mcp-1 patched
  5. Confirm that the required mcp group is unpaused:

    $ oc get mcp -o json | jq -r '["MCP","Paused"], ["---","------"], (.items[] | [(.metadata.name), (.spec.paused)]) | @tsv' | grep -v worker
    Example output
    MCP     Paused
    ---     ------
    master  false
    mcp-1   false
    mcp-2   true
  6. As each mcp group is upgraded, continue to unpause and upgrade the remaining nodes.

    $ oc get nodes
    Example output
    NAME           STATUS                        ROLES                  AGE    VERSION
    ctrl-plane-0   Ready                         control-plane,master   5d8h   v1.29.8+f10c92d
    ctrl-plane-1   Ready                         control-plane,master   5d8h   v1.29.8+f10c92d
    ctrl-plane-2   Ready                         control-plane,master   5d8h   v1.29.8+f10c92d
    worker-0       Ready                         mcp-1,worker           5d8h   v1.29.8+f10c92d
    worker-1       NotReady,SchedulingDisabled   mcp-2,worker           5d8h   v1.27.15+6147456

Verifying the health of the newly updated cluster

Run the following commands after updating the cluster to verify that the cluster is back up and running.

Procedure
  1. Check the cluster version by running the following command:

    $ oc get clusterversion
    Example output
    NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
    version   4.16.14   True        False         4h38m   Cluster version is 4.16.14

    This should return the new cluster version and the PROGRESSING column should return False.

  2. Check that all nodes are ready:

    $ oc get nodes
    Example output
    NAME           STATUS   ROLES                  AGE    VERSION
    ctrl-plane-0   Ready    control-plane,master   5d9h   v1.29.8+f10c92d
    ctrl-plane-1   Ready    control-plane,master   5d9h   v1.29.8+f10c92d
    ctrl-plane-2   Ready    control-plane,master   5d9h   v1.29.8+f10c92d
    worker-0       Ready    mcp-1,worker           5d9h   v1.29.8+f10c92d
    worker-1       Ready    mcp-2,worker           5d9h   v1.29.8+f10c92d

    All nodes in the cluster should be in a Ready status and running the same version.

  3. Check that there are no paused mcp resources in the cluster:

    $ oc get mcp -o json | jq -r '["MCP","Paused"], ["---","------"], (.items[] | [(.metadata.name), (.spec.paused)]) | @tsv' | grep -v worker
    Example output
    MCP     Paused
    ---     ------
    master  false
    mcp-1   false
    mcp-2   false
  4. Check that all cluster Operators are available:

    $ oc get co
    Example output
    NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
    authentication                             4.16.14   True        False         False      5d9h
    baremetal                                  4.16.14   True        False         False      5d9h
    cloud-controller-manager                   4.16.14   True        False         False      5d10h
    cloud-credential                           4.16.14   True        False         False      5d10h
    cluster-autoscaler                         4.16.14   True        False         False      5d9h
    config-operator                            4.16.14   True        False         False      5d9h
    console                                    4.16.14   True        False         False      5d9h
    control-plane-machine-set                  4.16.14   True        False         False      5d9h
    csi-snapshot-controller                    4.16.14   True        False         False      5d9h
    dns                                        4.16.14   True        False         False      5d9h
    etcd                                       4.16.14   True        False         False      5d9h
    image-registry                             4.16.14   True        False         False      85m
    ingress                                    4.16.14   True        False         False      5d9h
    insights                                   4.16.14   True        False         False      5d9h
    kube-apiserver                             4.16.14   True        False         False      5d9h
    kube-controller-manager                    4.16.14   True        False         False      5d9h
    kube-scheduler                             4.16.14   True        False         False      5d9h
    kube-storage-version-migrator              4.16.14   True        False         False      4h48m
    machine-api                                4.16.14   True        False         False      5d9h
    machine-approver                           4.16.14   True        False         False      5d9h
    machine-config                             4.16.14   True        False         False      5d9h
    marketplace                                4.16.14   True        False         False      5d9h
    monitoring                                 4.16.14   True        False         False      5d9h
    network                                    4.16.14   True        False         False      5d9h
    node-tuning                                4.16.14   True        False         False      5d7h
    openshift-apiserver                        4.16.14   True        False         False      5d9h
    openshift-controller-manager               4.16.14   True        False         False      5d9h
    openshift-samples                          4.16.14   True        False         False      5h24m
    operator-lifecycle-manager                 4.16.14   True        False         False      5d9h
    operator-lifecycle-manager-catalog         4.16.14   True        False         False      5d9h
    operator-lifecycle-manager-packageserver   4.16.14   True        False         False      5d9h
    service-ca                                 4.16.14   True        False         False      5d9h
    storage                                    4.16.14   True        False         False      5d9h

    All cluster Operators should report True in the AVAILABLE column.

  5. Check that all pods are healthy:

    $ oc get po -A | grep -E -iv 'complete|running'

    This should not return any pods.

    You might see a few pods still moving after the update. Watch this for a while to make sure all pods are cleared.