This is a cache of https://docs.openshift.com/container-platform/4.16/edge_computing/day_2_core_cnf_clusters/updating/telco-update-completing-the-y-stream-update.html. It is a snapshot of the page at 2024-11-29T10:35:06.235+0000.
Completing the y-stream update - Day 2 operations for telco core CNF clusters | Edge computing | OpenShift Container Platform 4.16
×

Follow these steps to perform the y-stream cluster update and monitor the update through to completion. Completing a y-stream update is more straightforward than a Control Plane Only update.

Acknowledging the Control Plane Only or y-stream update

When you update to all versions from 4.11 and later, you must manually acknowledge that the update can continue.

Before you acknowledge the update, verify that there are no Kubernetes APIs in use that are removed in the version you are updating to. For example, in OpenShift Container Platform 4.17, there are no API removals. See "Kubernetes API removals" for more information.

Procedure
  1. Run the following command:

    $ oc -n openshift-config patch cm admin-acks --patch '{"data":{"ack-<update_version_from>-kube-<kube_api_version>-api-removals-in-<update_version_to>":"true"}}' --type=merge

    where:

    <update_version_from>

    Is the cluster version you are moving from, for example, 4.14.

    <kube_api_version>

    Is kube API version, for example, 1.28.

    <update_version_to>

    Is the cluster version you are moving to, for example, 4.15.

Verification
  • Verify the update. Run the following command:

    $ oc get configmap admin-acks -n openshift-config -o json | jq .data
    Example output
    {
      "ack-4.14-kube-1.28-api-removals-in-4.15": "true",
      "ack-4.15-kube-1.29-api-removals-in-4.16": "true"
    }

    In this example, the cluster is updated from version 4.14 to 4.15, and then from 4.15 to 4.16 in a Control Plane Only update.

Additional resources

Starting the cluster update

When updating from one y-stream release to the next, you must ensure that the intermediate z-stream releases are also compatible.

You can verify that you are updating to a viable release by running the oc adm upgrade command. The oc adm upgrade command lists the compatible update releases.

Procedure
  1. Start the update:

    $ oc adm upgrade --to=4.15.33
    • Control Plane Only update: Make sure you point to the interim <y+1> release path

    • Y-stream update - Make sure you use the correct <y.z> release that follows the Kubernetes version skew policy.

    • Z-stream update - Verify that there are no problems moving to that specific release

    Example output
    Requested update to 4.15.33 (1)
    
    1 The Requested update value changes depending on your particular update.
Additional resources

Monitoring the cluster update

You should check the cluster health often during the update. Check for the node status, cluster Operators status and failed pods.

Procedure
  • Monitor the cluster update. For example, to monitor the cluster update from version 4.14 to 4.15, run the following command:

    $ watch "oc get clusterversion; echo; oc get co | head -1; oc get co | grep 4.14; oc get co | grep 4.15; echo; oc get no; echo; oc get po -A | grep -E -iv 'running|complete'"
    Example output
    NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
    version   4.14.34   True        True          4m6s    Working towards 4.15.33: 111 of 873 done (12% complete), waiting on kube-apiserver
    
    NAME                           VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
    authentication                 4.14.34   True        False         False      4d22h
    baremetal                      4.14.34   True        False         False      4d23h
    cloud-controller-manager       4.14.34   True        False         False      4d23h
    cloud-credential               4.14.34   True        False         False      4d23h
    cluster-autoscaler             4.14.34   True        False         False      4d23h
    console                        4.14.34   True        False         False      4d22h
    
    ...
    
    storage                        4.14.34   True        False         False      4d23h
    config-operator                4.15.33   True        False         False      4d23h
    etcd                           4.15.33   True        False         False      4d23h
    
    NAME           STATUS   ROLES                  AGE     VERSION
    ctrl-plane-0   Ready    control-plane,master   4d23h   v1.27.15+6147456
    ctrl-plane-1   Ready    control-plane,master   4d23h   v1.27.15+6147456
    ctrl-plane-2   Ready    control-plane,master   4d23h   v1.27.15+6147456
    worker-0       Ready    mcp-1,worker           4d23h   v1.27.15+6147456
    worker-1       Ready    mcp-2,worker           4d23h   v1.27.15+6147456
    
    NAMESPACE               NAME                       READY   STATUS              RESTARTS   AGE
    openshift-marketplace   redhat-marketplace-rf86t   0/1     ContainerCreating   0          0s
Verification

During the update the watch command cycles through one or several of the cluster Operators at a time, providing a status of the Operator update in the MESSAGE column.

When the cluster Operators update process is complete, each control plane nodes is rebooted, one at a time.

During this part of the update, messages are reported that state cluster Operators are being updated again or are in a degraded state. This is because the control plane node is offline while it reboots nodes.

As soon as the last control plane node reboot is complete, the cluster version is displayed as updated.

When the control plane update is complete a message such as the following is displayed. This example shows an update completed to the intermediate y-stream release.

NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.15.33   True        False         28m     Cluster version is 4.15.33

NAME                         VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication               4.15.33   True        False         False	  5d
baremetal                    4.15.33   True        False         False	  5d
cloud-controller-manager     4.15.33   True        False         False	  5d1h
cloud-credential             4.15.33   True        False         False	  5d1h
cluster-autoscaler           4.15.33   True        False         False	  5d
config-operator              4.15.33   True        False         False	  5d
console                      4.15.33   True        False         False	  5d

...

service-ca                   4.15.33   True        False         False	  5d
storage                      4.15.33   True        False         False	  5d

NAME           STATUS   ROLES                  AGE   VERSION
ctrl-plane-0   Ready    control-plane,master   5d    v1.28.13+2ca1a23
ctrl-plane-1   Ready    control-plane,master   5d    v1.28.13+2ca1a23
ctrl-plane-2   Ready    control-plane,master   5d    v1.28.13+2ca1a23
worker-0       Ready    mcp-1,worker           5d    v1.28.13+2ca1a23
worker-1       Ready    mcp-2,worker           5d    v1.28.13+2ca1a23

Updating the OLM Operators

In telco environments, software needs to vetted before it is loaded onto a production cluster. Production clusters are also configured in a disconnected network, which means that they are not always directly connected to the internet. Because the clusters are in a disconnected network, the OpenShift Operators are configured for manual update during installation so that new versions can be managed on a cluster-by-cluster basis. Perform the following procedure to move the Operators to the newer versions.

Procedure
  1. Check to see which Operators need to be updated:

    $ oc get installplan -A | grep -E 'APPROVED|false'
    Example output
    NAMESPACE           NAME            CSV                                               APPROVAL   APPROVED
    metallb-system      install-nwjnh   metallb-operator.v4.16.0-202409202304             Manual     false
    openshift-nmstate   install-5r7wr   kubernetes-nmstate-operator.4.16.0-202409251605   Manual     false
  2. Patch the InstallPlan resources for those Operators:

    $ oc patch installplan -n metallb-system install-nwjnh --type merge --patch \
    '{"spec":{"approved":true}}'
    Example output
    installplan.operators.coreos.com/install-nwjnh patched
  3. Monitor the namespace by running the following command:

    $ oc get all -n metallb-system
    Example output
    NAME                                                       READY   STATUS              RESTARTS   AGE
    pod/metallb-operator-controller-manager-69b5f884c-8bp22    0/1     ContainerCreating   0          4s
    pod/metallb-operator-controller-manager-77895bdb46-bqjdx   1/1     Running             0          4m1s
    pod/metallb-operator-webhook-server-5d9b968896-vnbhk       0/1     ContainerCreating   0          4s
    pod/metallb-operator-webhook-server-d76f9c6c8-57r4w        1/1     Running             0          4m1s
    
    ...
    
    NAME                                                             DESIRED   CURRENT   READY   AGE
    replicaset.apps/metallb-operator-controller-manager-69b5f884c    1         1         0       4s
    replicaset.apps/metallb-operator-controller-manager-77895bdb46   1         1         1       4m1s
    replicaset.apps/metallb-operator-controller-manager-99b76f88     0         0         0       4m40s
    replicaset.apps/metallb-operator-webhook-server-5d9b968896       1         1         0       4s
    replicaset.apps/metallb-operator-webhook-server-6f7dbfdb88       0         0         0       4m40s
    replicaset.apps/metallb-operator-webhook-server-d76f9c6c8        1         1         1       4m1s

    When the update is complete, the required pods should be in a Running state, and the required ReplicaSet resources should be ready:

    NAME                                                      READY   STATUS    RESTARTS   AGE
    pod/metallb-operator-controller-manager-69b5f884c-8bp22   1/1     Running   0          25s
    pod/metallb-operator-webhook-server-5d9b968896-vnbhk      1/1     Running   0          25s
    
    ...
    
    NAME                                                             DESIRED   CURRENT   READY   AGE
    replicaset.apps/metallb-operator-controller-manager-69b5f884c    1         1         1       25s
    replicaset.apps/metallb-operator-controller-manager-77895bdb46   0         0         0       4m22s
    replicaset.apps/metallb-operator-webhook-server-5d9b968896       1         1         1       25s
    replicaset.apps/metallb-operator-webhook-server-d76f9c6c8        0         0         0       4m22s
Verification
  • Verify that the Operators do not need to be updated for a second time:

    $ oc get installplan -A | grep -E 'APPROVED|false'

    There should be no output returned.

    Sometimes you have to approve an update twice because some Operators have interim z-stream release versions that need to be installed before the final version.

Additional resources

Updating the worker nodes

You upgrade the worker nodes after you have updated the control plane by unpausing the relevant mcp groups you created. Unpausing the mcp group starts the upgrade process for the worker nodes in that group. Each of the worker nodes in the cluster reboot to upgrade to the new EUS, y-stream or z-stream version as required.

In the case of Control Plane Only upgrades note that when a worker node is updated it will only require one reboot and will jump <y+2>-release versions. This is a feature that was added to decrease the amount of time that it takes to upgrade large bare-metal clusters.

This is a potential holding point. You can have a cluster version that is fully supported to run in production with the control plane that is updated to a new EUS release while the worker nodes are at a <y-2>-release. This allows large clusters to upgrade in steps across several maintenance windows.

  1. You can check how many nodes are managed in an mcp group. Run the following command to get the list of mcp groups:

    $ oc get mcp
    Example output
    NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
    master   rendered-master-c9a52144456dbff9c9af9c5a37d1b614   True      False      False      3              3                   3                     0                      36d
    mcp-1    rendered-mcp-1-07fe50b9ad51fae43ed212e84e1dcc8e    False     False      False      1              0                   0                     0                      47h
    mcp-2    rendered-mcp-2-07fe50b9ad51fae43ed212e84e1dcc8e    False     False      False      1              0                   0                     0                      47h
    worker   rendered-worker-f1ab7b9a768e1b0ac9290a18817f60f0   True      False      False      0              0                   0                     0                      36d

    You decide how many mcp groups to upgrade at a time. This depends on how many CNF pods can be taken down at a time and how your pod disruption budget and anti-affinity settings are configured.

  2. Get the list of nodes in the cluster:

    $ oc get nodes
    Example output
    NAME           STATUS   ROLES                  AGE    VERSION
    ctrl-plane-0   Ready    control-plane,master   5d8h   v1.29.8+f10c92d
    ctrl-plane-1   Ready    control-plane,master   5d8h   v1.29.8+f10c92d
    ctrl-plane-2   Ready    control-plane,master   5d8h   v1.29.8+f10c92d
    worker-0       Ready    mcp-1,worker           5d8h   v1.27.15+6147456
    worker-1       Ready    mcp-2,worker           5d8h   v1.27.15+6147456
  3. Confirm the MachineConfigPool groups that are paused:

    $ oc get mcp -o json | jq -r '["MCP","Paused"], ["---","------"], (.items[] | [(.metadata.name), (.spec.paused)]) | @tsv' | grep -v worker
    Example output
    MCP     Paused
    ---     ------
    master  false
    mcp-1   true
    mcp-2   true

    Each MachineConfigPool can be unpaused independently. Therefore, if a maintenance window runs out of time other MCPs do not need to be unpaused immediately. The cluster is supported to run with some worker nodes still at <y-2>-release version.

  4. Unpause the required mcp group to begin the upgrade:

    $ oc patch mcp/mcp-1 --type merge --patch '{"spec":{"paused":false}}'
    Example output
    machineconfigpool.machineconfiguration.openshift.io/mcp-1 patched
  5. Confirm that the required mcp group is unpaused:

    $ oc get mcp -o json | jq -r '["MCP","Paused"], ["---","------"], (.items[] | [(.metadata.name), (.spec.paused)]) | @tsv' | grep -v worker
    Example output
    MCP     Paused
    ---     ------
    master  false
    mcp-1   false
    mcp-2   true
  6. As each mcp group is upgraded, continue to unpause and upgrade the remaining nodes.

    $ oc get nodes
    Example output
    NAME           STATUS                        ROLES                  AGE    VERSION
    ctrl-plane-0   Ready                         control-plane,master   5d8h   v1.29.8+f10c92d
    ctrl-plane-1   Ready                         control-plane,master   5d8h   v1.29.8+f10c92d
    ctrl-plane-2   Ready                         control-plane,master   5d8h   v1.29.8+f10c92d
    worker-0       Ready                         mcp-1,worker           5d8h   v1.29.8+f10c92d
    worker-1       NotReady,SchedulingDisabled   mcp-2,worker           5d8h   v1.27.15+6147456

Verifying the health of the newly updated cluster

Run the following commands after updating the cluster to verify that the cluster is back up and running.

Procedure
  1. Check the cluster version by running the following command:

    $ oc get clusterversion
    Example output
    NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
    version   4.16.14   True        False         4h38m   Cluster version is 4.16.14

    This should return the new cluster version and the PROGRESSING column should return False.

  2. Check that all nodes are ready:

    $ oc get nodes
    Example output
    NAME           STATUS   ROLES                  AGE    VERSION
    ctrl-plane-0   Ready    control-plane,master   5d9h   v1.29.8+f10c92d
    ctrl-plane-1   Ready    control-plane,master   5d9h   v1.29.8+f10c92d
    ctrl-plane-2   Ready    control-plane,master   5d9h   v1.29.8+f10c92d
    worker-0       Ready    mcp-1,worker           5d9h   v1.29.8+f10c92d
    worker-1       Ready    mcp-2,worker           5d9h   v1.29.8+f10c92d

    All nodes in the cluster should be in a Ready status and running the same version.

  3. Check that there are no paused mcp resources in the cluster:

    $ oc get mcp -o json | jq -r '["MCP","Paused"], ["---","------"], (.items[] | [(.metadata.name), (.spec.paused)]) | @tsv' | grep -v worker
    Example output
    MCP     Paused
    ---     ------
    master  false
    mcp-1   false
    mcp-2   false
  4. Check that all cluster Operators are available:

    $ oc get co
    Example output
    NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
    authentication                             4.16.14   True        False         False      5d9h
    baremetal                                  4.16.14   True        False         False      5d9h
    cloud-controller-manager                   4.16.14   True        False         False      5d10h
    cloud-credential                           4.16.14   True        False         False      5d10h
    cluster-autoscaler                         4.16.14   True        False         False      5d9h
    config-operator                            4.16.14   True        False         False      5d9h
    console                                    4.16.14   True        False         False      5d9h
    control-plane-machine-set                  4.16.14   True        False         False      5d9h
    csi-snapshot-controller                    4.16.14   True        False         False      5d9h
    dns                                        4.16.14   True        False         False      5d9h
    etcd                                       4.16.14   True        False         False      5d9h
    image-registry                             4.16.14   True        False         False      85m
    ingress                                    4.16.14   True        False         False      5d9h
    insights                                   4.16.14   True        False         False      5d9h
    kube-apiserver                             4.16.14   True        False         False      5d9h
    kube-controller-manager                    4.16.14   True        False         False      5d9h
    kube-scheduler                             4.16.14   True        False         False      5d9h
    kube-storage-version-migrator              4.16.14   True        False         False      4h48m
    machine-api                                4.16.14   True        False         False      5d9h
    machine-approver                           4.16.14   True        False         False      5d9h
    machine-config                             4.16.14   True        False         False      5d9h
    marketplace                                4.16.14   True        False         False      5d9h
    monitoring                                 4.16.14   True        False         False      5d9h
    network                                    4.16.14   True        False         False      5d9h
    node-tuning                                4.16.14   True        False         False      5d7h
    openshift-apiserver                        4.16.14   True        False         False      5d9h
    openshift-controller-manager               4.16.14   True        False         False      5d9h
    openshift-samples                          4.16.14   True        False         False      5h24m
    operator-lifecycle-manager                 4.16.14   True        False         False      5d9h
    operator-lifecycle-manager-catalog         4.16.14   True        False         False      5d9h
    operator-lifecycle-manager-packageserver   4.16.14   True        False         False      5d9h
    service-ca                                 4.16.14   True        False         False      5d9h
    storage                                    4.16.14   True        False         False      5d9h

    All cluster Operators should report True in the AVAILABLE column.

  5. Check that all pods are healthy:

    $ oc get po -A | grep -E -iv 'complete|running'

    This should not return any pods.

    You might see a few pods still moving after the update. Watch this for a while to make sure all pods are cleared.