Updating a cluster using the CLI - Performing a cluster update | Updating clusters

Prerequisites
Pausing a MachineHealthCheck resource
About updating single node OKD
Updating a cluster by using the CLI
Gathering cluster update status using oc adm upgrade status (Technology Preview)
Updating along a conditional update path
Changing the update server by using the CLI

You can perform minor version and patch updates on an OKD cluster by using the OpenShift CLI (oc).

Prerequisites

Have access to the cluster as a user with admin privileges. See Using RBAC to define and apply permissions.
Have a recent etcd backup in case your update fails and you must restore your cluster to a previous state.
Have a recent Container Storage Interface (CSI) volume snapshot in case you need to restore persistent volumes due to a pod failure.
Your Fedora7 workers are replaced with Fedora8 or FCOS workers. Red Hat does not support in-place Fedora7 to Fedora8 updates for Fedora workers; those hosts must be replaced with a clean operating system install.
You have updated all Operators previously installed through Operator Lifecycle Manager (OLM) to a version that is compatible with your target release. Updating the Operators ensures they have a valid update path when the default OperatorHub catalogs switch from the current minor version to the next during a cluster update. See Updating installed Operators for more information on how to check compatibility and, if necessary, update the installed Operators.
Ensure that all machine config pools (MCPs) are running and not paused. Nodes associated with a paused MCP are skipped during the update process. You can pause the MCPs if you are performing a canary rollout update strategy.
If your cluster uses manually maintained credentials, update the cloud provider resources for the new release. For more information, including how to determine if this is a requirement for your cluster, see Preparing to update a cluster with manually maintained credentials.
Ensure that you address all Upgradeable=False conditions so the cluster allows an update to the next minor version. An alert displays at the top of the Cluster Settings page when you have one or more cluster Operators that cannot be updated. You can still update to the next available patch update for the minor release you are currently on.
If you run an Operator or you have configured any application with the pod disruption budget, you might experience an interruption during the update process. If minAvailable is set to 1 in PodDisruptionBudget, the nodes are drained to apply pending machine configs which might block the eviction process. If several nodes are rebooted, all the pods might run on only one node, and the PodDisruptionBudget field can prevent the node drain.

When an update is failing to complete, the Cluster Version Operator (CVO) reports the status of any blocking components while attempting to reconcile the update. Rolling your cluster back to a previous version is not supported. If your update is failing to complete, contact Red Hat support.
Using the unsupportedConfigOverrides section to modify the configuration of an Operator is unsupported and might block cluster updates. You must remove this setting before you can update your cluster.

Additional resources

Support policy for unmanaged Operators

Pausing a MachineHealthCheck resource

During the update process, nodes in the cluster might become temporarily unavailable. In the case of worker nodes, the machine health check might identify such nodes as unhealthy and reboot them. To avoid rebooting such nodes, pause all the MachineHealthCheck resources before updating the cluster.

Prerequisites

Install the OpenShift CLI (oc).

Procedure

To list all the available MachineHealthCheck resources that you want to pause, run the following command:
```
$ oc get machinehealthcheck -n openshift-machine-api
```

To pause the machine health checks, add the cluster.x-k8s.io/paused="" annotation to the MachineHealthCheck resource. Run the following command:

$ oc -n openshift-machine-api annotate mhc <mhc-name> cluster.x-k8s.io/paused=""

The annotated MachineHealthCheck resource resembles the following YAML file:

apiVersion: machine.openshift.io/v1beta1
kind: MachineHealthCheck
metadata:
  name: example
  namespace: openshift-machine-api
  annotations:
    cluster.x-k8s.io/paused: ""
spec:
  selector:
    matchLabels:
      role: worker
  unhealthyConditions:
  - type:    "Ready"
    status:  "Unknown"
    timeout: "300s"
  - type:    "Ready"
    status:  "False"
    timeout: "300s"
  maxUnhealthy: "40%"
status:
  currentHealthy: 5
  expectedMachines: 5

Resume the machine health checks after updating the cluster. To resume the check, remove the pause annotation from the MachineHealthCheck resource by running the following command:

$ oc -n openshift-machine-api annotate mhc <mhc-name> cluster.x-k8s.io/paused-

About updating single node OKD

You can update, or upgrade, a single-node OKD cluster by using either the console or CLI.

However, note the following limitations:

The prerequisite to pause the MachineHealthCheck resources is not required because there is no other node to perform the health check.
Restoring a single-node OKD cluster using an etcd backup is not officially supported. However, it is good practice to perform the etcd backup in case your update fails. If your control plane is healthy, you might be able to restore your cluster to a previous state by using the backup.
Updating a single-node OKD cluster requires downtime and can include an automatic reboot. The amount of downtime depends on the update payload, as described in the following scenarios:
- If the update payload contains an operating system update, which requires a reboot, the downtime is significant and impacts cluster management and user workloads.
- If the update contains machine configuration changes that do not require a reboot, the downtime is less, and the impact on the cluster management and user workloads is lessened. In this case, the node draining step is skipped with single-node OKD because there is no other node in the cluster to reschedule the workloads to.
- If the update payload does not contain an operating system update or machine configuration changes, a short API outage occurs and resolves quickly.

There are conditions, such as bugs in an updated package, that can cause the single node to not restart after a reboot. In this case, the update does not rollback automatically.

Additional resources

For information on which machine configuration changes require a reboot, see the note in About the Machine Config Operator.

Updating a cluster by using the CLI

You can use the OpenShift CLI (oc) to review and request cluster updates.

You can find information about available OKD advisories and updates in the errata section of the Customer Portal.

Prerequisites

Install the OpenShift CLI (oc) that matches the version for your updated version.
Log in to the cluster as user with cluster-admin privileges.
Pause all MachineHealthCheck resources.

Procedure

View the available updates and note the version number of the update that you want to apply:

$ oc adm upgrade

Example output

Cluster version is 4.13.0-0.okd-2023-10-28-065448

Upstream: https://amd64.origin.releases.ci.openshift.org/graph
Channel: stable-4

Recommended updates:

  VERSION                        IMAGE
  4.14.0-0.okd-2024-01-06-084517 registry.ci.openshift.org/origin/release@sha256:c4a6b6850701202f629c0e451de784b02f0de079650a1b9ccbf610448ebc9227
  4.14.0-0.okd-2023-11-14-101924 registry.ci.openshift.org/origin/release@sha256:72d40c51e7c4d1b9c31e9b0d276d045f1b2b93def5ecee49186df856d40bcb5c
  4.14.0-0.okd-2023-11-12-042703 registry.ci.openshift.org/origin/release@sha256:2242d1df4e4cbcc0cd27191ab9ad5f55ac4f0c60c3cda2a186181a2435e3bd00
  4.14.0-0.okd-2023-10-28-073550 registry.ci.openshift.org/origin/release@sha256:7a6200e347a1b857e47f2ab0735eb1303af7d796a847d79ef9706f217cd12f5c

If there are no recommended updates, updates that have known issues might still be available. See Updating along a conditional update path for more information.

Apply an update:

To update to the latest version:
```
$ oc adm upgrade --to-latest=true (1)
```

To update to a specific version:

$ oc adm upgrade --to=<version> (1)

1	`<version>` is the update version that you obtained from the output of the `oc adm upgrade` command.

When using oc adm upgrade --help, there is a listed option for --force. This is heavily discouraged, as using the --force option bypasses cluster-side guards, including release verification and precondition checks. Using --force does not guarantee a successful update. Bypassing guards put the cluster at risk.

Review the status of the Cluster Version Operator:

$ oc adm upgrade

Example output

info: An upgrade is in progress. Working towards 4.14.0-0.okd-2024-01-06-084517: 117 of 864 done (13% complete), waiting on etcd, kube-apiserver

Upstream: https://amd64.origin.releases.ci.openshift.org/graph
Channel: stable-4
No updates available. You may still upgrade to a specific release image with --to-image or wait for new updates to be available.

After the update completes, you can confirm that the cluster version has updated to the new version:

$ oc adm upgrade

Example output

Cluster version is 4.14.0-0.okd-2024-01-06-084517

Upstream: https://amd64.origin.releases.ci.openshift.org/graph
Channel: stable-4
No updates available. You may still upgrade to a specific release image with --to-image or wait for new updates to be available.

If you are updating your cluster to the next minor version, such as version X.y to X.(y+1), it is recommended to confirm that your nodes are updated before deploying workloads that rely on a new feature:

$ oc get nodes

Example output

NAME                           STATUS   ROLES    AGE   VERSION
ip-10-0-168-251.ec2.internal   Ready    master   82m   v1.30.3
ip-10-0-170-223.ec2.internal   Ready    master   82m   v1.30.3
ip-10-0-179-95.ec2.internal    Ready    worker   70m   v1.30.3
ip-10-0-182-134.ec2.internal   Ready    worker   70m   v1.30.3
ip-10-0-211-16.ec2.internal    Ready    master   82m   v1.30.3
ip-10-0-250-100.ec2.internal   Ready    worker   69m   v1.30.3

Gathering cluster update status using oc adm upgrade status (Technology Preview)

When updating your cluster, it is useful to understand how your update is progressing. While the oc adm upgrade command returns limited information about the status of your update, this release introduces the oc adm upgrade status command as a Technology Preview feature. This command decouples status information from the oc adm upgrade command and provides specific information regarding a cluster update, including the status of the control plane and worker node updates.

The oc adm upgrade status command is read-only and will never alter any state in your cluster.

The oc adm upgrade status command is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

The oc adm upgrade status command can be used for clusters from version 4.12 up to the latest supported release.

While your cluster does not need to be a Technology Preview-enabled cluster, you must enable the OC_ENABLE_CMD_UPGRADE_STATUS Technology Preview environment variable, otherwise the OpenShift CLI (oc) will not recognize the command and you will not be able to use the feature.

Procedure

Set the OC_ENABLE_CMD_UPGRADE_STATUS environmental variable to true by running the following command:
```
$ export OC_ENABLE_CMD_UPGRADE_STATUS=true
```

Run the oc adm upgrade status command:

$ oc adm upgrade status

Example output for an update progressing successfully

= Control Plane =
Assessment:      Progressing
Target Version:  4.14.1 (from 4.14.0)
Completion:      97%
Duration:        54m
Operator Status: 32 Healthy, 1 Unavailable

Control Plane Nodes
NAME                                        ASSESSMENT    PHASE      VERSION   EST    MESSAGE
ip-10-0-53-40.us-east-2.compute.internal    Progressing   Draining   4.14.0    +10m
ip-10-0-30-217.us-east-2.compute.internal   Outdated      Pending    4.14.0    ?
ip-10-0-92-180.us-east-2.compute.internal   Outdated      Pending    4.14.0    ?

= Worker Upgrade =

= Worker Pool =
Worker Pool:     worker
Assessment:      Progressing
Completion:      0%
Worker Status:   3 Total, 2 Available, 1 Progressing, 3 Outdated, 1 Draining, 0 Excluded, 0 Degraded

Worker Pool Nodes
NAME                                        ASSESSMENT    PHASE      VERSION   EST    MESSAGE
ip-10-0-4-159.us-east-2.compute.internal    Progressing   Draining   4.14.0    +10m
ip-10-0-20-162.us-east-2.compute.internal   Outdated      Pending    4.14.0    ?
ip-10-0-99-40.us-east-2.compute.internal    Outdated      Pending    4.14.0    ?

= Worker Pool =
Worker Pool:     infra
Assessment:      Progressing
Completion:      0%
Worker Status:   1 Total, 0 Available, 1 Progressing, 1 Outdated, 1 Draining, 0 Excluded, 0 Degraded

Worker Pool Node
NAME                                             ASSESSMENT    PHASE      VERSION   EST    MESSAGE
ip-10-0-4-159-infra.us-east-2.compute.internal   Progressing   Draining   4.14.0    +10m

= Update Health =
SINCE   LEVEL   IMPACT   MESSAGE
14m4s   Info    None     Update is proceeding well

With this information, you can make informed decisions on how to proceed with your update.

Additional resources

Updating along a conditional update path

Updating along a conditional update path

You can update along a recommended conditional update path using the web console or the OpenShift CLI (oc). When a conditional update is not recommended for your cluster, you can update along a conditional update path using the OpenShift CLI (oc) 4.10 or later.

Procedure

To view the description of the update when it is not recommended because a risk might apply, run the following command:
```
$ oc adm upgrade --include-not-recommended
```
If the cluster administrator evaluates the potential known risks and decides it is acceptable for the current cluster, then the administrator can waive the safety guards and proceed the update by running the following command:
```
$ oc adm upgrade --allow-not-recommended --to <version> (1)
```
1 <version> is the update version that you obtained from the output of the previous command, which is supported but also has known issues or risks.

Changing the update server by using the CLI

Changing the update server is optional. If you have an OpenShift Update Service (OSUS) installed and configured locally, you must set the URL for the server as the upstream to use the local server during updates. The default value for upstream is https://api.openshift.com/api/upgrades_info/v1/graph.

Procedure

Change the upstream parameter value in the cluster version:

$ oc patch clusterversion/version --patch '{"spec":{"upstream":"<update-server-url>"}}' --type=merge

The <update-server-url> variable specifies the URL for the update server.

Example output

clusterversion.config.openshift.io/version patched