This is a cache of https://docs.okd.io/4.14/updating/preparing_for_updates/updating-cluster-prepare.html. It is a snapshot of the page at 2024-11-23T22:21:57.633+0000.
Preparing to update to OKD 4.14 - Preparing to update a cluster | Updating clusters | OKD 4.14
×

Fedora 9.2 micro-architecture requirement change

OKD is now based on the Fedora 9.2 host operating system. The micro-architecture requirements are now increased to x86_64-v2, Power9, and Z14. See the RHEL micro-architecture requirements documentation. You can verify compatibility before updating by following the procedures outlined in this KCS article.

Without the correct micro-architecture requirements, the update process will fail. Make sure you purchase the appropriate subscription for each architecture. For more information, see Get Started with Red Hat Enterprise Linux - additional architectures

Kubernetes API deprecations and removals

OKD 4.14 uses Kubernetes 1.27, which removed several deprecated APIs.

A cluster administrator must provide a manual acknowledgment before the cluster can be updated from OKD 4.13 to 4.14. This is to help prevent issues after upgrading to OKD 4.14, where APIs that have been removed are still in use by workloads, tools, or other components running on or interacting with the cluster. Administrators must evaluate their cluster for any APIs in use that will be removed and migrate the affected components to use the appropriate new API version. After this evaluation and migration is complete, the administrator can provide the acknowledgment.

Before you can update your OKD 4.13 cluster to 4.14, you must provide the administrator acknowledgment.

Removed Kubernetes APIs

OKD 4.14 uses Kubernetes 1.27, which removed the following deprecated APIs. You must migrate manifests and API clients to use the appropriate API version. For more information about migrating removed APIs, see the Kubernetes documentation.

Table 1. APIs removed from Kubernetes 1.27
Resource Removed API Migrate to

CSIStorageCapacity

storage.k8s.io/v1beta1

storage.k8s.io/v1

Evaluating your cluster for removed APIs

There are several methods to help administrators identify where APIs that will be removed are in use. However, OKD cannot identify all instances, especially workloads that are idle or external tools that are used. It is the responsibility of the administrator to properly evaluate all workloads and other integrations for instances of removed APIs.

Reviewing alerts to identify uses of removed APIs

Two alerts fire when an API is in use that will be removed in the next release:

  • APIRemovedInNextReleaseInUse - for APIs that will be removed in the next OKD release.

  • APIRemovedInNextEUSReleaseInUse - for APIs that will be removed in the next OKD Extended Update Support (EUS) release.

If either of these alerts are firing in your cluster, review the alerts and take action to clear the alerts by migrating manifests and API clients to use the new API version.

Use the APIRequestCount API to get more information about which APIs are in use and which workloads are using removed APIs, because the alerts do not provide this information. Additionally, some APIs might not trigger these alerts but are still captured by APIRequestCount. The alerts are tuned to be less sensitive to avoid alerting fatigue in production systems.

Using APIRequestCount to identify uses of removed APIs

You can use the APIRequestCount API to track API requests and review whether any of them are using one of the removed APIs.

Prerequisites
  • You must have access to the cluster as a user with the cluster-admin role.

Procedure
  • Run the following command and examine the REMOVEDINRELEASE column of the output to identify the removed APIs that are currently in use:

    $ oc get apirequestcounts
    Example output
    NAME                                                                 REMOVEDINRELEASE   REQUESTSINCURRENTHOUR   REQUESTSINLAST24H
    ...
    csistoragecapacities.v1.storage.k8s.io                                                  14                      380
    csistoragecapacities.v1beta1.storage.k8s.io                          1.27               0                       16
    custompolicydefinitions.v1beta1.capabilities.3scale.net                                 8                       158
    customresourcedefinitions.v1.apiextensions.k8s.io                                       1407                    30148
    ...

    You can safely ignore the following entries that appear in the results:

    • The system:serviceaccount:kube-system:generic-garbage-collector and the system:serviceaccount:kube-system:namespace-controller users might appear in the results because these services invoke all registered APIs when searching for resources to remove.

    • The system:kube-controller-manager and system:cluster-policy-controller users might appear in the results because they walk through all resources while enforcing various policies.

    You can also use -o jsonpath to filter the results:

    $ oc get apirequestcounts -o jsonpath='{range .items[?(@.status.removedInRelease!="")]}{.status.removedInRelease}{"\t"}{.metadata.name}{"\n"}{end}'
    Example output
    1.27	csistoragecapacities.v1beta1.storage.k8s.io
    1.29	flowschemas.v1beta2.flowcontrol.apiserver.k8s.io
    1.29	prioritylevelconfigurations.v1beta2.flowcontrol.apiserver.k8s.io

Using APIRequestCount to identify which workloads are using the removed APIs

You can examine the APIRequestCount resource for a given API version to help identify which workloads are using the API.

Prerequisites
  • You must have access to the cluster as a user with the cluster-admin role.

Procedure
  • Run the following command and examine the username and userAgent fields to help identify the workloads that are using the API:

    $ oc get apirequestcounts <resource>.<version>.<group> -o yaml

    For example:

    $ oc get apirequestcounts csistoragecapacities.v1beta1.storage.k8s.io -o yaml

    You can also use -o jsonpath to extract the username and userAgent values from an APIRequestCount resource:

    $ oc get apirequestcounts csistoragecapacities.v1beta1.storage.k8s.io \
      -o jsonpath='{range .status.currentHour..byUser[*]}{..byVerb[*].verb}{","}{.username}{","}{.userAgent}{"\n"}{end}' \
      | sort -k 2 -t, -u | column -t -s, -NVERBS,USERNAME,USERAGENT
    Example output
    VERBS       USERNAME                        USERAGENT
    list watch  system:kube-controller-manager  cluster-policy-controller/v0.0.0
    list watch  system:kube-controller-manager  kube-controller-manager/v1.26.5+0abcdef
    list watch  system:kube-scheduler           kube-scheduler/v1.26.5+0abcdef

Migrating instances of removed APIs

For information about how to migrate removed Kubernetes APIs, see the Deprecated API Migration Guide in the Kubernetes documentation.

Providing the administrator acknowledgment

After you have evaluated your cluster for any removed APIs and have migrated any removed APIs, you can acknowledge that your cluster is ready to upgrade from OKD 4.13 to 4.14.

Be aware that all responsibility falls on the administrator to ensure that all uses of removed APIs have been resolved and migrated as necessary before providing this administrator acknowledgment. OKD can assist with the evaluation, but cannot identify all possible uses of removed APIs, especially idle workloads or external tools.

Prerequisites
  • You must have access to the cluster as a user with the cluster-admin role.

Procedure
  • Run the following command to acknowledge that you have completed the evaluation and your cluster is ready for the Kubernetes API removals in OKD 4.14:

    $ oc -n openshift-config patch cm admin-acks --patch '{"data":{"ack-4.13-kube-1.27-api-removals-in-4.14":"true"}}' --type=merge

Assessing the risk of conditional updates

A conditional update is an update target that is available but not recommended due to a known risk that applies to your cluster. The Cluster Version Operator (CVO) periodically queries the OpenShift Update Service (OSUS) for the most recent data about update recommendations, and some potential update targets might have risks associated with them.

The CVO evaluates the conditional risks, and if the risks are not applicable to the cluster, then the target version is available as a recommended update path for the cluster. If the risk is determined to be applicable, or if for some reason CVO cannot evaluate the risk, then the update target is available to the cluster as a conditional update.

When you encounter a conditional update while you are trying to update to a target version, you must assess the risk of updating your cluster to that version. Generally, if you do not have a specific need to update to that target version, it is best to wait for a recommended update path from Red Hat.

However, if you have a strong reason to update to that version, for example, if you need to fix an important CVE, then the benefit of fixing the CVE might outweigh the risk of the update being problematic for your cluster. You can complete the following tasks to determine whether you agree with the Red Hat assessment of the update risk:

  • Complete extensive testing in a non-production environment to the extent that you are comfortable completing the update in your production environment.

  • Follow the links provided in the conditional update description, investigate the bug, and determine if it is likely to cause issues for your cluster. If you need help understanding the risk, contact Red Hat Support.

Additional resources

etcd backups before cluster updates

etcd backups record the state of your cluster and all of its resource objects. You can use backups to attempt restoring the state of a cluster in disaster scenarios where you cannot recover a cluster in its currently dysfunctional state.

In the context of updates, you can attempt an etcd restoration of the cluster if an update introduced catastrophic conditions that cannot be fixed without reverting to the previous cluster version. etcd restorations might be destructive and destabilizing to a running cluster, use them only as a last resort.

Due to their high consequences, etcd restorations are not intended to be used as a rollback solution. Rolling your cluster back to a previous version is not supported. If your update is failing to complete, contact Red Hat support.

There are several factors that affect the viability of an etcd restoration. For more information, see "Backing up etcd data" and "Restoring to a previous cluster state".

Best practices for cluster updates

OKD provides a robust update experience that minimizes workload disruptions during an update. Updates will not begin unless the cluster is in an upgradeable state at the time of the update request.

This design enforces some key conditions before initiating an update, but there are a number of actions you can take to increase your chances of a successful cluster update.

The OpenShift Update Service (OSUS) provides update recommendations based on cluster characteristics such as the cluster’s subscribed channel. The Cluster Version Operator saves these recommendations as either recommended or conditional updates. While it is possible to attempt an update to a version that is not recommended by OSUS, following a recommended update path protects users from encountering known issues or unintended consequences on the cluster.

Choose only update targets that are recommended by OSUS to ensure a successful update.

Address all critical alerts on the cluster

Critical alerts must always be addressed as soon as possible, but it is especially important to address these alerts and resolve any problems before initiating a cluster update. Failing to address critical alerts before beginning an update can cause problematic conditions for the cluster.

In the Administrator perspective of the web console, navigate to ObserveAlerting to find critical alerts.

Ensure that duplicated encoding headers are removed

Before updating, you will receive a DuplicateTransferEncodingHeadersDetected alert if any route records a duplicate Transfer-Encoding header issue. This is due to the upgrade from HAProxy 2.2 in previous OKD releases to HAProxy 2.6 in OKD 4.14. Failing to address this alert will result in applications that send multiple Transfer-Encoding headers becoming unreachable through routes.

To mitigate this issue, update any problematic applications to no longer send multiple Transfer-Encoding headers. For example, this could require removing duplicated headers in your application configuration file.

For more information, see this Red Hat Knowledgebase article.

Ensure that the cluster is in an Upgradable state

When one or more Operators have not reported their Upgradeable condition as True for more than an hour, the ClusterNotUpgradeable warning alert is triggered in the cluster. In most cases this alert does not block patch updates, but you cannot perform a minor version update until you resolve this alert and all Operators report Upgradeable as True.

For more information about the Upgradeable condition, see "Understanding cluster Operator condition types" in the additional resources section.

Ensure that enough spare nodes are available

A cluster should not be running with little to no spare node capacity, especially when initiating a cluster update. Nodes that are not running and available may limit a cluster’s ability to perform an update with minimal disruption to cluster workloads.

Depending on the configured value of the cluster’s maxUnavailable spec, the cluster might not be able to apply machine configuration changes to nodes if there is an unavailable node. Additionally, if compute nodes do not have enough spare capacity, workloads might not be able to temporarily shift to another node while the first node is taken offline for an update.

Make sure that you have enough available nodes in each worker pool, as well as enough spare capacity on your compute nodes, to increase the chance of successful node updates.

The default setting for maxUnavailable is 1 for all the machine config pools in OKD. It is recommended to not change this value and update one control plane node at a time. Do not change this value to 3 for the control plane pool.

Ensure that the cluster’s PodDisruptionBudget is properly configured

You can use the PodDisruptionBudget object to define the minimum number or percentage of pod replicas that must be available at any given time. This configuration protects workloads from disruptions during maintenance tasks such as cluster updates.

However, it is possible to configure the PodDisruptionBudget for a given topology in a way that prevents nodes from being drained and updated during a cluster update.

When planning a cluster update, check the configuration of the PodDisruptionBudget object for the following factors:

  • For highly available workloads, make sure there are replicas that can be temporarily taken offline without being prohibited by the PodDisruptionBudget.

  • For workloads that aren’t highly available, make sure they are either not protected by a PodDisruptionBudget or have some alternative mechanism for draining these workloads eventually, such as periodic restart or guaranteed eventual termination.