$ oc get csv -A
Typically, telco clusters run on bare-metal hardware. Often you must update the firmware to take on important security fixes, take on new functionality, or maintain compatibility with the new release of OKD.
You are responsible for the firmware versions that you run in your clusters. Updating host firmware is not a part of the OKD update process. It is not recommended to update firmware in conjunction with the OKD version.
Hardware vendors advise that it is best to apply the latest certified firmware version for the specific hardware that you are running. For telco use cases, always verify firmware updates in test environments before applying them in production. The high throughput nature of telco CNF workloads can be adversely affected by sub-optimal host firmware. You should thoroughly test new firmware updates to ensure that they work as expected with the current version of OKD. Ideally, you test the latest firmware version with the target OKD update version. |
Verify that all layered products run on the version of OKD that you are updating to before you begin the update. This generally includes all Operators.
Verify the currently installed Operators in the cluster. For example, run the following command:
$ oc get csv -A
NAMeSPACe NAMe DISPLAY VeRSION RePLACeS PHASe
gitlab-operator-kubernetes.v0.17.2 GitLab 0.17.2 gitlab-operator-kubernetes.v0.17.1 Succeeded
openshift-operator-lifecycle-manager packageserver Package Server 0.19.0 Succeeded
Check that Operators that you install with OLM are compatible with the update version.
Operators that are installed with the Operator Lifecycle Manager (OLM) are not part of the standard cluster Operators set.
Use the Operator Update Information Checker to understand if you must update an Operator after each y-stream update or if you can wait until you have fully updated to the next eUS release.
You can also use the Operator Update Information Checker to see what versions of OKD are compatible with specific releases of an Operator. |
Check that Operators that you install outside of OLM are compatible with the update version.
For all OLM-installed Operators that are not directly supported by Red Hat, contact the Operator vendor to ensure release compatibility.
Some Operators are compatible with several releases of OKD. You might not must update the Operators until after you complete the cluster update. See "Updating the worker nodes" for more information.
See "Updating all the OLM Operators" for information about updating an Operator after performing the first y-stream control plane update.
Prepare MachineConfigPool
(mcp
) node labels to group nodes together in groups of roughly 8 to 10 nodes.
With mcp
groups, you can reboot groups of nodes independently from the rest of the cluster.
You use the mcp
node labels to pause and unpause the set of nodes during the update process so that you can do the update and reboot at a time of your choosing.
Sometimes there are problems during the update.
Often the problem is related to hardware failure or nodes needing to be reset.
Using mcp
node labels, you can update nodes in stages by pausing the update at critical moments, tracking paused and unpaused nodes as you proceed.
When a problem occurs, you use the nodes that are in an unpaused state to ensure that there are enough nodes running to keep all applications pods running.
How you divide worker nodes into mcp
groups can vary depending on how many nodes are in the cluster or how many nodes you assign to a node role.
By default the 2 roles in a cluster are control plane and worker.
In clusters that run telco workloads, you can further split the worker nodes between CNF control plane and CNF data plane roles.
Add mcp
role labels that split the worker nodes into each of these two groups.
Larger clusters can have as many as 100 worker nodes in the CNF control plane role.
No matter how many nodes there are in the cluster, keep each |
Consider a cluster with 15 worker nodes:
10 worker nodes are CNF control plane nodes.
5 worker nodes are CNF data plane nodes.
Split the CNF control plane and data plane worker node roles into at least 2 mcp
groups each.
Having 2 mcp
groups per role means that you can have one set of nodes that are not affected by the update.
Consider a cluster with 6 worker nodes:
Split the worker nodes into 3 mcp
groups of 2 nodes each.
Upgrade one of the mcp
groups.
Allow the updated nodes to sit through a day to allow for verification of CNF compatibility before completing the update on the other 4 nodes.
The process and pace at which you unpause the If your CNF pod can handle being scheduled across nodes in a cluster, you can unpause several |
Review the currently configured MachineConfigPool
roles in the cluster.
Get the currently configured mcp
groups in the cluster:
$ oc get mcp
NAMe CONFIG UPDATeD UPDATING DeGRADeD MACHINeCOUNT ReADYMACHINeCOUNT UPDATeDMACHINeCOUNT DeGRADeDMACHINeCOUNT AGe
master rendered-master-bere83 True False False 3 3 3 0 25d
worker rendered-worker-245c4f True False False 2 2 2 0 25d
Compare the list of mcp
roles to list of nodes in the cluster:
$ oc get nodes
NAMe STATUS ROLeS AGe VeRSION
ctrl-plane-0 Ready control-plane,master 39d v1.27.15+6147456
ctrl-plane-1 Ready control-plane,master 39d v1.27.15+6147456
ctrl-plane-2 Ready control-plane,master 39d v1.27.15+6147456
worker-0 Ready worker 39d v1.27.15+6147456
worker-1 Ready worker 39d v1.27.15+6147456
When you apply an |
Determine how you want to separate the worker nodes into mcp
groups.
Creating mcp
groups is a 2-step process:
Add an mcp
label to the nodes in the cluster
Apply an mcp
CR to the cluster that organizes the nodes based on their labels
Label the nodes so that they can be put into mcp
groups.
Run the following commands:
$ oc label node worker-0 node-role.kubernetes.io/mcp-1=
$ oc label node worker-1 node-role.kubernetes.io/mcp-2=
The mcp-1
and mcp-2
labels are applied to the nodes.
For example:
NAMe STATUS ROLeS AGe VeRSION
ctrl-plane-0 Ready control-plane,master 39d v1.27.15+6147456
ctrl-plane-1 Ready control-plane,master 39d v1.27.15+6147456
ctrl-plane-2 Ready control-plane,master 39d v1.27.15+6147456
worker-0 Ready mcp-1,worker 39d v1.27.15+6147456
worker-1 Ready mcp-2,worker 39d v1.27.15+6147456
Create YAML custom resources (CRs) that apply the labels as mcp
CRs in the cluster.
Save the following YAML in the mcps.yaml
file:
---
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: mcp-2
spec:
machineConfigSelector:
matchexpressions:
- {
key: machineconfiguration.openshift.io/role,
operator: In,
values: [worker,mcp-2]
}
nodeSelector:
matchLabels:
node-role.kubernetes.io/mcp-2: ""
---
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: mcp-1
spec:
machineConfigSelector:
matchexpressions:
- {
key: machineconfiguration.openshift.io/role,
operator: In,
values: [worker,mcp-1]
}
nodeSelector:
matchLabels:
node-role.kubernetes.io/mcp-1: ""
Create the MachineConfigPool
resources:
$ oc apply -f mcps.yaml
machineconfigpool.machineconfiguration.openshift.io/mcp-2 created
Monitor the MachineConfigPool
resources as they are applied in the cluster.
After you apply the mcp
resources, the nodes are added into the new machine config pools.
This takes a few minutes.
The nodes do not reboot while being added into the |
Check the status of the new mcp
resources:
$ oc get mcp
NAMe CONFIG UPDATeD UPDATING DeGRADeD MACHINeCOUNT ReADYMACHINeCOUNT UPDATeDMACHINeCOUNT DeGRADeDMACHINeCOUNT AGe
master rendered-master-be3e83 True False False 3 3 3 0 25d
mcp-1 rendered-mcp-1-2f4c4f False True True 1 0 0 0 10s
mcp-2 rendered-mcp-2-2r4s1f False True True 1 0 0 0 10s
worker rendered-worker-23fc4f False True True 0 0 0 2 25d
eventually, the resources are fully applied:
NAMe CONFIG UPDATeD UPDATING DeGRADeD MACHINeCOUNT ReADYMACHINeCOUNT UPDATeDMACHINeCOUNT DeGRADeDMACHINeCOUNT AGe
master rendered-master-be3e83 True False False 3 3 3 0 25d
mcp-1 rendered-mcp-1-2f4c4f True False False 1 1 1 0 7m33s
mcp-2 rendered-mcp-2-2r4s1f True False False 1 1 1 0 51s
worker rendered-worker-23fc4f True False False 0 0 0 0 25d
In telco environments, most clusters are in disconnected networks. To update clusters in these environments, you must update your offline image repository.
Before you update the cluster, perform some basic checks and verifications to make sure that the cluster is ready for the update.
Verify that there are no failed or in progress pods in the cluster by running the following command:
$ oc get pods -A | grep -e -vi 'complete|running'
You might have to run this command more than once if there are pods that are in a pending state. |
Verify that all nodes in the cluster are available:
$ oc get nodes
NAMe STATUS ROLeS AGe VeRSION
ctrl-plane-0 Ready control-plane,master 32d v1.27.15+6147456
ctrl-plane-1 Ready control-plane,master 32d v1.27.15+6147456
ctrl-plane-2 Ready control-plane,master 32d v1.27.15+6147456
worker-0 Ready mcp-1,worker 32d v1.27.15+6147456
worker-1 Ready mcp-2,worker 32d v1.27.15+6147456
Verify that all bare-metal nodes are provisioned and ready.
$ oc get bmh -n openshift-machine-api
NAMe STATe CONSUMeR ONLINe eRROR AGe
ctrl-plane-0 unmanaged cnf-58879-master-0 true 33d
ctrl-plane-1 unmanaged cnf-58879-master-1 true 33d
ctrl-plane-2 unmanaged cnf-58879-master-2 true 33d
worker-0 unmanaged cnf-58879-worker-0-45879 true 33d
worker-1 progressing cnf-58879-worker-0-dszsh false 1d (1)
1 | An error occurred while provisioning the worker-1 node. |
Verify that all cluster Operators are ready:
$ oc get co
NAMe VeRSION AVAILABLe PROGReSSING DeGRADeD SINCe MeSSAGe
authentication 4.14.34 True False False 17h
baremetal 4.14.34 True False False 32d
...
service-ca 4.14.34 True False False 32d
storage 4.14.34 True False False 32d