This is a cache of https://docs.okd.io/latest/edge_computing/day_2_core_cnf_clusters/updating/telco-update-before-the-update.html. It is a snapshot of the page at 2024-11-19T19:20:09.232+0000.
Before you update the cluster - Day 2 operations for telco core CNF clusters | Edge computing | OKD 4
×

Before you start the cluster update, you must pause worker nodes, back up the etcd database, and do a final cluster health check before proceeding.

Pausing worker nodes before the update

You must pause the worker nodes before you proceed with the update. In the following example, there are 2 mcp groups, mcp-1 and mcp-2. You patch the spec.paused field to true for each of these MachineConfigPool groups.

Procedure
  1. Patch the mcp CRs to pause the nodes and drain and remove the pods from those nodes by running the following command:

    $ oc patch mcp/mcp-1 --type merge --patch '{"spec":{"paused":true}}'
    $ oc patch mcp/mcp-2 --type merge --patch '{"spec":{"paused":true}}'
  2. Get the status of the paused mcp groups:

    $ oc get mcp -o json | jq -r '["MCP","Paused"], ["---","------"], (.items[] | [(.metadata.name), (.spec.paused)]) | @tsv' | grep -v worker
    Example output
    MCP     Paused
    ---     ------
    master  false
    mcp-1   true
    mcp-2   true

The default control plane and worker mcp groups are not changed during an update.

Backup the etcd database before you proceed with the update

You must backup the etcd database before you proceed with the update.

Backing up etcd data

Follow these steps to back up etcd data by creating an etcd snapshot and backing up the resources for the static pods. This backup can be saved and used at a later time if you need to restore etcd.

Only save a backup from a single control plane host. Do not take a backup from each control plane host in the cluster.

Prerequisites
  • You have access to the cluster as a user with the cluster-admin role.

  • You have checked whether the cluster-wide proxy is enabled.

    You can check whether the proxy is enabled by reviewing the output of oc get proxy cluster -o yaml. The proxy is enabled if the httpProxy, httpsProxy, and noProxy fields have values set.

Procedure
  1. Start a debug session as root for a control plane node:

    $ oc debug --as-root node/<node_name>
  2. Change your root directory to /host in the debug shell:

    sh-4.4# chroot /host
  3. If the cluster-wide proxy is enabled, export the NO_PROXY, HTTP_PROXY, and HTTPS_PROXY environment variables by running the following commands:

    $ export HTTP_PROXY=http://<your_proxy.example.com>:8080
    $ export HTTPS_PROXY=https://<your_proxy.example.com>:8080
    $ export NO_PROXY=<example.com>
  4. Run the cluster-backup.sh script in the debug shell and pass in the location to save the backup to.

    The cluster-backup.sh script is maintained as a component of the etcd Cluster Operator and is a wrapper around the etcdctl snapshot save command.

    sh-4.4# /usr/local/bin/cluster-backup.sh /home/core/assets/backup
    Example script output
    found latest kube-apiserver: /etc/kubernetes/static-pod-resources/kube-apiserver-pod-6
    found latest kube-controller-manager: /etc/kubernetes/static-pod-resources/kube-controller-manager-pod-7
    found latest kube-scheduler: /etc/kubernetes/static-pod-resources/kube-scheduler-pod-6
    found latest etcd: /etc/kubernetes/static-pod-resources/etcd-pod-3
    ede95fe6b88b87ba86a03c15e669fb4aa5bf0991c180d3c6895ce72eaade54a1
    etcdctl version: 3.4.14
    API version: 3.4
    {"level":"info","ts":1624647639.0188997,"caller":"snapshot/v3_snapshot.go:119","msg":"created temporary db file","path":"/home/core/assets/backup/snapshot_2021-06-25_190035.db.part"}
    {"level":"info","ts":"2021-06-25T19:00:39.030Z","caller":"clientv3/maintenance.go:200","msg":"opened snapshot stream; downloading"}
    {"level":"info","ts":1624647639.0301006,"caller":"snapshot/v3_snapshot.go:127","msg":"fetching snapshot","endpoint":"https://10.0.0.5:2379"}
    {"level":"info","ts":"2021-06-25T19:00:40.215Z","caller":"clientv3/maintenance.go:208","msg":"completed snapshot read; closing"}
    {"level":"info","ts":1624647640.6032252,"caller":"snapshot/v3_snapshot.go:142","msg":"fetched snapshot","endpoint":"https://10.0.0.5:2379","size":"114 MB","took":1.584090459}
    {"level":"info","ts":1624647640.6047094,"caller":"snapshot/v3_snapshot.go:152","msg":"saved","path":"/home/core/assets/backup/snapshot_2021-06-25_190035.db"}
    Snapshot saved at /home/core/assets/backup/snapshot_2021-06-25_190035.db
    {"hash":3866667823,"revision":31407,"totalKey":12828,"totalSize":114446336}
    snapshot db and kube resources are successfully saved to /home/core/assets/backup

    In this example, two files are created in the /home/core/assets/backup/ directory on the control plane host:

    • snapshot_<datetimestamp>.db: This file is the etcd snapshot. The cluster-backup.sh script confirms its validity.

    • static_kuberesources_<datetimestamp>.tar.gz: This file contains the resources for the static pods. If etcd encryption is enabled, it also contains the encryption keys for the etcd snapshot.

      If etcd encryption is enabled, it is recommended to store this second file separately from the etcd snapshot for security reasons. However, this file is required to restore from the etcd snapshot.

      Keep in mind that etcd encryption only encrypts values, not keys. This means that resource types, namespaces, and object names are unencrypted.

Creating a single etcd backup

Follow these steps to create a single etcd backup by creating and applying a custom resource (CR).

Prerequisites
  • You have access to the cluster as a user with the cluster-admin role.

  • You have access to the OpenShift CLI (oc).

Procedure
  • If dynamically-provisioned storage is available, complete the following steps to create a single automated etcd backup:

    1. Create a persistent volume claim (PVC) named etcd-backup-pvc.yaml with contents such as the following example:

      kind: PersistentVolumeClaim
      apiVersion: v1
      metadata:
        name: etcd-backup-pvc
        namespace: openshift-etcd
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 200Gi (1)
        volumeMode: Filesystem
      1 The amount of storage available to the PVC. Adjust this value for your requirements.
    2. Apply the PVC by running the following command:

      $ oc apply -f etcd-backup-pvc.yaml
    3. Verify the creation of the PVC by running the following command:

      $ oc get pvc
      Example output
      NAME              STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
      etcd-backup-pvc   Bound                                                       51s

      Dynamic PVCs stay in the Pending state until they are mounted.

    4. Create a CR file named etcd-single-backup.yaml with contents such as the following example:

      apiVersion: operator.openshift.io/v1alpha1
      kind: etcdBackup
      metadata:
        name: etcd-single-backup
        namespace: openshift-etcd
      spec:
        pvcName: etcd-backup-pvc (1)
      1 The name of the PVC to save the backup to. Adjust this value according to your environment.
    5. Apply the CR to start a single backup:

      $ oc apply -f etcd-single-backup.yaml
  • If dynamically-provisioned storage is not available, complete the following steps to create a single automated etcd backup:

    1. Create a StorageClass CR file named etcd-backup-local-storage.yaml with the following contents:

      apiVersion: storage.k8s.io/v1
      kind: StorageClass
      metadata:
        name: etcd-backup-local-storage
      provisioner: kubernetes.io/no-provisioner
      volumeBindingMode: Immediate
    2. Apply the StorageClass CR by running the following command:

      $ oc apply -f etcd-backup-local-storage.yaml
    3. Create a PV named etcd-backup-pv-fs.yaml with contents such as the following example:

      apiVersion: v1
      kind: PersistentVolume
      metadata:
        name: etcd-backup-pv-fs
      spec:
        capacity:
          storage: 100Gi (1)
        volumeMode: Filesystem
        accessModes:
        - ReadWriteOnce
        persistentVolumeReclaimPolicy: Retain
        storageClassName: etcd-backup-local-storage
        local:
          path: /mnt
        nodeAffinity:
          required:
            nodeSelectorTerms:
            - matchExpressions:
            - key: kubernetes.io/hostname
               operator: In
               values:
               - <example_master_node> (2)
      1 The amount of storage available to the PV. Adjust this value for your requirements.
      2 Replace this value with the node to attach this PV to.
    4. Verify the creation of the PV by running the following command:

      $ oc get pv
      Example output
      NAME                    CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS                REASON   AGE
      etcd-backup-pv-fs       100Gi      RWO            Retain           Available           etcd-backup-local-storage            10s
    5. Create a PVC named etcd-backup-pvc.yaml with contents such as the following example:

      kind: PersistentVolumeClaim
      apiVersion: v1
      metadata:
        name: etcd-backup-pvc
        namespace: openshift-etcd
      spec:
        accessModes:
        - ReadWriteOnce
        volumeMode: Filesystem
        resources:
          requests:
            storage: 10Gi (1)
      1 The amount of storage available to the PVC. Adjust this value for your requirements.
    6. Apply the PVC by running the following command:

      $ oc apply -f etcd-backup-pvc.yaml
    7. Create a CR file named etcd-single-backup.yaml with contents such as the following example:

      apiVersion: operator.openshift.io/v1alpha1
      kind: etcdBackup
      metadata:
        name: etcd-single-backup
        namespace: openshift-etcd
      spec:
        pvcName: etcd-backup-pvc (1)
      1 The name of the persistent volume claim (PVC) to save the backup to. Adjust this value according to your environment.
    8. Apply the CR to start a single backup:

      $ oc apply -f etcd-single-backup.yaml
Additional resources

Checking the cluster health

You should check the cluster health often during the update. Check for the node status, cluster Operators status and failed pods.

Procedure
  1. Check the status of the cluster Operators by running the following command:

    $ oc get co
    Example output
    NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
    authentication                             4.14.34   True        False         False      4d22h
    baremetal                                  4.14.34   True        False         False      4d22h
    cloud-controller-manager                   4.14.34   True        False         False      4d23h
    cloud-credential                           4.14.34   True        False         False      4d23h
    cluster-autoscaler                         4.14.34   True        False         False      4d22h
    config-operator                            4.14.34   True        False         False      4d22h
    console                                    4.14.34   True        False         False      4d22h
    ...
    service-ca                                 4.14.34   True        False         False      4d22h
    storage                                    4.14.34   True        False         False      4d22h
  2. Check the status of the cluster nodes:

    $ oc get nodes
    Example output
    NAME           STATUS   ROLES                  AGE     VERSION
    ctrl-plane-0   Ready    control-plane,master   4d22h   v1.27.15+6147456
    ctrl-plane-1   Ready    control-plane,master   4d22h   v1.27.15+6147456
    ctrl-plane-2   Ready    control-plane,master   4d22h   v1.27.15+6147456
    worker-0       Ready    mcp-1,worker           4d22h   v1.27.15+6147456
    worker-1       Ready    mcp-2,worker           4d22h   v1.27.15+6147456
  3. Check that there are no in-progress or failed pods. There should be no pods returned when you run the following command.

    $ oc get po -A | grep -E -iv 'running|complete'