Managing live CNF pods during the cluster update - Day 2 operations for telco core CNF clusters | Edge computing

Ensuring that CNF workloads run uninterrupted with pod disruption budgets
Ensuring that pods do not run on the same cluster node
Application liveness, readiness, and startup probes

Follow the guidance in Red Hat best practices for Kubernetes when developing cloud-native network functions (CNFs) to ensure that the cluster can schedule pods during an update.

Always deploy pods in groups by using deployment resources. deployment resources spread the workload across all of the available pods ensuring there is no single point of failure. When a pod that is managed by a deployment resource is deleted, a new pod takes its place automatically.

Additional resources

Red Hat best practices for Kubernetes

Ensuring that CNF workloads run uninterrupted with pod disruption budgets

You can configure the minimum number of pods in a deployment to allow the CNF workload to run uninterrupted by setting a pod disruption budget in a PodDisruptionBudget custom resource (CR) that you apply. Be careful when setting this value; setting it improperly can cause an update to fail.

For example, if you have 4 pods in a deployment and you set the pod disruption budget to 4, the cluster scheduler keeps 4 pods running at all times - no pods can be scaled down.

Instead, set the pod disruption budget to 2, letting 2 of the 4 pods be scheduled as down. Then, the worker nodes where those pods are located can be rebooted.

Setting the pod disruption budget to 2 does not mean that your deployment runs on only 2 pods for a period of time, for example, during an update. The cluster scheduler creates 2 new pods to replace the 2 older pods. However, there is short period of time between the new pods coming online and the old pods being deleted.

Additional resources

Ensuring that pods do not run on the same cluster node

High availability in Kubernetes requires duplicate processes to be running on separate nodes in the cluster. This ensures that the application continues to run even if one node becomes unavailable. In OpenShift Container Platform, processes can be automatically duplicated in separate pods in a deployment. You configure anti-affinity in the Pod spec to ensure that the pods in a deployment do not run on the same cluster node.

During an update, setting pod anti-affinity ensures that pods are distributed evenly across nodes in the cluster. This means that node reboots are easier during an update. For example, if there are 4 pods from a single deployment on a node, and the pod disruption budget is set to only allow 1 pod to be deleted at a time, then it will take 4 times as long for that node to reboot. Setting pod anti-affinity spreads pods across the cluster to prevent such occurrences.

Additional resources

Configuring a pod affinity rule

Application liveness, readiness, and startup probes

You can use liveness, readiness and startup probes to check the health of your live application containers before you schedule an update. These are very useful tools to use with pods that are dependent upon keeping state for their application containers.

Liveness health check: Determines if a container is running. If the liveness probe fails for a container, the pod responds based on the restart policy.
Readiness probe: Determines if a container is ready to accept service requests. If the readiness probe fails for a container, the kubelet removes the container from the list of available service endpoints.
Startup probe: A startup probe indicates whether the application within a container is started. All other probes are disabled until the startup succeeds. If the startup probe does not succeed, the kubelet kills the container, and the container is subject to the pod restartPolicy setting.

Additional resources

Understanding health checks

Configuring CNF pods before updating the telco core CNF cluster

Ensuring that CNF workloads run uninterrupted with pod disruption budgets

Ensuring that pods do not run on the same cluster node

Application liveness, readiness, and startup probes