Control plane architecture | Architecture | Red Hat OpenShift Service on AWS

Machine roles in Red Hat OpenShift Service on AWS
- Cluster workers
- Cluster control planes
Operators in Red Hat OpenShift Service on AWS
- Add-on Operators
Overview of etcd
- Benefits of using etcd
- How etcd works
Automatic updates to the control plane configuration
Recovery of failed control plane machines

The control plane, which is composed of control plane machines, manages the Red Hat OpenShift Service on AWS cluster. The control plane machines manage workloads on the compute machines, which are also known as worker machines. The cluster itself manages all upgrades to the machines by the actions of the Cluster Version Operator (CVO), and a set of individual Operators.

This control plane architecture is not supported on Red Hat OpenShift Service on AWS (ROSA) with hosted control planes (HCP).

Machine roles in Red Hat OpenShift Service on AWS

Red Hat OpenShift Service on AWS assigns hosts different roles. These roles define the function of the machine within the cluster. The cluster contains definitions for the standard master and worker role types.

edit

Cluster workers

In a Kubernetes cluster, worker nodes run and manage the actual workloads requested by Kubernetes users. The worker nodes advertise their capacity and the scheduler, which is a control plane service, determines on which nodes to start pods and containers. The following important services run on each worker node:

CRI-O, which is the container engine.
kubelet, which is the service that accepts and fulfills requests for running and stopping container workloads.
A service proxy, which manages communication for pods across workers.
The runC or crun low-level container runtime, which creates and runs containers.

For information about how to enable crun instead of the default runC, see the documentation for creating a ContainerRuntimeConfig CR.

In Red Hat OpenShift Service on AWS, compute machine sets control the compute machines, which are assigned the worker machine role. Machines with the worker role drive compute workloads that are governed by a specific machine pool that autoscales them. Because Red Hat OpenShift Service on AWS has the capacity to support multiple machine types, the machines with the worker role are classed as compute machines. In this release, the terms worker machine and compute machine are used interchangeably because the only default type of compute machine is the worker machine. In future versions of Red Hat OpenShift Service on AWS, different types of compute machines, such as infrastructure machines, might be used by default.

Compute machine sets are groupings of compute machine resources under the machine-api namespace. Compute machine sets are configurations that are designed to start new compute machines on a specific cloud provider. Conversely, machine config pools (MCPs) are part of the Machine Config Operator (MCO) namespace. An MCP is used to group machines together so the MCO can manage their configurations and facilitate their upgrades.

Cluster control planes

In a Kubernetes cluster, the master nodes run services that are required to control the Kubernetes cluster. In Red Hat OpenShift Service on AWS, the control plane is comprised of control plane machines that have a master machine role. They contain more than just the Kubernetes services for managing the Red Hat OpenShift Service on AWS cluster.

For most Red Hat OpenShift Service on AWS clusters, control plane machines are defined by a series of standalone machine API resources. Control planes are managed with control plane machine sets. Extra controls apply to control plane machines to prevent you from deleting all of the control plane machines and breaking your cluster.

Single availability zone clusters and multiple availability zone clusters require a minimum of three control plane nodes.

Services that fall under the Kubernetes category on the control plane include the Kubernetes API server, etcd, the Kubernetes controller manager, and the Kubernetes scheduler.

Table 1. Kubernetes services that run on the control plane
Component	Description
Kubernetes API server	The Kubernetes API server validates and configures the data for pods, services, and replication controllers. It also provides a focal point for the shared state of the cluster.
etcd	etcd stores the persistent control plane state while other components watch etcd for changes to bring themselves into the specified state.
Kubernetes controller manager	The Kubernetes controller manager watches etcd for changes to objects such as replication, namespace, and service account controller objects, and then uses the API to enforce the specified state. Several such processes create a cluster with one active leader at a time.
Kubernetes scheduler	The Kubernetes scheduler watches for newly created pods without an assigned node and selects the best node to host the pod.

There are also OpenShift services that run on the control plane, which include the OpenShift API server, OpenShift controller manager, OpenShift OAuth API server, and OpenShift OAuth server.

Table 2. OpenShift services that run on the control plane
Component	Description
OpenShift API server	The OpenShift API server validates and configures the data for OpenShift resources, such as projects, routes, and templates. The OpenShift API server is managed by the OpenShift API Server Operator.
OpenShift controller manager	The OpenShift controller manager watches etcd for changes to OpenShift objects, such as project, route, and template controller objects, and then uses the API to enforce the specified state. The OpenShift controller manager is managed by the OpenShift Controller Manager Operator.
OpenShift OAuth API server	The OpenShift OAuth API server validates and configures the data to authenticate to Red Hat OpenShift Service on AWS, such as users, groups, and OAuth tokens. The OpenShift OAuth API server is managed by the Cluster Authentication Operator.
OpenShift OAuth server	Users request tokens from the OpenShift OAuth server to authenticate themselves to the API. The OpenShift OAuth server is managed by the Cluster Authentication Operator.

Some of these services on the control plane machines run as systemd services, while others run as static pods.

Systemd services are appropriate for services that you need to always come up on that particular system shortly after it starts. For control plane machines, those include sshd, which allows remote login. It also includes services such as:

The CRI-O container engine (crio), which runs and manages the containers. Red Hat OpenShift Service on AWS uses CRI-O instead of the Docker Container Engine.
Kubelet (kubelet), which accepts requests for managing containers on the machine from control plane services.

CRI-O and Kubelet must run directly on the host as systemd services because they need to be running before you can run other containers.

The installer-* and revision-pruner-* control plane pods must run with root permissions because they write to the /etc/kubernetes directory, which is owned by the root user. These pods are in the following namespaces:

openshift-etcd
openshift-kube-apiserver
openshift-kube-controller-manager
openshift-kube-scheduler

edit

Operators in Red Hat OpenShift Service on AWS

Operators are among the most important components of Red Hat OpenShift Service on AWS. They are the preferred method of packaging, deploying, and managing services on the control plane. They can also provide advantages to applications that users run.

Operators integrate with Kubernetes APIs and CLI tools such as kubectl and the OpenShift CLI (oc). They provide the means of monitoring applications, performing health checks, managing over-the-air (OTA) updates, and ensuring that applications remain in your specified state.

Operators also offer a more granular configuration experience. You configure each component by modifying the API that the Operator exposes instead of modifying a global configuration file.

Because CRI-O and the Kubelet run on every node, almost every other cluster function can be managed on the control plane by using Operators. Components that are added to the control plane by using Operators include critical networking and credential services.

While both follow similar Operator concepts and goals, Operators in Red Hat OpenShift Service on AWS are managed by two different systems, depending on their purpose:

Cluster Operators: Managed by the Cluster Version Operator (CVO) and installed by default to perform cluster functions.
Optional add-on Operators: Managed by Operator Lifecycle Manager (OLM) and can be made accessible for users to run in their applications. Also known as OLM-based Operators.

edit

Add-on Operators

Operator Lifecycle Manager (OLM) and OperatorHub are default components in Red Hat OpenShift Service on AWS that help manage Kubernetes-native applications as Operators. Together they provide the system for discovering, installing, and managing the optional add-on Operators available on the cluster.

Using OperatorHub in the Red Hat OpenShift Service on AWS web console, administrators with the dedicated-admin role and authorized users can select Operators to install from catalogs of Operators. After installing an Operator from OperatorHub, it can be made available globally or in specific namespaces to run in user applications.

Default catalog sources are available that include Red Hat Operators, certified Operators, and community Operators. Administrators with the dedicated-admin role can also add their own custom catalog sources, which can contain a custom set of Operators.

All Operators listed in the Operator Hub marketplace should be available for installation. These Operators are considered customer workloads, and are not monitored by Red Hat Site Reliability Engineering (SRE).

Developers can use the Operator SDK to help author custom Operators that take advantage of OLM features, as well. Their Operator can then be bundled and added to a custom catalog source, which can be added to a cluster and made available to users.

OLM does not manage the cluster Operators that comprise the Red Hat OpenShift Service on AWS architecture.

Additional resources

For more details on running add-on Operators in Red Hat OpenShift Service on AWS, see the Operators guide sections on Operator Lifecycle Manager (OLM) and OperatorHub.
For more details on the Operator SDK, see Developing Operators.

Overview of etcd

etcd is a consistent, distributed key-value store that holds small amounts of data that can fit entirely in memory. Although etcd is a core component of many projects, it is the primary data store for Kubernetes, which is the standard system for container orchestration.

edit

Benefits of using etcd

By using etcd, you can benefit in several ways:

Maintain consistent uptime for your cloud-native applications, and keep them working even if individual servers fail
Store and replicate all cluster states for Kubernetes
Distribute configuration data to provide redundancy and resiliency for the configuration of nodes

How etcd works

To ensure a reliable approach to cluster configuration and management, etcd uses the etcd Operator. The Operator simplifies the use of etcd on a Kubernetes container platform like Red Hat OpenShift Service on AWS. With the etcd Operator, you can create or delete etcd members, resize clusters, perform backups, and upgrade etcd.

The etcd Operator observes, analyzes, and acts:

It observes the cluster state by using the Kubernetes API.
It analyzes differences between the current state and the state that you want.
It fixes the differences through the etcd cluster management APIs, the Kubernetes API, or both.

etcd holds the cluster state, which is constantly updated. This state is continuously persisted, which leads to a high number of small changes at high frequency. As a result, Red Hat Site Reliability Engineering (SRE) backs the etcd cluster member with fast, low-latency I/O.

edit

Automatic updates to the control plane configuration

On Red Hat OpenShift Service on AWS clusters, control plane machine sets automatically propagate changes to your control plane configuration. When a control plane machine needs to be replaced, the Control Plane Machine Set Operator creates a replacement machine based on the configuration specified by the ControlPlaneMachineSet custom resource (CR). When the new control plane machine is ready, the Operator safely drains and terminates the old control plane machine in a way that mitigates any potential negative effects on cluster API or workload availability.

You cannot request that control plane machine replacements happen only during maintenance windows. The Control Plane Machine Set Operator acts to ensure cluster stability. Waiting for a maintenance window could result in cluster stability being compromised.

A control plane machine can be marked for replacement at any time, typically because the machine has fallen out of spec or entered an unhealthy state. Such replacements are a normal part of a cluster’s lifecycle and not a cause for concern. SRE will be alerted to the issue automatically if any part of a control plane node replacement fails.

Depending on when the Red Hat OpenShift Service on AWS cluster was originally created, the introduction of control plane machine sets might leave one or two control plane nodes with labels or machine names that are inconsistent with the other control plane nodes. For example clustername-master-0, clustername-master-1,and clustername-master-2-abcxyz. Such naming inconsistencies do not affect the workings of the cluster and are not a cause for concern.

edit

Recovery of failed control plane machines

The Control Plane Machine Set Operator automates the recovery of control plane machines. When a control plane machine is deleted, the Operator creates a replacement with the configuration that is specified in the ControlPlaneMachineSet custom resource (CR).