$ oc get machineconfigpool
There are times when you need to make changes to the operating systems running on OpenShift Container Platform nodes. This can include changing settings for network time service, adding kernel arguments, or configuring journaling in a specific way.
Aside from a few specialized features, most changes to operating systems on OpenShift Container Platform nodes can be done by creating what are referred to as MachineConfig
objects that are managed by the Machine Config Operator.
Tasks in this section describe how to use features of the Machine Config Operator to configure operating system features on OpenShift Container Platform nodes.
The Machine Config Operator manages and applies configuration and updates of the base operating system and container runtime, including everything between the kernel and kubelet.
There are four components:
machine-config-server
: Provides Ignition configuration to new machines joining the cluster.
machine-config-controller
: Coordinates the upgrade of machines to the desired configurations defined by a MachineConfig
object. Options are provided to control the upgrade for sets of machines individually.
machine-config-daemon
: Applies new machine configuration during update. Validates and verifies the state of the machine to the requested machine configuration.
machine-config
: Provides a complete source of machine configuration at installation, first start up, and updates for a machine.
The Machine Config Operator (MCO) manages updates to systemd, CRI-O and Kubelet, the kernel, Network Manager and other system features. It also offers a MachineConfig
CRD that can write configuration files onto the host (see machine-config-operator). Understanding what MCO does and how it interacts with other components is critical to making advanced, system-level changes to an OpenShift Container Platform cluster. Here are some things you should know about MCO, machine configs, and how they are used:
A machine config can make a specific change to a file or service on the operating system of each system representing a pool of OpenShift Container Platform nodes.
MCO applies changes to operating systems in pools of machines. All OpenShift Container Platform clusters start with worker and control plane node (also known as the master node) pools. By adding more role labels, you can configure custom pools of nodes. For example, you can set up a custom pool of worker nodes that includes particular hardware features needed by an application. However, examples in this section focus on changes to the default pool types.
Some machine configuration must be in place before OpenShift Container Platform is installed to disk. In most cases, this can be accomplished by creating a machine config that is injected directly into the OpenShift Container Platform installer process, instead of running as a post-installation machine config. In other cases, you might need to do bare metal installation where you pass kernel arguments at OpenShift Container Platform installer startup, to do such things as setting per-node individual IP addresses or advanced disk partitioning.
MCO manages items that are set in machine configs. Manual changes you do to your systems will not be overwritten by MCO, unless MCO is explicitly told to manage a conflicting file. In other words, MCO only makes specific updates you request, it does not claim control over the whole node.
Manual changes to nodes are strongly discouraged. If you need to decommission a node and start a new one, those direct changes would be lost.
MCO is only supported for writing to files in /etc
and /var
directories, although there are symbolic links to some directories that can be writeable by being symbolically linked to one of those areas. The /opt
and /usr/local
directories are examples.
Ignition is the configuration format used in MachineConfigs. See the Ignition Configuration Specification v3.1.0 for details.
Although Ignition config settings can be delivered directly at OpenShift Container Platform installation time, and are formatted in the same way that MCO delivers Ignition configs, MCO has no way of seeing what those original Ignition configs are. Therefore, you should wrap Ignition config settings into a machine config before deploying them.
When a file managed by MCO changes outside of MCO, the Machine Config Daemon (MCD) sets the node as degraded
. It will not overwrite the
offending file, however, and should continue to operate in a degraded
state.
A key reason for using a machine config is that it will be applied when you spin up new nodes for a pool in your OpenShift Container Platform cluster. The machine-api-operator
provisions a new machine and MCO configures it.
MCO uses Ignition as the configuration format. OpenShift Container Platform 4.6 moved from Ignition config specification version 2 to version 3.
The kinds of components that MCO can change include:
config: Create Ignition config objects (see the Ignition configuration specification) to do things like modify files, systemd services, and other features on OpenShift Container Platform machines, including:
Configuration files: Create or overwrite files in the /var
or /etc
directory.
systemd units: Create and set the status of a systemd service or add to an existing systemd service by dropping in additional settings.
users and groups: Change ssh keys in the passwd section post-installation.
kernelArguments: Add arguments to the kernel command line when OpenShift Container Platform nodes boot.
kernelType: Optionally identify a non-standard kernel to use instead of the standard kernel. Use realtime
to use the RT kernel (for RAN). This is only supported on select platforms.
fips: Enable fips mode. fips should be set at installation-time setting and not a post-installation procedure.
The use of fips Validated / Modules in Process cryptographic libraries is only supported on OpenShift Container Platform deployments on the |
extensions: Extend RHCOS features by adding selected pre-packaged software. For this feature (new in OpenShift Container Platform 4.6), available extensions include usbguard and kernel modules.
Custom resources (for ContainerRuntime
and Kubelet
): Outside of machine configs, MCO manages two special custom resources for modifying
CRI-O container runtime settings (ContainerRuntime
CR) and the Kubelet service (Kubelet
CR).
The MCO is not the only Operator that can change operating system components on OpenShift Container Platform nodes. Other Operators can modify operating system-level features as well. One example is the Node Tuning Operator, which allows you to do node-level tuning through Tuned daemon profiles.
Tasks for the MCO configuration that can be done post-installation are included in the following procedures. See descriptions of RHCOS bare metal installation for system configuration tasks that must be done during or before OpenShift Container Platform installation.
See the openshift-machine-config-operator GitHub site for details.
To see the status of the Machine Config Operator (MCO), its sub-components, and the resources it manages, use the following oc
commands:
To see the number of MCO-managed nodes available on your cluster for each machine config pool (MCP), run the following command:
$ oc get machineconfigpool
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-06c9c4… True False False 3 3 3 0 4h42m
worker rendered-worker-f4b64… False True False 3 2 2 0 4h42m
where:
The True
status indicates that the MCO has applied the current machine config to the nodes in that MCP. The current machine config is specified in the STATUS
field in the oc get mcp
output. The False
status indicates a node in the MCP is updating.
The True
status indicates that the MCO is applying the desired machine config, as specified in the MachineConfigPool
custom resource, to at least one of the nodes in that MCP. The desired machine config is the new, edited machine config. Nodes that are updating might not be available for scheduling. The False
status indicates that all nodes in the MCP are updated.
A True
status indicates the MCO is blocked from applying the current or desired machine config to at least one of the nodes in that MCP, or the configuration is failing. Nodes that are degraded might not be available for scheduling. A False
status indicates that all nodes in the MCP are ready.
Indicates the total number of machines in that MCP.
Indicates the total number of machines in that MCP that are ready for scheduling.
Indicates the total number of machines in that MCP that have the current machine config.
Indicates the total number of machines in that MCP that are marked as degraded or unreconcilable.
In the previous output, there are three control plane (master) nodes and three worker nodes. The control plane MCP and the associated nodes are updated to the current machine config. The nodes in the worker MCP are being updated to the desired machine config. Two of the nodes in the worker MCP are updated and one is still updating, as indicated by the UPDATEDMACHINECOUNT
being 2
. There are no issues, as indicated by the DEGRADEDMACHINECOUNT
being 0
and DEGRADED
being False
.
While the nodes in the MCP are updating, the machine config listed under CONFIG
is the current machine config, which the MCP is being updated from. When the update is complete, the listed machine config is the desired machine config, which the MCP was updated to.
If a node is being cordoned, that node is not included in the Example output
|
To check the status of the nodes in an MCP by examining the MachineConfigPool
custom resource, run the following command:
:
$ oc describe mcp worker
...
Degraded Machine Count: 0
Machine Count: 3
Observed Generation: 2
Ready Machine Count: 3
Unavailable Machine Count: 0
Updated Machine Count: 3
Events: <none>
If a node is being cordoned, the node is not included in the Example output
|
To see each existing MachineConfig
object, run the following command:
$ oc get machineconfigs
NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE
00-master 2c9371fbb673b97a6fe8b1c52... 3.2.0 5h18m
00-worker 2c9371fbb673b97a6fe8b1c52... 3.2.0 5h18m
01-master-container-runtime 2c9371fbb673b97a6fe8b1c52... 3.2.0 5h18m
01-master-kubelet 2c9371fbb673b97a6fe8b1c52… 3.2.0 5h18m
...
rendered-master-dde... 2c9371fbb673b97a6fe8b1c52... 3.2.0 5h18m
rendered-worker-fde... 2c9371fbb673b97a6fe8b1c52... 3.2.0 5h18m
Note that the MachineConfig
objects listed as rendered
are not meant to be changed or deleted.
To view the contents of a particular machine config (in this case, 01-master-kubelet
), run the following command:
$ oc describe machineconfigs 01-master-kubelet
The output from the command shows that this MachineConfig
object contains both configuration files (cloud.conf
and kubelet.conf
) and a systemd service (Kubernetes Kubelet):
Name: 01-master-kubelet
...
Spec:
Config:
Ignition:
Version: 3.1.0
Storage:
Files:
Contents:
Source: data:,
Mode: 420
Overwrite: true
Path: /etc/kubernetes/cloud.conf
Contents:
Source: data:,kind%3A%20KubeletConfiguration%0AapiVersion%3A%20kubelet.config.k8s.io%2Fv1beta1%0Aauthentication%3A%0A%20%20x509%3A%0A%20%20%20%20clientCAFile%3A%20%2Fetc%2Fkubernetes%2Fkubelet-ca.crt%0A%20%20anonymous...
Mode: 420
Overwrite: true
Path: /etc/kubernetes/kubelet.conf
Systemd:
Units:
Contents: [Unit]
Description=Kubernetes Kubelet
Wants=rpc-statd.service network-online.target crio.service
After=network-online.target crio.service
ExecStart=/usr/bin/hyperkube \
kubelet \
--config=/etc/kubernetes/kubelet.conf \ ...
If something goes wrong with a machine config that you apply, you can always back out that change. For example, if you had run oc create -f ./myconfig.yaml
to apply a machine config, you could remove that machine config by running the following command:
$ oc delete -f ./myconfig.yaml
If that was the only problem, the nodes in the affected pool should return to a non-degraded state. This actually causes the rendered configuration to roll back to its previously rendered state.
If you add your own machine configs to your cluster, you can use the commands shown in the previous example to check their status and the related status of the pool to which they are applied.
You can use the tasks in this section to create MachineConfig
objects that modify files, systemd unit files, and other operating system features running on OpenShift Container Platform nodes. For more ideas on working with machine configs, see content related to updating SSH authorized keys, verifying image signatures, enabling SCTP, and configuring iSCSI initiatornames for OpenShift Container Platform.
OpenShift Container Platform version 4.6 supports Ignition specification version 3.1. All new machine configs you create going forward should be based on Ignition specification version 3.1. If you are upgrading your OpenShift Container Platform cluster, any existing Ignition specification version 2.x machine configs will be translated automatically to specification version 3.1.
Use the following "Configuring chrony time service" procedure as a model for how to go about adding other configuration files to OpenShift Container Platform nodes. |
You
can
set the time server and related settings used by the chrony time service (chronyd
)
by modifying the contents of the chrony.conf
file and passing those contents
to your nodes as a machine config.
Create the contents of the chrony.conf
file and encode it as base64. For example:
$ cat << EOF | base64
pool 0.rhel.pool.ntp.org iburst (1)
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
logdir /var/log/chrony
EOF
1 | Specify any valid, reachable time source, such as the one provided by your DHCP server.
Alternately, you can specify any of the following NTP servers: 1.rhel.pool.ntp.org , 2.rhel.pool.ntp.org , or 3.rhel.pool.ntp.org . |
ICAgIHNlcnZlciBjbG9jay5yZWRoYXQuY29tIGlidXJzdAogICAgZHJpZnRmaWxlIC92YXIvbGli
L2Nocm9ueS9kcmlmdAogICAgbWFrZXN0ZXAgMS4wIDMKICAgIHJ0Y3N5bmMKICAgIGxvZ2RpciAv
dmFyL2xvZy9jaHJvbnkK
Create the MachineConfig
object file, replacing the base64 string with the one you just created.
This example adds the file to master
nodes. You can change it to worker
or make an
additional MachineConfig for the worker
role. Create MachineConfig files for each type of machine that your cluster uses:
$ cat << EOF > ./99-masters-chrony-configuration.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: master
name: 99-masters-chrony-configuration
spec:
config:
ignition:
config: {}
security:
tls: {}
timeouts: {}
version: 3.1.0
networkd: {}
passwd: {}
storage:
files:
- contents:
source: data:text/plain;charset=utf-8;base64,ICAgIHNlcnZlciBjbG9jay5yZWRoYXQuY29tIGlidXJzdAogICAgZHJpZnRmaWxlIC92YXIvbGliL2Nocm9ueS9kcmlmdAogICAgbWFrZXN0ZXAgMS4wIDMKICAgIHJ0Y3N5bmMKICAgIGxvZ2RpciAvdmFyL2xvZy9jaHJvbnkK
mode: 420 (1)
overwrite: true
path: /etc/chrony.conf
osImageURL: ""
EOF
1 | Specify an octal value mode for the mode field in the machine config file. After creating the file and applying the changes, the mode is converted to a decimal value. You can check the YAML file with the command oc get mc <mc-name> -o yaml . |
Make a backup copy of the configuration files.
Apply the configurations in one of two ways:
If the cluster is not up yet, after you generate manifest files, add this file to the <installation_directory>/openshift
directory, and then continue to create the cluster.
If the cluster is already running, apply the file:
$ oc apply -f ./99-masters-chrony-configuration.yaml
In some special cases, you might want to add kernel arguments to a set of nodes in your cluster. This should only be done with caution and clear understanding of the implications of the arguments you set.
Improper use of kernel arguments can result in your systems becoming unbootable. |
Examples of kernel arguments you could set include:
enforcing=0: Configures Security Enhanced Linux (SELinux) to run in permissive mode. In permissive mode, the system acts as if SELinux is enforcing the loaded security policy, including labeling objects and emitting access denial entries in the logs, but it does not actually deny any operations. While not supported for production systems, permissive mode can be helpful for debugging.
nosmt: Disables symmetric multithreading (SMT) in the kernel. Multithreading allows multiple logical threads for each CPU. You could consider nosmt
in multi-tenant environments to reduce risks from potential cross-thread attacks. By disabling SMT, you essentially choose security over performance.
See Kernel.org kernel parameters for a list and descriptions of kernel arguments.
In the following procedure, you create a MachineConfig
object that identifies:
A set of machines to which you want to add the kernel argument. In this case, machines with a worker role.
Kernel arguments that are appended to the end of the existing kernel arguments.
A label that indicates where in the list of machine configs the change is applied.
Have administrative privilege to a working OpenShift Container Platform cluster.
List existing MachineConfig
objects for your OpenShift Container Platform cluster to determine how to
label your machine config:
$ oc get MachineConfig
NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE
00-master 5ce9351ceb24e721e28cd82de3a44fc7cc27137c 3.1.0 65m
00-worker 5ce9351ceb24e721e28cd82de3a44fc7cc27137c 3.1.0 65m
01-master-container-runtime 5ce9351ceb24e721e28cd82de3a44fc7cc27137c 3.1.0 65m
01-master-kubelet 5ce9351ceb24e721e28cd82de3a44fc7cc27137c 3.1.0 65m
01-worker-container-runtime 5ce9351ceb24e721e28cd82de3a44fc7cc27137c 3.1.0 65m
01-worker-kubelet 5ce9351ceb24e721e28cd82de3a44fc7cc27137c 3.1.0 65m
99-master-generated-registries 5ce9351ceb24e721e28cd82de3a44fc7cc27137c 3.1.0 65m
99-master-ssh 3.1.0 77m
99-worker-generated-registries 5ce9351ceb24e721e28cd82de3a44fc7cc27137c 3.1.0 65m
99-worker-ssh 3.1.0 77m
rendered-master-0f314bb55448c47e6776e16e608c5912 5ce9351ceb24e721e28cd82de3a44fc7cc27137c 3.1.0 42m
rendered-master-c7761e6162e6c9538b0cdd7eef567d38 5ce9351ceb24e721e28cd82de3a44fc7cc27137c 3.1.0 65m
Create a MachineConfig
object file that identifies the kernel argument (for example, 05-worker-kernelarg-selinuxpermissive.yaml
)
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker(1)
name: 05-worker-kernelarg-selinuxpermissive(2)
spec:
config:
ignition:
version: 3.1.0
kernelArguments:
- enforcing=0(3)
1 | Applies the new kernel argument only to worker nodes. |
2 | Named to identify where it fits among the machine configs (05) and what it does (adds a kernel argument to configure SELinux permissive mode). |
3 | Identifies the exact kernel argument as enforcing=0 . |
Create the new machine config:
$ oc create -f 05-worker-kernelarg-selinuxpermissive.yaml
Check the machine configs to see that the new one was added:
$ oc get MachineConfig
NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE
00-master 5ce9351ceb24e721e28cd82de3a44fc7cc27137c 3.1.0 65m
00-worker 5ce9351ceb24e721e28cd82de3a44fc7cc27137c 3.1.0 65m
01-master-container-runtime 5ce9351ceb24e721e28cd82de3a44fc7cc27137c 3.1.0 65m
01-master-kubelet 5ce9351ceb24e721e28cd82de3a44fc7cc27137c 3.1.0 65m
01-worker-container-runtime 5ce9351ceb24e721e28cd82de3a44fc7cc27137c 3.1.0 65m
01-worker-kubelet 5ce9351ceb24e721e28cd82de3a44fc7cc27137c 3.1.0 65m
05-worker-kernelarg-selinuxpermissive 3.1.0 105s
99-master-generated-registries 5ce9351ceb24e721e28cd82de3a44fc7cc27137c 3.1.0 65m
99-master-ssh 3.1.0 77m
99-worker-generated-registries 5ce9351ceb24e721e28cd82de3a44fc7cc27137c 3.1.0 65m
99-worker-ssh 3.1.0 77m
rendered-master-0f314bb55448c47e6776e16e608c5912 5ce9351ceb24e721e28cd82de3a44fc7cc27137c 3.1.0 42m
rendered-master-c7761e6162e6c9538b0cdd7eef567d38 5ce9351ceb24e721e28cd82de3a44fc7cc27137c 3.1.0 65m
Check the nodes:
$ oc get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-136-161.ec2.internal Ready worker 28m v1.19.0
ip-10-0-136-243.ec2.internal Ready master 34m v1.19.0
ip-10-0-141-105.ec2.internal Ready,SchedulingDisabled worker 28m v1.19.0
ip-10-0-142-249.ec2.internal Ready master 34m v1.19.0
ip-10-0-153-11.ec2.internal Ready worker 28m v1.19.0
ip-10-0-153-150.ec2.internal Ready master 34m v1.19.0
You can see that scheduling on each worker node is disabled as the change is being applied.
Check that the kernel argument worked by going to one of the worker nodes and listing
the kernel command line arguments (in /proc/cmdline
on the host):
$ oc debug node/ip-10-0-141-105.ec2.internal
Starting pod/ip-10-0-141-105ec2internal-debug ...
To use host binaries, run `chroot /host`
sh-4.2# cat /host/proc/cmdline
BOOT_IMAGE=/ostree/rhcos-... console=tty0 console=ttyS0,115200n8
rootflags=defaults,prjquota rw root=UUID=fd0... ostree=/ostree/boot.0/rhcos/16...
coreos.oem.id=qemu coreos.oem.id=ec2 ignition.platform.id=ec2 enforcing=0
sh-4.2# exit
You should see the enforcing=0
argument added to the other kernel arguments.
Some OpenShift Container Platform workloads require a high degree of determinism.While Linux is not a real-time operating system, the Linux real-time kernel includes a preemptive scheduler that provides the operating system with real-time characteristics.
If your OpenShift Container Platform workloads require these real-time characteristics, you can switch your machines to the Linux real-time kernel. For OpenShift Container Platform, 4.6 you can make this switch using a MachineConfig
object. Although making the change is as simple as changing a machine config kernelType
setting to realtime
, there are a few other considerations before making the change:
Currently, real-time kernel is supported only on worker nodes, and only for radio access network (RAN) use.
The following procedure is fully supported with bare metal installations that use systems that are certified for Red Hat Enterprise Linux for Real Time 8.
Real-time support in OpenShift Container Platform is limited to specific subscriptions.
The following procedure is also supported for use with Google Cloud Platform.
Have a running OpenShift Container Platform cluster (version 4.4 or later).
Log in to the cluster as a user with administrative privileges.
Create a machine config for the real-time kernel: Create a YAML file (for example, 99-worker-realtime.yaml
) that contains a MachineConfig
object for the realtime
kernel type. This example tells the cluster to use a real-time kernel for all worker nodes:
$ cat << EOF > 99-worker-realtime.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: "worker"
name: 99-worker-realtime
spec:
kernelType: realtime
EOF
Add the machine config to the cluster. Type the following to add the machine config to the cluster:
$ oc create -f 99-worker-realtime.yaml
Check the real-time kernel: Once each impacted node reboots, log in to the cluster and run the following commands to make sure that the real-time kernel has replaced the regular kernel for the set of nodes you configured:
$ oc get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-143-147.us-east-2.compute.internal Ready worker 103m v1.19.0
ip-10-0-146-92.us-east-2.compute.internal Ready worker 101m v1.19.0
ip-10-0-169-2.us-east-2.compute.internal Ready worker 102m v1.19.0
$ oc debug node/ip-10-0-143-147.us-east-2.compute.internal
Starting pod/ip-10-0-143-147us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
sh-4.4# uname -a
Linux <worker_node> 4.18.0-147.3.1.rt24.96.el8_1.x86_64 #1 SMP PREEMPT RT
Wed Nov 27 18:29:55 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
The kernel name contains rt
and text “PREEMPT RT” indicates that this is a real-time kernel.
To go back to the regular kernel, delete the MachineConfig
object:
$ oc delete -f 99-worker-realtime.yaml
If you need to configure settings for the journald
service on OpenShift Container Platform nodes, you can do that by modifying the appropriate configuration file and passing the file to the appropriate pool of nodes as a machine config.
This procedure describes how to modify journald
rate limiting settings in the /etc/systemd/journald.conf
file and apply them to worker nodes. See the journald.conf
man page for information on how to use that file.
Have a running OpenShift Container Platform cluster (version 4.4 or later).
Log in to the cluster as a user with administrative privileges.
Create the contents of the /etc/systemd/journald.conf
file and encode it as base64. For example:
$ cat > /tmp/jrnl.conf <<EOF
# Disable rate limiting
RateLimitInterval=1s
RateLimitBurst=10000
Storage=volatile
Compress=no
MaxRetentionSec=30s
EOF
Convert the temporary journal.conf
file to base64 and save it into a variable (jrnl_cnf
):
$ export jrnl_cnf=$( cat /tmp/jrnl.conf | base64 -w0 )
$ echo $jrnl_cnf
IyBEaXNhYmxlIHJhdGUgbGltaXRpbmcKUmF0ZUxpbWl0SW50ZXJ2YWw9MXMKUmF0ZUxpbWl0QnVyc3Q9MTAwMDAKU3RvcmFnZT12b2xhdGlsZQpDb21wcmVzcz1ubwpNYXhSZXRlbnRpb25TZWM9MzBzCg==
Create the machine config, including the encoded contents of journald.conf
(jrnl_cnf
variable):
$ cat > /tmp/40-worker-custom-journald.yaml <<EOF
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: 40-worker-custom-journald
spec:
config:
ignition:
config: {}
security:
tls: {}
timeouts: {}
version: 3.1.0
networkd: {}
passwd: {}
storage:
files:
- contents:
source: data:text/plain;charset=utf-8;base64,${jrnl_cnf}
verification: {}
filesystem: root
mode: 420
path: /etc/systemd/journald.conf
systemd: {}
osImageURL: ""
EOF
Apply the machine config to the pool:
$ oc apply -f /tmp/40-worker-custom-journald.yaml
Check that the new machine config is applied and that the nodes are not in a degraded state. It might take a few minutes. The worker pool will show the updates in progress, as each node successfully has the new machine config applied:
$ oc get machineconfigpool
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-35 True False False 3 3 3 0 34m
worker rendered-worker-d8 False True False 3 1 1 0 34m
To check that the change was applied, you can log in to a worker node:
$ oc get node | grep worker
ip-10-0-0-1.us-east-2.compute.internal Ready worker 39m v0.0.0-master+$Format:%h$
$ oc debug node/ip-10-0-0-1.us-east-2.compute.internal
Starting pod/ip-10-0-141-142us-east-2computeinternal-debug ...
...
sh-4.2# chroot /host
sh-4.4# cat /etc/systemd/journald.conf
# Disable rate limiting
RateLimitInterval=1s
RateLimitBurst=10000
Storage=volatile
Compress=no
MaxRetentionSec=30s
sh-4.4# exit
Settings that define the registries that OpenShift Container Platform uses to get container images are held in the /etc/containers/registries.conf
file by default. In that file, you can set registries to not require authentication (insecure), point to mirrored registries, or set which registries are searched for unqualified container image requests.
Rather than change registries.conf
directly, you can drop configuration files into the /etc/containers/registries.conf.d
directory that are then automatically appended to the system’s existing registries.conf
settings.
This procedure describes how to create a registries.d
file (/etc/containers/registries/99-worker-unqualified-search-registries.conf
) that adds quay.io
as an unqualified search registry (one that OpenShift Container Platform can search when it tries to pull an image name that does not include the registry name). It includes base64-encoded content that you can examine as follows:
$ echo dW5xdWFsaWZpZWQtc2VhcmNoLXJlZ2lzdHJpZXMgPSBbJ3JlZ2lzdHJ5LmFjY2Vzcy5yZWRoYXQuY29tJywgJ2RvY2tlci5pbycsICdxdWF5LmlvJ10K | base64 -d
unqualified-search-registries = ['registry.access.redhat.com', 'docker.io', 'quay.io']
See the containers-registries.conf
man page for the format for the registries.conf
and registries.d
directory files.
Have a running OpenShift Container Platform cluster (version 4.4 or later).
Log in to the cluster as a user with administrative privileges.
Create a YAML file (myregistry.yaml
) to hold the contents of the /etc/containers/registries.conf.d/99-worker-unqualified-search-registries.conf
file, including the encoded base64 contents for that file. For example:
$ cat > /tmp/myregistry.yaml <<EOF
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: 99-worker-unqualified-search-registries
spec:
config:
ignition:
version: 3.1.0
storage:
files:
- contents:
source: data:text/plain;charset=utf-8;base64,dW5xdWFsaWZpZWQtc2VhcmNoLXJlZ2lzdHJpZXMgPSBbJ3JlZ2lzdHJ5LmFjY2Vzcy5yZWRoYXQuY29tJywgJ2RvY2tlci5pbycsICdxdWF5LmlvJ10K
filesystem: root
mode: 0644
path: /etc/containers/registries.conf.d/99-worker-unqualified-search-registries.conf
EOF
Apply the machine config to the pool:
$ oc apply -f /tmp/myregistry.yaml
Check that the new machine config has been applied and that the nodes are not in a degraded state. It might take a few minutes. The worker pool will show the updates in progress, as each machine successfully has the new machine config applied:
$ oc get machineconfigpool
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-35 True False False 3 3 3 0 34m
worker rendered-worker-d8 False True False 3 1 1 0 34m
To check that the change was applied, you can log in to a worker node:
$ oc get node | grep worker
ip-10-0-0-1.us-east-2.compute.internal Ready worker 39m v0.0.0-master+$Format:%h$
$ oc debug node/ip-10-0-0-1.us-east-2.compute.internal
Starting pod/ip-10-0-141-142us-east-2computeinternal-debug ...
...
sh-4.2# chroot /host
sh-4.4# cat /etc/containers/registries.conf.d/99-worker-unqualified-search-registries.conf
unqualified-search-registries = ['registry.access.redhat.com', 'docker.io', 'quay.io']
sh-4.4# exit
RHCOS is a minimal container-oriented RHEL operating system, designed to provide a common set of capabilities to OpenShift Container Platform clusters across all platforms. While adding software packages to RHCOS systems is generally discouraged, the MCO provides an extensions
feature you can use to add a minimal set of features to RHCOS nodes.
Currently, the following extension is available:
usbguard: Adding the usbguard
extension protects RHCOS systems from attacks from intrusive USB devices. See USBGuard for details.
The following procedure describes how to use a machine config to add one or more extensions to your RHCOS nodes.
Have a running OpenShift Container Platform cluster (version 4.6 or later).
Log in to the cluster as a user with administrative privileges.
Create a machine config for extensions: Create a YAML file (for example, 80-extensions.yaml
) that contains a MachineConfig
extensions
object. This example tells the cluster to add the usbguard
extension.
$ cat << EOF > 80-extensions.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: 80-worker-extensions
spec:
config:
ignition:
version: 3.1.0
extensions:
- usbguard
EOF
Add the machine config to the cluster. Type the following to add the machine config to the cluster:
$ oc create -f 80-extensions.yaml
This sets all worker nodes to have rpm packages for usbguard
installed.
Check that the extensions were applied:
$ oc get machineconfig 80-worker-extensions
NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE
80-worker-extensions 3.1.0 57s
Check that the new machine config is now applied and that the nodes are not in a degraded state. It may take a few minutes. The worker pool will show the updates in progress, as each machine successfully has the new machine config applied:
$ oc get machineconfigpool
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-35 True False False 3 3 3 0 34m
worker rendered-worker-d8 False True False 3 1 1 0 34m
Check the extensions. To check that the extension was applied, run:
$ oc get node | grep worker
NAME STATUS ROLES AGE VERSION
ip-10-0-169-2.us-east-2.compute.internal Ready worker 102m v1.18.3
$ oc debug node/ip-10-0-169-2.us-east-2.compute.internal
...
To use host binaries, run `chroot /host`
sh-4.4# chroot /host
sh-4.4# rpm -q usbguard
usbguard-0.7.4-4.el8.x86_64.rpm
Because the default location for firmware blobs in /usr/lib
is read-only, you can locate a custom firmware blob by updating the search path. This enables you to load local firmware blobs in the machine config manifest when the blobs are not managed by RHCOS.
Create a Butane config file, 98-worker-firmware-blob.bu
, that updates the search path so that it is root-owned and writable to local storage. The following example places the custom blob file from your local workstation onto nodes under /var/lib/firmware
.
See "Creating machine configs with Butane" for information about Butane. |
variant: openshift
version: 4.9.0
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: 98-worker-firmware-blob
storage:
files:
- path: /var/lib/firmware/<package_name> (1)
contents:
local: <package_name> (2)
mode: 0644 (3)
openshift:
kernel_arguments:
- 'firmware_class.path=/var/lib/firmware' (4)
1 | Sets the path on the node where the firmware package is copied to. |
2 | Specifies a file with contents that are read from a local file directory on the system running Butane. The path of the local file is relative to a files-dir directory, which must be specified by using the --files-dir option with Butane in the following step. |
3 | Sets the permissions for the file on the RHCOS node. It is recommended to set 0644 permissions. |
4 | The firmware_class.path parameter customizes the kernel search path of where to look for the custom firmware blob that was copied from your local workstation onto the root file system of the node. This example uses /var/lib/firmware as the customized path. |
Run Butane to generate a MachineConfig
object file that uses a copy of the firmware blob on your local workstation named 98-worker-firmware-blob.yaml
. The firmware blob contains the configuration to be delivered to the nodes. The following example uses the --files-dir
option to specify the directory on your workstation where the local file or files are located:
$ butane 98-worker-firmware-blob.bu -o 98-worker-firmware-blob.yaml --files-dir <directory_including_package_name>
Apply the configurations to the nodes in one of two ways:
If the cluster is not running yet, after you generate manifest files, add the MachineConfig
object file to the <installation_directory>/openshift
directory, and then continue to create the cluster.
If the cluster is already running, apply the file:
$ oc apply -f 98-worker-firmware-blob.yaml
A MachineConfig
object YAML file is created for you to finish configuring your machines.
Save the Butane config in case you need to update the MachineConfig
object in the future.
Besides managing MachineConfig
objects, the MCO manages two custom resources (CRs): KubeletConfig
and ContainerRuntimeConfig
. Those CRs let you change node-level settings impacting how the Kubelet and CRI-O container runtime services behave.
The kubelet configuration is currently serialized as an Ignition configuration, so it can be directly edited. However, there is also a new
kubelet-config-controller
added to the Machine Config Controller (MCC). This allows you to create a KubeletConfig
custom resource (CR) to edit the kubelet parameters.
As the fields in the |
View the available machine configuration objects that you can select:
$ oc get machineconfig
By default, the two kubelet-related configs are 01-master-kubelet
and 01-worker-kubelet
.
To check the current value of max pods per node, run:
# oc describe node <node-ip> | grep Allocatable -A6
Look for value: pods: <value>
.
For example:
# oc describe node ip-172-31-128-158.us-east-2.compute.internal | grep Allocatable -A6
Allocatable:
attachable-volumes-aws-ebs: 25
cpu: 3500m
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 15341844Ki
pods: 250
To set the max pods per node on the worker nodes, create a custom resource file that contains the kubelet configuration. For example, change-maxPods-cr.yaml
:
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
name: set-max-pods
spec:
machineConfigPoolSelector:
matchLabels:
custom-kubelet: large-pods
kubeletConfig:
maxPods: 500
The rate at which the kubelet talks to the API server depends on queries per second (QPS) and burst values. The default values, 50
for kubeAPIQPS
and 100
for kubeAPIBurst
, are good enough if there are limited pods running on each node. Updating the kubelet QPS and burst rates is recommended if there are enough CPU and memory resources on the node:
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
name: set-max-pods
spec:
machineConfigPoolSelector:
matchLabels:
custom-kubelet: large-pods
kubeletConfig:
maxPods: <pod_count>
kubeAPIBurst: <burst_rate>
kubeAPIQPS: <QPS>
Update the machine config pool for workers with the label:
$ oc label machineconfigpool worker custom-kubelet=large-pods
Create the KubeletConfig
object:
$ oc create -f change-maxPods-cr.yaml
Verify that the KubeletConfig
object is created:
$ oc get kubeletconfig
This should return set-max-pods
.
Depending on the number of worker nodes in the cluster, wait for the worker nodes to be rebooted one by one. For a cluster with 3 worker nodes, this could take about 10 to 15 minutes.
Check for maxPods
changing for the worker nodes:
$ oc describe node
Verify the change by running:
$ oc get kubeletconfigs set-max-pods -o yaml
This should show a status of True
and type:Success
By default, only one machine is allowed to be unavailable when applying the kubelet-related configuration to the available worker nodes. For a large cluster, it can take a long time for the configuration change to be reflected. At any time, you can adjust the number of machines that are updating to speed up the process.
Edit the worker
machine config pool:
$ oc edit machineconfigpool worker
Set maxUnavailable
to the desired value.
spec:
maxUnavailable: <node_count>
When setting the value, consider the number of worker nodes that can be unavailable without affecting the applications running on the cluster. |
The ContainerRuntimeConfig
custom resource definition (CRD) provides a structured way of changing settings associated with the OpenShift Container Platform CRI-O runtime. Using a ContainerRuntimeConfig
custom resource (CR), you select the configuration values you want and the MCO handles rebuilding the crio.conf
and storage.conf
configuration files.
You can modify the following settings by using a ContainerRuntimeConfig
CR:
PIDs limit: The pidsLimit
parameter sets the CRI-O pids_limit
parameter, which is maximum number of processes allowed in a container. The default is 1024 (pids_limit = 1024
).
Log level: The logLevel
parameter sets the CRI-O log_level
parameter, which is the level of verbosity for log messages. The default is info
(log_level = info
). Other options include fatal
, panic
, error
, warn
, debug
, and trace
.
Overlay size: The overlaySize
parameter sets the CRI-O Overlay storage driver size
parameter, which is the maximum size of a container image.
Maximum log size: The logSizeMax
parameter sets the CRI-O log_size_max
parameter, which is the maximum size allowed for the container log file. The default is unlimited (log_size_max = -1
). If set to a positive number, it must be at least 8192 to not be smaller than the ConMon read buffer. ConMon is a program that monitors communications between a container manager (such as Podman or CRI-O) and the OCI runtime (such as runc or crun) for a single container.
The following procedure describes how to change CRI-O settings using the ContainerRuntimeConfig
CR.
To raise the pidsLimit
to 2048, set the logLevel
to debug
, and set the overlaySize
to 8 GB, create a CR file (for example, overlay-size.yaml
) that contains that setting:
$ cat << EOF > /tmp/overlay-size.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: ContainerRuntimeConfig
metadata:
name: overlay-size
spec:
machineConfigPoolSelector:
matchLabels:
custom-crio: overlay-size
containerRuntimeConfig:
pidsLimit: 2048
logLevel: debug
overlaySize: 8G
EOF
To apply the ContainerRuntimeConfig
object settings, run:
$ oc create -f /tmp/overlay-size.yaml
To verify that the YAML file applied the settings, run the following command:
$ oc get ContainerRuntimeConfig
NAME AGE
overlay-size 3m19s
To edit a pool of machines, such as worker
, run the following command to open a machine config pool:
$ oc edit machineconfigpool worker
Check that a new containerruntime
object has appeared under the machineconfigs
:
$ oc get machineconfigs | grep containerrun
99-worker-generated-containerruntime 2c9371fbb673b97a6fe8b1c52691999ed3a1bfc2 3.1.0 31s
Monitor the machine config pool as the changes are rolled into the machines until all are shown as ready:
$ oc get mcp worker
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
worker rendered-worker-169 False True False 3 1 1 0 9h
Open an oc debug
session to a worker node and run chroot /host
.
Verify the changes by running:
$ crio config | egrep 'log_level|pids_limit'
pids_limit = 2048
log_level = "debug"
$ head -n 7 /etc/containers/storage.conf
[storage] driver = "overlay" runroot = "/var/run/containers/storage" graphroot = "/var/lib/containers/storage" [storage.options] additionalimagestores = [] size = "8G"
The root partition of each container shows all of the available disk space of the underlying host. Follow this guidance to set a maximum partition size for the root disk of all containers.
To configure the maximum Overlay size, as well as other CRI-O options like the log level and PID limit, you can create the following ContainerRuntimeConfig
custom resource definition (CRD):
apiVersion: machineconfiguration.openshift.io/v1
kind: ContainerRuntimeConfig
metadata:
name: overlay-size
spec:
machineConfigPoolSelector:
matchLabels:
custom-crio: overlay-size
containerRuntimeConfig:
pidsLimit: 2048
logLevel: debug
overlaySize: 8G
Create the configuration object:
$ oc apply -f overlaysize.yml
To apply the new CRI-O configuration to your worker nodes, edit the worker machine config pool:
$ oc edit machineconfigpool worker
Add the custom-crio
label based on the matchLabels
name you set in the ContainerRuntimeConfig
CRD:
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
creationTimestamp: "2020-07-09T15:46:34Z"
generation: 3
labels:
custom-crio: overlay-size
machineconfiguration.openshift.io/mco-built-in: ""
Save the changes, then view the machine configs:
$ oc get machineconfigs
New 99-worker-generated-containerruntime
and rendered-worker-xyz
objects are created:
99-worker-generated-containerruntime 4173030d89fbf4a7a0976d1665491a4d9a6e54f1 2.2.0 7m42s
rendered-worker-xyz 4173030d89fbf4a7a0976d1665491a4d9a6e54f1 2.2.0 7m36s
After those objects are created, monitor the machine config pool for the changes to be applied:
$ oc get mcp worker
The worker nodes show UPDATING
as True
, as well as the number of machines, the number updated, and other details:
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
worker rendered-worker-xyz False True False 3 2 2 0 20h
When complete, the worker nodes transition back to UPDATING
as False
, and the UPDATEDMACHINECOUNT
number matches the MACHINECOUNT
:
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
worker rendered-worker-xyz True False False 3 3 3 0 20h
Looking at a worker machine, you see that the new 8 GB max size configuration is applied to all of the workers:
head -n 7 /etc/containers/storage.conf
[storage]
driver = "overlay"
runroot = "/var/run/containers/storage"
graphroot = "/var/lib/containers/storage"
[storage.options]
additionalimagestores = []
size = "8G"
Looking inside a container, you see that the root partition is now 8 GB:
~ $ df -h
Filesystem Size Used Available Use% Mounted on
overlay 8.0G 8.0K 8.0G 0% /