$ oc adm cordon <node1> NAME STATUS ROLES AGE VERSION <node1> NotReady,SchedulingDisabled worker 1d v1.14.6+c4799753c
As an administrator, you can perform a number of tasks to make your clusters more efficient.
Evacuating pods allows you to migrate all or selected pods from a given node or nodes.
You can only evacuate pods backed by a replication controller. The replication controller creates new pods on other nodes and removes the existing pods from the specified node(s).
Bare pods, meaning those not backed by a replication controller, are unaffected by default. You can evacuate a subset of pods by specifying a pod-selector. Pod selectors are based on labels, so all the pods with the specified label will be evacuated.
Nodes must first be marked unschedulable to perform pod evacuation. $ oc adm cordon <node1> NAME STATUS ROLES AGE VERSION <node1> NotReady,SchedulingDisabled worker 1d v1.14.6+c4799753c Use $ oc adm uncordon <node1> |
The following command evacuates all or selected pods on one or more nodes:
$ oc adm drain <node1> <node2> [--pod-selector=<pod_selector>]
The following command forces deletion of bare pods using the --force
option. When set to
true
, deletion continues even if there are pods not managed by a replication
controller, ReplicaSet, job, daemonset, or StatefulSet:
$ oc adm drain <node1> <node2> --force=true
The following command sets a period of time in seconds for each pod to
terminate gracefully, use --grace-period
. If negative, the default value specified in the pod will
be used:
$ oc adm drain <node1> <node2> --grace-period=-1
The following command ignores DaemonSet-managed pods using the --ignore-daemonsets
flag set to true
:
$ oc adm drain <node1> <node2> --ignore-daemonsets=true
The following command sets the length of time to wait before giving up using the --timeout
flag. A
value of 0
sets an infinite length of time:
$ oc adm drain <node1> <node2> --timeout=5s
The following command deletes pods even if there are pods using emptyDir using the --delete-local-data
flag set to true
. Local data is deleted when the node
is drained:
$ oc adm drain <node1> <node2> --delete-local-data=true
The following command lists objects that will be migrated without actually performing the evacuation,
using the --dry-run
option set to true
:
$ oc adm drain <node1> <node2> --dry-run=true
Instead of specifying specific node names (for example, <node1> <node2>
), you
can use the --selector=<node_selector>
option to evacuate pods on selected
nodes.
You can update any label on a node.
Node labels are not persisted after a node is deleted even if the node is backed up by a Machine.
Any change to a MachineSet is not applied to existing machines owned by the MachineSet. For example, labels edited or added to an existing MachineSet are not propagated to existing machines and Nodes associated with the MachineSet. |
The following command adds or updates labels on a node:
$ oc label node <node> <key_1>=<value_1> ... <key_n>=<value_n>
For example:
$ oc label nodes webconsole-7f7f6 unhealthy=true
The following command updates all pods in the namespace:
$ oc label pods --all <key_1>=<value_1>
For example:
$ oc label pods --all status=unhealthy
By default, healthy nodes with a Ready
status are
marked as schedulable, meaning that new pods are allowed for placement on the
node. Manually marking a node as unschedulable blocks any new pods from being
scheduled on the node. Existing pods on the node are not affected.
The following command marks a node or nodes as unschedulable:
$ oc adm cordon <node>
For example:
$ oc adm cordon node1.example.com node/node1.example.com cordoned NAME LABELS STATUS node1.example.com kubernetes.io/hostname=node1.example.com Ready,SchedulingDisabled
The following command marks a currently unschedulable node or nodes as schedulable:
$ oc adm uncordon <node1>
Alternatively, instead of specifying specific node names (for example, <node>
), you can use the --selector=<node_selector>
option to mark selected
nodes as schedulable or unschedulable.
As of OpenShift Container Platform 4.2, you can configure master nodes to be schedulable, meaning that new Pods are allowed for placement on the master nodes. By default, master nodes are not schedulable. However, if your cluster does not contain any worker nodes, then master nodes are marked schedulable by default.
In version 4.2, the ability to create a cluster that does not have worker nodes is available to only clusters that are deployed on bare metal as a technology preview. For all other cluster types, you can set the masters to be schedulable but must retain worker nodes. |
You can allow or disallow master nodes to be schedulable by configuring the
mastersSchedulable
field.
Edit the schedulers.config.openshift.io
resource.
$ oc edit schedulers.config.openshift.io cluster
Configure the mastersSchedulable
field.
apiVersion: config.openshift.io/v1
kind: Scheduler
metadata:
creationTimestamp: "2019-09-10T03:04:05Z"
generation: 1
name: cluster
resourceVersion: "433"
selfLink: /apis/config.openshift.io/v1/schedulers/cluster
uid: a636d30a-d377-11e9-88d4-0a60097bee62
spec:
mastersSchedulable: false (1)
policy:
name: ""
status: {}
1 | Set to true to allow master nodes to be schedulable, or false to
disallow master nodes to be schedulable. |
Save the file to apply the changes.
When you delete a node using the CLI, the node object is deleted in Kubernetes, but the Pods that exist on the node are not deleted. Any bare Pods not backed by a replication controller become inaccessible to OpenShift Container Platform. Pods backed by replication controllers are rescheduled to other available nodes. You must delete local manifest Pods.
To delete a node from the OpenShift Container Platform cluster, edit the appropriate MachineSet:
If you are running cluster on bare metal, you cannot delete a node by editing MachineSets. MachineSets are only available when a cluster is integrated with a cloud provider. Instead you must unschedule and drain the node before manually deleting it. |
View the MachineSets that are in the cluster:
$ oc get machinesets -n openshift-machine-api
The MachineSets are listed in the form of <clusterid>-worker-<aws-region-az>.
Scale the MachineSet:
$ oc scale --replicas=2 machineset <machineset> -n openshift-machine-api
For more information on scaling your cluster using a MachineSet, see Manually scaling a MachineSet.
When you delete a node using the CLI, the node object is deleted in Kubernetes, but the Pods that exist on the node are not deleted. Any bare Pods not backed by a replication controller become inaccessible to OpenShift Container Platform. Pods backed by replication controllers are rescheduled to other available nodes. You must delete local manifest Pods.
Delete a node from a OpenShift Container Platform cluster running on bare metal by completing the following steps:
Mark the node as unschedulable:
$ oc adm cordon <node_name>
Drain all Pods on your node:
$ oc adm drain <node_name> --force=true
Delete your node from the cluster:
$ oc adm delete node <node_name>
Although the node object is now deleted from the cluster, it can still rejoin the cluster after reboot or if the kubelet service is restarted. To permanently delete the node and all its data, you must decommission the node.
In some special cases, you might want to add kernel arguments to a set of nodes in your cluster. This should only be done with caution and clear understanding of the implications of the arguments you set.
Improper use of kernel arguments can result in your systems becoming unbootable. |
Examples of kernel arguments you could set include:
selinux=0: Disables Security Enhanced Linux (SELinux). While not recommended for production, disabling SELinux can improve performance by 2% - 3%.
nosmt: Disables symmetric multithreading (SMT) in the kernel.
Multithreading allows multiple logical threads for each CPU.
You could consider nosmt
in multi-tenant environments to reduce
risks from potential cross-thread attacks. By disabling SMT, you essentially choose security over performance.
See Kernel.org kernel parameters for a list and descriptions of kernel arguments.
In the following procedure, you create a MachineConfig that identifies:
A set of machines to which you want to add the kernel argument. In this case, machines with a worker role.
Kernel arguments that are appended to the end of the existing kernel arguments.
A label that indicates where in the list of MachineConfigs the change is applied.
Have administrative privilege to a working OpenShift Container Platform cluster.
List existing MachineConfigs for your OpenShift Container Platform cluster to determine how to label your MachineConfig:
$ oc get MachineConfig NAME GENERATEDBYCONTROLLER IGNITIONVERSION CREATED 00-master 577c2d527b09cd7a481a162c50592139caa15e20 2.2.0 30m 00-worker 577c2d527b09cd7a481a162c50592139caa15e20 2.2.0 30m 01-master-container-runtime 577c2d527b09cd7a481a162c50592139caa15e20 2.2.0 30m 01-master-kubelet 577c2d527b09cd7a481a162c50592139caa15e20 2.2.0 30m 01-worker-container-runtime 577c2d527b09cd7a481a162c50592139caa15e20 2.2.0 30m 01-worker-kubelet 577c2d527b09cd7a481a162c50592139caa15e20 2.2.0 30m 99-master-1131169f-dae9-11e9-b5dd-12a845e8ffd8-registries 577c2d527b09cd7a481a162c50592139caa15e20 2.2.0 30m 99-master-ssh 2.2.0 30m 99-worker-114e8ac7-dae9-11e9-b5dd-12a845e8ffd8-registries 577c2d527b09cd7a481a162c50592139caa15e20 2.2.0 30m 99-worker-ssh 2.2.0 30m rendered-master-b3729e5f6124ca3678188071343115d0 577c2d527b09cd7a481a162c50592139caa15e20 2.2.0 30m rendered-worker-18ff9506c718be1e8bd0a066850065b7 577c2d527b09cd7a481a162c50592139caa15e20 2.2.0 30m
Create a MachineConfig file that identifies the kernel argument (for example, 05-worker-kernelarg-selinuxoff.yaml
)
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker(1)
name: 05-worker-kernelarg-selinuxoff(2)
spec:
config:
ignition:
version: 2.2.0
kernelArguments:
- selinux=0(3)
1 | Applies the new kernel argument only to worker nodes. |
2 | Named to identify where it fits among the MachineConfigs (05) and what it does (adds a kernel argument to turn off SELinux). |
3 | Identifies the exact kernel argument as selinux=0 . |
Create the new MachineConfig:
$ oc create -f 05-worker-kernelarg-selinuxoff.yaml
Check the MachineConfigs to see that the new one was added:
$ oc get MachineConfig NAME GENERATEDBYCONTROLLER IGNITIONVERSION CREATED 00-master 577c2d527b09cd7a481a162c50592139caa15e20 2.2.0 31m 00-worker 577c2d527b09cd7a481a162c50592139caa15e20 2.2.0 31m 01-master-container-runtime 577c2d527b09cd7a481a162c50592139caa15e20 2.2.0 31m 01-master-kubelet 577c2d527b09cd7a481a162c50592139caa15e20 2.2.0 31m 01-worker-container-runtime 577c2d527b09cd7a481a162c50592139caa15e20 2.2.0 31m 01-worker-kubelet 577c2d527b09cd7a481a162c50592139caa15e20 2.2.0 31m 05-worker-kernelarg-selinuxoff 2.2.0 105s 99-master-1131169f-dae9-11e9-b5dd-12a845e8ffd8-registries 577c2d527b09cd7a481a162c50592139caa15e20 2.2.0 31m 99-master-ssh 2.2.0 30m 99-worker-114e8ac7-dae9-11e9-b5dd-12a845e8ffd8-registries 577c2d527b09cd7a481a162c50592139caa15e20 2.2.0 31m 99-worker-ssh 2.2.0 31m rendered-master-b3729e5f6124ca3678188071343115d0 577c2d527b09cd7a481a162c50592139caa15e20 2.2.0 31m rendered-worker-18ff9506c718be1e8bd0a066850065b7 577c2d527b09cd7a481a162c50592139caa15e20 2.2.0 31m
Check the nodes:
$ oc get node NAME STATUS ROLES AGE VERSION ip-10-0-136-161.ec2.internal Ready worker 28m v1.14.6+90fadebfa ip-10-0-136-243.ec2.internal Ready master 34m v1.14.6+90fadebfa ip-10-0-141-105.ec2.internal Ready,SchedulingDisabled worker 28m v1.14.6+90fadebfa ip-10-0-142-249.ec2.internal Ready master 34m v1.14.6+90fadebfa ip-10-0-153-11.ec2.internal Ready worker 28m v1.14.6+90fadebfa ip-10-0-153-150.ec2.internal Ready master 34m v1.14.6+90fadebfa
You can see that scheduling on each worker node is disabled as the change is being applied.
Check that the kernel argument worked by going to one of the worker nodes and listing
the kernel command line arguments (in /proc/cmdline
on the host):
$ oc debug node/ip-10-0-141-105.ec2.internal Starting pod/ip-10-0-141-105ec2internal-debug ... To use host binaries, run `chroot /host` sh-4.2# cat /host/proc/cmdline BOOT_IMAGE=/ostree/rhcos-... console=tty0 console=ttyS0,115200n8 rootflags=defaults,prjquota rw root=UUID=fd0... ostree=/ostree/boot.0/rhcos/16... coreos.oem.id=qemu coreos.oem.id=ec2 ignition.platform.id=ec2 selinux=0 sh-4.2# exit
You should see the selinux=0
argument added to the other kernel arguments.
For more information on scaling your cluster using a MachineSet, see Manually scaling a MachineSet.