This is a cache of https://docs.openshift.com/container-platform/4.16/support/troubleshooting/verifying-node-health.html. It is a snapshot of the page at 2024-11-27T08:56:44.019+0000.
Verifying node health - Troubleshooting | Support | OpenShift Container Platform 4.16
×

Reviewing node status, resource usage, and configuration

Review cluster node health status, resource consumption statistics, and node logs. Additionally, query kubelet status on individual nodes.

Prerequisites
  • You have access to the cluster as a user with the cluster-admin role.

  • You have installed the OpenShift CLI (oc).

Procedure
  • List the name, status, and role for all nodes in the cluster:

    $ oc get nodes
  • Summarize CPU and memory usage for each node within the cluster:

    $ oc adm top nodes
  • Summarize CPU and memory usage for a specific node:

    $ oc adm top node my-node

Querying the kubelet’s status on a node

You can review cluster node health status, resource consumption statistics, and node logs. Additionally, you can query kubelet status on individual nodes.

Prerequisites
  • You have access to the cluster as a user with the cluster-admin role.

  • Your API service is still functional.

  • You have installed the OpenShift CLI (oc).

Procedure
  1. The kubelet is managed using a systemd service on each node. Review the kubelet’s status by querying the kubelet systemd service within a debug pod.

    1. Start a debug pod for a node:

      $ oc debug node/my-node

      If you are running oc debug on a control plane node, you can find administrative kubeconfig files in the /etc/kubernetes/static-pod-resources/kube-apiserver-certs/secrets/node-kubeconfigs directory.

    2. Set /host as the root directory within the debug shell. The debug pod mounts the host’s root file system in /host within the pod. By changing the root directory to /host, you can run binaries contained in the host’s executable paths:

      # chroot /host

      OpenShift Container Platform 4.16 cluster nodes running Red Hat Enterprise Linux CoreOS (RHCOS) are immutable and rely on Operators to apply cluster changes. Accessing cluster nodes by using SSH is not recommended. However, if the OpenShift Container Platform API is not available, or kubelet is not properly functioning on the target node, oc operations will be impacted. In such situations, it is possible to access nodes using ssh core@<node>.<cluster_name>.<base_domain> instead.

    3. Check whether the kubelet systemd service is active on the node:

      # systemctl is-active kubelet
    4. Output a more detailed kubelet.service status summary:

      # systemctl status kubelet

Querying cluster node journal logs

You can gather journald unit logs and other logs within /var/log on individual cluster nodes.

Prerequisites
  • You have access to the cluster as a user with the cluster-admin role.

  • You have installed the OpenShift CLI (oc).

  • Your API service is still functional.

  • You have SSH access to your hosts.

Procedure
  1. Query kubelet journald unit logs from OpenShift Container Platform cluster nodes. The following example queries control plane nodes only:

    $ oc adm node-logs --role=master -u kubelet  (1)
    1 Replace kubelet as appropriate to query other unit logs.
  2. Collect logs from specific subdirectories under /var/log/ on cluster nodes.

    1. Retrieve a list of logs contained within a /var/log/ subdirectory. The following example lists files in /var/log/openshift-apiserver/ on all control plane nodes:

      $ oc adm node-logs --role=master --path=openshift-apiserver
    2. Inspect a specific log within a /var/log/ subdirectory. The following example outputs /var/log/openshift-apiserver/audit.log contents from all control plane nodes:

      $ oc adm node-logs --role=master --path=openshift-apiserver/audit.log
    3. If the API is not functional, review the logs on each node using SSH instead. The following example tails /var/log/openshift-apiserver/audit.log:

      $ ssh core@<master-node>.<cluster_name>.<base_domain> sudo tail -f /var/log/openshift-apiserver/audit.log

      OpenShift Container Platform 4.16 cluster nodes running Red Hat Enterprise Linux CoreOS (RHCOS) are immutable and rely on Operators to apply cluster changes. Accessing cluster nodes by using SSH is not recommended. Before attempting to collect diagnostic data over SSH, review whether the data collected by running oc adm must gather and other oc commands is sufficient instead. However, if the OpenShift Container Platform API is not available, or the kubelet is not properly functioning on the target node, oc operations will be impacted. In such situations, it is possible to access nodes using ssh core@<node>.<cluster_name>.<base_domain>.