Diagnostics Tool | Cluster Administration | OpenShift Container Platform 3.5

Overview
Using the Diagnostics Tool
Running Diagnostics in a Server Environment
Running Diagnostics in a Client Environment

Overview

The oc adm diagnostics command runs a series of checks for error conditions in the host or cluster. Specifically, it:

Verifies that the default registry and router are running and correctly configured.
Checks ClusterRoleBindings and ClusterRoles for consistency with base policy.
Checks that all of the client configuration contexts are valid and can be connected to.
Checks that SkyDNS is working properly and the pods have SDN connectivity.
Validates master and node configuration on the host.
Checks that nodes are running and available.
Analyzes host logs for known errors.
Checks that systemd units are configured as expected for the host.

Using the Diagnostics Tool

OpenShift Container Platform can be deployed in many ways: built from source, included in a VM image, in a container image, or as enterprise RPMs. Each method implies a different configuration and environment. To minimize environment assumptions, the diagnostics were added to the openshift binary so that wherever there is an OpenShift Container Platform server or client, the diagnostics can run in the exact same environment.

To use the diagnostics tool, preferably on a master host and as cluster administrator, run:

$ oc adm diagnostics

This runs all available diagnostics, skipping any that do not apply. For example, the NodeConfigCheck does not run unless a node configuration is available. You can also run specific diagnostics by name as you work to address issues. For example:

$ oc adm diagnostics NodeConfigCheck UnitStatus

Diagnostics look for configuration files in standard locations:

Client:
- As indicated by the $KUBECONFIG environment variable variable
- ~/.kube/config file
master:
- /etc/origin/master/master-config.yaml
Node:
- /etc/origin/node/node-config.yaml

Non-standard locations can be specified with flags (respectively, --config, --master-config, and --node-config). If a configuration file is not found or specified, related diagnostics are skipped.

Consult the output with the --help flag for all available options.

Running Diagnostics in a Server Environment

master and node diagnostics are most useful in a specific target environment, which is a deployment of RPMs with Ansible deployment logic. This provides some diagnostic benefits:

master and node configuration is based on a configuration file in a standard location.
Systemd units are configured to manage the server(s).
All components log to journald.

Having configuration files where Ansible places them means that you will generally not need to specify where to find them. Running oc adm diagnostics without flags will look for master and node configurations in the standard locations and use them if found; this should make the Ansible-installed use case as simple as possible. Also, it is easy to specify configuration files that are not in the expected locations:

$ oc adm diagnostics --master-config=<file_path> --node-config=<file_path>

Systemd units and logs entries in journald are necessary for the current log diagnostic logic. For other deployment types, logs may be going into files, to stdout, or may combine node and master. At this time, for these situations, log diagnostics are not able to work properly and will be skipped.

Running Diagnostics in a Client Environment

You may have access as an ordinary user, and/or as a cluster-admin user, and/or may be running on a host where OpenShift Container Platform master or node servers are operating. The diagnostics attempt to use as much access as the user has available.

A client with ordinary access should be able to diagnose its connection to the master and run a diagnostic pod. If multiple users or masters are configured, connections will be tested for all, but the diagnostic pod only runs against the current user, server, or project.

A client with cluster-admin access available (for any user, but only the current master) should be able to diagnose the status of infrastructure such as nodes, registry, and router. In each case, running oc adm diagnostics looks for the client configuration in its standard location and uses it if available.