This is a cache of https://docs.openshift.com/rosa/virt/monitoring/virt-monitoring-vm-health.html. It is a snapshot of the page at 2024-11-24T03:29:27.582+0000.
Virtual machine health checks - Monitoring | Virtualization | Red Hat OpenShift <strong>service</strong> on AWS
×

About readiness and liveness probes

Use readiness and liveness probes to detect and handle unhealthy virtual machines (VMs). You can include one or more probes in the specification of the VM to ensure that traffic does not reach a VM that is not ready for it and that a new VM is created when a VM becomes unresponsive.

A readiness probe determines whether a VM is ready to accept service requests. If the probe fails, the VM is removed from the list of available endpoints until the VM is ready.

A liveness probe determines whether a VM is responsive. If the probe fails, the VM is deleted and a new VM is created to restore responsiveness.

You can configure readiness and liveness probes by setting the spec.readinessProbe and the spec.livenessProbe fields of the VirtualMachine object. These fields support the following tests:

HTTP GET

The probe determines the health of the VM by using a web hook. The test is successful if the HTTP response code is between 200 and 399. You can use an HTTP GET test with applications that return HTTP status codes when they are completely initialized.

TCP socket

The probe attempts to open a socket to the VM. The VM is only considered healthy if the probe can establish a connection. You can use a TCP socket test with applications that do not start listening until initialization is complete.

Guest agent ping

The probe uses the guest-ping command to determine if the QEMU guest agent is running on the virtual machine.

Defining an HTTP readiness probe

Define an HTTP readiness probe by setting the spec.readinessProbe.httpGet field of the virtual machine (VM) configuration.

Procedure
  1. Include details of the readiness probe in the VM configuration file.

    Sample readiness probe with an HTTP GET test
    apiVersion: kubevirt.io/v1
    kind: VirtualMachine
    metadata:
      annotations:
      name: fedora-vm
      namespace: example-namespace
    # ...
    spec:
      template:
        spec:
          readinessProbe:
            httpGet: (1)
              port: 1500 (2)
              path: /healthz (3)
              httpHeaders:
              - name: Custom-Header
                value: Awesome
            initialDelaySeconds: 120 (4)
            periodSeconds: 20 (5)
            timeoutSeconds: 10 (6)
            failureThreshold: 3 (7)
            successThreshold: 3 (8)
    # ...
    1 The HTTP GET request to perform to connect to the VM.
    2 The port of the VM that the probe queries. In the above example, the probe queries port 1500.
    3 The path to access on the HTTP server. In the above example, if the handler for the server’s /healthz path returns a success code, the VM is considered to be healthy. If the handler returns a failure code, the VM is removed from the list of available endpoints.
    4 The time, in seconds, after the VM starts before the readiness probe is initiated.
    5 The delay, in seconds, between performing probes. The default delay is 10 seconds. This value must be greater than timeoutSeconds.
    6 The number of seconds of inactivity after which the probe times out and the VM is assumed to have failed. The default value is 1. This value must be lower than periodSeconds.
    7 The number of times that the probe is allowed to fail. The default is 3. After the specified number of attempts, the pod is marked Unready.
    8 The number of times that the probe must report success, after a failure, to be considered successful. The default is 1.
  2. Create the VM by running the following command:

    $ oc create -f <file_name>.yaml

Defining a TCP readiness probe

Define a TCP readiness probe by setting the spec.readinessProbe.tcpSocket field of the virtual machine (VM) configuration.

Procedure
  1. Include details of the TCP readiness probe in the VM configuration file.

    Sample readiness probe with a TCP socket test
    apiVersion: kubevirt.io/v1
    kind: VirtualMachine
    metadata:
      annotations:
      name: fedora-vm
      namespace: example-namespace
    # ...
    spec:
      template:
        spec:
          readinessProbe:
            initialDelaySeconds: 120 (1)
            periodSeconds: 20 (2)
            tcpSocket: (3)
              port: 1500 (4)
            timeoutSeconds: 10 (5)
    # ...
    1 The time, in seconds, after the VM starts before the readiness probe is initiated.
    2 The delay, in seconds, between performing probes. The default delay is 10 seconds. This value must be greater than timeoutSeconds.
    3 The TCP action to perform.
    4 The port of the VM that the probe queries.
    5 The number of seconds of inactivity after which the probe times out and the VM is assumed to have failed. The default value is 1. This value must be lower than periodSeconds.
  2. Create the VM by running the following command:

    $ oc create -f <file_name>.yaml

Defining an HTTP liveness probe

Define an HTTP liveness probe by setting the spec.livenessProbe.httpGet field of the virtual machine (VM) configuration. You can define both HTTP and TCP tests for liveness probes in the same way as readiness probes. This procedure configures a sample liveness probe with an HTTP GET test.

Procedure
  1. Include details of the HTTP liveness probe in the VM configuration file.

    Sample liveness probe with an HTTP GET test
    apiVersion: kubevirt.io/v1
    kind: VirtualMachine
    metadata:
      annotations:
      name: fedora-vm
      namespace: example-namespace
    # ...
    spec:
      template:
        spec:
          livenessProbe:
            initialDelaySeconds: 120 (1)
            periodSeconds: 20 (2)
            httpGet: (3)
              port: 1500 (4)
              path: /healthz (5)
              httpHeaders:
              - name: Custom-Header
                value: Awesome
            timeoutSeconds: 10 (6)
    # ...
    1 The time, in seconds, after the VM starts before the liveness probe is initiated.
    2 The delay, in seconds, between performing probes. The default delay is 10 seconds. This value must be greater than timeoutSeconds.
    3 The HTTP GET request to perform to connect to the VM.
    4 The port of the VM that the probe queries. In the above example, the probe queries port 1500. The VM installs and runs a minimal HTTP server on port 1500 via cloud-init.
    5 The path to access on the HTTP server. In the above example, if the handler for the server’s /healthz path returns a success code, the VM is considered to be healthy. If the handler returns a failure code, the VM is deleted and a new VM is created.
    6 The number of seconds of inactivity after which the probe times out and the VM is assumed to have failed. The default value is 1. This value must be lower than periodSeconds.
  2. Create the VM by running the following command:

    $ oc create -f <file_name>.yaml

Defining a watchdog

You can define a watchdog to monitor the health of the guest operating system by performing the following steps:

  1. Configure a watchdog device for the virtual machine (VM).

  2. Install the watchdog agent on the guest.

The watchdog device monitors the agent and performs one of the following actions if the guest operating system is unresponsive:

  • poweroff: The VM powers down immediately. If spec.running is set to true or spec.runStrategy is not set to manual, then the VM reboots.

  • reset: The VM reboots in place and the guest operating system cannot react.

    The reboot time might cause liveness probes to time out. If cluster-level protections detect a failed liveness probe, the VM might be forcibly rescheduled, increasing the reboot time.

  • shutdown: The VM gracefully powers down by stopping all services.

Watchdog is not available for Windows VMs.

Configuring a watchdog device for the virtual machine

You configure a watchdog device for the virtual machine (VM).

Prerequisites
  • The VM must have kernel support for an i6300esb watchdog device. Red Hat Enterprise Linux (RHEL) images support i6300esb.

Procedure
  1. Create a YAML file with the following contents:

    apiVersion: kubevirt.io/v1
    kind: VirtualMachine
    metadata:
      labels:
        kubevirt.io/vm: vm2-rhel84-watchdog
      name: <vm-name>
    spec:
      running: false
      template:
        metadata:
          labels:
            kubevirt.io/vm: vm2-rhel84-watchdog
        spec:
          domain:
            devices:
              watchdog:
                name: <watchdog>
                i6300esb:
                  action: "poweroff" (1)
    # ...
    1 Specify poweroff, reset, or shutdown.

    The example above configures the i6300esb watchdog device on a RHEL8 VM with the poweroff action and exposes the device as /dev/watchdog.

    This device can now be used by the watchdog binary.

  2. Apply the YAML file to your cluster by running the following command:

    $ oc apply -f <file_name>.yaml
Verification

This procedure is provided for testing watchdog functionality only and must not be run on production machines.

  1. Run the following command to verify that the VM is connected to the watchdog device:

    $ lspci | grep watchdog -i
  2. Run one of the following commands to confirm the watchdog is active:

    • Trigger a kernel panic:

      # echo c > /proc/sysrq-trigger
    • Stop the watchdog service:

      # pkill -9 watchdog

Installing the watchdog agent on the guest

You install the watchdog agent on the guest and start the watchdog service.

Procedure
  1. Log in to the virtual machine as root user.

  2. Install the watchdog package and its dependencies:

    # yum install watchdog
  3. Uncomment the following line in the /etc/watchdog.conf file and save the changes:

    #watchdog-device = /dev/watchdog
  4. Enable the watchdog service to start on boot:

    # systemctl enable --now watchdog.service