This is a cache of https://docs.okd.io/latest/virt/managing_vms/advanced_vm_management/virt-configuring-virtual-gpus.html. It is a snapshot of the page at 2026-03-10T20:06:58.719+0000.
Configuring virtual GPUs - Managing VMs | Virtualization | OKD 4
×

About using virtual GPUs with OKD Virtualization

You can create vGPUs for your VMs using supported GPU cards. You can use the NVIDIA GPU Operator to manage the lifecycle and creation of these vGPUs on the cluster nodes. You must add these devices to the HyperConverged custom resource (CR) so that OKD Virtualization can discover and make them available to virtual machines.

Refer to your hardware vendor’s documentation for functionality and support details.

Mediated device

A physical device that is divided into one or more virtual devices. A vGPU is a type of mediated device (mdev); the performance of the physical GPU is divided among the virtual devices. You can assign mediated devices to one or more virtual machines (VMs), but the number of guests must be compatible with your GPU. Some GPUs do not support multiple guests.

Preparing hosts for mediated devices

You must enable the Input-Output Memory Management Unit (IOMMU) driver before you can configure mediated devices.

Adding kernel arguments to enable the IOMMU driver

To enable the IOMMU driver in the kernel, create the MachineConfig object and add the kernel arguments.

Prerequisites
  • You have cluster administrator permissions.

  • Your CPU hardware is Intel or AMD.

  • You enabled Intel Virtualization Technology for Directed I/O extensions or AMD IOMMU in the BIOS.

  • You have installed the OpenShift CLI (oc).

Procedure
  1. Create a MachineConfig object that identifies the kernel argument. The following example shows a kernel argument for an Intel CPU.

    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfig
    metadata:
      labels:
        machineconfiguration.openshift.io/role: worker
      name: 100-worker-iommu
    spec:
      config:
        ignition:
          version: 3.2.0
      kernelArguments:
          - intel_iommu=on
    # ...
    • metadata.labels.machineconfiguration.openshift.io/role specifies that the new kernel argument is applied only to worker nodes.

    • metadata.name specifies the ranking of this kernel argument (100) among the machine configs and its purpose. If you have an AMD CPU, specify the kernel argument as amd_iommu=on.

    • spec.kernelArguments specifies the kernel argument as intel_iommu for an Intel CPU.

  2. Create the new MachineConfig object:

    $ oc create -f 100-worker-kernel-arg-iommu.yaml
Verification
  1. Verify that the new MachineConfig object was added by entering the following command and observing the output:

    $ oc get MachineConfig

    Example output:

    NAME                                       IGNITIONVERSION                    AGE
    00-master                                   3.5.0                             164m
    00-worker                                   3.5.0                             164m
    01-master-container-runtime                 3.5.0                             164m
    01-master-kubelet                           3.5.0                             164m
    01-worker-container-runtime                 3.5.0                             164m
    01-worker-kubelet                           3.5.0                             164m
    100-master-chrony-configuration             3.5.0                             169m
    100-master-set-core-user-password           3.5.0                             169m
    100-worker-chrony-configuration             3.5.0                             169m
    100-worker-iommu                            3.5.0                             14s
  2. Verify that IOMMU is enabled at the operating system (OS) level by entering the following command:

    $ dmesg | grep -i iommu
    • If IOMMU is enabled, output is displayed as shown in the following example:

      Example output:

      Intel: [ 0.000000] DMAR: Intel(R) IOMMU Driver
      AMD: [ 0.000000] AMD-Vi: IOMMU Initialized

Configuring the NVIDIA GPU Operator

You can use the NVIDIA GPU Operator to provision worker nodes for running GPU-accelerated virtual machines (VMs) in OKD Virtualization.

The NVIDIA GPU Operator is supported only by NVIDIA. For more information, see Obtaining Support from NVIDIA in the Red Hat Knowledgebase.

Using the NVIDIA GPU Operator

You can use the NVIDIA GPU Operator with OKD Virtualization to accelerate the deployment of worker nodes for running GPU-enabled virtual machines (VMs). The NVIDIA GPU Operator manages NVIDIA GPU resources in an OKD cluster and automates tasks when preparing nodes for GPU workloads.

The NVIDIA GPU Operator can also facilitate provisioning complex artificial intelligence and machine learning (AI/ML) workloads.

Procedure
  1. Configure your ClusterPolicy manifest. Your ClusterPolicy manifest must match the provided example:

    apiVersion: nvidia.com/v1
    kind: ClusterPolicy
    metadata:
      name: gpu-cluster-policy
    spec:
      daemonsets:
        updateStrategy: RollingUpdate
      dcgm:
        enabled: true
      dcgmExporter: {}
      devicePlugin: {}
      driver:
        enabled: false
        kernelModuleType: auto
      gfd: {}
      mig:
        strategy: single
      migManager:
        enabled: true
      nodeStatusExporter:
        enabled: true
      operator:
        defaultRuntime: crio
        initContainer: {}
        runtimeClass: nvidia
        use_ocp_driver_toolkit: true
      sandboxDevicePlugin:
        enabled: true
      sandboxWorkloads:
        defaultWorkload: vm-vgpu
        enabled: true
      toolkit:
        enabled: true
        installDir: /usr/local/nvidia
      validator:
        plugin:
          env:
          - name: WITH_WORKLOAD
            value: "true"
      vfioManager:
        enabled: true
      vgpuDeviceManager:
        config:
          default: default
          name: vgpu-devices-config
        enabled: true
      vgpuManager:
        enabled: true
        image: <vgpu_image_name>
        repository: <vgpu_container_registry>
        version: <nvidia_vgpu_manager_version>

    where:

    <vgpu_image_name>

    Specifies the vGPU image name.

    <vgpu_container_registry>

    Specifies the vGPU container registry value.

    <nvidia_vgpu_manager_version>

    Specifies the version of the vGPU driver you have downloaded from the NVIDIA website and used to build the image.

  2. Use the NVIDIA GPU Operator to configure mediated devices. For more information see NVIDIA GPU Operator with OpenShift Virtualization.

Labeling nodes with a MIG-backed vGPU profile

If you have GPUs that support NVIDIA Multi-Instance GPU (MIG), you can select a MIG-backed vGPU instance instead of time-sliced vGPU instances. When you use MIG, you give a partition of dedicated hardware to selected VMs.

Prerequisites
  • You have configured vGPU support. For more information see MIG Support in OKD.

  • You have the NVIDIA GPU Operator version 25.10 or higher.

  • You are using the NVIDIA AI Enterprise (AIE) vGPU Manager image.

Procedure
  • Label the node with the name of the MIG-backed vGPU profile:

    $ oc label node <node> --overwrite nvidia.com/vgpu.config=<profile>
    • Replace <node> with the fully qualified domain name (FQDN) of your compute node.

    • Replace <profile> with a supported MIG profile.

Example command
$ oc label node worker_1 --overwrite nvidia.com/vgpu.config=A30-1-6C

For more information about MIG profiles, see the MIG User Guide.

Additional resources

Managing mediated devices

Before you can assign mediated devices to virtual machines, you must create the devices and expose them to the cluster. You can also reconfigure and remove mediated devices.

Creating and exposing mediated devices

As an administrator, you can create mediated devices and expose them to the cluster by editing the HyperConverged custom resource (CR). Before you edit the CR, explore a worker node to find the configuration values that are specific to your hardware devices.

Prerequisites
  • You installed the OpenShift CLI (oc).

  • You enabled the Input-Output Memory Management Unit (IOMMU) driver.

  • If your hardware vendor provides drivers, you installed them on the nodes where you want to create mediated devices.

Procedure
  1. Identify the name selector and resource name values for the mediated devices by exploring a worker node:

    1. Start a debugging session with the worker node by using the oc debug command. For example:

      $ oc debug node/node-11.redhat.com
    2. Change the root directory of the shell process to the file system of the host node by running the following command:

      # chroot /host
    3. Navigate to the mdev_bus directory and view its contents. Each subdirectory name is a PCI address of a physical GPU. For example:

      # cd sys/class/mdev_bus && ls

      Example output:

      0000:4b:00.4
    4. Go to the directory for your physical device and list the supported mediated device types as defined by the hardware vendor. For example:

      # cd 0000:4b:00.4 && ls mdev_supported_types

      Example output:

      nvidia-742  nvidia-744	nvidia-746  nvidia-748	nvidia-750  nvidia-752
      nvidia-743  nvidia-745	nvidia-747  nvidia-749	nvidia-751  nvidia-753
    5. Select the mediated device type that you want to use and identify its name selector value by viewing the contents of its name file. For example:

      # cat nvidia-745/name

      Example output:

      NVIDIA A2-2Q
  2. Open the HyperConverged CR in your default editor by running the following command:

    $ oc edit hyperconverged kubevirt-hyperconverged -n kubevirt-hyperconverged
  3. Create and expose the mediated devices by updating the configuration:

    1. Expose the mediated devices to the cluster by adding the mdevNameSelector and resourceName values to the spec.permittedHostDevices.mediatedDevices stanza. The resourceName value is based on the mdevNameSelector value, but you use underscores instead of spaces.

      Example HyperConverged CR:

      apiVersion: hco.kubevirt.io/v1
      kind: HyperConverged
      metadata:
        name: kubevirt-hyperconverged
        namespace: kubevirt-hyperconverged
      spec:
        permittedHostDevices:
          mediatedDevices:
          - mdevNameSelector: NVIDIA A2-2Q
            resourceName: nvidia.com/NVIDIA_A2-2Q
            externalResourceProvider: true
          - mdevNameSelector: NVIDIA A2-4Q
            resourceName: nvidia.com/NVIDIA_A2-4Q
            externalResourceProvider: true
      # ...

      where:

      mdevNameSelector

      Specifies the mediated devices that map to this value on the host.

      resourceName

      Specifies the matching resource name that is allocated on the node.

      externalResourceProvider

      Specifies that the device is handled by an external provider, such as the NVIDIA GPU Operator.

  4. Save your changes and exit the editor.

Verification
  • Confirm that the virtual GPU is attached to the node by running the following command:

    $ oc get node <node_name> -o json \
      | jq '.status.allocatable \
      | with_entries(select(.key | startswith("nvidia.com/"))) \
      | with_entries(select(.value != "0"))'

Removing mediated devices from the cluster

As a cluster administrator you can remove mediated devices from the cluster so that you can reallocate GPU hardware. To remove a mediated device from the cluster, delete the information for that device from the HyperConverged CR.

Prerequisites
  • You have installed the OpenShift CLI (oc).

Procedure
  1. Edit the HyperConverged CR in your default editor by running the following command:

    $ oc edit hyperconverged kubevirt-hyperconverged -n kubevirt-hyperconverged
  2. Remove the device information from the spec.permittedHostDevices stanza of the HyperConverged CR. For example:

    apiVersion: hco.kubevirt.io/v1
    kind: HyperConverged
    metadata:
      name: kubevirt-hyperconverged
      namespace: kubevirt-hyperconverged
    spec:
      permittedHostDevices:
        mediatedDevices:
        - mdevNameSelector: GRID T4-2Q
          resourceName: nvidia.com/GRID_T4-2Q
          externalResourceProvider: true
    • To remove the GRID T4-2Q device, delete the mdevNameSelector field and its corresponding resourceName field.

  3. Save your changes and exit the editor.

Using mediated devices

You can assign mediated devices to one or more virtual machines.

Assigning a vGPU to a VM by using the CLI

Assign mediated devices such as virtual GPUs (vGPUs) to virtual machines (VMs).

Prerequisites
  • The mediated device is configured in the HyperConverged custom resource.

  • The virtual machine (VM) is stopped.

Procedure
  • Assign the mediated device to a VM by editing the spec.domain.devices.gpus stanza of the VirtualMachine manifest.

    Example virtual machine manifest:

    apiVersion: kubevirt.io/v1
    kind: VirtualMachine
    spec:
      domain:
        devices:
          gpus:
          - deviceName: nvidia.com/TU104GL_Tesla_T4
            name: gpu1
          - deviceName: nvidia.com/GRID_T4-2Q
            name: gpu2
    • spec.template.spec.domain.devices.gpus.deviceName specifies the resource name associated with the mediated device.

    • spec.template.spec.domain.devices.gpus.name specifies a name to identify the device on the VM.

Verification
  • To verify that the device is available from the virtual machine, run the following command, substituting <device_name> with the deviceName value from the VirtualMachine manifest:

    $ lspci -nnk | grep <device_name>

Assigning a vGPU to a VM by using the web console

You can assign virtual GPUs to virtual machines by using the OKD web console.

You can add hardware devices to virtual machines created from customized templates or a YAML file. You cannot add devices to pre-supplied boot source templates for specific operating systems.

Prerequisites
  • The vGPU is configured as a mediated device in your cluster.

    • To view the devices that are connected to your cluster, click ComputeHardware Devices from the side menu.

  • The VM is stopped.

Procedure
  1. In the OKD web console, click VirtualizationVirtualMachines from the side menu.

  2. Select the VM that you want to assign the device to.

  3. On the Details tab, click GPU devices.

  4. Click Add GPU device.

  5. Enter an identifying value in the Name field.

  6. From the Device name list, select the device that you want to add to the VM.

  7. Click Save.

Verification
  • To confirm that the devices were added to the VM, click the YAML tab and review the VirtualMachine configuration. Mediated devices are added to the spec.domain.devices stanza.