This is a cache of https://docs.okd.io/latest/virt/managing_vms/advanced_vm_management/virt-enabling-descheduler-evictions.html. It is a snapshot of the page at 2025-05-20T19:57:10.539+0000.
Enabling descheduler evictions on virtual machines - Managing VMs | Virtualization | OKD 4
×

You can use the descheduler to evict pods so that the pods can be rescheduled onto more appropriate nodes. If the pod is a virtual machine, the pod eviction causes the virtual machine to be live migrated to another node.

Descheduler profiles

Use the DevKubeVirtRelieveAndMigrate or LongLifecycle profile to enable the descheduler on a virtual machine.

You can not have both DevKubeVirtRelieveAndMigrate and LongLifeCycle enabled at the same time.

DevKubeVirtRelieveAndMigrate

This profile is an enhanced version of the LongLifeCycle profile.

The DevKubeVirtRelieveAndMigrate profile is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

The DevKubeVirtRelieveAndMigrate profile evicts pods from high-cost nodes to reduce overall resource expenses and enable workload migration. It also periodically rebalances workloads to help maintain similar spare capacity across nodes, which supports better handling of sudden workload spikes. Nodes can experience the following costs:

  • Resource utilization: Increased resource pressure raises the overhead for running applications.

  • Node maintenance: A higher number of containers on a node increases resource consumption and maintenance costs.

The profile enables the LowNodeUtilization strategy with the EvictionsInBackground alpha feature. The profile also exposes the following customization fields:

  • devActualUtilizationProfile: Enables load-aware descheduling.

  • devLowNodeUtilizationThresholds: Sets experimental thresholds for the LowNodeUtilization strategy. Do not use this field with devDeviationThresholds.

  • devDeviationThresholds: Treats nodes with below-average resource usage as underutilized to help redistribute workloads from overutilized nodes. Do not use this field with devLowNodeUtilizationThresholds. Supported values are: Low (10%:10%), Medium (20%:20%), High (30%:30%), AsymmetricLow (0%:10%), AsymmetricMedium (0%:20%), AsymmetricHigh (0%:30%).

  • devEnableSoftTainter: Enables the soft-tainting component to dynamically apply or remove soft taints as scheduling hints.

Example configuration
apiVersion: operator.openshift.io/v1
kind: KubeDescheduler
metadata:
  name: cluster
  namespace: openshift-kube-descheduler-operator
spec:
  managementState: Managed
  deschedulingIntervalSeconds: 30
  mode: "Automatic"
  profiles:
    - DevKubeVirtRelieveAndMigrate
  profileCustomizations:
    devEnableSoftTainter: true
    devDeviationThresholds: AsymmetricLow
    devActualUtilizationProfile: PrometheusCPUCombined

The DevKubeVirtRelieveAndMigrate profile requires PSI metrics to be enabled on all worker nodes. You can enable this by applying the following MachineConfig custom resource (CR):

Example MachineConfig CR
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 99-openshift-machineconfig-worker-psi-karg
spec:
  kernelArguments:
    - psi=1

You can use this profile with the SoftTopologyAndDuplicates profile to also rebalance pods based on soft topology constraints, which can be useful in hosted control plane environments.

LongLifecycle

This profile balances resource usage between nodes and enables the following strategies:

  • RemovePodsHavingTooManyRestarts: removes pods whose containers have been restarted too many times and pods where the sum of restarts over all containers (including Init Containers) is more than 100. Restarting the VM guest operating system does not increase this count.

  • LowNodeUtilization: evicts pods from overutilized nodes when there are any underutilized nodes. The destination node for the evicted pod will be determined by the scheduler.

    • A node is considered underutilized if its usage is below 20% for all thresholds (CPU, memory, and number of pods).

    • A node is considered overutilized if its usage is above 50% for any of the thresholds (CPU, memory, and number of pods).

Installing the descheduler

The descheduler is not available by default. To enable the descheduler, you must install the Kube Descheduler Operator from OperatorHub and enable one or more descheduler profiles.

By default, the descheduler runs in predictive mode, which means that it only simulates pod evictions. You must change the mode to automatic for the descheduler to perform the pod evictions.

If you have enabled hosted control planes in your cluster, set a custom priority threshold to lower the chance that pods in the hosted control plane namespaces are evicted. Set the priority threshold class name to hypershift-control-plane, because it has the lowest priority value (100000000) of the hosted control plane priority classes.

Prerequisites
  • You are logged in to OKD as a user with the cluster-admin role.

  • Access to the OKD web console.

  • Ensure that you have downloaded the pull secret from Red Hat OpenShift Cluster Manager as shown in Obtaining the installation program in the installation documentation for your platform.

    If you have the pull secret, add the redhat-operators catalog to the OperatorHub custom resource (CR) as shown in Configuring OKD to use Red Hat Operators.

Procedure
  1. Log in to the OKD web console.

  2. Create the required namespace for the Kube Descheduler Operator.

    1. Navigate to AdministrationNamespaces and click Create Namespace.

    2. Enter openshift-kube-descheduler-operator in the Name field, enter openshift.io/cluster-monitoring=true in the Labels field to enable descheduler metrics, and click Create.

  3. Install the Kube Descheduler Operator.

    1. Navigate to OperatorsOperatorHub.

    2. Type Kube Descheduler Operator into the filter box.

    3. Select the Kube Descheduler Operator and click Install.

    4. On the Install Operator page, select A specific namespace on the cluster. Select openshift-kube-descheduler-operator from the drop-down menu.

    5. Adjust the values for the Update Channel and Approval Strategy to the desired values.

    6. Click Install.

  4. Create a descheduler instance.

    1. From the OperatorsInstalled Operators page, click the Kube Descheduler Operator.

    2. Select the Kube Descheduler tab and click Create KubeDescheduler.

    3. Edit the settings as necessary.

      1. To evict pods instead of simulating the evictions, change the Mode field to Automatic.

      2. Expand the Profiles section and select LongLifecycle. The AffinityAndTaints profile is enabled by default.

        The only profile currently available for OKD Virtualization is LongLifecycle.

You can also configure the profiles and settings for the descheduler later using the OpenShift CLI (oc).

Enabling descheduler evictions on a virtual machine (VM)

After the descheduler is installed, you can enable descheduler evictions on your VM by adding an annotation to the VirtualMachine custom resource (CR).

Prerequisites
  • Install the descheduler in the OKD web console or OpenShift CLI (oc).

  • Ensure that the VM is not running.

Procedure
  1. Before starting the VM, add the descheduler.alpha.kubernetes.io/evict annotation to the VirtualMachine CR:

    apiVersion: kubevirt.io/v1
    kind: VirtualMachine
    spec:
      template:
        metadata:
          annotations:
            descheduler.alpha.kubernetes.io/evict: "true"
  2. Configure the KubeDescheduler object with the LongLifecycle profile and enable background evictions for improved VM eviction stability during live migration:

    apiVersion: operator.openshift.io/v1
    kind: KubeDescheduler
    metadata:
      name: cluster
      namespace: openshift-kube-descheduler-operator
    spec:
      deschedulingIntervalSeconds: 3600
      profiles:
      - LongLifecycle (1)
      mode: Predictive (2)
      profileCustomizations:
        devEnableEvictionsInBackground: true (3)
    1 You can only set the LongLifecycle profile. This profile balances resource usage between nodes.
    2 By default, the descheduler does not evict pods. To evict pods, set mode to Automatic.
    3 Enabling devEnableEvictionsInBackground allows evictions to occur in the background, improving stability and mitigating oscillatory behavior during live migrations.

The descheduler is now enabled on the VM.

Additional resources