This is a cache of https://docs.okd.io/latest/hardware_enablement/kmm-release-notes.html. It is a snapshot of the page at 2025-07-08T19:19:43.998+0000.
Kernel Module Management Operator release notes | Specialized hardware and driver enablement | OKD 4
×

Release notes for Kernel Module Management Operator 2.2

New features

  • KMM is now using the CRI-O container engine to pull container images in the worker pod instead of using HTTP calls directly from the worker container. For more information, see Example Module CR.

  • The Kernel Module Management (KMM) Operator images are now based on rhel-els-minimal container images instead of the rhel-els images. This change results in a greatly reduced image footprint, while still maintaining FIPS compliance.

  • In this release, the firmware search path has been updated to copy the contents of the specified path into the path specified in worker.setFirmwareClassPath (default: /var/lib/firmware). For more information, see Example Module CR.

  • For each node running a kernel matching the regular expression, KMM now checks if you have included a tag or a digest. If you have not specified a tag or digest in the container image, then the validation webhook returns an error and does not apply the module. For more information, see Example Module CR.

Release notes for Kernel Module Management Operator 2.3

New features

  • In this release, KMM uses version 1.23 of the Golang programming language to ensure test continuity for partners.

Release notes for Kernel Module Management Operator 2.4

New features and enhancements

  • In this release, you now have the option to configure the Kernel Module Management (KMM) module to not load an out-of-tree kernel driver and use the in-tree driver instead, and run only the device plugin. For more information see Using in-tree modules with the device plugin.

  • In this release, KMM configurations are now persistent after cluster and KMM Operator upgrades and redeployments of KMM.

    In earlier releases, a cluster or KMM upgrade, or any other action, such as upgrading a non-default configuration like the firmware path that redeploys KMM, could create the need to reconfigure KMM. In this release, KMM configurations now remain persistent regardless of any of such actions.

  • Improvements have been added to KMM so that GPU Operator vendors do not need to replicate KMM functionality in their code, but instead use KMM as is. This change greatly improves Operators' code size, tests, and reliability.

  • In this release, KMM no longer uses HTTP(S) direct requests to check if a kmod image exists. Instead, CRI-O is used internally to check for the images. This mitigates the need to access container image registries directly from HTTP(S) requests and manually handle tasks such as reading /etc/containers/registries.conf for mirroring configuration, accessing the image cluster resource for TLS configuration, mounting the CAs from the node, and maintaining your own cache in Hub & Spoke.

  • The KMM and KMM-hub Operators have been assigned the "Meets Best Practices" label in the Red Hat Catalog.

  • You can now install KMM on compute nodes, if needed. Previously, it was not possible to deploy workloads on the control-plane nodes. Because the compute nodes do not have the node-role.kubernetes.io/control-plane or node-role.kubernetes.io/master labels, the Kernel Module Management Operator might need further configurations. An internal code change has resolved this issue.

  • In this release, the heartbeat filter for the NMC reconciler has been updated to filter the following events on nodes:

    • node.spec

    • metadata.labels

    • status.nodeInfo

    • status.conditions[] (NodeReady only) and still filtering heartbeats

Notable technical changes

  • In this release, the preflight validation resource in the cluster has been modified. You can use the preflight validation to verify kernel modules to be installed on the nodes after cluster upgrades and possible kernel upgrades. Preflight validation also reports on the status and progress of each module in the cluster that it attempts or has attempted to validate. For more information, see Preflight validation for Kernel Module Management (KMM) Modules.

  • A requirement when creating a kmod image is that both the .ko kernel module files and the cp binary must be included, which is required for copying files during the image loading process. For more information, see Creating a kmod image.

  • The capabilities field that refers to the Operator maturity level has been changed from Basic Install to Seamless upgrades. Basic Install indicates that the Operator does not have an upgrade option. This is not the case for KMM, where seamless upgrades are supported.

Bug fixes

  • Webhook deployment has been renamed from webhook-server to webhook.

    • Cause: Generating files with controller-gen generated a service called webhook-service that is not configurable. And, when deploying KMM with Operator Lifecycle Manager (OLM), OLM deploys a service for the webhook called -service.

    • Consequence: Two services were generated for the same deployment. One generated by controller-gen and added to the bundle manifests and the other that the OLM created.

    • Fix: Make OLM find an already existing service called webhook-service in the cluster because the deployment is called webhook.

    • Result: A second service is no longer created.

  • Using imageRepoSecret object in conjunction with DTK as the image stream results in authorization required error.

    • Cause: On the Kernel Module Management (KMM) Operator, when you set imageRepoSecret object in the KMM module, and the build’s resulting container image is defined to be stored in the cluster’s internal registry, the build fails to push the final image and generate an authorization required error.

    • Consequence: The KMM Operator does not work as expected.

    • Fix: When the imageRepoSecret object is user-defined, it is used as both a pull and push secret by the build process. To support using the cluster’s internal registry, you must add the authorization token for that registry to the imageRepoSecret object. You can obtain the token from the "build" service account of the KMM module’s namespace.

    • Result: The KMM Operator works as expected.

  • Creating or deleting the image or creating an MCM module does not load the module on the spoke.

    • Cause: In a hub and spoke environment, when creating or deleting the image in registry, or when creating a ManagedClusterModule (MCM), the module on the spoke cluster is not loaded.

    • Consequence: The module on the spoke is not created.

    • Fix: Remove the cache package and image translation from the hub and spoke environment.

    • Result: The module on the spoke is created for the second time the MCM object is created.

  • KMM cannot pull images from the private registry while doing in-cluster builds.

    • Cause: The Kernel Module Management (KMM) Operator cannot pull images from private registry while doing in-cluster builds.

    • Consequence: Images in private registries that are used in the build process can not be pulled.

    • Fix: The imageRepoSecret object configuration is now also used in the build process. The imageRepoSecret object specified must include all registries that are being used.

    • Result: You can now use private registries when doing in-cluster builds.

  • KMM worker pod is orphaned when deleting a module with a container image that can not be pulled.

    • Cause: A Kernel Module Management (KMM) Operator worker pod is orphaned when deleting a module with a container image that can not be pulled.

    • Consequence: Failing worker pods are left on the cluster and at no point being collected for garbage.

    • Fix: KMM, now collects orphaned failing pods upon the modules deletion for garbage.

    • Result: The module is successfully deleted, and all associated orphaned failing pods are also deleted.

  • The KMM Operator tries to create a MIC even when the node selector does not match.

    • Cause: The Kernel Module Management (KMM) Operator tries to create a 'ModuleImagesConfig' (MIC) resource even when the node selector does not match with any actual nodes and fails.

    • Consequence: The KMM Operator reports an error when reconciling a module that does not target any node.

    • Fix: The Images field in the MIC resource is now optional.

    • Result: The KMM Operator can successfully create the MIC resource even when there are no images in it.

  • KMM does not reload the kernel module in case the node reboot sequence is too quick.

    • Cause: The Kernel Module Management (KMM) Operator does not reload the kernel module in case the node reboot sequence is too quick. The reboot is determined based on the timestamp of the status condition being later than the timestamp in the Node Machine Configuration (NMC) status.

    • Consequence: When the reboot happens quickly, in less time than the grace period, the node state does not change. After the node reboots, KMM does not load the kernel module again.

    • Fix: Instead of relying on the condition state, NMC can rely on the Status.NodeInfo.BootID field. This field is set by kubelet based on the /proc/sys/kernel/random/boot_id file of the server node, so it is updated after each reboot.

    • Result: The more accurate timestamps enable the Kernel Module Management (KMM) Operator to reload the kernel module after the node reboot sequence.

  • Filtering out node heartbeats events for the Node Machine Configuration (NMC) controller.

    • Cause: The NMC controller gets spammed with events from node heartbeats. The node heartbeats let the Kubernetes API server know that the node is still connected and functional.

    • Consequence: The spamming causes a constant reconciliation even when there is no module, and therefore no NMC, are applied to the cluster.

    • Fix: The NMC controller now filter the node’s heartbeat from its reconciliation loop.

    • Result: The NMC controller only gets real events and filters out node heartbeats.

  • NMC status contains toleration values, even though there are no tolerations in the NMC.spec or in the module.

    • Cause: The Node Machine Configuration (NMC) status contains toleration values, even though there are no tolerations in the NMC.spec or in the module.

    • Consequence: Tolerations other than Kernel Module Management-specific tolerations can appear in the status.

    • Fix: The NMC status now gets its toleration from a dedicated annotation rather than from the worker pod.

    • Result: The NMC status only contains the module’s tolerations.

  • The KMM Operator version 2.4 fails to start properly and cannot list the \modulebuildsignconfigs\ resource.

    • Cause: On the Kernel Module Management (KMM) Operator, when the Operator is installed using Red Hat Konflux, it does not start properly because the log files contain errors.

    • Consequence: The KMM Operator does not work as expected.

    • Fix: The Cluster Service Version (CSV) file is updated to list the \modulebuildsignconfigs\ and the moduleimagesconfig resources .

    • Result: The KMM Operator works as expected.

  • The Red Hat Konflux build does not include version and git commit ID in the Operator logs.

    • Cause: On the Kernel Module Management (KMM) Operator, when the Operator was built using Communications Platform as a Service (CPaas), the build included the Operator version and git commit ID in the log files. However, with Red Hat Konflux these details are not included in the log files.

    • Consequence: Important information is missing from the log files.

    • Fix: Some modifications are introduced in Konflux to resolve this issue.

    • Result: The KMM Operator build now includes the Operator version and git commit ID in the log files.

  • The KMM Operator does not load the module after node with taint is rebooted.

    • Cause: The Kernel Module Management (KMM) Operator does not reload the kernel module in case the node reboot sequence is too quick. The reboot is determined based on the timestamp of the status condition being later than the timestamp in the Node Machine Configuration (NMC) status.

    • Consequence: When the reboot happens quickly, in less time than the grace period, the node state does not change. After the node reboots, KMM does not load the kernel module again.

    • Fix: Instead of relying on the condition state, NMC can rely on the Status.NodeInfo.BootID field. This field is set by kubelet based on the /proc/sys/kernel/random/boot_id file of the server node, so it is updated after each reboot.

    • Result: The more accurate timestamps enable the Kernel Module Management (KMM) Operator to reload the kernel module after the node reboot sequence.

  • Redeploying a module that uses in-cluster builds fails with the ImagePullBackOff policy.

    • Cause: On the Kernel Module Management (KMM) Operator, the image pull policy for the puller pod and the worker pod is different.

    • Consequence: An image can be considered as existing while, in fact, it is not.

    • Fix: Make the image pull policy of the pull pod the same as the pull policy defined in the KMM module since its the same policy that is used by the worker pod.

    • Result: The MIC represents the state of the image in the same way the worker pod accesses it.

  • The MIC controller creates two pull-pods when it should create just one.

    • Cause: On the Kernel Module Management (KMM) Operator, the ModuleImagesConfig (MIC) controller may create multiple pull-pods for the same image.

    • Consequence: Resources are not used appropriately or as intended.

    • Fix: The CreateOrPatch MIC API receives a slice of ImageSpecs, as the input is created by going over the the target nodes and adding their images to the slice, so any duplicate ImageSpecs, are now filtered out.

    • Result: The KMM Operator works as expected.

  • The job.dcDelay example in the documentation should specify 0s instead of 0.

    • Cause: The Kernel Module Management (KMM) Operator default job.gcDelay duration field is 0s but the documentation mentions the value as 0.

    • Consequence: Entering a custom value of 60 instead of 60s or 1m might result in an error due to the wrong input type.

    • Fix: The job.gcDelay field in the documentation is updated to default value of 0s.

    • Result: Users are less likely to get confused.

  • The KMM Operator Hub environment does not work because of missing MIC and MBSC CRDs.

    • Cause: The Kernel Module Management (KMM) Operator hub environment only generates Custom Resource Definitions (CRD) files based on the api-hub/ directory. As a result, this does not contain some CRDs that are required for the KMM Operator Hub environment, such as, ModuleImagesConfig (MIC) resource and Managed Kubernetes Service (MBSC).

    • Consequence: The KMM Operator hub environment cannot work because it tries to start controllers reconciling CRDs that do not exist in the cluster.

    • Fix: The fix generates all CRD files into the config/crd-hub/bases directory, but only applies the resources to the cluster that it actually needs.

    • Result: The KMM Operator hub environment works as expected.

  • The KMM OperatorHub environment cannot build when finalisers are not set on a resource.

    • Cause: The Kernel Module Management (KMM) Operator displays an error with the ManagedClusterModule controller failing to build. This is due to the missing ModuleImagesConfig (MIC) resource finalizers and Role-based Action Control (RBAC) permissions for the KMM OperatorHub environment.

    • Consequence: The KMM OperatorHub environment cannot build images.

    • Fix: The RBAC permissions are updated to allow updating finalizers on the MIC resource, and then the appropriate rules created.

    • Result: The KMM OperatorHub environment builds images without errors with the ManagedClusterModule controller.

  • The PreflightValidationOCP custom resource, with a kernelVersion: tesdt causes the KMM Operator to panic.

    • Cause: Creating a PreflightValidationOCP custom resource (CR), with a kernelVersion flag that is set to tesdt, causes the Kernel Module Management (KMM) Operator to generate a panic runtime error.

    • Consequence: Entering invalid kernel versions causes the KMM Operator to panic.

    • Fix: A webhook - a method for one application to automatically send real-time data to another application when a specific event occurs - is now added to the PreflightValidationOCP CR.

    • Result: The PreflightValidationOCP CR with invalid kernel versions can no longer be applied to the cluster, therefore, preventing the Operator from generating a panic runtime error.

  • The PreFflightValidationOCP custom resource, with a kernelVersion flag that is different that the one of the cluster, does not work.

    • Cause: Creating a PreflightValidationOCP custom resource (CR), with a kernelVersion flag that is different from the one of the cluster, does not work.

    • Consequence: The Kernel Module Management (KMM) Operator is unable to find the Driver Toolkit (DTK) input image for the new kernel version.

    • Fix: You must use the PreflightValidationOCP CR and explicitly set the dtkImage field in the CR.

    • Result: Using the fields kernelVersion and dtkImage the feature can build installed modules for target OKD versions.

  • The KMM Operator version 2.4 documentation is updated with PreflightValidationOCP information.

    • Cause: Previously, when creating an PreflightValidationOCP CR, you were required to supply the release-image. This has now changed and you need to set the kernelVersion the dtkImage fields.

    • Consequence: The documentation was outdated and required an update.

    • Fix: The documentation is updated with the new support details.

    • Result: The KMM preflight feature is documented as expected.

Known issues

  • The ModuleUnloaded event does not appear when a module is Unloaded`.

    • Cause: When a module is Loaded (using the create a ModuleLoad event) or Unloaded ` (using the create a `ModuleUnloaded event) the events might not appear. This happens when you load and unload the kernel module in a quick succession.

    • Consequence: The ModuleLoad and the ModuleUnloaded events might not appear in OKD.

    • Fix: Introduce an alerting mechanism for this potential behavior and for awareness when working with modules.

    • Result: Not yet available.