KMM is now using the CRI-O container engine to pull container images in the worker pod instead of using HTTP calls directly from the worker container. For more information, see Example Module CR.
The Kernel Module Management (KMM) Operator images are now based on rhel-els-minimal
container images instead of the rhel-els
images. This change results in a greatly reduced image footprint, while still maintaining fips compliance.
In this release, the firmware search path has been updated to copy the contents of the specified path into the path specified in worker.setFirmwareClassPath (default: /var/lib/firmware). For more information, see Example Module CR.
For each node running a kernel matching the regular expression, KMM now checks if you have included a tag or a digest. If you have not specified a tag or digest in the container image, then the validation webhook returns an error and does not apply the module. For more information, see Example Module CR.
In this release, KMM uses version 1.23 of the Golang programming language to ensure test continuity for partners.
You can now schedule KMM pods by defining taints and tolerations. For more information, see Using tolerations for kernel module scheduling.
In this release, you now have the option to configure the Kernel Module Management (KMM) module to not load an out-of-tree kernel driver and use the in-tree driver instead, and run only the device plugin. For more information see Using in-tree modules with the device plugin.
In this release, KMM configurations are now persistent after cluster and KMM Operator upgrades and redeployments of KMM.
In earlier releases, a cluster or KMM upgrade, or any other action, such as upgrading a non-default configuration like the firmware path that redeploys KMM, could create the need to reconfigure KMM. In this release, KMM configurations now remain persistent regardless of any of such actions.
For more information, see Configuring the Kernel Module Management Operator.
Improvements have been added to KMM so that GPU Operator vendors do not need to replicate KMM functionality in their code, but instead use KMM as is. This change greatly improves Operators' code size, tests, and reliability.
In this release, KMM no longer uses HTTP(S) direct requests to check if a kmod image exists. Instead, CRI-O is used internally to check for the images. This mitigates the need to access container image registries directly from HTTP(S) requests and manually handle tasks such as reading /etc/containers/registries.conf
for mirroring configuration, accessing the image cluster resource for TLS configuration, mounting the CAs from the node, and maintaining your own cache in Hub & Spoke.
The KMM and KMM-hub Operators have been assigned the "Meets Best Practices" label in the Red Hat Catalog.
You can now install KMM on compute nodes, if needed. Previously, it was not possible to deploy workloads on the control-plane nodes. Because the compute nodes do not have the node-role.kubernetes.io/control-plane
or node-role.kubernetes.io/master
labels, the Kernel Module Management Operator might need further configurations. An internal code change has resolved this issue.
In this release, the heartbeat filter for the NMC reconciler has been updated to filter the following events on nodes:
node.spec
metadata.labels
status.nodeInfo
status.conditions[]
(NodeReady
only) and still filtering heartbeats
In this release, the preflight validation resource in the cluster has been modified. You can use the preflight validation to verify kernel modules to be installed on the nodes after cluster upgrades and possible kernel upgrades. Preflight validation also reports on the status and progress of each module in the cluster that it attempts or has attempted to validate. For more information, see Preflight validation for Kernel Module Management (KMM) Modules.
A requirement when creating a kmod image is that both the .ko
kernel module files and the cp
binary must be included, which is required for copying files during the image loading process.
For more information, see Creating a kmod image.
The capabilities
field that refers to the Operator maturity level has been changed from Basic Install
to Seamless upgrades
. Basic Install
indicates that the Operator does not have an upgrade option. This is not the case for KMM, where seamless upgrades are supported.
Webhook deployment has been renamed from webhook-server
to webhook
.
Cause: Generating files with controller-gen
generated a service called webhook-service
that is not configurable. And, when deploying KMM with Operator Lifecycle Manager (OLM), OLM deploys a service for the webhook called -service
.
Consequence: Two services were generated for the same deployment. One generated by controller-gen
and added to the bundle manifests and the other that the OLM created.
Fix: Make OLM find an already existing service called webhook-service
in the cluster because the deployment is called webhook
.
Result: A second service is no longer created.
Using imageRepoSecret
object in conjunction with DTK as the image stream results in authorization required
error.
Cause: On the Kernel Module Management (KMM) Operator, when you set imageRepoSecret
object in the KMM module, and the build’s resulting container image is defined to be stored in the cluster’s internal registry, the build fails to push the final image and generate an authorization required
error.
Consequence: The KMM Operator does not work as expected.
Fix: When the imageRepoSecret
object is user-defined, it is used as both a pull and push secret by the build process. To support using the cluster’s internal registry, you must add the authorization token for that registry to the imageRepoSecret
object. You can obtain the token from the "build" service account of the KMM module’s namespace.
Result: The KMM Operator works as expected.
Creating or deleting the image or creating an MCM module does not load the module on the spoke.
Cause: In a hub and spoke environment, when creating or deleting the image in registry, or when creating a ManagedClusterModule
(MCM), the module on the spoke cluster is not loaded.
Consequence: The module on the spoke is not created.
Fix: Remove the cache package and image translation from the hub and spoke environment.
Result: The module on the spoke is created for the second time the MCM object is created.
KMM cannot pull images from the private registry while doing in-cluster builds.
Cause: The Kernel Module Management (KMM) Operator cannot pull images from private registry while doing in-cluster builds.
Consequence: Images in private registries that are used in the build process can not be pulled.
Fix: The imageRepoSecret
object configuration is now also used in the build process. The imageRepoSecret
object specified must include all registries that are being used.
Result: You can now use private registries when doing in-cluster builds.
KMM worker pod is orphaned when deleting a module with a container image that can not be pulled.
Cause: A Kernel Module Management (KMM) Operator worker pod is orphaned when deleting a module with a container image that can not be pulled.
Consequence: Failing worker pods are left on the cluster and at no point being collected for garbage.
Fix: KMM, now collects orphaned failing pods upon the modules deletion for garbage.
Result: The module is successfully deleted, and all associated orphaned failing pods are also deleted.
The KMM Operator tries to create a MIC even when the node selector does not match.
Cause: The Kernel Module Management (KMM) Operator tries to create a 'ModuleImagesConfig' (MIC) resource even when the node selector does not match with any actual nodes and fails.
Consequence: The KMM Operator reports an error when reconciling a module that does not target any node.
Fix: The Images
field in the MIC resource is now optional.
Result: The KMM Operator can successfully create the MIC resource even when there are no images in it.
KMM does not reload the kernel module in case the node reboot sequence is too quick.
Cause: The Kernel Module Management (KMM) Operator does not reload the kernel module in case the node reboot sequence is too quick. The reboot is determined based on the timestamp of the status condition being later than the timestamp in the Node Machine Configuration (NMC) status.
Consequence: When the reboot happens quickly, in less time than the grace period, the node state does not change. After the node reboots, KMM does not load the kernel module again.
Fix: Instead of relying on the condition state, NMC can rely on the Status.NodeInfo.BootID
field. This field is set by kubelet based on the /proc/sys/kernel/random/boot_id
file of the server node, so it is updated after each reboot.
Result: The more accurate timestamps enable the Kernel Module Management (KMM) Operator to reload the kernel module after the node reboot sequence.
Filtering out node heartbeats events for the Node Machine Configuration (NMC) controller.
Cause: The NMC controller gets spammed with events from node heartbeats. The node heartbeats let the Kubernetes API server know that the node is still connected and functional.
Consequence: The spamming causes a constant reconciliation even when there is no module, and therefore no NMC, are applied to the cluster.
Fix: The NMC controller now filter the node’s heartbeat from its reconciliation loop.
Result: The NMC controller only gets real events and filters out node heartbeats.
NMC status contains toleration values, even though there are no tolerations in the NMC.spec
or in the module.
Cause: The Node Machine Configuration (NMC) status contains toleration values, even though there are no tolerations in the NMC.spec
or in the module.
Consequence: Tolerations other than Kernel Module Management-specific tolerations can appear in the status.
Fix: The NMC status now gets its toleration from a dedicated annotation rather than from the worker pod.
Result: The NMC status only contains the module’s tolerations.
The KMM Operator version 2.4 fails to start properly and cannot list the \modulebuildsignconfigs\
resource.
Cause: On the Kernel Module Management (KMM) Operator, when the Operator is installed using Red Hat Konflux, it does not start properly because the log files contain errors.
Consequence: The KMM Operator does not work as expected.
Fix: The Cluster Service Version (CSV) file is updated to list the \modulebuildsignconfigs\
and the moduleimagesconfig
resources .
Result: The KMM Operator works as expected.
The Red Hat Konflux build does not include version and git commit ID in the Operator logs.
Cause: On the Kernel Module Management (KMM) Operator, when the Operator was built using Communications Platform as a Service (CPaas), the build included the Operator version and git commit ID in the log files. However, with Red Hat Konflux these details are not included in the log files.
Consequence: Important information is missing from the log files.
Fix: Some modifications are introduced in Konflux to resolve this issue.
Result: The KMM Operator build now includes the Operator version and git commit ID in the log files.
The KMM Operator does not load the module after node with taint is rebooted.
Cause: The Kernel Module Management (KMM) Operator does not reload the kernel module in case the node reboot sequence is too quick. The reboot is determined based on the timestamp of the status condition being later than the timestamp in the Node Machine Configuration (NMC) status.
Consequence: When the reboot happens quickly, in less time than the grace period, the node state does not change. After the node reboots, KMM does not load the kernel module again.
Fix: Instead of relying on the condition state, NMC can rely on the Status.NodeInfo.BootID
field. This field is set by kubelet based on the /proc/sys/kernel/random/boot_id file of the server node, so it is updated after each reboot.
Result: The more accurate timestamps enable the Kernel Module Management (KMM) Operator to reload the kernel module after the node reboot sequence.
Redeploying a module that uses in-cluster builds fails with the ImagePullBackOff
policy.
Cause: On the Kernel Module Management (KMM) Operator, the image pull policy for the puller pod and the worker pod is different.
Consequence: An image can be considered as existing while, in fact, it is not.
Fix: Make the image pull policy of the pull pod the same as the pull policy defined in the KMM module since its the same policy that is used by the worker pod.
Result: The MIC represents the state of the image in the same way the worker pod accesses it.
The MIC controller creates two pull-pods when it should create just one.
Cause: On the Kernel Module Management (KMM) Operator, the ModuleImagesConfig
(MIC) controller may create multiple pull-pods for the same image.
Consequence: Resources are not used appropriately or as intended.
Fix: The CreateOrPatch
MIC API receives a slice of ImageSpecs
, as the input is created by going over the the target nodes and adding their images to the slice, so any duplicate ImageSpecs
, are now filtered out.
Result: The KMM Operator works as expected.
The job.dcDelay
example in the documentation should specify 0s
instead of 0
.
Cause: The Kernel Module Management (KMM) Operator default job.gcDelay
duration field is 0s
but the documentation mentions the value as 0
.
Consequence: Entering a custom value of 60
instead of 60s
or 1m
might result in an error due to the wrong input type.
Fix: The job.gcDelay
field in the documentation is updated to default value of 0s
.
Result: Users are less likely to get confused.
The KMM Operator Hub environment does not work because of missing MIC and MBSC CRDs.
Cause: The Kernel Module Management (KMM) Operator hub environment only generates Custom Resource Definitions (CRD) files based on the api-hub/
directory. As a result, this does not contain some CRDs that are required for the KMM Operator Hub environment, such as, ModuleImagesConfig
(MIC) resource and Managed Kubernetes Service (MBSC).
Consequence: The KMM Operator hub environment cannot work because it tries to start controllers reconciling CRDs that do not exist in the cluster.
Fix: The fix generates all CRD files into the config/crd-hub/bases
directory, but only applies the resources to the cluster that it actually needs.
Result: The KMM Operator hub environment works as expected.
The KMM OperatorHub environment cannot build when finalisers are not set on a resource.
Cause: The Kernel Module Management (KMM) Operator displays an error with the ManagedClusterModule
controller failing to build. This is due to the missing ModuleImagesConfig
(MIC) resource finalizers and Role-based Action Control (RBAC) permissions for the KMM OperatorHub environment.
Consequence: The KMM OperatorHub environment cannot build images.
Fix: The RBAC permissions are updated to allow updating finalizers on the MIC resource, and then the appropriate rules created.
Result: The KMM OperatorHub environment builds images without errors with the ManagedClusterModule
controller.
The PreflightValidationOCP
custom resource, with a kernelVersion: tesdt
causes the KMM Operator to panic.
Cause: Creating a PreflightValidationOCP
custom resource (CR), with a kernelVersion
flag that is set to tesdt
, causes the Kernel Module Management (KMM) Operator to generate a panic runtime error.
Consequence: Entering invalid kernel versions causes the KMM Operator to panic.
Fix: A webhook - a method for one application to automatically send real-time data to another application when a specific event occurs - is now added to the PreflightValidationOCP
CR.
Result: The PreflightValidationOCP
CR with invalid kernel versions can no longer be applied to the cluster, therefore, preventing the Operator from generating a panic runtime error.
The PreFflightValidationOCP
custom resource, with a kernelVersion
flag that is different that the one of the cluster, does not work.
Cause: Creating a PreflightValidationOCP
custom resource (CR), with a kernelVersion
flag that is different from the one of the cluster, does not work.
Consequence: The Kernel Module Management (KMM) Operator is unable to find the Driver Toolkit (DTK) input image for the new kernel version.
Fix: You must use the PreflightValidationOCP
CR and explicitly set the dtkImage
field in the CR.
Result: Using the fields kernelVersion
and dtkImage
the feature can build installed modules for target OKD versions.
The KMM Operator version 2.4 documentation is updated with PreflightValidationOCP
information.
Cause: Previously, when creating an PreflightValidationOCP
CR, you were required to supply the release-image. This has now changed and you need to set the kernelVersion
the dtkImage
fields.
Consequence: The documentation was outdated and required an update.
Fix: The documentation is updated with the new support details.
Result: The KMM preflight feature is documented as expected.
The ModuleUnloaded
event does not appear when a module is Unloaded`
.
Cause: When a module is Loaded
(using the create a ModuleLoad
event) or Unloaded ` (using the create a `ModuleUnloaded
event) the events might not appear. This happens when you load and unload the kernel module in a quick succession.
Consequence: The ModuleLoad
and the ModuleUnloaded
events might not appear in OKD.
Fix: Introduce an alerting mechanism for this potential behavior and for awareness when working with modules.
Result: Not yet available.