Managing Compliance Operator remediation - Compliance Operator | Security and compliance

Filters for compliance check results
Reviewing a remediation
Applying remediation when using customized machine config pools
Evaluating KubeletConfig rules against default configuration values
Scanning custom node pools
Remediating KubeletConfig sub pools
Applying a remediation
Remediating a platform check manually
Updating remediations
Unapplying a remediation
Removing a KubeletConfig remediation
Inconsistent ComplianceScan
Additional resources

Each ComplianceCheckResult represents a result of one compliance rule check. If the rule can be remediated automatically, a ComplianceRemediation object with the same name, owned by the ComplianceCheckResult is created. Unless requested, the remediations are not applied automatically, which gives an OKD administrator the opportunity to review what the remediation does and only apply a remediation once it has been verified.

Filters for compliance check results

By default, the ComplianceCheckResult objects are labeled with several useful labels that allow you to query the checks and decide on the next steps after the results are generated.

List checks that belong to a specific suite:

$ oc get -n openshift-compliance compliancecheckresults \
  -l compliance.openshift.io/suite=workers-compliancesuite

List checks that belong to a specific scan:

$ oc get -n openshift-compliance compliancecheckresults \
-l compliance.openshift.io/scan=workers-scan

Not all ComplianceCheckResult objects create ComplianceRemediation objects. Only ComplianceCheckResult objects that can be remediated automatically do. A ComplianceCheckResult object has a related remediation if it is labeled with the compliance.openshift.io/automated-remediation label. The name of the remediation is the same as the name of the check.

List all failing checks that can be remediated automatically:

$ oc get -n openshift-compliance compliancecheckresults \
-l 'compliance.openshift.io/check-status=FAIL,compliance.openshift.io/automated-remediation'

List all failing checks sorted by severity:

$ oc get compliancecheckresults -n openshift-compliance \
-l 'compliance.openshift.io/check-status=FAIL,compliance.openshift.io/check-severity=high'

Example output

NAME                                                           STATUS   SEVERITY
nist-moderate-modified-master-configure-crypto-policy          FAIL     high
nist-moderate-modified-master-coreos-pti-kernel-argument       FAIL     high
nist-moderate-modified-master-disable-ctrlaltdel-burstaction   FAIL     high
nist-moderate-modified-master-disable-ctrlaltdel-reboot        FAIL     high
nist-moderate-modified-master-enable-fips-mode                 FAIL     high
nist-moderate-modified-master-no-empty-passwords               FAIL     high
nist-moderate-modified-master-selinux-state                    FAIL     high
nist-moderate-modified-worker-configure-crypto-policy          FAIL     high
nist-moderate-modified-worker-coreos-pti-kernel-argument       FAIL     high
nist-moderate-modified-worker-disable-ctrlaltdel-burstaction   FAIL     high
nist-moderate-modified-worker-disable-ctrlaltdel-reboot        FAIL     high
nist-moderate-modified-worker-enable-fips-mode                 FAIL     high
nist-moderate-modified-worker-no-empty-passwords               FAIL     high
nist-moderate-modified-worker-selinux-state                    FAIL     high
ocp4-moderate-configure-network-policies-namespaces            FAIL     high
ocp4-moderate-fips-mode-enabled-on-all-nodes                   FAIL     high

List all failing checks that must be remediated manually:

$ oc get -n openshift-compliance compliancecheckresults \
-l 'compliance.openshift.io/check-status=FAIL,!compliance.openshift.io/automated-remediation'

The manual remediation steps are typically stored in the description attribute in the ComplianceCheckResult object.

Table 1. ComplianceCheckResult Status
ComplianceCheckResult Status	Description
PASS	Compliance check ran to completion and passed.
FAIL	Compliance check ran to completion and failed.
INFO	Compliance check ran to completion and found something not severe enough to be considered an error.
MANUAL	Compliance check does not have a way to automatically assess the success or failure and must be checked manually.
INCONSISTENT	Compliance check reports different results from different sources, typically cluster nodes.
ERROR	Compliance check ran, but could not complete properly.
NOT-APPLICABLE	Compliance check did not run because it is not applicable or not selected.

Reviewing a remediation

Review both the ComplianceRemediation object and the ComplianceCheckResult object that owns the remediation. The ComplianceCheckResult object contains human-readable descriptions of what the check does and the hardening trying to prevent, as well as other metadata like the severity and the associated security controls. The ComplianceRemediation object represents a way to fix the problem described in the ComplianceCheckResult. After first scan, check for remediations with the state MissingDependencies.

Below is an example of a check and a remediation called sysctl-net-ipv4-conf-all-accept-redirects. This example is redacted to only show spec and status and omits metadata:

spec:
  apply: false
  current:
  object:
    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfig
    spec:
      config:
        ignition:
          version: 3.2.0
        storage:
          files:
            - path: /etc/sysctl.d/75-sysctl_net_ipv4_conf_all_accept_redirects.conf
              mode: 0644
              contents:
                source: data:,net.ipv4.conf.all.accept_redirects%3D0
  outdated: {}
status:
  applicationState: NotApplied

The remediation payload is stored in the spec.current attribute. The payload can be any Kubernetes object, but because this remediation was produced by a node scan, the remediation payload in the above example is a MachineConfig object. For Platform scans, the remediation payload is often a different kind of an object (for example, a ConfigMap or Secret object), but typically applying that remediation is up to the administrator, because otherwise the Compliance Operator would have required a very broad set of permissions to manipulate any generic Kubernetes object. An example of remediating a Platform check is provided later in the text.

To see exactly what the remediation does when applied, the MachineConfig object contents use the Ignition objects for the configuration. See the Ignition specification for further information about the format. In our example, the spec.config.storage.files[0].path attribute specifies the file that is being create by this remediation (/etc/sysctl.d/75-sysctl_net_ipv4_conf_all_accept_redirects.conf) and the spec.config.storage.files[0].contents.source attribute specifies the contents of that file.

The contents of the files are URL-encoded.

Use the following Python script to view the contents:

$ echo "net.ipv4.conf.all.accept_redirects%3D0" | python3 -c "import sys, urllib.parse; print(urllib.parse.unquote(''.join(sys.stdin.readlines())))"

Example output

net.ipv4.conf.all.accept_redirects=0

Applying remediation when using customized machine config pools

When you create a custom MachineConfigPool, add a label to the MachineConfigPool so that machineConfigPoolSelector present in the KubeletConfig can match the label with MachineConfigPool.

Do not set protectKernelDefaults: false in the KubeletConfig file, because the MachineConfigPool object might fail to unpause unexpectedly after the Compliance Operator finishes applying remediation.

Procedure

List the nodes.

$ oc get nodes -n openshift-compliance

Example output

NAME                                       STATUS  ROLES  AGE    VERSION
ip-10-0-128-92.us-east-2.compute.internal  Ready   master 5h21m  v1.23.3+d99c04f
ip-10-0-158-32.us-east-2.compute.internal  Ready   worker 5h17m  v1.23.3+d99c04f
ip-10-0-166-81.us-east-2.compute.internal  Ready   worker 5h17m  v1.23.3+d99c04f
ip-10-0-171-170.us-east-2.compute.internal Ready   master 5h21m  v1.23.3+d99c04f
ip-10-0-197-35.us-east-2.compute.internal  Ready   master 5h22m  v1.23.3+d99c04f

Add a label to nodes.

$ oc -n openshift-compliance \
label node ip-10-0-166-81.us-east-2.compute.internal \
node-role.kubernetes.io/<machine_config_pool_name>=

Example output

node/ip-10-0-166-81.us-east-2.compute.internal labeled

Create custom MachineConfigPool CR.

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: <machine_config_pool_name>
  labels:
    pools.operator.machineconfiguration.openshift.io/<machine_config_pool_name>: '' (1)
spec:
  machineConfigSelector:
  matchExpressions:
  - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,<machine_config_pool_name>]}
  nodeSelector:
  matchLabels:
    node-role.kubernetes.io/<machine_config_pool_name>: ""

1	The `labels` field defines label name to add for Machine config pool(MCP).

Verify MCP created successfully.
```
$ oc get mcp -w
```

Evaluating KubeletConfig rules against default configuration values

OKD infrastructure might contain incomplete configuration files at run time, and nodes assume default configuration values for missing configuration options. Some configuration options can be passed as command line arguments. As a result, the Compliance Operator cannot verify if the configuration file on the node is complete because it might be missing options used in the rule checks.

To prevent false negative results where the default configuration value passes a check, the Compliance Operator uses the Node/Proxy API to fetch the configuration for each node in a node pool, then all configuration options that are consistent across nodes in the node pool are stored in a file that represents the configuration for all nodes within that node pool. This increases the accuracy of the scan results.

No additional configuration changes are required to use this feature with default master and worker node pools configurations.

Scanning custom node pools

The Compliance Operator does not maintain a copy of each node pool configuration. The Compliance Operator aggregates consistent configuration options for all nodes within a single node pool into one copy of the configuration file. The Compliance Operator then uses the configuration file for a particular node pool to evaluate rules against nodes within that pool.

If your cluster uses custom node pools outside the default worker and master node pools, you must supply additional variables to ensure the Compliance Operator aggregates a configuration file for that node pool.

Procedure

To check the configuration against all pools in an example cluster containing master, worker, and custom example node pools, set the value of the ocp-var-role-master and opc-var-role-worker fields to example in the TailoredProfile object:

apiVersion: compliance.openshift.io/v1alpha1
kind: TailoredProfile
metadata:
  name: cis-example-tp
spec:
  extends: ocp4-cis
  title: My modified NIST profile to scan example nodes
  setValues:
  - name: ocp4-var-role-master
    value: example
    rationale: test for example nodes
  - name: ocp4-var-role-worker
    value: example
    rationale: test for example nodes
  description: cis-example-scan

Add the example role to the ScanSetting object that will be stored in the ScanSettingBinding CR:

apiVersion: compliance.openshift.io/v1alpha1
kind: ScanSetting
metadata:
  name: default
  namespace: openshift-compliance
rawResultStorage:
  rotation: 3
  size: 1Gi
roles:
- worker
- master
- example
scanTolerations:
- effect: NoSchedule
  key: node-role.kubernetes.io/master
  operator: Exists
schedule: '0 1 * * *'

Create a scan that uses the ScanSettingBinding CR:

apiVersion: compliance.openshift.io/v1alpha1
kind: ScanSettingBinding
metadata:
  name: cis
  namespace: openshift-compliance
profiles:
- apiGroup: compliance.openshift.io/v1alpha1
  kind: Profile
  name: ocp4-cis
- apiGroup: compliance.openshift.io/v1alpha1
  kind: Profile
  name: ocp4-cis-node
- apiGroup: compliance.openshift.io/v1alpha1
  kind: TailoredProfile
  name: cis-example-tp
settingsRef:
  apiGroup: compliance.openshift.io/v1alpha1
  kind: ScanSetting
  name: default

The Compliance Operator checks the runtime KubeletConfig through the Node/Proxy API object and then uses variables such as ocp-var-role-master and ocp-var-role-worker to determine the nodes it performs the check against. In the ComplianceCheckResult, the KubeletConfig rules are shown as ocp4-cis-kubelet-*. The scan passes only if all selected nodes pass this check.

Verification

The Platform KubeletConfig rules are checked through the Node/Proxy object. You can find those rules by running the following command:

$ oc get rules -o json | jq '.items[] | select(.checkType == "Platform") | select(.metadata.name | contains("ocp4-kubelet-")) | .metadata.name'

Remediating `KubeletConfig` sub pools

KubeletConfig remediation labels can be applied to MachineConfigPool sub-pools.

Procedure

Add a label to the sub-pool MachineConfigPool CR:

$ oc label mcp <sub-pool-name> pools.operator.machineconfiguration.openshift.io/<sub-pool-name>=

Applying a remediation

The boolean attribute spec.apply controls whether the remediation should be applied by the Compliance Operator. You can apply the remediation by setting the attribute to true:

$ oc -n openshift-compliance \
patch complianceremediations/<scan-name>-sysctl-net-ipv4-conf-all-accept-redirects \
--patch '{"spec":{"apply":true}}' --type=merge

After the Compliance Operator processes the applied remediation, the status.ApplicationState attribute would change to Applied or to Error if incorrect. When a machine config remediation is applied, that remediation along with all other applied remediations are rendered into a MachineConfig object named 75-$scan-name-$suite-name. That MachineConfig object is subsequently rendered by the Machine Config Operator and finally applied to all the nodes in a machine config pool by an instance of the machine control daemon running on each node.

Note that when the Machine Config Operator applies a new MachineConfig object to nodes in a pool, all the nodes belonging to the pool are rebooted. This might be inconvenient when applying multiple remediations, each of which re-renders the composite 75-$scan-name-$suite-name MachineConfig object. To prevent applying the remediation immediately, you can pause the machine config pool by setting the .spec.paused attribute of a MachineConfigPool object to true.

The Compliance Operator can apply remediations automatically. Set autoApplyRemediations: true in the ScanSetting top-level object.

Applying remediations automatically should only be done with careful consideration.

Remediating a platform check manually

Checks for Platform scans typically have to be remediated manually by the administrator for two reasons:

It is not always possible to automatically determine the value that must be set. One of the checks requires that a list of allowed registries is provided, but the scanner has no way of knowing which registries the organization wants to allow.
Different checks modify different API objects, requiring automated remediation to possess root or superuser access to modify objects in the cluster, which is not advised.

Procedure

The example below uses the ocp4-ocp-allowed-registries-for-import rule, which would fail on a default OKD installation. Inspect the rule oc get rule.compliance/ocp4-ocp-allowed-registries-for-import -oyaml, the rule is to limit the registries the users are allowed to import images from by setting the allowedRegistriesForImport attribute, The warning attribute of the rule also shows the API object checked, so it can be modified and remediate the issue:

$ oc edit image.config.openshift.io/cluster

Example output

apiVersion: config.openshift.io/v1
kind: Image
metadata:
  annotations:
    release.openshift.io/create-only: "true"
  creationTimestamp: "2020-09-10T10:12:54Z"
  generation: 2
  name: cluster
  resourceVersion: "363096"
  selfLink: /apis/config.openshift.io/v1/images/cluster
  uid: 2dcb614e-2f8a-4a23-ba9a-8e33cd0ff77e
spec:
  allowedRegistriesForImport:
  - domainName: registry.redhat.io
status:
  externalRegistryHostnames:
  - default-route-openshift-image-registry.apps.user-cluster-09-10-12-07.devcluster.openshift.com
  internalRegistryHostname: image-registry.openshift-image-registry.svc:5000

Re-run the scan:

$ oc -n openshift-compliance \
annotate compliancescans/rhcos4-e8-worker compliance.openshift.io/rescan=

Updating remediations

When a new version of compliance content is used, it might deliver a new and different version of a remediation than the previous version. The Compliance Operator will keep the old version of the remediation applied. The OKD administrator is also notified of the new version to review and apply. A ComplianceRemediation object that had been applied earlier, but was updated changes its status to Outdated. The outdated objects are labeled so that they can be searched for easily.

The previously applied remediation contents would then be stored in the spec.outdated attribute of a ComplianceRemediation object and the new updated contents would be stored in the spec.current attribute. After updating the content to a newer version, the administrator then needs to review the remediation. As long as the spec.outdated attribute exists, it would be used to render the resulting MachineConfig object. After the spec.outdated attribute is removed, the Compliance Operator re-renders the resulting MachineConfig object, which causes the Operator to push the configuration to the nodes.

Procedure

Search for any outdated remediations:
```
$ oc -n openshift-compliance get complianceremediations \
-l complianceoperator.openshift.io/outdated-remediation=
```
Example output
```
NAME                              STATE
workers-scan-no-empty-passwords   Outdated
```
The currently applied remediation is stored in the Outdated attribute and the new, unapplied remediation is stored in the Current attribute. If you are satisfied with the new version, remove the Outdated field. If you want to keep the updated content, remove the Current and Outdated attributes.

Apply the newer version of the remediation:

$ oc -n openshift-compliance patch complianceremediations workers-scan-no-empty-passwords \
--type json -p '[{"op":"remove", "path":/spec/outdated}]'

The remediation state will switch from Outdated to Applied:

$ oc get -n openshift-compliance complianceremediations workers-scan-no-empty-passwords

Example output

NAME                              STATE
workers-scan-no-empty-passwords   Applied

The nodes will apply the newer remediation version and reboot.

Unapplying a remediation

It might be required to unapply a remediation that was previously applied.

Procedure

Set the apply flag to false:

$ oc -n openshift-compliance \
patch complianceremediations/rhcos4-moderate-worker-sysctl-net-ipv4-conf-all-accept-redirects \
--patch '{"spec":{"apply":false}}' --type=merge

The remediation status will change to NotApplied and the composite MachineConfig object would be re-rendered to not include the remediation.

All affected nodes with the remediation will be rebooted.

Removing a KubeletConfig remediation

KubeletConfig remediations are included in node-level profiles. In order to remove a KubeletConfig remediation, you must manually remove it from the KubeletConfig objects. This example demonstrates how to remove the compliance check for the one-rule-tp-node-master-kubelet-eviction-thresholds-set-hard-imagefs-available remediation.

Procedure

Locate the scan-name and compliance check for the one-rule-tp-node-master-kubelet-eviction-thresholds-set-hard-imagefs-available remediation:

$ oc -n openshift-compliance get remediation \ one-rule-tp-node-master-kubelet-eviction-thresholds-set-hard-imagefs-available -o yaml

Example output

apiVersion: compliance.openshift.io/v1alpha1
kind: ComplianceRemediation
metadata:
  annotations:
    compliance.openshift.io/xccdf-value-used: var-kubelet-evictionhard-imagefs-available
  creationTimestamp: "2022-01-05T19:52:27Z"
  generation: 1
  labels:
    compliance.openshift.io/scan-name: one-rule-tp-node-master (1)
    compliance.openshift.io/suite: one-rule-ssb-node
  name: one-rule-tp-node-master-kubelet-eviction-thresholds-set-hard-imagefs-available
  namespace: openshift-compliance
  ownerReferences:
  - apiVersion: compliance.openshift.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: ComplianceCheckResult
    name: one-rule-tp-node-master-kubelet-eviction-thresholds-set-hard-imagefs-available
    uid: fe8e1577-9060-4c59-95b2-3e2c51709adc
  resourceVersion: "84820"
  uid: 5339d21a-24d7-40cb-84d2-7a2ebb015355
spec:
  apply: true
  current:
    object:
      apiVersion: machineconfiguration.openshift.io/v1
      kind: KubeletConfig
      spec:
        kubeletConfig:
          evictionHard:
            imagefs.available: 10% (2)
  outdated: {}
  type: Configuration
status:
  applicationState: Applied

1	The scan name of the remediation.
2	The remediation that was added to the `KubeletConfig` objects.

If the remediation invokes an evictionHard kubelet configuration, you must specify all of the evictionHard parameters: memory.available, nodefs.available, nodefs.inodesFree, imagefs.available, and imagefs.inodesFree. If you do not specify all parameters, only the specified parameters are applied and the remediation will not function properly.

Remove the remediation:

Set apply to false for the remediation object:

$ oc -n openshift-compliance patch \
complianceremediations/one-rule-tp-node-master-kubelet-eviction-thresholds-set-hard-imagefs-available \
-p '{"spec":{"apply":false}}' --type=merge

Using the scan-name, find the KubeletConfig object that the remediation was applied to:

$ oc -n openshift-compliance get kubeletconfig \
--selector compliance.openshift.io/scan-name=one-rule-tp-node-master

Example output

NAME                                 AGE
compliance-operator-kubelet-master   2m34s

Manually remove the remediation, imagefs.available: 10%, from the KubeletConfig object:
```
$ oc edit -n openshift-compliance KubeletConfig compliance-operator-kubelet-master
```
All affected nodes with the remediation will be rebooted.

You must also exclude the rule from any scheduled scans in your tailored profiles that auto-applies the remediation, otherwise, the remediation will be re-applied during the next scheduled scan.

Inconsistent ComplianceScan

The ScanSetting object lists the node roles that the compliance scans generated from the ScanSetting or ScanSettingBinding objects would scan. Each node role usually maps to a machine config pool.

It is expected that all machines in a machine config pool are identical and all scan results from the nodes in a pool should be identical.

If some of the results are different from others, the Compliance Operator flags a ComplianceCheckResult object where some of the nodes will report as INCONSISTENT. All ComplianceCheckResult objects are also labeled with compliance.openshift.io/inconsistent-check.

Because the number of machines in a pool might be quite large, the Compliance Operator attempts to find the most common state and list the nodes that differ from the common state. The most common state is stored in the compliance.openshift.io/most-common-status annotation and the annotation compliance.openshift.io/inconsistent-source contains pairs of hostname:status of check statuses that differ from the most common status. If no common state can be found, all the hostname:status pairs are listed in the compliance.openshift.io/inconsistent-source annotation.

If possible, a remediation is still created so that the cluster can converge to a compliant status. However, this might not always be possible and correcting the difference between nodes must be done manually. The compliance scan must be re-run to get a consistent result by annotating the scan with the compliance.openshift.io/rescan= option:

$ oc -n openshift-compliance \
annotate compliancescans/rhcos4-e8-worker compliance.openshift.io/rescan=

Additional resources

Modifying nodes.

Managing Compliance Operator result and remediation

Filters for compliance check results

Reviewing a remediation

Applying remediation when using customized machine config pools

Evaluating KubeletConfig rules against default configuration values

Scanning custom node pools

Remediating KubeletConfig sub pools

Applying a remediation

Remediating a platform check manually

Updating remediations

Unapplying a remediation

Removing a KubeletConfig remediation

Inconsistent ComplianceScan

Additional resources

Remediating `KubeletConfig` sub pools