This is a cache of https://docs.okd.io/4.15/edge_computing/ztp-advanced-policy-config.html. It is a snapshot of the page at 2024-11-26T21:19:13.022+0000.
Advanced managed cluster configuration with PolicyGenTemplate resources | Edge computing | OKD 4.15
×

Deploying additional changes to clusters

If you require cluster configuration changes outside of the base GitOps Zero Touch Provisioning (ZTP) pipeline configuration, there are three options:

Apply the additional configuration after the GitOps ZTP pipeline is complete

When the GitOps ZTP pipeline deployment is complete, the deployed cluster is ready for application workloads. At this point, you can install additional Operators and apply configurations specific to your requirements. Ensure that additional configurations do not negatively affect the performance of the platform or allocated CPU budget.

Add content to the GitOps ZTP library

The base source custom resources (CRs) that you deploy with the GitOps ZTP pipeline can be augmented with custom content as required.

Create extra manifests for the cluster installation

Extra manifests are applied during installation and make the installation process more efficient.

Providing additional source CRs or modifying existing source CRs can significantly impact the performance or CPU profile of OKD.

Using PolicyGenTemplate CRs to override source CRs content

PolicyGenTemplate custom resources (CRs) allow you to overlay additional configuration details on top of the base source CRs provided with the GitOps plugin in the ztp-site-generate container. You can think of PolicyGenTemplate CRs as a logical merge or patch to the base CR. Use PolicyGenTemplate CRs to update a single field of the base CR, or overlay the entire contents of the base CR. You can update values and insert fields that are not in the base CR.

The following example procedure describes how to update fields in the generated PerformanceProfile CR for the reference configuration based on the PolicyGenTemplate CR in the group-du-sno-ranGen.yaml file. Use the procedure as a basis for modifying other parts of the PolicyGenTemplate based on your requirements.

Prerequisites
  • Create a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for Argo CD.

Procedure
  1. Review the baseline source CR for existing content. You can review the source CRs listed in the reference PolicyGenTemplate CRs by extracting them from the GitOps Zero Touch Provisioning (ZTP) container.

    1. Create an /out folder:

      $ mkdir -p ./out
    2. Extract the source CRs:

      $ podman run --log-driver=none --rm registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.15.1 extract /home/ztp --tar | tar x -C ./out
  2. Review the baseline PerformanceProfile CR in ./out/source-crs/PerformanceProfile.yaml:

    apiVersion: performance.openshift.io/v2
    kind: PerformanceProfile
    metadata:
      name: $name
      annotations:
        ran.openshift.io/ztp-deploy-wave: "10"
    spec:
      additionalKernelArgs:
      - "idle=poll"
      - "rcupdate.rcu_normal_after_boot=0"
      cpu:
        isolated: $isolated
        reserved: $reserved
      hugepages:
        defaultHugepagesSize: $defaultHugepagesSize
        pages:
          - size: $size
            count: $count
            node: $node
      machineConfigPoolSelector:
        pools.operator.machineconfiguration.openshift.io/$mcp: ""
      net:
        userLevelNetworking: true
      nodeSelector:
        node-role.kubernetes.io/$mcp: ''
      numa:
        topologyPolicy: "restricted"
      realTimeKernel:
        enabled: true

    Any fields in the source CR which contain $…​ are removed from the generated CR if they are not provided in the PolicyGenTemplate CR.

  3. Update the PolicyGenTemplate entry for PerformanceProfile in the group-du-sno-ranGen.yaml reference file. The following example PolicyGenTemplate CR stanza supplies appropriate CPU specifications, sets the hugepages configuration, and adds a new field that sets globallyDisableIrqLoadBalancing to false.

    - fileName: PerformanceProfile.yaml
      policyName: "config-policy"
      metadata:
        name: openshift-node-performance-profile
      spec:
        cpu:
          # These must be tailored for the specific hardware platform
          isolated: "2-19,22-39"
          reserved: "0-1,20-21"
        hugepages:
          defaultHugepagesSize: 1G
          pages:
            - size: 1G
              count: 10
        globallyDisableIrqLoadBalancing: false
  4. Commit the PolicyGenTemplate change in Git, and then push to the Git repository being monitored by the GitOps ZTP argo CD application.

Example output

The GitOps ZTP application generates an RHACM policy that contains the generated PerformanceProfile CR. The contents of that CR are derived by merging the metadata and spec contents from the PerformanceProfile entry in the PolicyGenTemplate onto the source CR. The resulting CR has the following content:

---
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
    name: openshift-node-performance-profile
spec:
    additionalKernelArgs:
        - idle=poll
        - rcupdate.rcu_normal_after_boot=0
    cpu:
        isolated: 2-19,22-39
        reserved: 0-1,20-21
    globallyDisableIrqLoadBalancing: false
    hugepages:
        defaultHugepagesSize: 1G
        pages:
            - count: 10
              size: 1G
    machineConfigPoolSelector:
        pools.operator.machineconfiguration.openshift.io/master: ""
    net:
        userLevelNetworking: true
    nodeSelector:
        node-role.kubernetes.io/master: ""
    numa:
        topologyPolicy: restricted
    realTimeKernel:
        enabled: true

In the /source-crs folder that you extract from the ztp-site-generate container, the $ syntax is not used for template substitution as implied by the syntax. Rather, if the policyGen tool sees the $ prefix for a string and you do not specify a value for that field in the related PolicyGenTemplate CR, the field is omitted from the output CR entirely.

An exception to this is the $mcp variable in /source-crs YAML files that is substituted with the specified value for mcp from the PolicyGenTemplate CR. For example, in example/policygentemplates/group-du-standard-ranGen.yaml, the value for mcp is worker:

spec:
  bindingRules:
    group-du-standard: ""
  mcp: "worker"

The policyGen tool replace instances of $mcp with worker in the output CRs.

Adding custom content to the GitOps ZTP pipeline

Perform the following procedure to add new content to the GitOps ZTP pipeline.

Procedure
  1. Create a subdirectory named source-crs in the directory that contains the kustomization.yaml file for the PolicyGenTemplate custom resource (CR).

  2. Add your user-provided CRs to the source-crs subdirectory, as shown in the following example:

    example
    └── policygentemplates
        ├── dev.yaml
        ├── kustomization.yaml
        ├── mec-edge-sno1.yaml
        ├── sno.yaml
        └── source-crs (1)
            ├── PaoCatalogSource.yaml
            ├── PaoSubscription.yaml
            ├── custom-crs
            |   ├── apiserver-config.yaml
            |   └── disable-nic-lldp.yaml
            └── elasticsearch
                ├── ElasticsearchNS.yaml
                └── ElasticsearchOperatorGroup.yaml
    1 The source-crs subdirectory must be in the same directory as the kustomization.yaml file.
  3. Update the required PolicyGenTemplate CRs to include references to the content you added in the source-crs/custom-crs and source-crs/elasticsearch directories. For example:

    apiVersion: ran.openshift.io/v1
    kind: PolicyGenTemplate
    metadata:
      name: "group-dev"
      namespace: "ztp-clusters"
    spec:
      bindingRules:
        dev: "true"
      mcp: "master"
      sourceFiles:
        # These policies/CRs come from the internal container Image
        #Cluster Logging
        - fileName: ClusterLogNS.yaml
          remediationAction: inform
          policyName: "group-dev-cluster-log-ns"
        - fileName: ClusterLogOperGroup.yaml
          remediationAction: inform
          policyName: "group-dev-cluster-log-operator-group"
        - fileName: ClusterLogSubscription.yaml
          remediationAction: inform
          policyName: "group-dev-cluster-log-sub"
        #Local Storage Operator
        - fileName: StorageNS.yaml
          remediationAction: inform
          policyName: "group-dev-lso-ns"
        - fileName: StorageOperGroup.yaml
          remediationAction: inform
          policyName: "group-dev-lso-operator-group"
        - fileName: StorageSubscription.yaml
          remediationAction: inform
          policyName: "group-dev-lso-sub"
        #These are custom local polices that come from the source-crs directory in the git repo
        # Performance Addon Operator
        - fileName: PaoSubscriptionNS.yaml
          remediationAction: inform
          policyName: "group-dev-pao-ns"
        - fileName: PaoSubscriptionCatalogSource.yaml
          remediationAction: inform
          policyName: "group-dev-pao-cat-source"
          spec:
            image: <image_URL_here>
        - fileName: PaoSubscription.yaml
          remediationAction: inform
          policyName: "group-dev-pao-sub"
        #Elasticsearch Operator
        - fileName: elasticsearch/ElasticsearchNS.yaml (1)
          remediationAction: inform
          policyName: "group-dev-elasticsearch-ns"
        - fileName: elasticsearch/ElasticsearchOperatorGroup.yaml
          remediationAction: inform
          policyName: "group-dev-elasticsearch-operator-group"
        #Custom Resources
        - fileName: custom-crs/apiserver-config.yaml (1)
          remediationAction: inform
          policyName: "group-dev-apiserver-config"
        - fileName: custom-crs/disable-nic-lldp.yaml
          remediationAction: inform
          policyName: "group-dev-disable-nic-lldp"
    1 Set fileName to include the relative path to the file from the /source-crs parent directory.
  4. Commit the PolicyGenTemplate change in Git, and then push to the Git repository that is monitored by the GitOps ZTP Argo CD policies application.

  5. Update the ClusterGroupUpgrade CR to include the changed PolicyGenTemplate and save it as cgu-test.yaml. The following example shows a generated cgu-test.yaml file.

    apiVersion: ran.openshift.io/v1alpha1
    kind: ClusterGroupUpgrade
    metadata:
      name: custom-source-cr
      namespace: ztp-clusters
    spec:
      managedPolicies:
        - group-dev-config-policy
      enable: true
      clusters:
      - cluster1
      remediationStrategy:
        maxConcurrency: 2
        timeout: 240
  6. Apply the updated ClusterGroupUpgrade CR by running the following command:

    $ oc apply -f cgu-test.yaml
Verification
  • Check that the updates have succeeded by running the following command:

    $ oc get cgu -A
    Example output
    NAMESPACE     NAME               AGE   STATE        DETAILS
    ztp-clusters  custom-source-cr   6s    InProgress   Remediating non-compliant policies
    ztp-install   cluster1           19h   Completed    All clusters are compliant with all the managed policies

Configuring policy compliance evaluation timeouts for PolicyGenTemplate CRs

Use Red Hat Advanced Cluster Management (RHACM) installed on a hub cluster to monitor and report on whether your managed clusters are compliant with applied policies. RHACM uses policy templates to apply predefined policy controllers and policies. Policy controllers are Kubernetes custom resource definition (CRD) instances.

You can override the default policy evaluation intervals with PolicyGenTemplate custom resources (CRs). You configure duration settings that define how long a ConfigurationPolicy CR can be in a state of policy compliance or non-compliance before RHACM re-evaluates the applied cluster policies.

The GitOps Zero Touch Provisioning (ZTP) policy generator generates ConfigurationPolicy CR policies with pre-defined policy evaluation intervals. The default value for the noncompliant state is 10 seconds. The default value for the compliant state is 10 minutes. To disable the evaluation interval, set the value to never.

Prerequisites
  • You have installed the OpenShift CLI (oc).

  • You have logged in to the hub cluster as a user with cluster-admin privileges.

  • You have created a Git repository where you manage your custom site configuration data.

Procedure
  1. To configure the evaluation interval for all policies in a PolicyGenTemplate CR, add evaluationInterval to the spec field, and then set the appropriate compliant and noncompliant values. For example:

    spec:
      evaluationInterval:
        compliant: 30m
        noncompliant: 20s
  2. To configure the evaluation interval for the spec.sourceFiles object in a PolicyGenTemplate CR, add evaluationInterval to the sourceFiles field, for example:

    spec:
      sourceFiles:
       - fileName: SriovSubscription.yaml
         policyName: "sriov-sub-policy"
         evaluationInterval:
           compliant: never
           noncompliant: 10s
  3. Commit the PolicyGenTemplate CRs files in the Git repository and push your changes.

Verification

Check that the managed spoke cluster policies are monitored at the expected intervals.

  1. Log in as a user with cluster-admin privileges on the managed cluster.

  2. Get the pods that are running in the open-cluster-management-agent-addon namespace. Run the following command:

    $ oc get pods -n open-cluster-management-agent-addon
    Example output
    NAME                                         READY   STATUS    RESTARTS        AGE
    config-policy-controller-858b894c68-v4xdb    1/1     Running   22 (5d8h ago)   10d
  3. Check the applied policies are being evaluated at the expected interval in the logs for the config-policy-controller pod:

    $ oc logs -n open-cluster-management-agent-addon config-policy-controller-858b894c68-v4xdb
    Example output
    2022-05-10T15:10:25.280Z       info   configuration-policy-controller controllers/configurationpolicy_controller.go:166      Skipping the policy evaluation due to the policy not reaching the evaluation interval  {"policy": "compute-1-config-policy-config"}
    2022-05-10T15:10:25.280Z       info   configuration-policy-controller controllers/configurationpolicy_controller.go:166      Skipping the policy evaluation due to the policy not reaching the evaluation interval  {"policy": "compute-1-common-compute-1-catalog-policy-config"}

Signalling GitOps ZTP cluster deployment completion with validator inform policies

Create a validator inform policy that signals when the GitOps Zero Touch Provisioning (ZTP) installation and configuration of the deployed cluster is complete. This policy can be used for deployments of single-node OpenShift clusters, three-node clusters, and standard clusters.

Procedure
  1. Create a standalone PolicyGenTemplate custom resource (CR) that contains the source file validatorCRs/informDuValidator.yaml. You only need one standalone PolicyGenTemplate CR for each cluster type. For example, this CR applies a validator inform policy for single-node OpenShift clusters:

    Example single-node cluster validator inform policy CR (group-du-sno-validator-ranGen.yaml)
    apiVersion: ran.openshift.io/v1
    kind: PolicyGenTemplate
    metadata:
      name: "group-du-sno-validator" (1)
      namespace: "ztp-group" (2)
    spec:
      bindingRules:
        group-du-sno: "" (3)
      bindingExcludedRules:
        ztp-done: "" (4)
      mcp: "master" (5)
      sourceFiles:
        - fileName: validatorCRs/informDuValidator.yaml
          remediationAction: inform (6)
          policyName: "du-policy" (7)
    1 The name of PolicyGenTemplates object. This name is also used as part of the names for the placementBinding, placementRule, and policy that are created in the requested namespace.
    2 This value should match the namespace used in the group PolicyGenTemplates.
    3 The group-du-* label defined in bindingRules must exist in the SiteConfig files.
    4 The label defined in bindingExcludedRules must be`ztp-done:`. The ztp-done label is used in coordination with the Topology Aware Lifecycle Manager.
    5 mcp defines the MachineConfigPool object that is used in the source file validatorCRs/informDuValidator.yaml. It should be master for single node and three-node cluster deployments and worker for standard cluster deployments.
    6 Optional. The default value is inform.
    7 This value is used as part of the name for the generated RHACM policy. The generated validator policy for the single node example is group-du-sno-validator-du-policy.
  2. Commit the PolicyGenTemplate CR file in your Git repository and push the changes.

Additional resources

Configuring power states using PolicyGenTemplates CRs

For low latency and high-performance edge deployments, it is necessary to disable or limit C-states and P-states. With this configuration, the CPU runs at a constant frequency, which is typically the maximum turbo frequency. This ensures that the CPU is always running at its maximum speed, which results in high performance and low latency. This leads to the best latency for workloads. However, this also leads to the highest power consumption, which might not be necessary for all workloads.

Workloads can be classified as critical or non-critical, with critical workloads requiring disabled C-state and P-state settings for high performance and low latency, while non-critical workloads use C-state and P-state settings for power savings at the expense of some latency and performance. You can configure the following three power states using GitOps Zero Touch Provisioning (ZTP):

  • High-performance mode provides ultra low latency at the highest power consumption.

  • Performance mode provides low latency at a relatively high power consumption.

  • Power saving balances reduced power consumption with increased latency.

The default configuration is for a low latency, performance mode.

PolicyGenTemplate custom resources (CRs) allow you to overlay additional configuration details onto the base source CRs provided with the GitOps plugin in the ztp-site-generate container.

Configure the power states by updating the workloadHints fields in the generated PerformanceProfile CR for the reference configuration, based on the PolicyGenTemplate CR in the group-du-sno-ranGen.yaml.

The following common prerequisites apply to configuring all three power states.

Prerequisites
  • You have created a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for Argo CD.

  • You have followed the procedure described in "Preparing the GitOps ZTP site configuration repository".

Configuring performance mode using PolicyGenTemplate CRs

Follow this example to set performance mode by updating the workloadHints fields in the generated PerformanceProfile CR for the reference configuration, based on the PolicyGenTemplate CR in the group-du-sno-ranGen.yaml.

Performance mode provides low latency at a relatively high power consumption.

Prerequisites
  • You have configured the BIOS with performance related settings by following the guidance in "Configuring host firmware for low latency and high performance".

Procedure
  1. Update the PolicyGenTemplate entry for PerformanceProfile in the group-du-sno-ranGen.yaml reference file in out/argocd/example/policygentemplates as follows to set performance mode.

    - fileName: PerformanceProfile.yaml
      policyName: "config-policy"
      metadata:
        [...]
      spec:
        [...]
        workloadHints:
             realTime: true
             highPowerConsumption: false
             perPodPowerManagement: false
  2. Commit the PolicyGenTemplate change in Git, and then push to the Git repository being monitored by the GitOps ZTP Argo CD application.

Configuring high-performance mode using PolicyGenTemplate CRs

Follow this example to set high performance mode by updating the workloadHints fields in the generated PerformanceProfile CR for the reference configuration, based on the PolicyGenTemplate CR in the group-du-sno-ranGen.yaml.

High performance mode provides ultra low latency at the highest power consumption.

Prerequisites
  • You have configured the BIOS with performance related settings by following the guidance in "Configuring host firmware for low latency and high performance".

Procedure
  1. Update the PolicyGenTemplate entry for PerformanceProfile in the group-du-sno-ranGen.yaml reference file in out/argocd/example/policygentemplates as follows to set high-performance mode.

    - fileName: PerformanceProfile.yaml
      policyName: "config-policy"
      metadata:
        [...]
      spec:
        [...]
        workloadHints:
             realTime: true
             highPowerConsumption: true
             perPodPowerManagement: false
  2. Commit the PolicyGenTemplate change in Git, and then push to the Git repository being monitored by the GitOps ZTP Argo CD application.

Configuring power saving mode using PolicyGenTemplate CRs

Follow this example to set power saving mode by updating the workloadHints fields in the generated PerformanceProfile CR for the reference configuration, based on the PolicyGenTemplate CR in the group-du-sno-ranGen.yaml.

The power saving mode balances reduced power consumption with increased latency.

Prerequisites
  • You enabled C-states and OS-controlled P-states in the BIOS.

Procedure
  1. Update the PolicyGenTemplate entry for PerformanceProfile in the group-du-sno-ranGen.yaml reference file in out/argocd/example/policygentemplates as follows to configure power saving mode. It is recommended to configure the CPU governor for the power saving mode through the additional kernel arguments object.

    - fileName: PerformanceProfile.yaml
      policyName: "config-policy"
      metadata:
        [...]
      spec:
        [...]
        workloadHints:
             realTime: true
             highPowerConsumption: false
             perPodPowerManagement: true
        [...]
        additionalKernelArgs:
           - [...]
           - "cpufreq.default_governor=schedutil" (1)
    1 The schedutil governor is recommended, however, other governors that can be used include ondemand and powersave.
  2. Commit the PolicyGenTemplate change in Git, and then push to the Git repository being monitored by the GitOps ZTP Argo CD application.

Verification
  1. Select a worker node in your deployed cluster from the list of nodes identified by using the following command:

    $ oc get nodes
  2. Log in to the node by using the following command:

    $ oc debug node/<node-name>

    Replace <node-name> with the name of the node you want to verify the power state on.

  3. Set /host as the root directory within the debug shell. The debug pod mounts the host’s root file system in /host within the pod. By changing the root directory to /host, you can run binaries contained in the host’s executable paths as shown in the following example:

    # chroot /host
  4. Run the following command to verify the applied power state:

    # cat /proc/cmdline
Expected output
  • For power saving mode the intel_pstate=passive.

Maximizing power savings

Limiting the maximum CPU frequency is recommended to achieve maximum power savings. Enabling C-states on the non-critical workload CPUs without restricting the maximum CPU frequency negates much of the power savings by boosting the frequency of the critical CPUs.

Maximize power savings by updating the sysfs plugin fields, setting an appropriate value for max_perf_pct in the TunedPerformancePatch CR for the reference configuration. This example based on the group-du-sno-ranGen.yaml describes the procedure to follow to restrict the maximum CPU frequency.

Prerequisites
  • You have configured power savings mode as described in "Using PolicyGenTemplate CRs to configure power savings mode".

Procedure
  1. Update the PolicyGenTemplate entry for TunedPerformancePatch in the group-du-sno-ranGen.yaml reference file in out/argocd/example/policygentemplates. To maximize power savings, add max_perf_pct as shown in the following example:

    - fileName: TunedPerformancePatch.yaml
          policyName: "config-policy"
          spec:
            profile:
              - name: performance-patch
                data: |
                  [...]
                  [sysfs]
                  /sys/devices/system/cpu/intel_pstate/max_perf_pct=<x> (1)
    1 The max_perf_pct controls the maximum frequency the cpufreq driver is allowed to set as a percentage of the maximum supported CPU frequency. This value applies to all CPUs. You can check the maximum supported frequency in /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq. As a starting point, you can use a percentage that caps all CPUs at the All Cores Turbo frequency. The All Cores Turbo frequency is the frequency that all cores will run at when the cores are all fully occupied.

    To maximize power savings, set a lower value. Setting a lower value for max_perf_pct limits the maximum CPU frequency, thereby reducing power consumption, but also potentially impacting performance. Experiment with different values and monitor the system’s performance and power consumption to find the optimal setting for your use-case.

  2. Commit the PolicyGenTemplate change in Git, and then push to the Git repository being monitored by the GitOps ZTP Argo CD application.

Configuring LVM Storage using PolicyGenTemplate CRs

You can configure Logical Volume Manager (LVM) Storage for managed clusters that you deploy with GitOps Zero Touch Provisioning (ZTP).

You use LVM Storage to persist event subscriptions when you use PTP events or bare-metal hardware events with HTTP transport.

Use the Local Storage Operator for persistent storage that uses local volumes in distributed units.

Prerequisites
  • Install the OpenShift CLI (oc).

  • Log in as a user with cluster-admin privileges.

  • Create a Git repository where you manage your custom site configuration data.

Procedure
  1. To configure LVM Storage for new managed clusters, add the following YAML to spec.sourceFiles in the common-ranGen.yaml file:

    - fileName: StorageLVMOSubscriptionNS.yaml
      policyName: subscription-policies
    - fileName: StorageLVMOSubscriptionOperGroup.yaml
      policyName: subscription-policies
    - fileName: StorageLVMOSubscription.yaml
      spec:
        name: lvms-operator
        channel: stable-4.15
      policyName: subscription-policies

    The Storage LVMO subscription is deprecated. In future releases of OKD, the storage LVMO subscription will not be available. Instead, you must use the Storage LVMS subscription.

    In OKD 4.15, you can use the Storage LVMS subscription instead of the LVMO subscription. The LVMS subscription does not require manual overrides in the common-ranGen.yaml file. Add the following YAML to spec.sourceFiles in the common-ranGen.yaml file to use the Storage LVMS subscription:

    - fileName: StorageLVMSubscriptionNS.yaml
      policyName: subscription-policies
    - fileName: StorageLVMSubscriptionOperGroup.yaml
      policyName: subscription-policies
    - fileName: StorageLVMSubscription.yaml
      policyName: subscription-policies
  2. Add the LVMCluster CR to spec.sourceFiles in your specific group or individual site configuration file. For example, in the group-du-sno-ranGen.yaml file, add the following:

    - fileName: StorageLVMCluster.yaml
      policyName: "lvms-config" (1)
      spec:
        storage:
          deviceClasses:
          - name: vg1
            thinPoolConfig:
              name: thin-pool-1
              sizePercent: 90
              overprovisionRatio: 10
    1 This example configuration creates a volume group (vg1) with all the available devices, except the disk where OKD is installed. A thin-pool logical volume is also created.
  3. Merge any other required changes and files with your custom site repository.

  4. Commit the PolicyGenTemplate changes in Git, and then push the changes to your site configuration repository to deploy LVM Storage to new sites using GitOps ZTP.

Configuring PTP events with PolicyGenTemplate CRs

You can use the GitOps ZTP pipeline to configure PTP events that use HTTP or AMQP transport.

HTTP transport is the default transport for PTP and bare-metal events. Use HTTP transport instead of AMQP for PTP and bare-metal events where possible. AMQ Interconnect is EOL from 30 June 2024. Extended life cycle support (ELS) for AMQ Interconnect ends 29 November 2029. For more information see, Red Hat AMQ Interconnect support status.

Configuring PTP events that use HTTP transport

You can configure PTP events that use HTTP transport on managed clusters that you deploy with the GitOps Zero Touch Provisioning (ZTP) pipeline.

Prerequisites
  • You have installed the OpenShift CLI (oc).

  • You have logged in as a user with cluster-admin privileges.

  • You have created a Git repository where you manage your custom site configuration data.

Procedure
  1. Apply the following PolicyGenTemplate changes to group-du-3node-ranGen.yaml, group-du-sno-ranGen.yaml, or group-du-standard-ranGen.yaml files according to your requirements:

    1. In .sourceFiles, add the PtpOperatorConfig CR file that configures the transport host:

      - fileName: PtpOperatorConfigForEvent.yaml
        policyName: "config-policy"
        spec:
          daemonNodeSelector: {}
          ptpEventConfig:
            enableEventPublisher: true
            transportHost: http://ptp-event-publisher-service-NODE_NAME.openshift-ptp.svc.cluster.local:9043

      In OKD 4.13 or later, you do not need to set the transportHost field in the PtpOperatorConfig resource when you use HTTP transport with PTP events.

    2. Configure the linuxptp and phc2sys for the PTP clock type and interface. For example, add the following stanza into .sourceFiles:

      - fileName: PtpConfigSlave.yaml (1)
        policyName: "config-policy"
        metadata:
          name: "du-ptp-slave"
        spec:
          profile:
          - name: "slave"
            interface: "ens5f1" (2)
            ptp4lOpts: "-2 -s --summary_interval -4" (3)
            phc2sysOpts: "-a -r -m -n 24 -N 8 -R 16" (4)
          ptpClockThreshold: (5)
            holdOverTimeout: 30 #secs
            maxOffsetThreshold: 100  #nano secs
            minOffsetThreshold: -100 #nano secs
      1 Can be one of PtpConfigMaster.yaml, PtpConfigSlave.yaml, or PtpConfigSlaveCvl.yaml depending on your requirements. PtpConfigSlaveCvl.yaml configures linuxptp services for an Intel E810 Columbiaville NIC. For configurations based on group-du-sno-ranGen.yaml or group-du-3node-ranGen.yaml, use PtpConfigSlave.yaml.
      2 Device specific interface name.
      3 You must append the --summary_interval -4 value to ptp4lOpts in .spec.sourceFiles.spec.profile to enable PTP fast events.
      4 Required phc2sysOpts values. -m prints messages to stdout. The linuxptp-daemon DaemonSet parses the logs and generates Prometheus metrics.
      5 Optional. If the ptpClockThreshold stanza is not present, default values are used for the ptpClockThreshold fields. The stanza shows default ptpClockThreshold values. The ptpClockThreshold values configure how long after the PTP master clock is disconnected before PTP events are triggered. holdOverTimeout is the time value in seconds before the PTP clock event state changes to FREERUN when the PTP master clock is disconnected. The maxOffsetThreshold and minOffsetThreshold settings configure offset values in nanoseconds that compare against the values for CLOCK_REALTIME (phc2sys) or master offset (ptp4l). When the ptp4l or phc2sys offset value is outside this range, the PTP clock state is set to FREERUN. When the offset value is within this range, the PTP clock state is set to LOCKED.
  2. Merge any other required changes and files with your custom site repository.

  3. Push the changes to your site configuration repository to deploy PTP fast events to new sites using GitOps ZTP.

Configuring PTP events that use AMQP transport

You can configure PTP events that use AMQP transport on managed clusters that you deploy with the GitOps Zero Touch Provisioning (ZTP) pipeline.

HTTP transport is the default transport for PTP and bare-metal events. Use HTTP transport instead of AMQP for PTP and bare-metal events where possible. AMQ Interconnect is EOL from 30 June 2024. Extended life cycle support (ELS) for AMQ Interconnect ends 29 November 2029. For more information see, Red Hat AMQ Interconnect support status.

Prerequisites
  • You have installed the OpenShift CLI (oc).

  • You have logged in as a user with cluster-admin privileges.

  • You have created a Git repository where you manage your custom site configuration data.

Procedure
  1. Add the following YAML into .spec.sourceFiles in the common-ranGen.yaml file to configure the AMQP Operator:

    #AMQ interconnect operator for fast events
    - fileName: AmqSubscriptionNS.yaml
      policyName: "subscriptions-policy"
    - fileName: AmqSubscriptionOperGroup.yaml
      policyName: "subscriptions-policy"
    - fileName: AmqSubscription.yaml
      policyName: "subscriptions-policy"
  2. Apply the following PolicyGenTemplate changes to group-du-3node-ranGen.yaml, group-du-sno-ranGen.yaml, or group-du-standard-ranGen.yaml files according to your requirements:

    1. In .sourceFiles, add the PtpOperatorConfig CR file that configures the AMQ transport host to the config-policy:

      - fileName: PtpOperatorConfigForEvent.yaml
        policyName: "config-policy"
        spec:
          daemonNodeSelector: {}
          ptpEventConfig:
            enableEventPublisher: true
            transportHost: "amqp://amq-router.amq-router.svc.cluster.local"
    2. Configure the linuxptp and phc2sys for the PTP clock type and interface. For example, add the following stanza into .sourceFiles:

      - fileName: PtpConfigSlave.yaml (1)
        policyName: "config-policy"
        metadata:
          name: "du-ptp-slave"
        spec:
          profile:
          - name: "slave"
            interface: "ens5f1" (2)
            ptp4lOpts: "-2 -s --summary_interval -4" (3)
            phc2sysOpts: "-a -r -m -n 24 -N 8 -R 16" (4)
          ptpClockThreshold: (5)
            holdOverTimeout: 30 #secs
            maxOffsetThreshold: 100  #nano secs
            minOffsetThreshold: -100 #nano secs
      1 Can be one PtpConfigMaster.yaml, PtpConfigSlave.yaml, or PtpConfigSlaveCvl.yaml depending on your requirements. PtpConfigSlaveCvl.yaml configures linuxptp services for an Intel E810 Columbiaville NIC. For configurations based on group-du-sno-ranGen.yaml or group-du-3node-ranGen.yaml, use PtpConfigSlave.yaml.
      2 Device specific interface name.
      3 You must append the --summary_interval -4 value to ptp4lOpts in .spec.sourceFiles.spec.profile to enable PTP fast events.
      4 Required phc2sysOpts values. -m prints messages to stdout. The linuxptp-daemon DaemonSet parses the logs and generates Prometheus metrics.
      5 Optional. If the ptpClockThreshold stanza is not present, default values are used for the ptpClockThreshold fields. The stanza shows default ptpClockThreshold values. The ptpClockThreshold values configure how long after the PTP master clock is disconnected before PTP events are triggered. holdOverTimeout is the time value in seconds before the PTP clock event state changes to FREERUN when the PTP master clock is disconnected. The maxOffsetThreshold and minOffsetThreshold settings configure offset values in nanoseconds that compare against the values for CLOCK_REALTIME (phc2sys) or master offset (ptp4l). When the ptp4l or phc2sys offset value is outside this range, the PTP clock state is set to FREERUN. When the offset value is within this range, the PTP clock state is set to LOCKED.
  3. Apply the following PolicyGenTemplate changes to your specific site YAML files, for example, example-sno-site.yaml:

    1. In .sourceFiles, add the Interconnect CR file that configures the AMQ router to the config-policy:

      - fileName: AmqInstance.yaml
        policyName: "config-policy"
  4. Merge any other required changes and files with your custom site repository.

  5. Push the changes to your site configuration repository to deploy PTP fast events to new sites using GitOps ZTP.

Additional resources

Configuring bare-metal events with PolicyGenTemplate CRs

You can use the GitOps ZTP pipeline to configure bare-metal events that use HTTP or AMQP transport.

HTTP transport is the default transport for PTP and bare-metal events. Use HTTP transport instead of AMQP for PTP and bare-metal events where possible. AMQ Interconnect is EOL from 30 June 2024. Extended life cycle support (ELS) for AMQ Interconnect ends 29 November 2029. For more information see, Red Hat AMQ Interconnect support status.

Configuring bare-metal events that use HTTP transport

You can configure bare-metal events that use HTTP transport on managed clusters that you deploy with the GitOps Zero Touch Provisioning (ZTP) pipeline.

Prerequisites
  • You have installed the OpenShift CLI (oc).

  • You have logged in as a user with cluster-admin privileges.

  • You have created a Git repository where you manage your custom site configuration data.

Procedure
  1. Configure the Bare Metal Event Relay Operator by adding the following YAML to spec.sourceFiles in the common-ranGen.yaml file:

    # Bare Metal Event Relay operator
    - fileName: BareMetalEventRelaySubscriptionNS.yaml
      policyName: "subscriptions-policy"
    - fileName: BareMetalEventRelaySubscriptionOperGroup.yaml
      policyName: "subscriptions-policy"
    - fileName: BareMetalEventRelaySubscription.yaml
      policyName: "subscriptions-policy"
  2. Add the HardwareEvent CR to spec.sourceFiles in your specific group configuration file, for example, in the group-du-sno-ranGen.yaml file:

    - fileName: HardwareEvent.yaml (1)
      policyName: "config-policy"
      spec:
        nodeSelector: {}
        transportHost: "http://hw-event-publisher-service.openshift-bare-metal-events.svc.cluster.local:9043"
        logLevel: "info"
    1 Each baseboard management controller (BMC) requires a single HardwareEvent CR only.

    In OKD 4.13 or later, you do not need to set the transportHost field in the HardwareEvent custom resource (CR) when you use HTTP transport with bare-metal events.

  3. Merge any other required changes and files with your custom site repository.

  4. Push the changes to your site configuration repository to deploy bare-metal events to new sites with GitOps ZTP.

  5. Create the Redfish Secret by running the following command:

    $ oc -n openshift-bare-metal-events create secret generic redfish-basic-auth \
    --from-literal=username=<bmc_username> --from-literal=password=<bmc_password> \
    --from-literal=hostaddr="<bmc_host_ip_addr>"

Configuring bare-metal events that use AMQP transport

You can configure bare-metal events that use AMQP transport on managed clusters that you deploy with the GitOps Zero Touch Provisioning (ZTP) pipeline.

HTTP transport is the default transport for PTP and bare-metal events. Use HTTP transport instead of AMQP for PTP and bare-metal events where possible. AMQ Interconnect is EOL from 30 June 2024. Extended life cycle support (ELS) for AMQ Interconnect ends 29 November 2029. For more information see, Red Hat AMQ Interconnect support status.

Prerequisites
  • You have installed the OpenShift CLI (oc).

  • You have logged in as a user with cluster-admin privileges.

  • You have created a Git repository where you manage your custom site configuration data.

Procedure
  1. To configure the AMQ Interconnect Operator and the Bare Metal Event Relay Operator, add the following YAML to spec.sourceFiles in the common-ranGen.yaml file:

    # AMQ interconnect operator for fast events
    - fileName: AmqSubscriptionNS.yaml
      policyName: "subscriptions-policy"
    - fileName: AmqSubscriptionOperGroup.yaml
      policyName: "subscriptions-policy"
    - fileName: AmqSubscription.yaml
      policyName: "subscriptions-policy"
    # Bare Metal Event Rely operator
    - fileName: BareMetalEventRelaySubscriptionNS.yaml
      policyName: "subscriptions-policy"
    - fileName: BareMetalEventRelaySubscriptionOperGroup.yaml
      policyName: "subscriptions-policy"
    - fileName: BareMetalEventRelaySubscription.yaml
      policyName: "subscriptions-policy"
  2. Add the Interconnect CR to .spec.sourceFiles in the site configuration file, for example, the example-sno-site.yaml file:

    - fileName: AmqInstance.yaml
      policyName: "config-policy"
  3. Add the HardwareEvent CR to spec.sourceFiles in your specific group configuration file, for example, in the group-du-sno-ranGen.yaml file:

    - fileName: HardwareEvent.yaml
      policyName: "config-policy"
      spec:
        nodeSelector: {}
        transportHost: "amqp://<amq_interconnect_name>.<amq_interconnect_namespace>.svc.cluster.local" (1)
        logLevel: "info"
    1 The transportHost URL is composed of the existing AMQ Interconnect CR name and namespace. For example, in transportHost: "amqp://amq-router.amq-router.svc.cluster.local", the AMQ Interconnect name and namespace are both set to amq-router.

    Each baseboard management controller (BMC) requires a single HardwareEvent resource only.

  4. Commit the PolicyGenTemplate change in Git, and then push the changes to your site configuration repository to deploy bare-metal events monitoring to new sites using GitOps ZTP.

  5. Create the Redfish Secret by running the following command:

    $ oc -n openshift-bare-metal-events create secret generic redfish-basic-auth \
    --from-literal=username=<bmc_username> --from-literal=password=<bmc_password> \
    --from-literal=hostaddr="<bmc_host_ip_addr>"

Configuring the Image Registry Operator for local caching of images

OKD manages image caching using a local registry. In edge computing use cases, clusters are often subject to bandwidth restrictions when communicating with centralized image registries, which might result in long image download times.

Long download times are unavoidable during initial deployment. Over time, there is a risk that CRI-O will erase the /var/lib/containers/storage directory in the case of an unexpected shutdown. To address long image download times, you can create a local image registry on remote managed clusters using GitOps Zero Touch Provisioning (ZTP). This is useful in Edge computing scenarios where clusters are deployed at the far edge of the network.

Before you can set up the local image registry with GitOps ZTP, you need to configure disk partitioning in the SiteConfig CR that you use to install the remote managed cluster. After installation, you configure the local image registry using a PolicyGenTemplate CR. Then, the GitOps ZTP pipeline creates Persistent Volume (PV) and Persistent Volume Claim (PVC) CRs and patches the imageregistry configuration.

The local image registry can only be used for user application images and cannot be used for the OKD or Operator Lifecycle Manager operator images.

Additional resources

Configuring disk partitioning with SiteConfig

Configure disk partitioning for a managed cluster using a SiteConfig CR and GitOps Zero Touch Provisioning (ZTP). The disk partition details in the SiteConfig CR must match the underlying disk.

You must complete this procedure at installation time.

Prerequisites
  • Install Butane.

Procedure
  1. Create the storage.bu file by using the following example YAML file:

    variant: fcos
    version: 1.3.0
    storage:
      disks:
      - device: /dev/disk/by-path/pci-0000:01:00.0-scsi-0:2:0:0 (1)
        wipe_table: false
        partitions:
        - label: var-lib-containers
          start_mib: <start_of_partition> (2)
          size_mib: <partition_size> (3)
      filesystems:
        - path: /var/lib/containers
          device: /dev/disk/by-partlabel/var-lib-containers
          format: xfs
          wipe_filesystem: true
          with_mount_unit: true
          mount_options:
            - defaults
            - prjquota
    1 Specify the root disk.
    2 Specify the start of the partition in MiB. If the value is too small, the installation fails.
    3 Specify the size of the partition. If the value is too small, the deployments fails.
  2. Convert the storage.bu to an Ignition file by running the following command:

    $ butane storage.bu
    Example output
    {"ignition":{"version":"3.2.0"},"storage":{"disks":[{"device":"/dev/disk/by-path/pci-0000:01:00.0-scsi-0:2:0:0","partitions":[{"label":"var-lib-containers","sizeMiB":0,"startMiB":250000}],"wipeTable":false}],"filesystems":[{"device":"/dev/disk/by-partlabel/var-lib-containers","format":"xfs","mountOptions":["defaults","prjquota"],"path":"/var/lib/containers","wipeFilesystem":true}]},"systemd":{"units":[{"contents":"# # Generated by Butane\n[Unit]\nRequires=systemd-fsck@dev-disk-by\\x2dpartlabel-var\\x2dlib\\x2dcontainers.service\nAfter=systemd-fsck@dev-disk-by\\x2dpartlabel-var\\x2dlib\\x2dcontainers.service\n\n[Mount]\nWhere=/var/lib/containers\nWhat=/dev/disk/by-partlabel/var-lib-containers\nType=xfs\nOptions=defaults,prjquota\n\n[Install]\nRequiredBy=local-fs.target","enabled":true,"name":"var-lib-containers.mount"}]}}
  3. Use a tool such as JSON Pretty Print to convert the output into JSON format.

  4. Copy the output into the .spec.clusters.nodes.ignitionConfigOverride field in the SiteConfig CR.

    Example
    [...]
    spec:
      clusters:
        - nodes:
            - ignitionConfigOverride: |
              {
                "ignition": {
                  "version": "3.2.0"
                },
                "storage": {
                  "disks": [
                    {
                      "device": "/dev/disk/by-path/pci-0000:01:00.0-scsi-0:2:0:0",
                      "partitions": [
                        {
                          "label": "var-lib-containers",
                          "sizeMiB": 0,
                          "startMiB": 250000
                        }
                      ],
                      "wipeTable": false
                    }
                  ],
                  "filesystems": [
                    {
                      "device": "/dev/disk/by-partlabel/var-lib-containers",
                      "format": "xfs",
                      "mountOptions": [
                        "defaults",
                        "prjquota"
                      ],
                      "path": "/var/lib/containers",
                      "wipeFilesystem": true
                    }
                  ]
                },
                "systemd": {
                  "units": [
                    {
                      "contents": "# # Generated by Butane\n[Unit]\nRequires=systemd-fsck@dev-disk-by\\x2dpartlabel-var\\x2dlib\\x2dcontainers.service\nAfter=systemd-fsck@dev-disk-by\\x2dpartlabel-var\\x2dlib\\x2dcontainers.service\n\n[Mount]\nWhere=/var/lib/containers\nWhat=/dev/disk/by-partlabel/var-lib-containers\nType=xfs\nOptions=defaults,prjquota\n\n[Install]\nRequiredBy=local-fs.target",
                      "enabled": true,
                      "name": "var-lib-containers.mount"
                    }
                  ]
                }
              }
    [...]

    If the .spec.clusters.nodes.ignitionConfigOverride field does not exist, create it.

Verification
  1. During or after installation, verify on the hub cluster that the BareMetalHost object shows the annotation by running the following command:

    $ oc get bmh -n my-sno-ns my-sno -ojson | jq '.metadata.annotations["bmac.agent-install.openshift.io/ignition-config-overrides"]
    Example output
    "{\"ignition\":{\"version\":\"3.2.0\"},\"storage\":{\"disks\":[{\"device\":\"/dev/disk/by-id/wwn-0x6b07b250ebb9d0002a33509f24af1f62\",\"partitions\":[{\"label\":\"var-lib-containers\",\"sizeMiB\":0,\"startMiB\":250000}],\"wipeTable\":false}],\"filesystems\":[{\"device\":\"/dev/disk/by-partlabel/var-lib-containers\",\"format\":\"xfs\",\"mountOptions\":[\"defaults\",\"prjquota\"],\"path\":\"/var/lib/containers\",\"wipeFilesystem\":true}]},\"systemd\":{\"units\":[{\"contents\":\"# Generated by Butane\\n[Unit]\\nRequires=systemd-fsck@dev-disk-by\\\\x2dpartlabel-var\\\\x2dlib\\\\x2dcontainers.service\\nAfter=systemd-fsck@dev-disk-by\\\\x2dpartlabel-var\\\\x2dlib\\\\x2dcontainers.service\\n\\n[Mount]\\nWhere=/var/lib/containers\\nWhat=/dev/disk/by-partlabel/var-lib-containers\\nType=xfs\\nOptions=defaults,prjquota\\n\\n[Install]\\nRequiredBy=local-fs.target\",\"enabled\":true,\"name\":\"var-lib-containers.mount\"}]}}"
  2. After installation, check the single-node OpenShift disk status.

    1. Enter into a debug session on the single-node OpenShift node by running the following command.

      This step instantiates a debug pod called <node_name>-debug:

      $ oc debug node/my-sno-node
      1. Set /host as the root directory within the debug shell by running the following command.

        The debug pod mounts the host’s root file system in /host within the pod. By changing the root directory to /host, you can run binaries contained in the host’s executable paths:

        # chroot /host
      2. List information about all available block devices by running the following command:

        # lsblk
        Example output
        NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
        sda      8:0    0 446.6G  0 disk
        ├─sda1   8:1    0     1M  0 part
        ├─sda2   8:2    0   127M  0 part
        ├─sda3   8:3    0   384M  0 part /boot
        ├─sda4   8:4    0 243.6G  0 part /var
        │                                /sysroot/ostree/deploy/rhcos/var
        │                                /usr
        │                                /etc
        │                                /
        │                                /sysroot
        └─sda5   8:5    0 202.5G  0 part /var/lib/containers
      3. Display information about the file system disk space usage by running the following command:

        # df -h
        Example output
        Filesystem      Size  Used Avail Use% Mounted on
        devtmpfs        4.0M     0  4.0M   0% /dev
        tmpfs           126G   84K  126G   1% /dev/shm
        tmpfs            51G   93M   51G   1% /run
        /dev/sda4       244G  5.2G  239G   3% /sysroot
        tmpfs           126G  4.0K  126G   1% /tmp
        /dev/sda5       203G  119G   85G  59% /var/lib/containers
        /dev/sda3       350M  110M  218M  34% /boot
        tmpfs            26G     0   26G   0% /run/user/1000

Configuring the image registry using PolicyGenTemplate CRs

Use PolicyGenTemplate (PGT) CRs to apply the CRs required to configure the image registry and patch the imageregistry configuration.

Prerequisites
  • You have configured a disk partition in the managed cluster.

  • You have installed the OpenShift CLI (oc).

  • You have logged in to the hub cluster as a user with cluster-admin privileges.

  • You have created a Git repository where you manage your custom site configuration data for use with GitOps Zero Touch Provisioning (ZTP).

Procedure
  1. Configure the storage class, persistent volume claim, persistent volume, and image registry configuration in the appropriate PolicyGenTemplate CR. For example, to configure an individual site, add the following YAML to the file example-sno-site.yaml:

    sourceFiles:
      # storage class
      - fileName: StorageClass.yaml
        policyName: "sc-for-image-registry"
        metadata:
          name: image-registry-sc
          annotations:
            ran.openshift.io/ztp-deploy-wave: "100" (1)
      # persistent volume claim
      - fileName: StoragePVC.yaml
        policyName: "pvc-for-image-registry"
        metadata:
          name: image-registry-pvc
          namespace: openshift-image-registry
          annotations:
            ran.openshift.io/ztp-deploy-wave: "100"
        spec:
          accessModes:
            - ReadWriteMany
          resources:
            requests:
              storage: 100Gi
          storageClassName: image-registry-sc
          volumeMode: Filesystem
      # persistent volume
      - fileName: ImageRegistryPV.yaml (2)
        policyName: "pv-for-image-registry"
        metadata:
          annotations:
            ran.openshift.io/ztp-deploy-wave: "100"
      - fileName: ImageRegistryConfig.yaml
        policyName: "config-for-image-registry"
        complianceType: musthave
        metadata:
          annotations:
            ran.openshift.io/ztp-deploy-wave: "100"
        spec:
          storage:
            pvc:
              claim: "image-registry-pvc"
    1 Set the appropriate value for ztp-deploy-wave depending on whether you are configuring image registries at the site, common, or group level. ztp-deploy-wave: "100" is suitable for development or testing because it allows you to group the referenced source files together.
    2 In ImageRegistryPV.yaml, ensure that the spec.local.path field is set to /var/imageregistry to match the value set for the mount_point field in the SiteConfig CR.

    Do not set complianceType: mustonlyhave for the - fileName: ImageRegistryConfig.yaml configuration. This can cause the registry pod deployment to fail.

  2. Commit the PolicyGenTemplate change in Git, and then push to the Git repository being monitored by the GitOps ZTP ArgoCD application.

Verification

Use the following steps to troubleshoot errors with the local image registry on the managed clusters:

  • Verify successful login to the registry while logged in to the managed cluster. Run the following commands:

    1. Export the managed cluster name:

      $ cluster=<managed_cluster_name>
    2. Get the managed cluster kubeconfig details:

      $ oc get secret -n $cluster $cluster-admin-password -o jsonpath='{.data.password}' | base64 -d > kubeadmin-password-$cluster
    3. Download and export the cluster kubeconfig:

      $ oc get secret -n $cluster $cluster-admin-kubeconfig -o jsonpath='{.data.kubeconfig}' | base64 -d > kubeconfig-$cluster && export KUBECONFIG=./kubeconfig-$cluster
    4. Verify access to the image registry from the managed cluster. See "Accessing the registry".

  • Check that the Config CRD in the imageregistry.operator.openshift.io group instance is not reporting errors. Run the following command while logged in to the managed cluster:

    $ oc get image.config.openshift.io cluster -o yaml
    Example output
    apiVersion: config.openshift.io/v1
    kind: Image
    metadata:
      annotations:
        include.release.openshift.io/ibm-cloud-managed: "true"
        include.release.openshift.io/self-managed-high-availability: "true"
        include.release.openshift.io/single-node-developer: "true"
        release.openshift.io/create-only: "true"
      creationTimestamp: "2021-10-08T19:02:39Z"
      generation: 5
      name: cluster
      resourceVersion: "688678648"
      uid: 0406521b-39c0-4cda-ba75-873697da75a4
    spec:
      additionalTrustedCA:
        name: acm-ice
  • Check that the PersistentVolumeClaim on the managed cluster is populated with data. Run the following command while logged in to the managed cluster:

    $ oc get pv image-registry-sc
  • Check that the registry* pod is running and is located under the openshift-image-registry namespace.

    $ oc get pods -n openshift-image-registry | grep registry*
    Example output
    cluster-image-registry-operator-68f5c9c589-42cfg   1/1     Running     0          8d
    image-registry-5f8987879-6nx6h                     1/1     Running     0          8d
  • Check that the disk partition on the managed cluster is correct:

    1. Open a debug shell to the managed cluster:

      $ oc debug node/sno-1.example.com
    2. Run lsblk to check the host disk partitions:

      sh-4.4# lsblk
      NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
      sda      8:0    0 446.6G  0 disk
        |-sda1   8:1    0     1M  0 part
        |-sda2   8:2    0   127M  0 part
        |-sda3   8:3    0   384M  0 part /boot
        |-sda4   8:4    0 336.3G  0 part /sysroot
        `-sda5   8:5    0 100.1G  0 part /var/imageregistry (1)
      sdb      8:16   0 446.6G  0 disk
      sr0     11:0    1   104M  0 rom
      1 /var/imageregistry indicates that the disk is correctly partitioned.
Additional resources

Using hub templates in PolicyGenTemplate CRs

Topology Aware Lifecycle Manager supports partial Red Hat Advanced Cluster Management (RHACM) hub cluster template functions in configuration policies used with GitOps Zero Touch Provisioning (ZTP).

Hub-side cluster templates allow you to define configuration policies that can be dynamically customized to the target clusters. This reduces the need to create separate policies for many clusters with similiar configurations but with different values.

Policy templates are restricted to the same namespace as the namespace where the policy is defined. This means that you must create the objects referenced in the hub template in the same namespace where the policy is created.

The following supported hub template functions are available for use in GitOps ZTP with TALM:

  • fromConfigmap returns the value of the provided data key in the named ConfigMap resource.

    There is a 1 MiB size limit for ConfigMap CRs. The effective size for ConfigMap CRs is further limited by the last-applied-configuration annotation. To avoid the last-applied-configuration limitation, add the following annotation to the template ConfigMap:

    argocd.argoproj.io/sync-options: Replace=true
  • base64enc returns the base64-encoded value of the input string

  • base64dec returns the decoded value of the base64-encoded input string

  • indent returns the input string with added indent spaces

  • autoindent returns the input string with added indent spaces based on the spacing used in the parent template

  • toInt casts and returns the integer value of the input value

  • toBool converts the input string into a boolean value, and returns the boolean

Various Open source community functions are also available for use with GitOps ZTP.

Example hub templates

The following code examples are valid hub templates. Each of these templates return values from the ConfigMap CR with the name test-config in the default namespace.

  • Returns the value with the key common-key:

    {{hub fromConfigMap "default" "test-config" "common-key" hub}}
  • Returns a string by using the concatenated value of the .ManagedClusterName field and the string -name:

    {{hub fromConfigMap "default" "test-config" (printf "%s-name" .ManagedClusterName) hub}}
  • Casts and returns a boolean value from the concatenated value of the .ManagedClusterName field and the string -name:

    {{hub fromConfigMap "default" "test-config" (printf "%s-name" .ManagedClusterName) | toBool hub}}
  • Casts and returns an integer value from the concatenated value of the .ManagedClusterName field and the string -name:

    {{hub (printf "%s-name" .ManagedClusterName) | fromConfigMap "default" "test-config" | toInt hub}}

Specifying group and site configuration in group PolicyGenTemplate CRs with hub templates

You can manage the configuration of fleets of clusters with ConfigMap CRs by using hub templates to populate the group and site values in the generated policies that get applied to the managed clusters. Using hub templates in site PolicyGenTemplate (PGT) CRs means that you do not need to create a PolicyGenTemplate CR for each site.

You can group the clusters in a fleet in various categories, depending on the use case, for example hardware type or region. Each cluster should have a label corresponding to the group or groups that the cluster is in. If you manage the configuration values for each group in different ConfigMap CRs, then you require only one group PolicyGenTemplate CR to apply the changes to all the clusters in the group by using hub templates.

The following example shows you how to use three ConfigMap CRs and one group PolicyGenTemplate CR to apply both site and group configuration to clusters grouped by hardware type and region.

When you use the fromConfigmap function, the printf variable is only available for the template resource data key fields. You cannot use it with name and namespace fields.

Prerequisites
  • You have installed the OpenShift CLI (oc).

  • You have logged in to the hub cluster as a user with cluster-admin privileges.

  • You have created a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for the GitOps ZTP ArgoCD application.

Procedure
  1. Create three ConfigMap CRs that contain the group and site configuration:

    1. Create a ConfigMap CR named group-hardware-types-configmap to hold the hardware-specific configuration. For example:

      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: group-hardware-types-configmap
        namespace: ztp-group
        annotations:
          argocd.argoproj.io/sync-options: Replace=true (1)
      data:
        # SriovNetworkNodePolicy.yaml
        hardware-type-1-sriov-node-policy-pfNames-1: "[\"ens5f0\"]"
        hardware-type-1-sriov-node-policy-pfNames-2: "[\"ens7f0\"]"
        # PerformanceProfile.yaml
        hardware-type-1-cpu-isolated: "2-31,34-63"
        hardware-type-1-cpu-reserved: "0-1,32-33"
        hardware-type-1-hugepages-default: "1G"
        hardware-type-1-hugepages-size: "1G"
        hardware-type-1-hugepages-count: "32"
      1 The argocd.argoproj.io/sync-options annotation is required only if the ConfigMap is larger than 1 MiB in size.
    2. Create a ConfigMap CR named group-zones-configmap to hold the regional configuration. For example:

      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: group-zones-configmap
        namespace: ztp-group
      data:
        # ClusterLogForwarder.yaml
        zone-1-cluster-log-fwd-outputs: "[{\"type\":\"kafka\", \"name\":\"kafka-open\", \"url\":\"tcp://10.46.55.190:9092/test\"}]"
        zone-1-cluster-log-fwd-pipelines: "[{\"inputRefs\":[\"audit\", \"infrastructure\"], \"labels\": {\"label1\": \"test1\", \"label2\": \"test2\", \"label3\": \"test3\", \"label4\": \"test4\"}, \"name\": \"all-to-default\", \"outputRefs\": [\"kafka-open\"]}]"
    3. Create a ConfigMap CR named site-data-configmap to hold the site-specific configuration. For example:

      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: site-data-configmap
        namespace: ztp-group
      data:
        # SriovNetwork.yaml
        du-sno-1-zone-1-sriov-network-vlan-1: "140"
        du-sno-1-zone-1-sriov-network-vlan-2: "150"

    Each ConfigMap CR must be in the same namespace as the policy to be generated from the group PolicyGenTemplate CR.

  2. Commit the ConfigMap CRs in Git, and then push to the Git repository being monitored by the Argo CD application.

  3. Apply the hardware type and region labels to the clusters. The following command applies to a single cluster named du-sno-1-zone-1 and the labels chosen are "hardware-type": "hardware-type-1" and "group-du-sno-zone": "zone-1":

    $ oc patch managedclusters.cluster.open-cluster-management.io/du-sno-1-zone-1 --type merge -p '{"metadata":{"labels":{"hardware-type": "hardware-type-1", "group-du-sno-zone": "zone-1"}}}'
  4. Create a group PolicyGenTemplate CR that uses hub templates to obtain the required data from the ConfigMap objects. This example PolicyGenTemplate CR configures logging, VLAN IDs, NICs and Performance Profile for the clusters that match the labels listed under spec.bindingRules:

    apiVersion: ran.openshift.io/v1
    kind: PolicyGenTemplate
    metadata:
      name: group-du-sno-pgt
      namespace: ztp-group
    spec:
      bindingRules:
        # These policies will correspond to all clusters with these labels
        group-du-sno-zone: "zone-1"
        hardware-type: "hardware-type-1"
      mcp: "master"
      sourceFiles:
        - fileName: ClusterLogForwarder.yaml # wave 10
          policyName: "group-du-sno-cfg-policy"
          spec:
            outputs: '{{hub fromConfigMap "" "group-zones-configmap" (printf "%s-cluster-log-fwd-outputs" (index .ManagedClusterLabels "group-du-sno-zone")) | toLiteral hub}}'
            pipelines: '{{hub fromConfigMap "" "group-zones-configmap" (printf "%s-cluster-log-fwd-pipelines" (index .ManagedClusterLabels "group-du-sno-zone")) | toLiteral hub}}'
    
        - fileName: PerformanceProfile.yaml # wave 10
          policyName: "group-du-sno-cfg-policy"
          metadata:
            name: openshift-node-performance-profile
          spec:
            additionalKernelArgs:
            - rcupdate.rcu_normal_after_boot=0
            - vfio_pci.enable_sriov=1
            - vfio_pci.disable_idle_d3=1
            - efi=runtime
            cpu:
              isolated: '{{hub fromConfigMap "" "group-hardware-types-configmap" (printf "%s-cpu-isolated" (index .ManagedClusterLabels "hardware-type")) hub}}'
              reserved: '{{hub fromConfigMap "" "group-hardware-types-configmap" (printf "%s-cpu-reserved" (index .ManagedClusterLabels "hardware-type")) hub}}'
            hugepages:
              defaultHugepagesSize: '{{hub fromConfigMap "" "group-hardware-types-configmap" (printf "%s-hugepages-default" (index .ManagedClusterLabels "hardware-type")) hub}}'
              pages:
                - size: '{{hub fromConfigMap "" "group-hardware-types-configmap" (printf "%s-hugepages-size" (index .ManagedClusterLabels "hardware-type")) hub}}'
                  count: '{{hub fromConfigMap "" "group-hardware-types-configmap" (printf "%s-hugepages-count" (index .ManagedClusterLabels "hardware-type")) | toInt hub}}'
            realTimeKernel:
              enabled: true
    
        - fileName: SriovNetwork.yaml # wave 100
          policyName: "group-du-sno-sriov-policy"
          metadata:
            name: sriov-nw-du-fh
          spec:
            resourceName: du_fh
            vlan: '{{hub fromConfigMap "" "site-data-configmap" (printf "%s-sriov-network-vlan-1" .ManagedClusterName) | toInt hub}}'
    
        - fileName: SriovNetworkNodePolicy.yaml # wave 100
          policyName: "group-du-sno-sriov-policy"
          metadata:
            name: sriov-nnp-du-fh
          spec:
            deviceType: netdevice
            isRdma: false
            nicSelector:
              pfNames: '{{hub fromConfigMap "" "group-hardware-types-configmap" (printf "%s-sriov-node-policy-pfNames-1" (index .ManagedClusterLabels "hardware-type")) | toLiteral hub}}'
            numVfs: 8
            priority: 10
            resourceName: du_fh
    
        - fileName: SriovNetwork.yaml # wave 100
          policyName: "group-du-sno-sriov-policy"
          metadata:
            name: sriov-nw-du-mh
          spec:
            resourceName: du_mh
            vlan: '{{hub fromConfigMap "" "site-data-configmap" (printf "%s-sriov-network-vlan-2" .ManagedClusterName) | toInt hub}}'
    
        - fileName: SriovNetworkNodePolicy.yaml # wave 100
          policyName: "group-du-sno-sriov-policy"
          metadata:
            name: sriov-nw-du-fh
          spec:
            deviceType: netdevice
            isRdma: false
            nicSelector:
              pfNames: '{{hub fromConfigMap "" "group-hardware-types-configmap" (printf "%s-sriov-node-policy-pfNames-2" (index .ManagedClusterLabels "hardware-type")) | toLiteral hub}}'
            numVfs: 8
            priority: 10
            resourceName: du_fh

    To retrieve site-specific configuration values, use the .ManagedClusterName field. This is a template context value set to the name of the target managed cluster.

    To retrieve group-specific configuration, use the .ManagedClusterLabels field. This is a template context value set to the value of the managed cluster’s labels.

  5. Commit the site PolicyGenTemplate CR in Git and push to the Git repository that is monitored by the ArgoCD application.

    Subsequent changes to the referenced ConfigMap CR are not automatically synced to the applied policies. You need to manually sync the new ConfigMap changes to update existing PolicyGenTemplate CRs. See "Syncing new ConfigMap changes to existing PolicyGenTemplate CRs".

    You can use the same PolicyGenTemplate CR for multiple clusters. If there is a configuration change, then the only modifications you need to make are to the ConfigMap objects that hold the configuration for each cluster and the labels of the managed clusters.

Syncing new ConfigMap changes to existing PolicyGenTemplate CRs

Prerequisites
  • You have installed the OpenShift CLI (oc).

  • You have logged in to the hub cluster as a user with cluster-admin privileges.

  • You have created a PolicyGenTemplate CR that pulls information from a ConfigMap CR using hub cluster templates.

Procedure
  1. Update the contents of your ConfigMap CR, and apply the changes in the hub cluster.

  2. To sync the contents of the updated ConfigMap CR to the deployed policy, do either of the following:

    1. Option 1: Delete the existing policy. ArgoCD uses the PolicyGenTemplate CR to immediately recreate the deleted policy. For example, run the following command:

      $ oc delete policy <policy_name> -n <policy_namespace>
    2. Option 2: Apply a special annotation policy.open-cluster-management.io/trigger-update to the policy with a different value every time when you update the ConfigMap. For example:

      $ oc annotate policy <policy_name> -n <policy_namespace> policy.open-cluster-management.io/trigger-update="1"

      You must apply the updated policy for the changes to take effect. For more information, see Special annotation for reprocessing.

  3. Optional: If it exists, delete the ClusterGroupUpdate CR that contains the policy. For example:

    $ oc delete clustergroupupgrade <cgu_name> -n <cgu_namespace>
    1. Create a new ClusterGroupUpdate CR that includes the policy to apply with the updated ConfigMap changes. For example, add the following YAML to the file cgr-example.yaml:

      apiVersion: ran.openshift.io/v1alpha1
      kind: ClusterGroupUpgrade
      metadata:
        name: <cgr_name>
        namespace: <policy_namespace>
      spec:
        managedPolicies:
          - <managed_policy>
        enable: true
        clusters:
        - <managed_cluster_1>
        - <managed_cluster_2>
        remediationStrategy:
          maxConcurrency: 2
          timeout: 240
    2. Apply the updated policy:

      $ oc apply -f cgr-example.yaml