The hub cluster provides monitoring and status reporting for the managed clusters through the Observability pillar of the RHACM Operator. This includes aggregated metrics, alerts, and compliance monitoring through the Governance policy framework.
The telco hub reference design specification (RDS) describes the configuration for a hub cluster that deploys and operates fleets of OKD clusters in a telco environment.
The telco core, telco RAN and telco hub reference design specifications (RDS) capture the recommended, tested, and supported configurations to get reliable and repeatable performance for clusters running the telco core and telco RAN profiles.
Each RDS includes the released features and supported configurations that are engineered and validated for clusters to run the individual profiles. The configurations provide a baseline OKD installation that meets feature and KPI targets. Each RDS also describes expected variations for each individual configuration. Validation of each RDS includes many long duration and at-scale tests.
The validated reference configurations are updated for each major Y-stream release of OKD. Z-stream patch releases are periodically re-tested against the reference configurations. |
Deviating from the validated telco core, telco RAN DU, and telco hub reference design specifications (RDS) can have significant impact beyond the specific component or feature that you change. Deviations require analysis and engineering in the context of the complete solution.
All deviations from the RDS should be analyzed and documented with clear action tracking information. Due diligence is expected from partners to understand how to bring deviations into line with the reference design. This might require partners to provide additional resources to engage with Red Hat to work towards enabling their use case to achieve a best in class outcome with the platform. This is critical for the supportability of the solution and ensuring alignment across Red Hat and with partners. |
Deviation from the RDS can have some or all of the following consequences:
It can take longer to resolve issues.
There is a risk of missing project service-level agreements (SLAs), project deadlines, end provider performance requirements, and so on.
Unapproved deviations may require escalation at executive levels.
Red Hat prioritizes the servicing of requests for deviations based on partner engagement priorities. |
Use the features and components running on the management hub cluster to manage many other clusters in a hub-and-spoke topology. The hub cluster provides a highly available and centralized interface for managing the configuration, lifecycle, and observability of the fleet of deployed clusters.
All management hub functionality can be deployed on a dedicated OKD cluster or as applications that are co-resident on an existing cluster. |
Using a combination of Day 2 Operators, the hub cluster provides the necessary infrastructure to deploy and configure the fleet of clusters by using a GitOps methodology. Over the lifetime of the deployed clusters, further management of upgrades, scaling the number of clusters, node replacement, and other lifecycle management functions can be declaratively defined and rolled out. You can control the timing and progression of the rollout across the fleet.
The hub cluster provides monitoring and status reporting for the managed clusters through the Observability pillar of the RHACM Operator. This includes aggregated metrics, alerts, and compliance monitoring through the Governance policy framework.
The telco management hub reference design specification (RDS) and the associated reference custom resources (CRs) describe the telco engineering and QE validated method for deploying, configuring and managing the lifecycle of telco managed cluster infrastructure. The reference configuration includes the installation and configuration of the hub cluster components on top of OKD.
The hub cluster provides managed cluster installation, configuration, observability and ongoing lifecycle management for telco application and workload clusters.
For more information about core clusters or far edge clusters that host RAN distributed unit (DU) workloads, see the following:
For more information about lifecycle management for the fleet of managed clusters see:
For more information about declarative cluster provisioning with GitOps ZTP see:
For more information about observability metrics and alerts, see:
The resource requirements for the hub cluster are directly dependent on the number of clusters being managed by the hub, the number of policies used for each managed cluster, and the set of features that are configured in Red Hat Advanced Cluster Management (RHACM).
The hub cluster reference configuration can support up to 3500 managed single-node OpenShift clusters under the following conditions:
5 policies for each cluster with hub-side templating configured with a 10 minute evaluation interval.
Only the following RHACM add-ons are enabled:
Policy controller
Observability with the default configuration
You deploy managed clusters by using GitOps ZTP in batches of up to 500 clusters at a time.
The reference configuration is also validated for deployment and management of a mix of managed cluster topologies. The specific limits depend on the mix of cluster topologies, enabled RHACM features, and so on. In a mixed topology scenario, the reference hub configuration is validated with a combination of 1200 single-node OpenShift clusters, 400 compact clusters (3 nodes combined control plane and compute nodes), and 230 standard clusters (3 control plane and 2 worker nodes).
A hub cluster conforming to this reference specification can support synchronization of 1000 single-node ClusterInstance
CRs for each ArgoCD application.
You can use multiple applications to achieve the maximum number of clusters supported by a single hub cluster.
Specific dimensioning requirements are highly dependent on the cluster topology and workload. For more information, see "Storage requirements". Adjust cluster dimensions for the specific characteristics of your fleet of managed clusters. |
Resource utilization was measured for deploying hub clusters in the following scenario:
Under reference load managing 3500 single-node OpenShift clusters.
3-node compact cluster for management hub running on dual socket bare-metal servers.
Network impairment of 50 ms round-trip latency, 100 Mbps bandwidth limit and 0.02% packet loss.
Observability was not enabled.
Only local storage was used.
Metric | Peak Measurement |
---|---|
OpenShift Platform CPU |
106 cores (52 cores peak per node) |
OpenShift Platform memory |
504 G (168 G peak per node) |
In production environments, the OKD hub cluster must be highly available to maintain high availability of the management functions.
Use a highly available cluster topology for the hub cluster, for example:
Compact (3 nodes combined control plane and compute nodes)
Standard (3 control plane nodes + N compute nodes)
In non-production environments, a single-node OpenShift cluster can be used for limited hub cluster functionality.
Certain capabilities, for example Red Hat OpenShift Data Foundation, are not supported on single-node OpenShift. In this configuration, some hub cluster features might not be available.
The number of optional compute nodes can vary depending on the scale of the specific use case.
Compute nodes can be added later as required.
The reference hub cluster is designed to operate in a disconnected networking environment where direct access to the internet is not possible. As with all OKD clusters, the hub cluster requires access to an image registry hosting all OpenShift and Day 2 Operator Lifecycle Manager (OLM) images.
The hub cluster supports dual-stack networking support for IPv6 and IPv4 networks. IPv6 is typical in edge or far-edge network segments, while IPv4 is more prevalent for use with legacy equipment in the data center.
Regardless of the installation method, you must configure the following network types for the hub cluster:
clusterNetwork
serviceNetwork
machineNetwork
You must configure the following IP addresses for the hub cluster:
apiVIP
ingressVIP
For the above networking configurations, some values are required, or can be auto-assigned, depending on the chosen architecture and DHCP configuration. |
You must use the default OKD network provider OVN-Kubernetes.
Networking between the managed cluster and hub cluster must meet the networking requirements in the Red Hat Advanced Cluster Management (RHACM) documentation, for example:
Hub cluster access to managed cluster API service, Ironic Python agent, and baseboard management controller (BMC) port.
Managed cluster access to hub cluster API service, ingress IP and control plane node IP addresses.
Managed cluster BMC access to hub cluster control plane node IP addresses.
An image registry must be accessible throughout the lifetime of the hub cluster.
All required container images must be mirrored to the disconnected registry.
The hub cluster must be configured to use a disconnected registry.
The hub cluster cannot host its own image registry. For example, the registry must be available in a scenario where a power failure affects all cluster nodes.
When deploying a hub cluster, ensure you define appropriately sized CIDR range definitions.
The memory and CPU requirements of the hub cluster vary depending on the configuration of the hub cluster, the number of resources on the cluster, and the number of managed clusters.
Ensure that the hub cluster meets the underlying memory and CPU requirements for OKD and Red Hat Advanced Cluster Management (RHACM).
Before deploying a telco hub cluster, ensure that your cluster host meets cluster requirements.
For more information about scaling the number of managed clusters, see "Hub cluster scaling target".
The total amount of storage required by the management hub cluster is dependant on the storage requirements for each of the applications deployed on the cluster.
The main components that require storage through highly available PersistentVolume
resources are described in the following sections.
The storage required for the underlying OKD installation is separate to these requirements. |
The Assisted Service is deployed with the multicluster engine and Red Hat Advanced Cluster Management (RHACM).
Persistent volume resource | Size (GB) |
---|---|
|
50 |
|
700 |
|
20 |
Cluster Observability is provided by the multicluster engine and Red Hat Advanced Cluster Management (RHACM).
Observability storage needs several PV
resources and an S3 compatible bucket storage for long term retention of the metrics.
Storage requirements calculation is complex and dependent on the specific workloads and characteristics of managed clusters.
Requirements for PV
resources and the S3 bucket depend on many aspects including data retention, the number of managed clusters, managed cluster workloads, and so on.
Estimate the required storage for observability by using the observability sizing calculator in the RHACM capacity planning repository. See the Red Hat Knowledgebase article Calculating storage need for MultiClusterHub Observability on telco environments for an explanation of using the calculator to estimate observability storage requirements. The below table uses inputs derived from the telco RAN DU RDS and the hub cluster RDS as representative values.
The following numbers are estimated. Tune the values for more accurate results. Add an engineering margin, for example +20%, to the results to account for potential estimation inaccuracies. |
Capacity planner input | Data source | Example value |
---|---|---|
Number of control plane nodes |
Hub cluster RDS (scale) and telco RAN DU RDS (topology) |
3500 |
Number of additional worker nodes |
Hub cluster RDS (scale) and telco RAN DU RDS (topology) |
0 |
Days for storage of data |
Hub cluster RDS |
15 |
Total number of pods per cluster |
Telco RAN DU RDS |
120 |
Number of namespaces (excluding OKD) |
Telco RAN DU RDS |
4 |
Number of metric samples per hour |
Default value |
12 |
Number of hours of retention in receiver persistent volume (PV) |
Default value |
24 |
With these input values, the sizing calculator as described in the Red Hat Knowledgebase article Calculating storage need for MultiClusterHub Observability on telco environments indicates the following storage needs:
alertmanager PV |
thanos receive PV |
thanos compact PV |
|||
---|---|---|---|---|---|
Per replica |
Total |
Per replica |
Total |
Total |
|
10 GiB |
30 GiB |
10 GiB |
30 GiB |
100 GiB |
thanos rule PV |
thanos store PV |
Object bucket[1] | |||
---|---|---|---|---|---|
Per replica |
Total |
Per replica |
Total |
Per day |
Total |
30 GiB |
90 GiB |
100 GiB |
300 GiB |
15 GiB |
101 GiB |
[1] For the object bucket, it is assumed that downsampling is disabled, so that only raw data is calculated for storage requirements.
Minimum OKD and Red Hat Advanced Cluster Management (RHACM) limits apply
High availability should be provided through a storage backend. The hub cluster reference configuration provides storage through Red Hat OpenShift Data Foundation.
Object bucket storage is provided through OpenShift Data Foundation.
Use SSD or NVMe disks with low latency and high throughput for etcd storage.
The storage solution for telco hub clusters is OpenShift Data Foundation.
Local Storage Operator supports the storage class used by OpenShift Data Foundation to provide block, file, and object storage as needed by other components on the hub cluster.
The Local Storage Operator LocalVolume
configuration includes setting forceWipeDevicesAndDestroyAllData: true
to support the reinstallation of hub cluster nodes where OpenShift Data Foundation has previously been used.
The telco management hub cluster supports a GitOps-driven methodology for installing and managing the configuration of OpenShift clusters for various telco applications. This methodology requires an accessible Git repository that serves as the authoritative source of truth for cluster definitions and configuration artifacts.
Red Hat does not offer a commercially supported Git server. An existing Git server provided in the production environment can be used. Gitea and Gogs are examples of self-hosted Git servers that you can use.
The Git repository is typically provided in the production network external to the hub cluster. In a large-scale deployment, multiple hub clusters can use the same Git repository for maintaining the definitions of managed clusters. Using this approach, you can easily review the state of the complete network. As the source of truth for cluster definitions, the Git repository should be highly available and recoverable in disaster scenarios.
For disaster recovery and multi-hub considerations, run the Git repository separately from the hub cluster. |
A Git repository is required to support the GitOps ZTP functions of the hub cluster, including installation, configuration, and lifecycle management of the managed clusters.
The Git repository must be accessible from the management cluster.
The Git repository is used by the GitOps Operator to ensure continuous deployment and a single source of truth for the applied configuration.
The reference method for installing OKD for the hub cluster is through the Agent-based Installer.
Agent-based Installer provides installation capabilities without additional centralized infrastructure. The Agent-based Installer creates an ISO image, which you mount to the server to be installed. When you boot the server, OKD is installed alongside optionally supplied extra manifests, such as the Red Hat OpenShift GitOps Operator.
You can also install OKD in the hub cluster by using other installation methods. |
If hub cluster functions are being applied to an existing OKD cluster, the Agent-based Installer installation is not required. The remaining steps to install Day 2 Operators and configure the cluster for these functions remains the same. When OKD installation is complete, the set of additional Operators and their configuration must be installed on the hub cluster.
The reference configuration includes all of these custom resources (CRs), which you can apply manually, for example:
$ oc apply -f <reference_cr>
You can also add the reference configuration to the Git repository and apply it using ArgoCD.
If you apply the CRs manually, ensure you apply the CRs in the order of their dependencies. For example, apply namespaces before Operators and apply Operators before configurations. |
Agent-based Installer requires an accessible image repository containing all required OKD and Day 2 Operator images.
Agent-based Installer builds ISO images based on a specific OpenShift releases and specific cluster details. Installation of a second hub requires a separate ISO image to be built.
Agent-based Installer provides a baseline OKD installation. You apply Day 2 Operators and other configuration CRs after the cluster is installed.
The reference configuration supports Agent-based Installer installation in a disconnected environment.
A limited set of additional manifests can be supplied at installation time.
The management hub cluster relies on a set of Day 2 Operators to provide critical management services and infrastructure. Use Operator versions that match the set of managed cluster versions in your fleet.
Install Day 2 Operators using Operator Lifecycle Manager (OLM) and Subscription
custom resources (CRs).
Subscription
CRs identify the specific Day 2 Operator to install, the catalog in which the Operator is found, and the appropriate version channel for the Operator.
By default OLM installs and attempt to keep Operators updated with the latest z-stream version available in the channel.
By default all Subscriptions are set with an installPlanApproval: Automatic
value.
In this mode, OLM automatically installs new Operator versions when they are available in the catalog and channel.
Setting |
When upgrading a telco hub cluster, the versions of OKD and Operators must meet the requirements of all relevant compatibility matrixes.
Red Hat Advanced Cluster Management for Kubernetes 2.11 Support Matrix
For more information about telco hub cluster update requirements, see:
For more information about updating the hub cluster, see:
The Red Hat Advanced Cluster Management (RHACM) multicluster engine Observability component provides centralized aggregation and visualization of metrics and alerts for all managed clusters. To balance performance and data analysis, the monitoring service maintains a subset list of aggregated metrics that are collected at a downsampled interval. The metrics can be accessed on the hub through a set of different preconfigured dashboards.
The primary CR to enable and configure the Observability service is the MulticlusterObservability
CR, which defines the following settings:
The primary custom resource (CR) to enable and configure the observability service is the MulticlusterObservability
CR, which defines the following settings:
Configurable retention settings.
Storage for the different components: thanos receive
, thanos compact
, thanos rule
, thanos store
sharding, alertmanager
.
The metadata.annotations.mco-disable-alerting="true"
annotation that enables tuning for the monitoring configuration on managed clusters.
Without this setting the Observability component attempts to configure the managed cluster monitoring configuration.
With this value set you can merge your desired configuration with the necessary Observability configuration of alert forwarding into the managed cluster monitoring |
The hub cluster provides an Observability Alertmanager that can be configured to push alerts to external systems, for example, email. The Alertmanager is enabled by default.
You must configure alert forwarding.
When the Alertmanager is enabled but not configured, the hub Alertmanager does not forward alerts externally.
When Observability is enabled, the managed clusters can be configured to send alerts to any endpoint including the hub Alertmanager.
When a managed cluster is configured to forward alerts to external sources, alerts are not routed through the hub cluster Alertmanager.
Alert state is available as a metric.
When observability is enabled, the managed cluster alert states are included in the subset of metrics forwarded to the hub cluster and are available through Observability dashboards.
Observability requires persistent object storage for long-term metrics. For more information, see "Storage requirements".
Forwarding of metrics is a subset of the full metric data.
It includes only the metrics defined in the observability-metrics-allowlist
config map and any custom metrics added by the user.
Metrics are forwarded at a downsampled rate.
Metrics are forwarded by taking the latest datapoint at a 5 minute interval (or as defined by the MultiClusterObservability
CR configuration).
A network outage may lead to a loss of metrics forwarded to the hub cluster during that interval. This can be mitigated if metrics are also forwarded directly from managed clusters to an external metrics collector in the providers network. Full resolution metrics are available on the managed cluster.
In addition to default metrics dashboards on the hub, users may define custom dashboards.
The reference configuration is sized based on 15 days of metrics storage by the hub cluster for 3500 single-node OpenShift clusters. If longer retention or other managed cluster topology or sizing is required, the storage calculations must be updated and sufficient storage capacity be maintained. For more information about calculating new values, see "Storage requirements".
For more information about observability, see:
For more information about custom metrics, see Adding custom metrics
For more information about forwarding alerts to other external systems, see Forwarding alerts
For more information about CPU and memory requirements see: Observability pod capacity requests
For more information about custom dashboards, see Using Grafana dashboards
To provision and manage sites at the far edge of the network, use GitOps ZTP in a hub-and-spoke architecture, where a single hub cluster manages many managed clusters.
Lifecycle management for spoke clusters can be divided into two different stages: cluster deployment, including OKD installation, and cluster configuration.
As of Red Hat Advanced Cluster Management (RHACM) 2.12, using the SiteConfig Operator is the recommended method for deploying managed clusters.
The SiteConfig Operator introduces a unified ClusterInstance API that decouples the parameters that define the cluster from the manner in which it is deployed.
The SiteConfig Operator uses a set of cluster templates that are instantiated using the data from a ClusterInstance
custom resource (CR) to dynamically generate installation manifests.
Following the GitOps methodology, the ClusterInstance
CR is sourced from a Git repository through ArgoCD.
The ClusterInstance
CR can be used to initiate cluster installation by using either Assisted Installer, or the image-based installation available in multicluster engine.
The SiteConfig ArgoCD plugin which handles SiteConfig
CRs is deprecated from OKD 4.18.
You must create a Secret
CR with the login information for the cluster baseboard management controller (BMC).
This Secret
CR is then referenced in the SiteConfig
CR.
Integration with a secret store, such as Vault, can be used to manage the secrets.
Besides offering deployment method isolation and unification of Git and non-Git workflows, the SiteConfig Operator provides better scalability, greater flexibility with the use of custom templates, and an enhanced troubleshooting experience.
You can upgrade versions of OKD, Day 2 Operators, and managed cluster configurations, by declaring the required version in the Policy
custom resources (CRs) that target the clusters to be upgraded.
Policy controllers periodically check for policy compliance.
If the result is negative, a violation report is created.
If the policy remediation action is set to enforce
the violations are remediated according to the updated policy.
If the policy remediation action is set to inform
, the process ends with a non-compliant status report and responsibility to initiate the upgrade is left to the user to perform during an appropriate maintenance window.
The Topology Aware Lifecycle Manager (TALM) extends Red Hat Advanced Cluster Management (RHACM) with features to manage the rollout of upgrades or configuration throughout the lifecycle of the fleet of clusters. It operates in progressive, limited size batches of clusters. When upgrades to OKD or the Day 2 Operators are required, TALM progressively rolls out the updates by stepping through the set of policies and switching them to an "enforce" policy to push the configuration to the managed cluster.
The custom resource (CR) that TALM uses to build the remediation plan is the ClusterGroupUpgrade
CR.
You can use image-based upgrade (IBU) with the Lifecycle Agent as an alternative upgrade path for the single-node OpenShift cluster platform version. IBU uses an OCI image generated from a dedicated seed cluster to install single-node OpenShift on the target cluster.
TALM uses the ImageBasedGroupUpgrade
CR to roll out image-based upgrades to a set of identified clusters.
You can perform direct upgrades for single-node OpenShift clusters using image-based upgrade for OKD <4.y>
to <4.y+2>
, and <4.y.z>
to <4.y.z+n>
.
Image-based upgrade uses custom images that are specific to the hardware platform that the clusters are running on. Different hardware platforms require separate seed images.
In edge deployments, you can minimize the disruption to managed clusters by managing the timing and rollout of changes.
Set all policies to inform
to monitor compliance without triggering automatic enforcement.
Similarly, configure Day 2 Operator subscriptions to manual to prevent updates from occurring outside of scheduled maintenance windows.
The recommended upgrade aproach for single-node OpenShift clusters is the image-based upgrade.
For multi-node cluster upgrades, consider the following MachineConfigPool
CR configurations to reduce upgrade times:
Pause configuration deployments to nodes during a maintenance window by setting the paused
field to true
.
Adjust the maxUnavailable
field to control how many nodes in the pool can be updated simultaneously.
The MaxUnavailable
field defines the percentage of nodes in the pool that can be simultaneously unavailable during a MachineConfig
object update.
Set maxUnavailable
to the maximum tolerable value.
This reduces the number of reboots in a cluster during upgrades which results in shorter upgrade times.
Resume configuration deployments by setting the paused
field to false
. The configuration changes are applied in a single reboot.
During cluster installation, you can pause MachineConfigPool
CRs by setting the paused
field to true
and setting maxUnavailable
to 100% to improve installation times.
Note that loss of the hub cluster does not typically create a service outage on the managed clusters. Functions provided by the hub cluster will be lost, such as observability, configuration, lifecycle management updates being driven through the hub cluster, and so on.
Backup,restore and disaster recovery are offered by the cluster backup and restore Operator, which depends on the OpenShift API for Data Protection (OADP) Operator.
You can extend the cluster backup and restore operator to third party resources of the hub cluster based on your configuration.
The cluster backup and restore operator is not enabled by default in Red Hat Advanced Cluster Management (RHACM). The reference configuration enables this feature.
No reference design updates in this release.
Red Hat Advanced Cluster Management (RHACM) provides multicluster engine installation and ongoing lifecycle management functionality for deployed clusters.
You can manage cluster configuration and upgrades declaratively by applying Policy
custom resources (CRs) to clusters during maintenance windows.
RHACM provides functionality such as the following:
Zero touch provisioning (ZTP) and ongoing scaling of clusters using the multicluster engine component in RHACM.
Configuration, upgrades, and cluster status through the RHACM policy controller.
During managed cluster installation, RHACM can apply labels to individual nodes as configured through the ClusterInstance
CR.
The Topology Aware Lifecycle Manager component of RHACM provides phased rollout of configuration changes to managed clusters.
The RHACM multicluster engine Observability component provides selective monitoring, dashboards, alerts, and metrics.
The recommended method for single-node OpenShift cluster installation is the image-based installation method in multicluster engine, which uses the ClusterInstance
CR for cluster definition.
The recommended method for single-node OpenShift upgrade is the image-based upgrade method.
The RHACM multicluster engine Observability component brings you a centralized view of the health and status of all the managed clusters. By default, every managed cluster is enabled to send metrics and alerts, created by their Cluster Monitoring Operator (CMO), back to Observability. For more information, see "Observability". |
For more information about limits on number of clusters managed by a single hub cluster, see "Telco management hub cluster use model".
The number of managed clusters that can be effectively managed by the hub depends on various factors, including:
Resource availability at each managed cluster
Policy complexity and cluster size
Network utilization
Workload demands and distribution
The hub and managed clusters must maintain sufficient bi-directional connectivity.
You can configure the cluster backup and restore Operator to include third-party resources.
The use of RHACM hub side templating when defining configuration through policy is strongly recommended. This feature reduces the number of policies needed to manage the fleet by enabling for each cluster or for each group. For example, regional or hardware type content to be templated in a policy and substituted on cluster or group basis.
Managed clusters typically have some number of configuration values which are specific to an individual cluster.
These should be managed using RHACM policy hub side templating with values pulled from ConfigMap
CRs based on the cluster name.
No reference design updates in this release.
TALM is an Operator that runs only on the hub cluster for managing how changes like cluster upgrades, Operator upgrades, and cluster configuration are rolled out to the network. TALM supports the following features:
Progressive rollout of policy updates to fleets of clusters in user configurable batches.
Per-cluster actions add ztp-done
labels or other user-configurable labels following configuration changes to managed clusters.
TALM supports optional pre-caching of OKD, OLM Operator, and additional images to single-node OpenShift clusters before initiating an upgrade. The pre-caching feature is not applicable when using the recommended image-based upgrade method for upgrading single-node OpenShift clusters.
Specifying optional pre-caching configurations with PreCachingConfig
CRs.
Configurable image filtering to exclude unused content.
Storage validation before and after pre-caching, using defined space requirement parameters.
TALM supports concurrent cluster upgrades in batches of 500.
Pre-caching is limited to single-node OpenShift cluster topology.
The PreCachingConfig
custom resource (CR) is optional. You do not need to create it if you want to pre-cache platform-related images only, such as OKD and OLM.
TALM supports the use of hub-side templating with Red Hat Advanced Cluster Management policies.
No reference design updates in this release
GitOps Operator and GitOps ZTP provide a GitOps-based infrastructure for managing cluster deployment and configuration.
Cluster definitions and configurations are maintained as a declarative state in Git.
You can apply ClusterInstance
custom resources (CRs) to the hub cluster where the SiteConfig
Operator renders them as installation CRs.
In earlier releases, a GitOps ZTP plugin supported the generation of installation CRs from SiteConfig
CRs.
This plugin is now deprecated.
A separate GitOps ZTP plugin is available to enable automatic wrapping of configuration CRs into policies based on the PolicyGenerator
or the PolicyGenTemplate
CRs.
You can deploy and manage multiple versions of OKD on managed clusters by using the baseline reference configuration CRs.
You can use custom CRs alongside the baseline CRs.
To maintain multiple per-version policies simultaneously, use Git to manage the versions of the source and policy CRs by using the PolicyGenerator
or the PolicyGenTemplate
CRs.
To ensure consistent and complete cleanup of managed clusters and their associated resources during cluster or node deletion, you must configure ArgoCD to use background deletion mode.
To avoid confusion or unintentional overwrite when updating content, use unique and distinguishable names for custom CRs in the source-crs
directory and extra manifests.
Keep reference source CRs in a separate directory from custom CRs. This facilitates easy update of reference CRs as required.
To help with multiple versions, keep all source CRs and policy creation CRs in versioned Git repositories to ensure consistent generation of policies for each OKD version.
No reference design updates in this release
You can create persistent volumes that can be used as PVC
resources by applications with the Local Storage Operator.
The number and type of PV
resources that you create depends on your requirements.
Create backing storage for PV
CRs before creating the persistent volume.
This can be a partition, a local volume, LVM volume, or full disk.
Refer to the device listing in LocalVolume
CRs by the hardware path used to access each device to ensure correct allocation of disks and partitions, for example, /dev/disk/by-path/<id>
.
Logical names (for example, /dev/sda
) are not guaranteed to be consistent across node reboots.
No reference design updates in this release
Red Hat OpenShift Data Foundation provides file, block, and object storage services to the hub cluster.
Red Hat OpenShift Data Foundation (ODF) in internal mode requires the Local Storage Operator to define a storage class which will provide the necessary underlying storage.
When doing the planning for a telco management cluster, consider the ODF infrastructure and networking requirements.
Dual stack support is limited. ODF IPv4 is supported on dual-stack clusters.
Address capacity warnings promptly as recovery can be difficult in case of storage capacity exhaustion, see Capacity planning.
No reference design updates in this release
Use the Cluster Logging Operator to collect and ship logs off the node for remote archival and analysis. The reference configuration uses Kafka to ship audit and infrastructure logs to a remote archive.
The reference configuration does not include local log storage.
The reference configuration does not include aggregation of managed cluster logs at the hub cluster.
The impact of cluster CPU use is based on the number or size of logs generated and the amount of log filtering configured.
The reference configuration does not include shipping of application logs. The inclusion of application logs in the configuration requires you to evaluate the application logging rate and have sufficient additional CPU resources allocated to the reserved set.
No reference design updates in this release
The OpenShift API for Data Protection (OADP) Operator is automatically installed and managed by Red Hat Advanced Cluster Management (RHACM) when the backup feature is enabled.
The OADP Operator facilitates the backup and restore of workloads in OKD clusters. Based on the upstream open source project Velero, it allows you to backup and restore all Kubernetes resources for a given project, including persistent volumes.
While it is not mandatory to have it on the hub cluster, it is highly recommended for cluster backup, disaster recovery and high availability architecture for the hub cluster.
The OADP Operator must be enabled to use the disaster recovery solutions for RHACM.
The reference configuration enables backup (OADP) through the MultiClusterHub
custom resource (CR) provided by the RHACM Operator.
Only one version of OADP can be installed on a cluster. The version installed by RHACM must be used for RHACM disaster recovery features.
No engineering consideration updates in this release.
The following is the complete YAML reference of all the custom resources (CRs) for the telco management hub reference configuration in 4.19.
---
apiVersion: agent-install.openshift.io/v1beta1
kind: AgentServiceConfig
metadata:
name: agent
annotations:
argocd.argoproj.io/sync-wave: "7"
argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
spec:
databaseStorage:
storageClassName: # your-fs-storageclass-here
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
filesystemStorage:
storageClassName: # your-fs-storageclass-here
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
imageStorage:
storageClassName: # your-fs-storageclass-here
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
mirrorRegistryRef:
name: mirror-registry-config
osImages:
# Replace <http-server-address:port> with the address of the local web server that stores the RHCOS images.
# The images can be downloaded from "https://mirror.openshift.com/pub/openshift-v4/x86_64/dependencies/rhcos/".
- cpuArchitecture: "x86_64"
openshiftVersion: "4.17"
rootFSUrl: http://<http-server-address:port>/rhcos-4.17.0-x86_64-live-rootfs.x86_64.img
url: http://<http-server-address:port>/rhcos-4.17.0-x86_64-live.x86_64.iso
version: "417.94.202409121747-0"
---
apiVersion: operator.open-cluster-management.io/v1
kind: MultiClusterHub
metadata:
annotations:
argocd.argoproj.io/sync-wave: "4"
argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
installer.open-cluster-management.io/mce-subscription-spec: '{"source": "redhat-operators-disconnected", "installPlanApproval": "Automatic"}'
installer.open-cluster-management.io/oadp-subscription-spec: '{"source": "redhat-operators-disconnected", "installPlanApproval": "Automatic"}'
name: multiclusterhub
namespace: open-cluster-management
spec:
availabilityConfig: High
enableClusterbackup: false
ingress: {}
overrides:
components:
- configOverrides: {}
enabled: true
name: app-lifecycle
- configOverrides: {}
enabled: true
name: cluster-lifecycle
- configOverrides: {}
enabled: true
name: cluster-permission
- configOverrides: {}
enabled: true
name: console
- configOverrides: {}
enabled: true
name: grc
- configOverrides: {}
enabled: true
name: insights
- configOverrides: {}
enabled: true
name: multicluster-engine
- configOverrides: {}
enabled: true
name: multicluster-observability
- configOverrides: {}
enabled: true
name: search
- configOverrides: {}
enabled: true
name: submariner-addon
- configOverrides: {}
enabled: true
name: volsync
- configOverrides: {}
enabled: true
name: cluster-backup
- configOverrides: {}
enabled: true
name: siteconfig
- configOverrides: {}
enabled: false
name: edge-manager-preview
separateCertificateManagement: false
---
apiVersion: v1
kind: ConfigMap
metadata:
name: mirror-registry-config
annotations:
argocd.argoproj.io/sync-wave: "5"
argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
namespace: multicluster-engine
labels:
app: assisted-service
data:
# Add the mirror registry SSL certificate chain up to the CA itself.
ca-bundle.crt: |
-----BEGIN CERTIFICATE-----
MIID7jCCAtagAwXXX...
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
MIIDvTCCAqWgAwXXX...
-----END CERTIFICATE-----
# The registries.conf field has been populated using the registries.conf file found in "/etc/containers/registries.conf" on each node.
# Replace <registry.example.com:8443> with the mirror registry's address.
registries.conf: |
unqualified-search-registries = ["registry.access.redhat.com", "docker.io"]
[[registry]]
prefix = ""
location = "quay.io/openshift-release-dev"
[[registry.mirror]]
location = "<registry.example.com:8443>/openshift-release-dev"
pull-from-mirror = "digest-only"
[[registry]]
prefix = ""
location = "quay.io/openshift-release-dev/ocp-release"
[[registry.mirror]]
location = "<registry.example.com:8443>/openshift-release-dev/ocp-release"
pull-from-mirror = "digest-only"
[[registry]]
prefix = ""
location = "quay.io/openshift-release-dev/ocp-v4.0-art-dev"
[[registry.mirror]]
location = "<registry.example.com:8443>/openshift-release-dev/ocp-v4.0-art-dev"
pull-from-mirror = "digest-only"
[[registry]]
prefix = ""
location = "registry.redhat.io/multicluster-engine"
[[registry.mirror]]
location = "<registry.example.com:8443>/multicluster-engine"
pull-from-mirror = "digest-only"
[[registry]]
prefix = ""
location = "registry.redhat.io/odf4"
[[registry.mirror]]
location = "<registry.example.com:8443>/odf4"
pull-from-mirror = "digest-only"
[[registry]]
prefix = ""
location = "registry.redhat.io/openshift4"
[[registry.mirror]]
location = "<registry.example.com:8443>/openshift4"
pull-from-mirror = "digest-only"
[[registry]]
prefix = ""
location = "registry.redhat.io/rhacm2"
[[registry.mirror]]
location = "<registry.example.com:8443>/rhacm2"
pull-from-mirror = "digest-only"
[[registry]]
prefix = ""
location = "registry.redhat.io/rhceph"
[[registry.mirror]]
location = "<registry.example.com:8443>/rhceph"
pull-from-mirror = "digest-only"
[[registry]]
prefix = ""
location = "registry.redhat.io/rhel8"
[[registry.mirror]]
location = "<registry.example.com:8443>/rhel8"
pull-from-mirror = "digest-only"
[[registry]]
prefix = ""
location = "registry.redhat.io/rhel9"
[[registry.mirror]]
location = "<registry.example.com:8443>/rhel9"
pull-from-mirror = "digest-only"
[[registry]]
prefix = ""
location = "registry.redhat.io/ubi8"
[[registry.mirror]]
location = "<registry.example.com:8443>/ubi8"
pull-from-mirror = "tag-only"
---
apiVersion: v1
kind: Namespace
metadata:
labels:
openshift.io/cluster-monitoring: "true"
name: open-cluster-management
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: open-cluster-management-group
namespace: open-cluster-management
spec:
targetNamespaces:
- open-cluster-management
---
apiVersion: search.open-cluster-management.io/v1alpha1
kind: Search
metadata:
name: search-v2-operator
namespace: open-cluster-management
annotations:
argocd.argoproj.io/sync-wave: "10"
argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
spec:
dbStorage:
size: 10Gi
deployments:
collector:
resources:
limits:
memory: 8Gi
requests:
cpu: 25m
memory: 64Mi
database:
envVar:
- name: POSTGRESQL_EFFECTIVE_CACHE_SIZE
value: 1024MB
- name: POSTGRESQL_SHARED_BUFFERS
value: 512MB
- name: WORK_MEM
value: 128MB
resources:
limits:
memory: 16Gi
requests:
cpu: 25m
memory: 32Mi
indexer:
resources:
limits:
memory: 4Gi
requests:
cpu: 25m
memory: 128Mi
queryapi:
replicaCount: 2
resources:
limits:
memory: 4Gi
requests:
cpu: 25m
memory: 1Gi
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/infra
operator: Exists
---
apiVersion: metal3.io/v1alpha1
kind: Provisioning
metadata:
name: provisioning-configuration
annotations:
argocd.argoproj.io/sync-wave: "6"
argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
spec:
watchAllNamespaces: true
# some servers do not support virtual media installations
# when the image is served using the https protocol
# disableVirtualMediaTLS: true
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: open-cluster-management-subscription
namespace: open-cluster-management
spec:
channel: release-2.13
installPlanApproval: Automatic
name: advanced-cluster-management
source: redhat-operators-disconnected
sourceNamespace: openshift-marketplace
---
apiVersion: observability.open-cluster-management.io/v1beta2
kind: MultiClusterObservability
metadata:
name: observability
annotations:
argocd.argoproj.io/sync-wave: "10"
argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
# avoids MultiClusterHub Observability to own/manage the
# spoke clusters configuration about AlertManager forwards.
# ZTP Policies will be in charge of configuring it
# https://issues.redhat.com/browse/CNF-13398
mco-disable-alerting: "true"
spec:
# based on the data provided by acm-capacity tool
# https://github.com/stolostron/capacity-planning/blob/main/calculation/ObsSizingTemplate-Rev1.ipynb
# for an scenario with:
# 3500SNOs, 125 pods and 4 Namespaces (apart from Openshift NS)
# storage retention 15 days
# downsampling disabled
# default MCO Addon configuration samples_per_hour, pv_retention_hrs.
# More on how to stimate: https://access.redhat.com/articles/7103886
advanced:
retentionConfig:
blockDuration: 2h
deleteDelay: 48h
retentionInLocal: 24h
retentionResolutionRaw: 15d
enableDownsampling: false
observabilityAddonSpec:
enableMetrics: true
interval: 300
storageConfig:
storageClass: # your-fs-storageclass-here
alertmanagerStorageSize: 10Gi
compactStorageSize: 100Gi
metricObjectStorage:
key: thanos.yaml
name: thanos-object-storage
receiveStorageSize: 10Gi
ruleStorageSize: 30Gi
storeStorageSize: 100Gi
# In addition to these storage settings, the `metricObjectStorage`
# points to an Object Storage. Under the reference configuration,
# scale and retention the estimated object storage is about 101Gi
---
apiVersion: v1
kind: Namespace
metadata:
labels:
openshift.io/cluster-monitoring: "true"
name: open-cluster-management-observability
---
apiVersion: objectbucket.io/v1alpha1
kind: ObjectBucketClaim
metadata:
name: observability-obc
annotations:
argocd.argoproj.io/sync-wave: "8"
argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
namespace: open-cluster-management-observability
spec:
generateBucketName: observability-object-bucket
storageClassName: openshift-storage.noobaa.io
---
apiVersion: v1
kind: Secret
metadata:
annotations:
argocd.argoproj.io/sync-wave: "9"
argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
labels:
cluster.open-cluster-management.io/backup: ""
name: multiclusterhub-operator-pull-secret
namespace: open-cluster-management-observability
type: kubernetes.io/dockerconfigjson
data:
.dockerconfigjson: '' # Value provided by user or by pull-secret-openshift-config-copy policy
# This content creates a policy which copies the necessary data from
# the generated Object Bucket Claim into the necessary secret for
# observability to connect to thanos.
---
apiVersion: v1
kind: Namespace
metadata:
name: hub-policies
---
apiVersion: policy.open-cluster-management.io/v1
kind: Policy
metadata:
annotations:
policy.open-cluster-management.io/categories: CM Configuration Management
policy.open-cluster-management.io/controls: CM-2 Baseline Configuration
policy.open-cluster-management.io/description: ""
policy.open-cluster-management.io/standards: NIST SP 800-53
argocd.argoproj.io/sync-wave: "9"
argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
name: obs-thanos-secret
namespace: hub-policies
spec:
disabled: false
policy-templates:
- objectDefinition:
apiVersion: policy.open-cluster-management.io/v1
kind: ConfigurationPolicy
metadata:
name: thanos-secret-cp
spec:
remediationAction: enforce
severity: high
object-templates-raw: |
{{- /* read the bucket data and noobaa endpoint access data */ -}}
{{- $objBucket := (lookup "v1" "ConfigMap" "open-cluster-management-observability" "observability-obc") }}
{{- $awsAccess := (lookup "v1" "Secret" "open-cluster-management-observability" "observability-obc") }}
{{- /* create the thanos config file as a template */ -}}
{{- $thanosConfig := `
type: s3
config:
bucket: %[1]s
endpoint: %[2]s
insecure: true
access_key: %[3]s
secret_key: %[4]s
`
}}
{{- /* create the secret using the thanos configuration template created above. */ -}}
- complianceType: mustonlyhave
objectDefinition:
apiVersion: v1
kind: Secret
metadata:
name: thanos-object-storage
namespace: open-cluster-management-observability
type: Opaque
data:
thanos.yaml: {{ (printf $thanosConfig $objBucket.data.BUCKET_NAME
$objBucket.data.BUCKET_HOST
($awsAccess.data.AWS_ACCESS_KEY_ID | base64dec)
($awsAccess.data.AWS_SECRET_ACCESS_KEY | base64dec)
) | base64enc }}
---
apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Placement
metadata:
name: obs-thanos-pl
namespace: hub-policies
annotations:
argocd.argoproj.io/sync-wave: "9"
argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
spec:
predicates:
- requiredClusterSelector:
labelSelector:
matchExpressions:
- key: name
operator: In
values:
- local-cluster
---
apiVersion: policy.open-cluster-management.io/v1
kind: PlacementBinding
metadata:
name: obs-thanos-binding
namespace: hub-policies
annotations:
argocd.argoproj.io/sync-wave: "9"
argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
placementRef:
name: obs-thanos-pl
apiGroup: cluster.open-cluster-management.io
kind: Placement
subjects:
- name: obs-thanos-secret
apiGroup: policy.open-cluster-management.io
kind: Policy
---
apiVersion: cluster.open-cluster-management.io/v1beta2
kind: ManagedClusterSetBinding
metadata:
name: default
namespace: hub-policies
annotations:
argocd.argoproj.io/sync-wave: "8"
argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
spec:
clusterSet: default
# For reference this is the secret which is being generated (with
# approriate values in the fields):
# ---
# apiVersion: v1
# kind: Secret
# metadata:
# name: thanos-object-storage
# namespace: open-cluster-management-observability
# type: Opaque
# stringData:
# thanos.yaml: |
# type: s3
# config:
# bucket: "<BUCKET_NAME>"
# endpoint: "<BUCKET_HOST>"
# insecure: true
# access_key: "<AWS_ACCESS_KEY_ID>"
# secret_key: "<AWS_SECRET_ACCESS_KEY>"
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: openshift-topology-aware-lifecycle-manager-subscription
namespace: openshift-operators
spec:
channel: stable
installPlanApproval: Automatic
name: topology-aware-lifecycle-manager
source: redhat-operators-disconnected
sourceNamespace: openshift-marketplace
---
apiVersion: "local.storage.openshift.io/v1"
kind: "LocalVolume"
metadata:
name: "local-disks"
namespace: "openshift-local-storage"
annotations:
argocd.argoproj.io/sync-wave: "2"
argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
spec:
nodeSelector:
nodeSelectorTerms:
- matchExpressions:
- key: cluster.ocs.openshift.io/openshift-storage
operator: In
values:
- ""
storageClassDevices:
- storageClassName: "local-sc"
forceWipeDevicesAndDestroyAllData: true
volumeMode: Block
devicePaths:
- /dev/disk/by-path/pci-xxx
---
apiVersion: v1
kind: Namespace
metadata:
name: openshift-local-storage
labels:
openshift.io/cluster-monitoring: "true"
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: local-operator-group
namespace: openshift-local-storage
spec:
targetNamespaces:
- openshift-local-storage
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: local-storage-operator
namespace: openshift-local-storage
spec:
channel: stable
installPlanApproval: Automatic
name: local-storage-operator
source: redhat-operators-disconnected
sourceNamespace: openshift-marketplace
---
apiVersion: v1
kind: Namespace
metadata:
name: openshift-storage
annotations:
workload.openshift.io/allowed: management
labels:
openshift.io/cluster-monitoring: "true"
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: openshift-storage-operatorgroup
namespace: openshift-storage
spec:
targetNamespaces:
- openshift-storage
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: odf-operator
namespace: openshift-storage
spec:
channel: "stable-4.18"
name: odf-operator
source: redhat-operators-disconnected
sourceNamespace: openshift-marketplace
installPlanApproval: Automatic
---
apiVersion: ocs.openshift.io/v1
kind: StorageCluster
metadata:
name: ocs-storagecluster
namespace: openshift-storage
annotations:
argocd.argoproj.io/sync-wave: "3"
argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
spec:
manageNodes: false
resources:
mds:
limits:
cpu: "3"
memory: "8Gi"
requests:
cpu: "3"
memory: "8Gi"
monDataDirHostPath: /var/lib/rook
storageDeviceSets:
- count: 1 # <-- Modify count to desired value. For each set of 3 disks increment the count by 1.
dataPVCTemplate:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "600Gi" # <-- This should be changed as per storage size. Minimum 100 GiB and Maximum 4 TiB
storageClassName: "local-sc" # match this with the storage block created at the LSO step
volumeMode: Block
name: ocs-deviceset
placement: {}
portable: false
replica: 3
resources:
limits:
cpu: "2"
memory: "5Gi"
requests:
cpu: "2"
memory: "5Gi"
---
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-ssh-known-hosts-cm
namespace: openshift-gitops
data:
ssh_known_hosts: |
#############################################################
# by default empty known hosts, because of usual #
# disconnected environments. #
# #
# Manually add needed ssh known hosts: #
# example: $> ssh-keyscan my-github.com #
# Copy the output here
#############################################################
# my-github.com sh-rsa AAAAB3NzaC1y...J4i36KV/aCl4Ixz
# my-github.com ecdsa-sha2-nistp256...GGtLKqmwLLeKhe6xgc=
# my-github-com ssh-ed25519 AAAAC3N...lNrvWjBQ2u
---
apiVersion: v1
kind: Namespace
metadata:
name: openshift-gitops-operator
labels:
openshift.io/cluster-monitoring: "true"
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: openshift-gitops-operator
namespace: openshift-gitops-operator
spec:
upgradeStrategy: Default
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: openshift-gitops-operator
namespace: openshift-gitops-operator
spec:
channel: gitops-1.15
installPlanApproval: Automatic
name: openshift-gitops-operator
source: redhat-operators-disconnected
sourceNamespace: openshift-marketplace
---
apiVersion: v1
kind: Secret
metadata:
name: ztp-repo
namespace: openshift-gitops
labels:
argocd.argoproj.io/secret-type: repository
stringData:
# use following for ssh repo access
url: git@gitlab.example.com:namespace/repo.git
insecure: "false"
sshPrivateKey: |
-----BEGIN OPENSSH PRIVATE KEY-----
INSERT PRIVATE KEY
-----END OPENSSH PRIVATE KEY-----
# uncomment and use following for https repo access
# url: https://gitlab.example.com/namespace/repo
# insecure: "false"
# password: password
# username: username
# forceHttpBasicAuth: "true"
# more examples: https://argo-cd.readthedocs.io/en/stable/operator-manual/argocd-repositories-yaml/
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
name: ztp-app-project
namespace: openshift-gitops
annotations:
argocd.argoproj.io/sync-wave: "100"
spec:
clusterResourceWhitelist:
- group: 'hive.openshift.io'
kind: ClusterImageSet
- group: 'cluster.open-cluster-management.io'
kind: ManagedCluster
- group: ''
kind: Namespace
destinations:
- namespace: '*'
server: '*'
namespaceResourceWhitelist:
- group: ''
kind: ConfigMap
- group: ''
kind: Namespace
- group: ''
kind: Secret
- group: 'agent-install.openshift.io'
kind: InfraEnv
- group: 'agent-install.openshift.io'
kind: NMStateConfig
- group: 'extensions.hive.openshift.io'
kind: AgentClusterInstall
- group: 'extensions.hive.openshift.io'
kind: ImageClusterInstall
- group: 'hive.openshift.io'
kind: ClusterDeployment
- group: 'metal3.io'
kind: BareMetalHost
- group: 'metal3.io'
kind: HostFirmwareSettings
- group: 'metal3.io'
kind: DataImage
- group: 'agent.open-cluster-management.io'
kind: KlusterletAddonConfig
- group: 'cluster.open-cluster-management.io'
kind: ManagedCluster
- group: 'ran.openshift.io'
kind: SiteConfig
- group: 'siteconfig.open-cluster-management.io'
kind: ClusterInstance
sourceRepos:
- '*'
{
"spec": {
"controller": {
"resources": {
"limits": {
"cpu": "16",
"memory": "32Gi"
},
"requests": {
"cpu": "1",
"memory": "2Gi"
}
}
},
"kustomizeBuildOptions": "--enable-alpha-plugins",
"repo": {
"volumes": [
{
"name": "kustomize",
"emptyDir": {}
}
],
"initContainers": [
{
"resources": {
},
"terminationMessagePath": "/dev/termination-log",
"name": "kustomize-plugin",
"command": [
"/exportkustomize.sh"
],
"args": [
"/.config"
],
"imagePullPolicy": "Always",
"volumeMounts": [
{
"name": "kustomize",
"mountPath": "/.config"
}
],
"terminationMessagePolicy": "File",
"image": "registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.17.0"
},
{
"args": [
"-c",
"mkdir -p /.config/kustomize/plugin/policy.open-cluster-management.io/v1/policygenerator && cp /policy-generator/PolicyGenerator-not-fips-compliant /.config/kustomize/plugin/policy.open-cluster-management.io/v1/policygenerator/PolicyGenerator"
],
"command": [
"/bin/bash"
],
"image": "registry.redhat.io/rhacm2/multicluster-operators-subscription-rhel9:v2.11",
"name": "policy-generator-install",
"imagePullPolicy": "Always",
"volumeMounts": [
{
"mountPath": "/.config",
"name": "kustomize"
}
]
}
],
"volumeMounts": [
{
"name": "kustomize",
"mountPath": "/.config"
}
],
"env": [
{
"name": "ARGOCD_EXEC_TIMEOUT",
"value": "360s"
},
{
"name": "KUSTOMIZE_PLUGIN_HOME",
"value": "/.config/kustomize/plugin"
}
],
"resources": {
"limits": {
"cpu": "8",
"memory": "16Gi"
},
"requests": {
"cpu": "1",
"memory": "2Gi"
}
}
}
}
}
apiVersion: v1
kind: Namespace
metadata:
name: clusters-sub
annotations:
argocd.argoproj.io/sync-wave: "100"
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: clusters
namespace: openshift-gitops
annotations:
argocd.argoproj.io/sync-wave: "100"
spec:
destination:
server: https://kubernetes.default.svc
namespace: clusters-sub
project: ztp-app-project
source:
path: ztp/gitops-subscriptions/argocd/example/siteconfig
repoURL: https://github.com/openshift-kni/cnf-features-deploy
targetRevision: master
# uncomment the below plugin if you will be adding the plugin binaries in the same repo->dir where
# the sitconfig.yaml exist AND use the ../../hack/patch-argocd-dev.sh script to re-patch the deployment-repo-server
# plugin:
# name: kustomize-with-local-plugins
ignoreDifferences: # recommended way to allow ACM controller to manage its fields. alternative approach documented below (1)
- group: cluster.open-cluster-management.io
kind: ManagedCluster
managedFieldsManagers:
- controller
# (1) alternatively you can choose to ignore a specific path like so (replace managedFieldsManagers with jsonPointers)
# jsonPointers:
# - /metadata/labels/cloud
# - /metadata/labels/vendor
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
- PrunePropagationPolicy=background
- RespectIgnoreDifferences=true
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: gitops-cluster
annotations:
argocd.argoproj.io/sync-wave: "100"
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: openshift-gitops-argocd-application-controller
namespace: openshift-gitops
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: gitops-policy
annotations:
argocd.argoproj.io/sync-wave: "100"
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: open-cluster-management:cluster-manager-admin
subjects:
- kind: ServiceAccount
name: openshift-gitops-argocd-application-controller
namespace: openshift-gitops
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- app-project.yaml
- policies-app-project.yaml
- gitops-policy-rolebinding.yaml
- gitops-cluster-rolebinding.yaml
- clusters-app.yaml
- policies-app.yaml
- AddPluginsPolicy.yaml
---
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
name: policy-app-project
namespace: openshift-gitops
annotations:
argocd.argoproj.io/sync-wave: "100"
spec:
clusterResourceWhitelist:
- group: ''
kind: Namespace
destinations:
- namespace: 'ztp*'
server: '*'
- namespace: 'policies-sub'
server: '*'
namespaceResourceWhitelist:
- group: ''
kind: ConfigMap
- group: ''
kind: Namespace
- group: 'apps.open-cluster-management.io'
kind: PlacementRule
- group: 'policy.open-cluster-management.io'
kind: Policy
- group: 'policy.open-cluster-management.io'
kind: PlacementBinding
- group: 'ran.openshift.io'
kind: PolicyGenTemplate
- group: cluster.open-cluster-management.io
kind: Placement
- group: policy.open-cluster-management.io
kind: PolicyGenerator
- group: policy.open-cluster-management.io
kind: PolicySet
- group: cluster.open-cluster-management.io
kind: ManagedClusterSetBinding
sourceRepos:
- '*'
apiVersion: v1
kind: Namespace
metadata:
name: policies-sub
annotations:
argocd.argoproj.io/sync-wave: "100"
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: policies
namespace: openshift-gitops
annotations:
argocd.argoproj.io/sync-wave: "100"
spec:
destination:
server: https://kubernetes.default.svc
namespace: policies-sub
project: policy-app-project
source:
path: ztp/gitops-subscriptions/argocd/example/policygentemplates
repoURL: https://github.com/openshift-kni/cnf-features-deploy
targetRevision: master
# uncomment the below plugin if you will be adding the plugin binaries in the same repo->dir where
# the policyGenTemplate.yaml exist AND use the ../../hack/patch-argocd-dev.sh script to re-patch the deployment-repo-server
# plugin:
# name: kustomize-with-local-plugins
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
---
apiVersion: v1
kind: Namespace
metadata:
name: openshift-logging
annotations:
workload.openshift.io/allowed: management
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: cluster-logging
namespace: openshift-logging
spec:
targetNamespaces:
- openshift-logging
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: cluster-logging
namespace: openshift-logging
spec:
channel: "stable-6.2"
name: cluster-logging
source: redhat-operators-disconnected
sourceNamespace: openshift-marketplace
installPlanApproval: Automatic
---
apiVersion: v1beta1
kind: AgentConfig
metadata:
name: hub # need to match the same name put in install-config
rendezvousIP: 192.168.125.20 # one of the master IP
# Replace the fields below with your network details
hosts:
- hostname: hub-ctl-0
role: master
interfaces:
- name: ens3
macAddress: aa:aa:aa:aa:01:01
networkConfig:
interfaces:
- name: ens3
mac-address: aa:aa:aa:aa:01:01
ipv4:
enabled: true
dhcp: true
ipv6:
enabled: true
dhcp: false
address:
- ip: fd01::20
prefix-length: 64
routes:
config:
- destination: ::/0
next-hop-address: fd01::1
next-hop-interface: ens3
table-id: 254
rootDeviceHints:
deviceName: "/dev/disk/by-path/pci-0000:00:07.0"
- hostname: hub-ctl-1
role: master
interfaces:
- name: ens3
macAddress: aa:aa:aa:aa:01:02
networkConfig:
interfaces:
- name: ens3
mac-address: aa:aa:aa:aa:01:02
ipv4:
enabled: true
dhcp: true
ipv6:
enabled: true
dhcp: false
address:
- ip: fd01::21
prefix-length: 64
routes:
config:
- destination: ::/0
next-hop-address: fd01::1
next-hop-interface: ens3
table-id: 254
rootDeviceHints:
deviceName: "/dev/disk/by-path/pci-0000:00:07.0"
- hostname: hub-ctl-2
role: master
interfaces:
- name: ens3
macAddress: aa:aa:aa:aa:01:03
networkConfig:
interfaces:
- name: ens3
mac-address: aa:aa:aa:aa:01:03
ipv4:
enabled: true
dhcp: true
ipv6:
enabled: true
dhcp: false
address:
- ip: fd01::22
prefix-length: 64
routes:
config:
- destination: ::/0
next-hop-address: fd01::1
next-hop-interface: ens3
table-id: 254
rootDeviceHints:
deviceName: "/dev/disk/by-path/pci-0000:00:07.0"
---
apiVersion: v1
metadata:
name: hub # replace with your hub name
baseDomain: example.com # replace with your domain name
compute:
- architecture: amd64
hyperthreading: Enabled
name: worker
replicas: 0
controlPlane:
architecture: amd64
hyperthreading: Enabled
name: master
replicas: 3
networking:
clusterNetwork:
- cidr: 10.128.0.0/14
hostPrefix: 23
- cidr: fd02::/48
hostPrefix: 64
machineNetwork:
- cidr: 192.168.125.0/24 # replace with your machine network CIDR
- cidr: fd01::/64
networkType: OVNKubernetes
serviceNetwork:
- 172.30.0.0/16
- fd03::/112
# Replace the fields below with your network details
platform:
baremetal:
provisioningNetwork: "Disabled"
apiVIPs:
- 192.168.125.10
- fd01::10
ingressVIPs:
- 192.168.125.11
- fd01::11
# Replace <registry.example.com:8443> with the mirror registry's address.
imageDigestSources:
- mirrors:
- <registry.example.com:8443>/openshift-release-dev/ocp-release
source: quay.io/openshift-release-dev/ocp-release
- mirrors:
- <registry.example.com:8443>/openshift-release-dev/ocp-v4.0-art-dev
source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
# Add the mirror registry SSL certificate chain up to the CA itself.
additionalTrustBundle: |
-----BEGIN CERTIFICATE-----
MIID7jCCAtagAwXXX...
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
MIIDvTCCAqWgAwIBAgIUcXQpXXX...
-----END CERTIFICATE-----
# Add the mirror registry credentials to the pull secret.
pullSecret: '{"auths":{"<registry.example.com:8443>":{"auth": "aW5pdDo0R1XXXXXjdCbUoweUNuMWI1OTZBMmhkcEhjMw==","email": "user@redhat.com"},...}}}'
# Add the SSH public key to connect to the OCP nodes
sshKey: |
ssh-rsa AAAAB3NzaC1yc2EA...
The telco hub 4.19 solution has been validated using the following Red Hat software products for OKD clusters.
Component | Software version |
---|---|
OKD |
4.19 |
Local Storage Operator |
4.19 |
Red Hat OpenShift Data Foundation (ODF) |
4.18 |
Red Hat Advanced Cluster Management (RHACM) |
2.13 |
Red Hat OpenShift GitOps |
1.16 |
GitOps Zero Touch Provisioning (ZTP) plugins |
4.19 |
multicluster engine Operator PolicyGenerator plugin |
2.12 |
Topology Aware Lifecycle Manager (TALM) |
4.19 |
Cluster Logging Operator |
6.2 |
OpenShift API for Data Protection (OADP) |
The version aligned with the RHACM release. |