Telco hub reference design specifications | Scalability and performance

Reference design scope
Deviations from the reference design
Hub cluster architecture overview
Telco management hub cluster use model
Hub cluster scaling target
Hub cluster resource utilization
Hub cluster topology
Hub cluster networking
Hub cluster memory and CPU requirements
Hub cluster storage requirements
OKD installation on the hub cluster
Day 2 Operators in the hub cluster
Observability
Managed cluster lifecycle management
- Managed cluster deployment
- Managed cluster updates
Hub cluster disaster recovery
Hub cluster components
Hub cluster reference configuration CRs
Telco hub reference configuration software specifications

The telco hub reference design specification (RDS) describes the configuration for a hub cluster that deploys and operates fleets of OKD clusters in a telco environment.

Reference design scope

The telco core, telco RAN and telco hub reference design specifications (RDS) capture the recommended, tested, and supported configurations to get reliable and repeatable performance for clusters running the telco core and telco RAN profiles.

Each RDS includes the released features and supported configurations that are engineered and validated for clusters to run the individual profiles. The configurations provide a baseline OKD installation that meets feature and KPI targets. Each RDS also describes expected variations for each individual configuration. Validation of each RDS includes many long duration and at-scale tests.

The validated reference configurations are updated for each major Y-stream release of OKD. Z-stream patch releases are periodically re-tested against the reference configurations.

Deviations from the reference design

Deviating from the validated telco core, telco RAN DU, and telco hub reference design specifications (RDS) can have significant impact beyond the specific component or feature that you change. Deviations require analysis and engineering in the context of the complete solution.

All deviations from the RDS should be analyzed and documented with clear action tracking information. Due diligence is expected from partners to understand how to bring deviations into line with the reference design. This might require partners to provide additional resources to engage with Red Hat to work towards enabling their use case to achieve a best in class outcome with the platform. This is critical for the supportability of the solution and ensuring alignment across Red Hat and with partners.

Deviation from the RDS can have some or all of the following consequences:

It can take longer to resolve issues.
There is a risk of missing project service-level agreements (SLAs), project deadlines, end provider performance requirements, and so on.
Unapproved deviations may require escalation at executive levels.

Red Hat prioritizes the servicing of requests for deviations based on partner engagement priorities.

Hub cluster architecture overview

Use the features and components running on the management hub cluster to manage many other clusters in a hub-and-spoke topology. The hub cluster provides a highly available and centralized interface for managing the configuration, lifecycle, and observability of the fleet of deployed clusters.

All management hub functionality can be deployed on a dedicated OKD cluster or as applications that are co-resident on an existing cluster.

Managed cluster lifecycle: Using a combination of Day 2 Operators, the hub cluster provides the necessary infrastructure to deploy and configure the fleet of clusters by using a GitOps methodology. Over the lifetime of the deployed clusters, further management of upgrades, scaling the number of clusters, node replacement, and other lifecycle management functions can be declaratively defined and rolled out. You can control the timing and progression of the rollout across the fleet.
Monitoring: The hub cluster provides monitoring and status reporting for the managed clusters through the Observability pillar of the RHACM Operator. This includes aggregated metrics, alerts, and compliance monitoring through the Governance policy framework.

The telco management hub reference design specification (RDS) and the associated reference custom resources (CRs) describe the telco engineering and QE validated method for deploying, configuring and managing the lifecycle of telco managed cluster infrastructure. The reference configuration includes the installation and configuration of the hub cluster components on top of OKD.

telco hub cluster reference design components

Figure 1. Hub cluster reference design components

Figure 2. Hub cluster reference design architecture

Telco management hub cluster use model

The hub cluster provides managed cluster installation, configuration, observability and ongoing lifecycle management for telco application and workload clusters.

Additional resources

For more information about core clusters or far edge clusters that host RAN distributed unit (DU) workloads, see the following:
- Telco core RDS
- Telco RAN DU RDS
For more information about lifecycle management for the fleet of managed clusters see:
For more information about declarative cluster provisioning with GitOps ZTP see:
- Installing managed clusters with RHACM and SiteConfig resources
For more information about observability metrics and alerts, see:
- Multicluster architecture
- Observability

Hub cluster scaling target

The resource requirements for the hub cluster are directly dependent on the number of clusters being managed by the hub, the number of policies used for each managed cluster, and the set of features that are configured in Red Hat Advanced Cluster Management (RHACM).

The hub cluster reference configuration can support up to 3500 managed single-node OpenShift clusters under the following conditions:

5 policies for each cluster with hub-side templating configured with a 10 minute evaluation interval.
Only the following RHACM add-ons are enabled:
- Policy controller
- Observability with the default configuration
You deploy managed clusters by using GitOps ZTP in batches of up to 500 clusters at a time.

The reference configuration is also validated for deployment and management of a mix of managed cluster topologies. The specific limits depend on the mix of cluster topologies, enabled RHACM features, and so on. In a mixed topology scenario, the reference hub configuration is validated with a combination of 1200 single-node OpenShift clusters, 400 compact clusters (3 nodes combined control plane and compute nodes), and 230 standard clusters (3 control plane and 2 worker nodes).

A hub cluster conforming to this reference specification can support synchronization of 1000 single-node ClusterInstance CRs for each ArgoCD application. You can use multiple applications to achieve the maximum number of clusters supported by a single hub cluster.

Specific dimensioning requirements are highly dependent on the cluster topology and workload. For more information, see "Storage requirements". Adjust cluster dimensions for the specific characteristics of your fleet of managed clusters.

Hub cluster resource utilization

Resource utilization was measured for deploying hub clusters in the following scenario:

Under reference load managing 3500 single-node OpenShift clusters.
3-node compact cluster for management hub running on dual socket bare-metal servers.
Network impairment of 50 ms round-trip latency, 100 Mbps bandwidth limit and 0.02% packet loss.
Observability was not enabled.
Only local storage was used.

Table 1. Resource utilization values
Metric	Peak Measurement
OpenShift Platform CPU	106 cores (52 cores peak per node)
OpenShift Platform memory	504 G (168 G peak per node)

Additional resources

Comparison of hub cluster and managed cluster templates

Hub cluster topology

In production environments, the OKD hub cluster must be highly available to maintain high availability of the management functions.

Limits and requirements

Use a highly available cluster topology for the hub cluster, for example:

Compact (3 nodes combined control plane and compute nodes)
Standard (3 control plane nodes + N compute nodes)

Engineering considerations

In non-production environments, a single-node OpenShift cluster can be used for limited hub cluster functionality.
Certain capabilities, for example Red Hat OpenShift Data Foundation, are not supported on single-node OpenShift. In this configuration, some hub cluster features might not be available.
The number of optional compute nodes can vary depending on the scale of the specific use case.
Compute nodes can be added later as required.

Additional resources

Hub cluster networking

The reference hub cluster is designed to operate in a disconnected networking environment where direct access to the internet is not possible. As with all OKD clusters, the hub cluster requires access to an image registry hosting all OpenShift and Day 2 Operator Lifecycle Manager (OLM) images.

The hub cluster supports dual-stack networking support for IPv6 and IPv4 networks. IPv6 is typical in edge or far-edge network segments, while IPv4 is more prevalent for use with legacy equipment in the data center.

Limits and requirements

Regardless of the installation method, you must configure the following network types for the hub cluster:
- clusterNetwork
- serviceNetwork
- machineNetwork
You must configure the following IP addresses for the hub cluster:
- apiVIP
- ingressVIP

For the above networking configurations, some values are required, or can be auto-assigned, depending on the chosen architecture and DHCP configuration.

You must use the default OKD network provider OVN-Kubernetes.
Networking between the managed cluster and hub cluster must meet the networking requirements in the Red Hat Advanced Cluster Management (RHACM) documentation, for example:
- Hub cluster access to managed cluster API service, Ironic Python agent, and baseboard management controller (BMC) port.
- Managed cluster access to hub cluster API service, ingress IP and control plane node IP addresses.
- Managed cluster BMC access to hub cluster control plane node IP addresses.
An image registry must be accessible throughout the lifetime of the hub cluster.
- All required container images must be mirrored to the disconnected registry.
- The hub cluster must be configured to use a disconnected registry.
- The hub cluster cannot host its own image registry. For example, the registry must be available in a scenario where a power failure affects all cluster nodes.

Engineering considerations

When deploying a hub cluster, ensure you define appropriately sized CIDR range definitions.

Additional resources

Hub cluster memory and CPU requirements

The memory and CPU requirements of the hub cluster vary depending on the configuration of the hub cluster, the number of resources on the cluster, and the number of managed clusters.

Limits and requirements

Ensure that the hub cluster meets the underlying memory and CPU requirements for OKD and Red Hat Advanced Cluster Management (RHACM).

Engineering considerations

Before deploying a telco hub cluster, ensure that your cluster host meets cluster requirements.

For more information about scaling the number of managed clusters, see "Hub cluster scaling target".

Additional resources

Hub cluster storage requirements

The total amount of storage required by the management hub cluster is dependant on the storage requirements for each of the applications deployed on the cluster. The main components that require storage through highly available PersistentVolume resources are described in the following sections.

The storage required for the underlying OKD installation is separate to these requirements.

Assisted Service

The Assisted Service is deployed with the multicluster engine and Red Hat Advanced Cluster Management (RHACM).

Table 2. Assisted Service storage requirements
Persistent volume resource	Size (GB)
`imageStorage`	50
`filesystemStorage`	700
`dataBaseStorage`	20

Additional resources

Enabling central infrastructure management in disconnected environments

RHACM Observability

Cluster Observability is provided by the multicluster engine and Red Hat Advanced Cluster Management (RHACM).

Observability storage needs several PV resources and an S3 compatible bucket storage for long term retention of the metrics.
Storage requirements calculation is complex and dependent on the specific workloads and characteristics of managed clusters. Requirements for PV resources and the S3 bucket depend on many aspects including data retention, the number of managed clusters, managed cluster workloads, and so on.
Estimate the required storage for observability by using the observability sizing calculator in the RHACM capacity planning repository. See the Red Hat Knowledgebase article Calculating storage need for MultiClusterHub Observability on telco environments for an explanation of using the calculator to estimate observability storage requirements. The below table uses inputs derived from the telco RAN DU RDS and the hub cluster RDS as representative values.

The following numbers are estimated. Tune the values for more accurate results. Add an engineering margin, for example +20%, to the results to account for potential estimation inaccuracies.

Table 3. Cluster requirements
Capacity planner input	Data source	Example value
Number of control plane nodes	Hub cluster RDS (scale) and telco RAN DU RDS (topology)	3500
Number of additional worker nodes	Hub cluster RDS (scale) and telco RAN DU RDS (topology)	0
Days for storage of data	Hub cluster RDS	15
Total number of pods per cluster	Telco RAN DU RDS	120
Number of namespaces (excluding OKD)	Telco RAN DU RDS	4
Number of metric samples per hour	Default value	12
Number of hours of retention in receiver persistent volume (PV)	Default value	24

With these input values, the sizing calculator as described in the Red Hat Knowledgebase article Calculating storage need for MultiClusterHub Observability on telco environments indicates the following storage needs:

Table 4. Storage requirements
`alertmanager` PV		`thanos receive` PV		`thanos compact` PV
Per replica	Total	Per replica	Total	Total
10 GiB	30 GiB	10 GiB	30 GiB	100 GiB

Table 5. Storage requirements
`thanos rule` PV		`thanos store` PV		Object bucket^[1]
Per replica	Total	Per replica	Total	Per day	Total
30 GiB	90 GiB	100 GiB	300 GiB	15 GiB	101 GiB

[1] For the object bucket, it is assumed that downsampling is disabled, so that only raw data is calculated for storage requirements.

Storage considerations

Limits and requirements

Minimum OKD and Red Hat Advanced Cluster Management (RHACM) limits apply
High availability should be provided through a storage backend. The hub cluster reference configuration provides storage through Red Hat OpenShift Data Foundation.
Object bucket storage is provided through OpenShift Data Foundation.

Engineering considerations

Use SSD or NVMe disks with low latency and high throughput for etcd storage.
The storage solution for telco hub clusters is OpenShift Data Foundation.
- Local Storage Operator supports the storage class used by OpenShift Data Foundation to provide block, file, and object storage as needed by other components on the hub cluster.
The Local Storage Operator LocalVolume configuration includes setting forceWipeDevicesAndDestroyAllData: true to support the reinstallation of hub cluster nodes where OpenShift Data Foundation has previously been used.

Additional resources

Git repository

The telco management hub cluster supports a GitOps-driven methodology for installing and managing the configuration of OpenShift clusters for various telco applications. This methodology requires an accessible Git repository that serves as the authoritative source of truth for cluster definitions and configuration artifacts.

Red Hat does not offer a commercially supported Git server. An existing Git server provided in the production environment can be used. Gitea and Gogs are examples of self-hosted Git servers that you can use.

The Git repository is typically provided in the production network external to the hub cluster. In a large-scale deployment, multiple hub clusters can use the same Git repository for maintaining the definitions of managed clusters. Using this approach, you can easily review the state of the complete network. As the source of truth for cluster definitions, the Git repository should be highly available and recoverable in disaster scenarios.

For disaster recovery and multi-hub considerations, run the Git repository separately from the hub cluster.

Limits and requirements

A Git repository is required to support the GitOps ZTP functions of the hub cluster, including installation, configuration, and lifecycle management of the managed clusters.
The Git repository must be accessible from the management cluster.

Engineering considerations

The Git repository is used by the GitOps Operator to ensure continuous deployment and a single source of truth for the applied configuration.

OKD installation on the hub cluster

Description

The reference method for installing OKD for the hub cluster is through the Agent-based Installer.

Agent-based Installer provides installation capabilities without additional centralized infrastructure. The Agent-based Installer creates an ISO image, which you mount to the server to be installed. When you boot the server, OKD is installed alongside optionally supplied extra manifests, such as the Red Hat OpenShift GitOps Operator.

You can also install OKD in the hub cluster by using other installation methods.

If hub cluster functions are being applied to an existing OKD cluster, the Agent-based Installer installation is not required. The remaining steps to install Day 2 Operators and configure the cluster for these functions remains the same. When OKD installation is complete, the set of additional Operators and their configuration must be installed on the hub cluster.

The reference configuration includes all of these custom resources (CRs), which you can apply manually, for example:

$ oc apply -f <reference_cr>

You can also add the reference configuration to the Git repository and apply it using ArgoCD.

If you apply the CRs manually, ensure you apply the CRs in the order of their dependencies. For example, apply namespaces before Operators and apply Operators before configurations.

Limits and requirements

Agent-based Installer requires an accessible image repository containing all required OKD and Day 2 Operator images.
Agent-based Installer builds ISO images based on a specific OpenShift releases and specific cluster details. Installation of a second hub requires a separate ISO image to be built.

Engineering considerations

Agent-based Installer provides a baseline OKD installation. You apply Day 2 Operators and other configuration CRs after the cluster is installed.
The reference configuration supports Agent-based Installer installation in a disconnected environment.
A limited set of additional manifests can be supplied at installation time.

Additional resources

Day 2 Operators in the hub cluster

The management hub cluster relies on a set of Day 2 Operators to provide critical management services and infrastructure. Use Operator versions that match the set of managed cluster versions in your fleet.

Install Day 2 Operators using Operator Lifecycle Manager (OLM) and Subscription custom resources (CRs). Subscription CRs identify the specific Day 2 Operator to install, the catalog in which the Operator is found, and the appropriate version channel for the Operator. By default OLM installs and attempt to keep Operators updated with the latest z-stream version available in the channel. By default all Subscriptions are set with an installPlanApproval: Automatic value. In this mode, OLM automatically installs new Operator versions when they are available in the catalog and channel.

Setting installPlanApproval to automatic exposes the risk of the Operator being updated outside of defined maintenance windows if the catalog index is updated to include newer Operator versions. In a disconnected environment where you are building and maintaining a curated set of Operators and versions in the catalog, and if you follow a strategy of creating a new catalog index for updated versions, the risk of the Operators being inadvertently updated is largely removed. However, if you want to further close this risk, the Subscription CRs can be set to installPlanApproval: Manual which prevents Operators from being updated without explicit administrator approval.

Limits and requirements

When upgrading a telco hub cluster, the versions of OKD and Operators must meet the requirements of all relevant compatibility matrixes.

Additional resources

Red Hat Advanced Cluster Management for Kubernetes 2.11 Support Matrix
OpenShift Operator lifecycles
For more information about telco hub cluster update requirements, see:
For more information about updating the hub cluster, see:

Observability

The Red Hat Advanced Cluster Management (RHACM) multicluster engine Observability component provides centralized aggregation and visualization of metrics and alerts for all managed clusters. To balance performance and data analysis, the monitoring service maintains a subset list of aggregated metrics that are collected at a downsampled interval. The metrics can be accessed on the hub through a set of different preconfigured dashboards.

Observability installation

The primary CR to enable and configure the Observability service is the MulticlusterObservability CR, which defines the following settings: The primary custom resource (CR) to enable and configure the observability service is the MulticlusterObservability CR, which defines the following settings:

Configurable retention settings.
Storage for the different components: thanos receive, thanos compact, thanos rule, thanos store sharding, alertmanager.

The metadata.annotations.mco-disable-alerting="true" annotation that enables tuning for the monitoring configuration on managed clusters.

Without this setting the Observability component attempts to configure the managed cluster monitoring configuration. With this value set you can merge your desired configuration with the necessary Observability configuration of alert forwarding into the managed cluster monitoring ConfigMap object. When the Observability service is enabled RHACM will deploy to each managed cluster a workload to push metrics and alerts generated by local Monitoring to the hub cluster. The metrics and alerts to be forwarded from the managed cluster to the hub, are defined by a ConfigMap CR in the open-cluster-management-addon-observability namespace. You can also specify custom metrics, for more information, see Adding custom metrics.

Alertmananger configuration

The hub cluster provides an Observability Alertmanager that can be configured to push alerts to external systems, for example, email. The Alertmanager is enabled by default.
You must configure alert forwarding.
When the Alertmanager is enabled but not configured, the hub Alertmanager does not forward alerts externally.
When Observability is enabled, the managed clusters can be configured to send alerts to any endpoint including the hub Alertmanager.
When a managed cluster is configured to forward alerts to external sources, alerts are not routed through the hub cluster Alertmanager.
Alert state is available as a metric.
When observability is enabled, the managed cluster alert states are included in the subset of metrics forwarded to the hub cluster and are available through Observability dashboards.

Limits and requirements

Observability requires persistent object storage for long-term metrics. For more information, see "Storage requirements".

Engineering considerations

Forwarding of metrics is a subset of the full metric data. It includes only the metrics defined in the observability-metrics-allowlist config map and any custom metrics added by the user.
Metrics are forwarded at a downsampled rate. Metrics are forwarded by taking the latest datapoint at a 5 minute interval (or as defined by the MultiClusterObservability CR configuration).
A network outage may lead to a loss of metrics forwarded to the hub cluster during that interval. This can be mitigated if metrics are also forwarded directly from managed clusters to an external metrics collector in the providers network. Full resolution metrics are available on the managed cluster.
In addition to default metrics dashboards on the hub, users may define custom dashboards.
The reference configuration is sized based on 15 days of metrics storage by the hub cluster for 3500 single-node OpenShift clusters. If longer retention or other managed cluster topology or sizing is required, the storage calculations must be updated and sufficient storage capacity be maintained. For more information about calculating new values, see "Storage requirements".

Additional resources

For more information about observability, see:
- Exporting metrics to external endpoints
- Enabling the Observability service
For more information about custom metrics, see Adding custom metrics
For more information about forwarding alerts to other external systems, see Forwarding alerts
For more information about CPU and memory requirements see: Observability pod capacity requests
For more information about custom dashboards, see Using Grafana dashboards

Managed cluster lifecycle management

To provision and manage sites at the far edge of the network, use GitOps ZTP in a hub-and-spoke architecture, where a single hub cluster manages many managed clusters.

Lifecycle management for spoke clusters can be divided into two different stages: cluster deployment, including OKD installation, and cluster configuration.

Additional resources

Challenges of the network far edge

Managed cluster deployment

Description

As of Red Hat Advanced Cluster Management (RHACM) 2.12, using the SiteConfig Operator is the recommended method for deploying managed clusters. The SiteConfig Operator introduces a unified ClusterInstance API that decouples the parameters that define the cluster from the manner in which it is deployed. The SiteConfig Operator uses a set of cluster templates that are instantiated using the data from a ClusterInstance custom resource (CR) to dynamically generate installation manifests. Following the GitOps methodology, the ClusterInstance CR is sourced from a Git repository through ArgoCD. The ClusterInstance CR can be used to initiate cluster installation by using either Assisted Installer, or the image-based installation available in multicluster engine.

Limits and requirements

The SiteConfig ArgoCD plugin which handles SiteConfig CRs is deprecated from OKD 4.18.

Engineering considerations

You must create a Secret CR with the login information for the cluster baseboard management controller (BMC). This Secret CR is then referenced in the SiteConfig CR. Integration with a secret store, such as Vault, can be used to manage the secrets.
Besides offering deployment method isolation and unification of Git and non-Git workflows, the SiteConfig Operator provides better scalability, greater flexibility with the use of custom templates, and an enhanced troubleshooting experience.

Additional resources

Managed cluster updates

Description

You can upgrade versions of OKD, Day 2 Operators, and managed cluster configurations, by declaring the required version in the Policy custom resources (CRs) that target the clusters to be upgraded.

Policy controllers periodically check for policy compliance. If the result is negative, a violation report is created. If the policy remediation action is set to enforce the violations are remediated according to the updated policy. If the policy remediation action is set to inform, the process ends with a non-compliant status report and responsibility to initiate the upgrade is left to the user to perform during an appropriate maintenance window.

The Topology Aware Lifecycle Manager (TALM) extends Red Hat Advanced Cluster Management (RHACM) with features to manage the rollout of upgrades or configuration throughout the lifecycle of the fleet of clusters. It operates in progressive, limited size batches of clusters. When upgrades to OKD or the Day 2 Operators are required, TALM progressively rolls out the updates by stepping through the set of policies and switching them to an "enforce" policy to push the configuration to the managed cluster.

The custom resource (CR) that TALM uses to build the remediation plan is the ClusterGroupUpgrade CR.

You can use image-based upgrade (IBU) with the Lifecycle Agent as an alternative upgrade path for the single-node OpenShift cluster platform version. IBU uses an OCI image generated from a dedicated seed cluster to install single-node OpenShift on the target cluster.

TALM uses the ImageBasedGroupUpgrade CR to roll out image-based upgrades to a set of identified clusters.

Limits and requirements

You can perform direct upgrades for single-node OpenShift clusters using image-based upgrade for OKD <4.y> to <4.y+2>, and <4.y.z> to <4.y.z+n>.
Image-based upgrade uses custom images that are specific to the hardware platform that the clusters are running on. Different hardware platforms require separate seed images.

Engineering considerations

In edge deployments, you can minimize the disruption to managed clusters by managing the timing and rollout of changes. Set all policies to inform to monitor compliance without triggering automatic enforcement. Similarly, configure Day 2 Operator subscriptions to manual to prevent updates from occurring outside of scheduled maintenance windows.
The recommended upgrade aproach for single-node OpenShift clusters is the image-based upgrade.
For multi-node cluster upgrades, consider the following MachineConfigPool CR configurations to reduce upgrade times:
- Pause configuration deployments to nodes during a maintenance window by setting the paused field to true.
- Adjust the maxUnavailable field to control how many nodes in the pool can be updated simultaneously. The MaxUnavailable field defines the percentage of nodes in the pool that can be simultaneously unavailable during a MachineConfig object update. Set maxUnavailable to the maximum tolerable value. This reduces the number of reboots in a cluster during upgrades which results in shorter upgrade times.
- Resume configuration deployments by setting the paused field to false. The configuration changes are applied in a single reboot.
During cluster installation, you can pause MachineConfigPool CRs by setting the paused field to true and setting maxUnavailable to 100% to improve installation times.

Additional resources

Hub cluster disaster recovery

Note that loss of the hub cluster does not typically create a service outage on the managed clusters. Functions provided by the hub cluster will be lost, such as observability, configuration, lifecycle management updates being driven through the hub cluster, and so on.

Limits and requirements

Backup,restore and disaster recovery are offered by the cluster backup and restore Operator, which depends on the OpenShift API for Data Protection (OADP) Operator.

Engineering considerations

You can extend the cluster backup and restore operator to third party resources of the hub cluster based on your configuration.
The cluster backup and restore operator is not enabled by default in Red Hat Advanced Cluster Management (RHACM). The reference configuration enables this feature.

Additional resources

Business continuity

Hub cluster components

Red Hat Advanced Cluster Management (RHACM)

New in this release

No reference design updates in this release.

Description

Red Hat Advanced Cluster Management (RHACM) provides multicluster engine installation and ongoing lifecycle management functionality for deployed clusters. You can manage cluster configuration and upgrades declaratively by applying Policy custom resources (CRs) to clusters during maintenance windows.

RHACM provides functionality such as the following:

Zero touch provisioning (ZTP) and ongoing scaling of clusters using the multicluster engine component in RHACM.
Configuration, upgrades, and cluster status through the RHACM policy controller.
During managed cluster installation, RHACM can apply labels to individual nodes as configured through the ClusterInstance CR.
The Topology Aware Lifecycle Manager component of RHACM provides phased rollout of configuration changes to managed clusters.
The RHACM multicluster engine Observability component provides selective monitoring, dashboards, alerts, and metrics.

The recommended method for single-node OpenShift cluster installation is the image-based installation method in multicluster engine, which uses the ClusterInstance CR for cluster definition.

The recommended method for single-node OpenShift upgrade is the image-based upgrade method.

The RHACM multicluster engine Observability component brings you a centralized view of the health and status of all the managed clusters. By default, every managed cluster is enabled to send metrics and alerts, created by their Cluster Monitoring Operator (CMO), back to Observability. For more information, see "Observability".

Limits and requirements

For more information about limits on number of clusters managed by a single hub cluster, see "Telco management hub cluster use model".
The number of managed clusters that can be effectively managed by the hub depends on various factors, including:
- Resource availability at each managed cluster
- Policy complexity and cluster size
- Network utilization
- Workload demands and distribution
The hub and managed clusters must maintain sufficient bi-directional connectivity.

Engineering considerations

You can configure the cluster backup and restore Operator to include third-party resources.
The use of RHACM hub side templating when defining configuration through policy is strongly recommended. This feature reduces the number of policies needed to manage the fleet by enabling for each cluster or for each group. For example, regional or hardware type content to be templated in a policy and substituted on cluster or group basis.
Managed clusters typically have some number of configuration values which are specific to an individual cluster. These should be managed using RHACM policy hub side templating with values pulled from ConfigMap CRs based on the cluster name.

Additional resources

Topology Aware Lifecycle Manager

New in this release

No reference design updates in this release.

Description

TALM is an Operator that runs only on the hub cluster for managing how changes like cluster upgrades, Operator upgrades, and cluster configuration are rolled out to the network. TALM supports the following features:

Progressive rollout of policy updates to fleets of clusters in user configurable batches.
Per-cluster actions add ztp-done labels or other user-configurable labels following configuration changes to managed clusters.
TALM supports optional pre-caching of OKD, OLM Operator, and additional images to single-node OpenShift clusters before initiating an upgrade. The pre-caching feature is not applicable when using the recommended image-based upgrade method for upgrading single-node OpenShift clusters.
- Specifying optional pre-caching configurations with PreCachingConfig CRs.
- Configurable image filtering to exclude unused content.
- Storage validation before and after pre-caching, using defined space requirement parameters.

Limits and requirements

TALM supports concurrent cluster upgrades in batches of 500.
Pre-caching is limited to single-node OpenShift cluster topology.

Engineering considerations

The PreCachingConfig custom resource (CR) is optional. You do not need to create it if you want to pre-cache platform-related images only, such as OKD and OLM.
TALM supports the use of hub-side templating with Red Hat Advanced Cluster Management policies.

GitOps Operator and GitOps ZTP

New in this release

No reference design updates in this release

Description

GitOps Operator and GitOps ZTP provide a GitOps-based infrastructure for managing cluster deployment and configuration. Cluster definitions and configurations are maintained as a declarative state in Git. You can apply ClusterInstance custom resources (CRs) to the hub cluster where the SiteConfig Operator renders them as installation CRs. In earlier releases, a GitOps ZTP plugin supported the generation of installation CRs from SiteConfig CRs. This plugin is now deprecated. A separate GitOps ZTP plugin is available to enable automatic wrapping of configuration CRs into policies based on the PolicyGenerator or the PolicyGenTemplate CRs.

You can deploy and manage multiple versions of OKD on managed clusters by using the baseline reference configuration CRs. You can use custom CRs alongside the baseline CRs. To maintain multiple per-version policies simultaneously, use Git to manage the versions of the source and policy CRs by using the PolicyGenerator or the PolicyGenTemplate CRs.

Limits and requirements

To ensure consistent and complete cleanup of managed clusters and their associated resources during cluster or node deletion, you must configure ArgoCD to use background deletion mode.

Engineering considerations

To avoid confusion or unintentional overwrite when updating content, use unique and distinguishable names for custom CRs in the source-crs directory and extra manifests.
Keep reference source CRs in a separate directory from custom CRs. This facilitates easy update of reference CRs as required.
To help with multiple versions, keep all source CRs and policy creation CRs in versioned Git repositories to ensure consistent generation of policies for each OKD version.

Additional resources

Local Storage Operator

New in this release

No reference design updates in this release

Description

You can create persistent volumes that can be used as PVC resources by applications with the Local Storage Operator. The number and type of PV resources that you create depends on your requirements.

Engineering considerations

Create backing storage for PV CRs before creating the persistent volume. This can be a partition, a local volume, LVM volume, or full disk.
Refer to the device listing in LocalVolume CRs by the hardware path used to access each device to ensure correct allocation of disks and partitions, for example, /dev/disk/by-path/<id>. Logical names (for example, /dev/sda) are not guaranteed to be consistent across node reboots.

Red Hat OpenShift Data Foundation

New in this release

No reference design updates in this release

Description

Red Hat OpenShift Data Foundation provides file, block, and object storage services to the hub cluster.

Limits and requirements

Red Hat OpenShift Data Foundation (ODF) in internal mode requires the Local Storage Operator to define a storage class which will provide the necessary underlying storage.
When doing the planning for a telco management cluster, consider the ODF infrastructure and networking requirements.
Dual stack support is limited. ODF IPv4 is supported on dual-stack clusters.

Engineering considerations

Address capacity warnings promptly as recovery can be difficult in case of storage capacity exhaustion, see Capacity planning.

Additional resources

Logging

New in this release

No reference design updates in this release

Description

Use the Cluster Logging Operator to collect and ship logs off the node for remote archival and analysis. The reference configuration uses Kafka to ship audit and infrastructure logs to a remote archive.

Limits and requirements

The reference configuration does not include local log storage.
The reference configuration does not include aggregation of managed cluster logs at the hub cluster.

Engineering considerations

The impact of cluster CPU use is based on the number or size of logs generated and the amount of log filtering configured.
The reference configuration does not include shipping of application logs. The inclusion of application logs in the configuration requires you to evaluate the application logging rate and have sufficient additional CPU resources allocated to the reserved set.

OpenShift API for Data Protection

New in this release

No reference design updates in this release

Description

The OpenShift API for Data Protection (OADP) Operator is automatically installed and managed by Red Hat Advanced Cluster Management (RHACM) when the backup feature is enabled.

The OADP Operator facilitates the backup and restore of workloads in OKD clusters. Based on the upstream open source project Velero, it allows you to backup and restore all Kubernetes resources for a given project, including persistent volumes.

While it is not mandatory to have it on the hub cluster, it is highly recommended for cluster backup, disaster recovery and high availability architecture for the hub cluster. The OADP Operator must be enabled to use the disaster recovery solutions for RHACM. The reference configuration enables backup (OADP) through the MultiClusterHub custom resource (CR) provided by the RHACM Operator.

Limits and requirements

Only one version of OADP can be installed on a cluster. The version installed by RHACM must be used for RHACM disaster recovery features.

Engineering considerations

No engineering consideration updates in this release.

Hub cluster reference configuration CRs

The following is the complete YAML reference of all the custom resources (CRs) for the telco management hub reference configuration in 4.19.

RHACM reference YAML

acmAgentServiceConfig.yaml

---
apiVersion: agent-install.openshift.io/v1beta1
kind: AgentServiceConfig
metadata:
  name: agent
  annotations:
    argocd.argoproj.io/sync-wave: "7"
    argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
spec:
  databaseStorage:
    storageClassName:  # your-fs-storageclass-here
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: 20Gi
  filesystemStorage:
    storageClassName:  # your-fs-storageclass-here
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: 20Gi
  imageStorage:
    storageClassName:  # your-fs-storageclass-here
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: 100Gi
  mirrorRegistryRef:
    name: mirror-registry-config
  osImages:
  # Replace <http-server-address:port> with the address of the local web server that stores the RHCOS images.
  # The images can be downloaded from "https://mirror.openshift.com/pub/openshift-v4/x86_64/dependencies/rhcos/".
  - cpuArchitecture: "x86_64"
    openshiftVersion: "4.17"
    rootFSUrl: http://<http-server-address:port>/rhcos-4.17.0-x86_64-live-rootfs.x86_64.img
    url: http://<http-server-address:port>/rhcos-4.17.0-x86_64-live.x86_64.iso
    version: "417.94.202409121747-0"

acmMCH.yaml

---
apiVersion: operator.open-cluster-management.io/v1
kind: MultiClusterHub
metadata:
  annotations:
    argocd.argoproj.io/sync-wave: "4"
    argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
    installer.open-cluster-management.io/mce-subscription-spec: '{"source": "redhat-operators-disconnected", "installPlanApproval": "Automatic"}'
    installer.open-cluster-management.io/oadp-subscription-spec: '{"source": "redhat-operators-disconnected", "installPlanApproval": "Automatic"}'
  name: multiclusterhub
  namespace: open-cluster-management
spec:
  availabilityConfig: High
  enableClusterbackup: false
  ingress: {}
  overrides:
    components:
    - configOverrides: {}
      enabled: true
      name: app-lifecycle
    - configOverrides: {}
      enabled: true
      name: cluster-lifecycle
    - configOverrides: {}
      enabled: true
      name: cluster-permission
    - configOverrides: {}
      enabled: true
      name: console
    - configOverrides: {}
      enabled: true
      name: grc
    - configOverrides: {}
      enabled: true
      name: insights
    - configOverrides: {}
      enabled: true
      name: multicluster-engine
    - configOverrides: {}
      enabled: true
      name: multicluster-observability
    - configOverrides: {}
      enabled: true
      name: search
    - configOverrides: {}
      enabled: true
      name: submariner-addon
    - configOverrides: {}
      enabled: true
      name: volsync
    - configOverrides: {}
      enabled: true
      name: cluster-backup
    - configOverrides: {}
      enabled: true
      name: siteconfig
    - configOverrides: {}
      enabled: false
      name: edge-manager-preview
  separateCertificateManagement: false

acmMirrorRegistryCM.yaml

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: mirror-registry-config
  annotations:
    argocd.argoproj.io/sync-wave: "5"
    argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
  namespace: multicluster-engine
  labels:
    app: assisted-service
data:
  # Add the mirror registry SSL certificate chain up to the CA itself.
  ca-bundle.crt: |
    -----BEGIN CERTIFICATE-----
    MIID7jCCAtagAwXXX...
    -----END CERTIFICATE-----
    -----BEGIN CERTIFICATE-----
    MIIDvTCCAqWgAwXXX...
    -----END CERTIFICATE-----
  # The registries.conf field has been populated using the registries.conf file found in "/etc/containers/registries.conf" on each node.
  # Replace <registry.example.com:8443> with the mirror registry's address.
  registries.conf: |
    unqualified-search-registries = ["registry.access.redhat.com", "docker.io"]
    [[registry]]
      prefix = ""
      location = "quay.io/openshift-release-dev"

      [[registry.mirror]]
        location = "<registry.example.com:8443>/openshift-release-dev"
        pull-from-mirror = "digest-only"

    [[registry]]
      prefix = ""
      location = "quay.io/openshift-release-dev/ocp-release"

      [[registry.mirror]]
        location = "<registry.example.com:8443>/openshift-release-dev/ocp-release"
        pull-from-mirror = "digest-only"

    [[registry]]
      prefix = ""
      location = "quay.io/openshift-release-dev/ocp-v4.0-art-dev"

      [[registry.mirror]]
        location = "<registry.example.com:8443>/openshift-release-dev/ocp-v4.0-art-dev"
        pull-from-mirror = "digest-only"

    [[registry]]
      prefix = ""
      location = "registry.redhat.io/multicluster-engine"

      [[registry.mirror]]
        location = "<registry.example.com:8443>/multicluster-engine"
        pull-from-mirror = "digest-only"

    [[registry]]
      prefix = ""
      location = "registry.redhat.io/odf4"

      [[registry.mirror]]
        location = "<registry.example.com:8443>/odf4"
        pull-from-mirror = "digest-only"

    [[registry]]
      prefix = ""
      location = "registry.redhat.io/openshift4"

      [[registry.mirror]]
        location = "<registry.example.com:8443>/openshift4"
        pull-from-mirror = "digest-only"

    [[registry]]
      prefix = ""
      location = "registry.redhat.io/rhacm2"

      [[registry.mirror]]
        location = "<registry.example.com:8443>/rhacm2"
        pull-from-mirror = "digest-only"

    [[registry]]
      prefix = ""
      location = "registry.redhat.io/rhceph"

      [[registry.mirror]]
        location = "<registry.example.com:8443>/rhceph"
        pull-from-mirror = "digest-only"

    [[registry]]
      prefix = ""
      location = "registry.redhat.io/rhel8"

      [[registry.mirror]]
        location = "<registry.example.com:8443>/rhel8"
        pull-from-mirror = "digest-only"

    [[registry]]
      prefix = ""
      location = "registry.redhat.io/rhel9"

      [[registry.mirror]]
        location = "<registry.example.com:8443>/rhel9"
        pull-from-mirror = "digest-only"

    [[registry]]
      prefix = ""
      location = "registry.redhat.io/ubi8"

      [[registry.mirror]]
        location = "<registry.example.com:8443>/ubi8"
        pull-from-mirror = "tag-only"

acmNS.yaml

---
apiVersion: v1
kind: Namespace
metadata:
  labels:
    openshift.io/cluster-monitoring: "true"
  name: open-cluster-management

acmOperGroup.yaml

---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: open-cluster-management-group
  namespace: open-cluster-management
spec:
  targetNamespaces:
  - open-cluster-management

acmPerfSearch.yaml

---
apiVersion: search.open-cluster-management.io/v1alpha1
kind: Search
metadata:
  name: search-v2-operator
  namespace: open-cluster-management
  annotations:
    argocd.argoproj.io/sync-wave: "10"
    argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
spec:
  dbStorage:
    size: 10Gi
  deployments:
    collector:
      resources:
        limits:
          memory: 8Gi
        requests:
          cpu: 25m
          memory: 64Mi
    database:
      envVar:
      - name: POSTGRESQL_EFFECTIVE_CACHE_SIZE
        value: 1024MB
      - name: POSTGRESQL_SHARED_BUFFERS
        value: 512MB
      - name: WORK_MEM
        value: 128MB
      resources:
        limits:
          memory: 16Gi
        requests:
          cpu: 25m
          memory: 32Mi
    indexer:
      resources:
        limits:
          memory: 4Gi
        requests:
          cpu: 25m
          memory: 128Mi
    queryapi:
      replicaCount: 2
      resources:
        limits:
          memory: 4Gi
        requests:
          cpu: 25m
          memory: 1Gi
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/infra
    operator: Exists

acmProvisioning.yaml

---
apiVersion: metal3.io/v1alpha1
kind: Provisioning
metadata:
  name: provisioning-configuration
  annotations:
    argocd.argoproj.io/sync-wave: "6"
    argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
spec:
  watchAllNamespaces: true
  # some servers do not support virtual media installations
  # when the image is served using the https protocol
  # disableVirtualMediaTLS: true

acmSubscription.yaml

---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: open-cluster-management-subscription
  namespace: open-cluster-management
spec:
  channel: release-2.13
  installPlanApproval: Automatic
  name: advanced-cluster-management
  source: redhat-operators-disconnected
  sourceNamespace: openshift-marketplace

observabilityMCO.yaml

---
apiVersion: observability.open-cluster-management.io/v1beta2
kind: MultiClusterObservability
metadata:
  name: observability
  annotations:
    argocd.argoproj.io/sync-wave: "10"
    argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
    # avoids MultiClusterHub Observability to own/manage the
    # spoke clusters configuration about AlertManager forwards.
    # ZTP Policies will be in charge of configuring it
    # https://issues.redhat.com/browse/CNF-13398
    mco-disable-alerting: "true"
spec:
  # based on the data provided by acm-capacity tool
  # https://github.com/stolostron/capacity-planning/blob/main/calculation/ObsSizingTemplate-Rev1.ipynb
  # for an scenario with:
  # 3500SNOs, 125 pods and 4 Namespaces (apart from Openshift NS)
  # storage retention 15 days
  # downsampling disabled
  # default MCO Addon configuration samples_per_hour, pv_retention_hrs.
  # More on how to stimate: https://access.redhat.com/articles/7103886
  advanced:
    retentionConfig:
      blockDuration: 2h
      deleteDelay: 48h
      retentionInLocal: 24h
      retentionResolutionRaw: 15d
  enableDownsampling: false
  observabilityAddonSpec:
    enableMetrics: true
    interval: 300
  storageConfig:
    storageClass: # your-fs-storageclass-here
    alertmanagerStorageSize: 10Gi
    compactStorageSize: 100Gi
    metricObjectStorage:
      key: thanos.yaml
      name: thanos-object-storage
    receiveStorageSize: 10Gi
    ruleStorageSize: 30Gi
    storeStorageSize: 100Gi
    # In addition to these storage settings, the `metricObjectStorage`
    # points to an Object Storage. Under the reference configuration,
    # scale and retention the estimated object storage is about 101Gi

observabilityNS.yaml

---
apiVersion: v1
kind: Namespace
metadata:
  labels:
    openshift.io/cluster-monitoring: "true"
  name: open-cluster-management-observability

observabilityOBC.yaml

---
apiVersion: objectbucket.io/v1alpha1
kind: ObjectBucketClaim
metadata:
  name: observability-obc
  annotations:
    argocd.argoproj.io/sync-wave: "8"
    argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
  namespace: open-cluster-management-observability
spec:
  generateBucketName: observability-object-bucket
  storageClassName: openshift-storage.noobaa.io

observabilitySecret.yaml

---
apiVersion: v1
kind: Secret
metadata:
  annotations:
    argocd.argoproj.io/sync-wave: "9"
    argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
  labels:
    cluster.open-cluster-management.io/backup: ""
  name: multiclusterhub-operator-pull-secret
  namespace: open-cluster-management-observability
type: kubernetes.io/dockerconfigjson
data:
  .dockerconfigjson: ''  # Value provided by user or by pull-secret-openshift-config-copy policy

thanosSecret.yaml

# This content creates a policy which copies the necessary data from
# the generated Object Bucket Claim into the necessary secret for
# observability to connect to thanos.
---
apiVersion: v1
kind: Namespace
metadata:
  name: hub-policies
---
apiVersion: policy.open-cluster-management.io/v1
kind: Policy
metadata:
  annotations:
    policy.open-cluster-management.io/categories: CM Configuration Management
    policy.open-cluster-management.io/controls: CM-2 Baseline Configuration
    policy.open-cluster-management.io/description: ""
    policy.open-cluster-management.io/standards: NIST SP 800-53
    argocd.argoproj.io/sync-wave: "9"
    argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
  name: obs-thanos-secret
  namespace: hub-policies
spec:
  disabled: false
  policy-templates:
  - objectDefinition:
      apiVersion: policy.open-cluster-management.io/v1
      kind: ConfigurationPolicy
      metadata:
        name: thanos-secret-cp
      spec:
        remediationAction: enforce
        severity: high
        object-templates-raw: |
          {{- /* read the bucket data and noobaa endpoint access data */ -}}
          {{- $objBucket := (lookup "v1" "ConfigMap" "open-cluster-management-observability" "observability-obc") }}
          {{- $awsAccess := (lookup "v1" "Secret" "open-cluster-management-observability" "observability-obc") }}
          {{- /* create the thanos config file as a template */ -}}
          {{- $thanosConfig := `
          type: s3
          config:
            bucket: %[1]s
            endpoint: %[2]s
            insecure: true
            access_key: %[3]s
            secret_key: %[4]s
          `
          }}
          {{- /* create the secret using the thanos configuration template created above. */ -}}
          - complianceType: mustonlyhave
            objectDefinition:
              apiVersion: v1
              kind: Secret
              metadata:
                name: thanos-object-storage
                namespace: open-cluster-management-observability
              type: Opaque
              data:
                thanos.yaml: {{ (printf $thanosConfig $objBucket.data.BUCKET_NAME
                                                      $objBucket.data.BUCKET_HOST
                                                      ($awsAccess.data.AWS_ACCESS_KEY_ID | base64dec)
                                                      ($awsAccess.data.AWS_SECRET_ACCESS_KEY | base64dec)
                                ) | base64enc }}
---
apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Placement
metadata:
  name: obs-thanos-pl
  namespace: hub-policies
  annotations:
    argocd.argoproj.io/sync-wave: "9"
    argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
spec:
  predicates:
  - requiredClusterSelector:
      labelSelector:
        matchExpressions:
        - key: name
          operator: In
          values:
          - local-cluster
---
apiVersion: policy.open-cluster-management.io/v1
kind: PlacementBinding
metadata:
  name: obs-thanos-binding
  namespace: hub-policies
  annotations:
    argocd.argoproj.io/sync-wave: "9"
    argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
placementRef:
  name: obs-thanos-pl
  apiGroup: cluster.open-cluster-management.io
  kind: Placement
subjects:
  - name: obs-thanos-secret
    apiGroup: policy.open-cluster-management.io
    kind: Policy
---
apiVersion: cluster.open-cluster-management.io/v1beta2
kind: ManagedClusterSetBinding
metadata:
  name: default
  namespace: hub-policies
  annotations:
    argocd.argoproj.io/sync-wave: "8"
    argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
spec:
  clusterSet: default

# For reference this is the secret which is being generated (with
# approriate values in the fields):
# ---
# apiVersion: v1
# kind: Secret
# metadata:
#   name: thanos-object-storage
#   namespace: open-cluster-management-observability
# type: Opaque
# stringData:
#   thanos.yaml: |
#     type: s3
#     config:
#       bucket:  "<BUCKET_NAME>"
#       endpoint: "<BUCKET_HOST>"
#       insecure: true
#       access_key: "<AWS_ACCESS_KEY_ID>"
#       secret_key: "<AWS_SECRET_ACCESS_KEY>"

talmSubscription.yaml

---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: openshift-topology-aware-lifecycle-manager-subscription
  namespace: openshift-operators
spec:
  channel: stable
  installPlanApproval: Automatic
  name: topology-aware-lifecycle-manager
  source: redhat-operators-disconnected
  sourceNamespace: openshift-marketplace

Storage reference YAML

lsoLocalVolume.yaml

---
apiVersion: "local.storage.openshift.io/v1"
kind: "LocalVolume"
metadata:
  name: "local-disks"
  namespace: "openshift-local-storage"
  annotations:
    argocd.argoproj.io/sync-wave: "2"
    argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
spec:
  nodeSelector:
    nodeSelectorTerms:
    - matchExpressions:
        - key: cluster.ocs.openshift.io/openshift-storage
          operator: In
          values:
          - ""
  storageClassDevices:
    - storageClassName: "local-sc"
      forceWipeDevicesAndDestroyAllData: true
      volumeMode: Block
      devicePaths:
        - /dev/disk/by-path/pci-xxx

lsoNS.yaml

---
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-local-storage
  labels:
    openshift.io/cluster-monitoring: "true"

lsoOperatorgroup.yaml

---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: local-operator-group
  namespace: openshift-local-storage
spec:
  targetNamespaces:
    - openshift-local-storage

lsoSubscription.yaml

---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: local-storage-operator
  namespace: openshift-local-storage
spec:
  channel: stable
  installPlanApproval: Automatic
  name: local-storage-operator
  source: redhat-operators-disconnected
  sourceNamespace: openshift-marketplace

odfNS.yaml

---
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-storage
  annotations:
    workload.openshift.io/allowed: management
  labels:
    openshift.io/cluster-monitoring: "true"

odfOperatorGroup.yaml

---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: openshift-storage-operatorgroup
  namespace: openshift-storage
spec:
  targetNamespaces:
    - openshift-storage

odfSubscription.yaml

---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: odf-operator
  namespace: openshift-storage
spec:
  channel: "stable-4.18"
  name: odf-operator
  source: redhat-operators-disconnected
  sourceNamespace: openshift-marketplace
  installPlanApproval: Automatic

storageCluster.yaml

---
apiVersion: ocs.openshift.io/v1
kind: StorageCluster
metadata:
  name: ocs-storagecluster
  namespace: openshift-storage
  annotations:
    argocd.argoproj.io/sync-wave: "3"
    argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
spec:
  manageNodes: false
  resources:
    mds:
      limits:
        cpu: "3"
        memory: "8Gi"
      requests:
        cpu: "3"
        memory: "8Gi"
  monDataDirHostPath: /var/lib/rook
  storageDeviceSets:
  - count: 1  # <-- Modify count to desired value. For each set of 3 disks increment the count by 1.
    dataPVCTemplate:
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: "600Gi"  # <-- This should be changed as per storage size. Minimum 100 GiB and Maximum 4 TiB
        storageClassName: "local-sc"  # match this with the storage block created at the LSO step
        volumeMode: Block
    name: ocs-deviceset
    placement: {}
    portable: false
    replica: 3
    resources:
      limits:
        cpu: "2"
        memory: "5Gi"
      requests:
        cpu: "2"
        memory: "5Gi"

GitOps Operator and GitOps ZTP reference YAML

argocd-ssh-known-hosts-cm.yaml

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-ssh-known-hosts-cm
  namespace: openshift-gitops
data:
  ssh_known_hosts: |
    #############################################################
    # by default empty known hosts, because of usual            #
    # disconnected environments.                                #
    #                                                           #
    #  Manually add needed ssh known hosts:                     #
    #  example: $> ssh-keyscan my-github.com                    #
    #  Copy the output here
    #############################################################
    # my-github.com sh-rsa AAAAB3NzaC1y...J4i36KV/aCl4Ixz
    # my-github.com ecdsa-sha2-nistp256...GGtLKqmwLLeKhe6xgc=
    # my-github-com ssh-ed25519 AAAAC3N...lNrvWjBQ2u

gitopsNS.yaml

---
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-gitops-operator
  labels:
    openshift.io/cluster-monitoring: "true"

gitopsOperatorGroup.yaml

---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: openshift-gitops-operator
  namespace: openshift-gitops-operator
spec:
  upgradeStrategy: Default

gitopsSubscription.yaml

---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: openshift-gitops-operator
  namespace: openshift-gitops-operator
spec:
  channel: gitops-1.15
  installPlanApproval: Automatic
  name: openshift-gitops-operator
  source: redhat-operators-disconnected
  sourceNamespace: openshift-marketplace

ztp-repo.yaml

---
apiVersion: v1
kind: Secret
metadata:
  name: ztp-repo
  namespace: openshift-gitops
  labels:
    argocd.argoproj.io/secret-type: repository
stringData:
  # use following for ssh repo access
  url: git@gitlab.example.com:namespace/repo.git
  insecure: "false"
  sshPrivateKey: |
    -----BEGIN OPENSSH PRIVATE KEY-----
    INSERT PRIVATE KEY
    -----END OPENSSH PRIVATE KEY-----
  # uncomment and use following for https repo access
  # url: https://gitlab.example.com/namespace/repo
  # insecure: "false"
  # password: password
  # username: username
  # forceHttpBasicAuth: "true"
  # more examples: https://argo-cd.readthedocs.io/en/stable/operator-manual/argocd-repositories-yaml/

app-project.yaml

apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: ztp-app-project
  namespace: openshift-gitops
  annotations:
    argocd.argoproj.io/sync-wave: "100"
spec:
  clusterResourceWhitelist:
  - group: 'hive.openshift.io'
    kind: ClusterImageSet
  - group: 'cluster.open-cluster-management.io'
    kind: ManagedCluster
  - group: ''
    kind: Namespace
  destinations:
  - namespace: '*'
    server: '*'
  namespaceResourceWhitelist:
  - group: ''
    kind: ConfigMap
  - group: ''
    kind: Namespace
  - group: ''
    kind: Secret
  - group: 'agent-install.openshift.io'
    kind: InfraEnv
  - group: 'agent-install.openshift.io'
    kind: NMStateConfig
  - group: 'extensions.hive.openshift.io'
    kind: AgentClusterInstall
  - group: 'extensions.hive.openshift.io'
    kind: ImageClusterInstall
  - group: 'hive.openshift.io'
    kind: ClusterDeployment
  - group: 'metal3.io'
    kind: BareMetalHost
  - group: 'metal3.io'
    kind: HostFirmwareSettings
  - group: 'metal3.io'
    kind: DataImage
  - group: 'agent.open-cluster-management.io'
    kind: KlusterletAddonConfig
  - group: 'cluster.open-cluster-management.io'
    kind: ManagedCluster
  - group: 'ran.openshift.io'
    kind: SiteConfig
  - group: 'siteconfig.open-cluster-management.io'
    kind: ClusterInstance
  sourceRepos:
  - '*'

argocd-openshift-gitops-patch.json

{
  "spec": {
    "controller": {
      "resources": {
        "limits": {
          "cpu": "16",
          "memory": "32Gi"
        },
        "requests": {
          "cpu": "1",
          "memory": "2Gi"
        }
      }
    },
    "kustomizeBuildOptions": "--enable-alpha-plugins",
    "repo": {
      "volumes": [
        {
          "name": "kustomize",
          "emptyDir": {}
        }
      ],
      "initContainers": [
        {
          "resources": {
          },
          "terminationMessagePath": "/dev/termination-log",
          "name": "kustomize-plugin",
          "command": [
            "/exportkustomize.sh"
          ],
          "args": [
            "/.config"
          ],
          "imagePullPolicy": "Always",
          "volumeMounts": [
            {
              "name": "kustomize",
              "mountPath": "/.config"
            }
          ],
          "terminationMessagePolicy": "File",
          "image": "registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.17.0"
        },
        {
          "args": [
            "-c",
            "mkdir -p /.config/kustomize/plugin/policy.open-cluster-management.io/v1/policygenerator && cp /policy-generator/PolicyGenerator-not-fips-compliant /.config/kustomize/plugin/policy.open-cluster-management.io/v1/policygenerator/PolicyGenerator"
          ],
          "command": [
            "/bin/bash"
          ],
          "image": "registry.redhat.io/rhacm2/multicluster-operators-subscription-rhel9:v2.11",
          "name": "policy-generator-install",
          "imagePullPolicy": "Always",
          "volumeMounts": [
            {
              "mountPath": "/.config",
              "name": "kustomize"
            }
          ]
        }
      ],
      "volumeMounts": [
        {
          "name": "kustomize",
          "mountPath": "/.config"
        }
      ],
      "env": [
        {
          "name": "ARGOCD_EXEC_TIMEOUT",
          "value": "360s"
        },
        {
          "name": "KUSTOMIZE_PLUGIN_HOME",
          "value": "/.config/kustomize/plugin"
        }
      ],
      "resources": {
        "limits": {
          "cpu": "8",
          "memory": "16Gi"
        },
        "requests": {
          "cpu": "1",
          "memory": "2Gi"
        }
      }
    }
  }
}

clusters-app.yaml

apiVersion: v1
kind: Namespace
metadata:
  name: clusters-sub
  annotations:
    argocd.argoproj.io/sync-wave: "100"
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: clusters
  namespace: openshift-gitops
  annotations:
    argocd.argoproj.io/sync-wave: "100"
spec:
  destination:
    server: https://kubernetes.default.svc
    namespace: clusters-sub
  project: ztp-app-project
  source:
    path: ztp/gitops-subscriptions/argocd/example/siteconfig
    repoURL: https://github.com/openshift-kni/cnf-features-deploy
    targetRevision: master
    # uncomment the below plugin if you will be adding the plugin binaries in the same repo->dir where
    # the sitconfig.yaml exist AND use the ../../hack/patch-argocd-dev.sh script to re-patch the deployment-repo-server
#    plugin:
#      name: kustomize-with-local-plugins
  ignoreDifferences: # recommended way to allow ACM controller to manage its fields. alternative approach documented below (1)
    - group: cluster.open-cluster-management.io
      kind: ManagedCluster
      managedFieldsManagers:
        - controller
# (1) alternatively you can choose to ignore a specific path like so (replace managedFieldsManagers with jsonPointers)
#      jsonPointers:
#        - /metadata/labels/cloud
#        - /metadata/labels/vendor
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true
      - PrunePropagationPolicy=background
      - RespectIgnoreDifferences=true

gitops-cluster-rolebinding.yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: gitops-cluster
  annotations:
    argocd.argoproj.io/sync-wave: "100"
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: openshift-gitops-argocd-application-controller
  namespace: openshift-gitops

gitops-policy-rolebinding.yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: gitops-policy
  annotations:
    argocd.argoproj.io/sync-wave: "100"
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: open-cluster-management:cluster-manager-admin
subjects:
- kind: ServiceAccount
  name: openshift-gitops-argocd-application-controller
  namespace: openshift-gitops

kustomization.yaml

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - app-project.yaml
  - policies-app-project.yaml
  - gitops-policy-rolebinding.yaml
  - gitops-cluster-rolebinding.yaml
  - clusters-app.yaml
  - policies-app.yaml
  - AddPluginsPolicy.yaml

policies-app-project.yaml

---
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: policy-app-project
  namespace: openshift-gitops
  annotations:
    argocd.argoproj.io/sync-wave: "100"
spec:
  clusterResourceWhitelist:
  - group: ''
    kind: Namespace
  destinations:
  - namespace: 'ztp*'
    server: '*'
  - namespace: 'policies-sub'
    server: '*'
  namespaceResourceWhitelist:
  - group: ''
    kind: ConfigMap
  - group: ''
    kind: Namespace
  - group: 'apps.open-cluster-management.io'
    kind: PlacementRule
  - group: 'policy.open-cluster-management.io'
    kind: Policy
  - group: 'policy.open-cluster-management.io'
    kind: PlacementBinding
  - group: 'ran.openshift.io'
    kind: PolicyGenTemplate
  - group: cluster.open-cluster-management.io
    kind: Placement
  - group: policy.open-cluster-management.io
    kind: PolicyGenerator
  - group: policy.open-cluster-management.io
    kind: PolicySet
  - group: cluster.open-cluster-management.io
    kind: ManagedClusterSetBinding
  sourceRepos:
  - '*'

policies-app.yaml

apiVersion: v1
kind: Namespace
metadata:
  name: policies-sub
  annotations:
    argocd.argoproj.io/sync-wave: "100"
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: policies
  namespace: openshift-gitops
  annotations:
    argocd.argoproj.io/sync-wave: "100"
spec:
  destination:
    server: https://kubernetes.default.svc
    namespace: policies-sub
  project: policy-app-project
  source:
    path: ztp/gitops-subscriptions/argocd/example/policygentemplates
    repoURL: https://github.com/openshift-kni/cnf-features-deploy
    targetRevision: master
    # uncomment the below plugin if you will be adding the plugin binaries in the same repo->dir where
    # the policyGenTemplate.yaml exist AND use the ../../hack/patch-argocd-dev.sh script to re-patch the deployment-repo-server
    #  plugin:
    #    name: kustomize-with-local-plugins
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
    - CreateNamespace=true

Logging reference YAML

clusterLogNS.yaml

---
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-logging
  annotations:
    workload.openshift.io/allowed: management

clusterLogOperGroup.yaml

---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: cluster-logging
  namespace: openshift-logging
spec:
  targetNamespaces:
  - openshift-logging

clusterLogSubscription.yaml

---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: cluster-logging
  namespace: openshift-logging
spec:
  channel: "stable-6.2"
  name: cluster-logging
  source: redhat-operators-disconnected
  sourceNamespace: openshift-marketplace
  installPlanApproval: Automatic

Installation reference YAML

agent-config.yaml

---
apiVersion: v1beta1
kind: AgentConfig
metadata:
  name: hub  # need to match the same name put in install-config
rendezvousIP: 192.168.125.20  # one of the master IP
# Replace the fields below with your network details
hosts:
  - hostname: hub-ctl-0
    role: master
    interfaces:
      - name: ens3
        macAddress: aa:aa:aa:aa:01:01
    networkConfig:
      interfaces:
        - name: ens3
          mac-address: aa:aa:aa:aa:01:01
          ipv4:
            enabled: true
            dhcp: true
          ipv6:
            enabled: true
            dhcp: false
            address:
              - ip: fd01::20
                prefix-length: 64
      routes:
        config:
          - destination: ::/0
            next-hop-address: fd01::1
            next-hop-interface: ens3
            table-id: 254
    rootDeviceHints:
      deviceName: "/dev/disk/by-path/pci-0000:00:07.0"
  - hostname: hub-ctl-1
    role: master
    interfaces:
      - name: ens3
        macAddress: aa:aa:aa:aa:01:02
    networkConfig:
      interfaces:
        - name: ens3
          mac-address: aa:aa:aa:aa:01:02
          ipv4:
            enabled: true
            dhcp: true
          ipv6:
            enabled: true
            dhcp: false
            address:
              - ip: fd01::21
                prefix-length: 64
      routes:
        config:
          - destination: ::/0
            next-hop-address: fd01::1
            next-hop-interface: ens3
            table-id: 254
    rootDeviceHints:
      deviceName: "/dev/disk/by-path/pci-0000:00:07.0"
  - hostname: hub-ctl-2
    role: master
    interfaces:
      - name: ens3
        macAddress: aa:aa:aa:aa:01:03
    networkConfig:
      interfaces:
        - name: ens3
          mac-address: aa:aa:aa:aa:01:03
          ipv4:
            enabled: true
            dhcp: true
          ipv6:
            enabled: true
            dhcp: false
            address:
              - ip: fd01::22
                prefix-length: 64
      routes:
        config:
          - destination: ::/0
            next-hop-address: fd01::1
            next-hop-interface: ens3
            table-id: 254
    rootDeviceHints:
      deviceName: "/dev/disk/by-path/pci-0000:00:07.0"

install-config.yaml

---
apiVersion: v1
metadata:
  name: hub  # replace with your hub name
baseDomain: example.com  # replace with your domain name
compute:
- architecture: amd64
  hyperthreading: Enabled
  name: worker
  replicas: 0
controlPlane:
  architecture: amd64
  hyperthreading: Enabled
  name: master
  replicas: 3
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  - cidr: fd02::/48
    hostPrefix: 64
  machineNetwork:
  - cidr: 192.168.125.0/24  # replace with your machine network CIDR
  - cidr: fd01::/64
  networkType: OVNKubernetes
  serviceNetwork:
  - 172.30.0.0/16
  - fd03::/112
# Replace the fields below with your network details
platform:
  baremetal:
    provisioningNetwork: "Disabled"
    apiVIPs:
    - 192.168.125.10
    - fd01::10
    ingressVIPs:
    - 192.168.125.11
    - fd01::11
# Replace <registry.example.com:8443> with the mirror registry's address.
imageDigestSources:
- mirrors:
  - <registry.example.com:8443>/openshift-release-dev/ocp-release
  source: quay.io/openshift-release-dev/ocp-release
- mirrors:
  - <registry.example.com:8443>/openshift-release-dev/ocp-v4.0-art-dev
  source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
# Add the mirror registry SSL certificate chain up to the CA itself.
additionalTrustBundle: |
  -----BEGIN CERTIFICATE-----
  MIID7jCCAtagAwXXX...
  -----END CERTIFICATE-----
  -----BEGIN CERTIFICATE-----
  MIIDvTCCAqWgAwIBAgIUcXQpXXX...
  -----END CERTIFICATE-----
# Add the mirror registry credentials to the pull secret.
pullSecret: '{"auths":{"<registry.example.com:8443>":{"auth": "aW5pdDo0R1XXXXXjdCbUoweUNuMWI1OTZBMmhkcEhjMw==","email": "user@redhat.com"},...}}}'
# Add the SSH public key to connect to the OCP nodes
sshKey: |
  ssh-rsa AAAAB3NzaC1yc2EA...

Telco hub reference configuration software specifications

The telco hub 4 solution has been validated using the following Red Hat software products for OKD clusters.

Table 6. Telco hub cluster validated software components
Component	Software version
OKD	4.19
Local Storage Operator	4.19
Red Hat OpenShift Data Foundation (ODF)	4.18
Red Hat Advanced Cluster Management (RHACM)	2.13
Red Hat OpenShift GitOps	1.16
GitOps Zero Touch Provisioning (ZTP) plugins	4.19
multicluster engine Operator PolicyGenerator plugin	2.12
Topology Aware Lifecycle Manager (TALM)	4.19
Cluster Logging Operator	6.2
OpenShift API for Data Protection (OADP)	The version aligned with the RHACM release.