Deploying hosted control planes on IBM Power - Deploying hosted control planes | Hosted control planes

Prerequisites to configure hosted control planes on IBM Power
IBM Power infrastructure requirements
DNS configuration for hosted control planes on IBM Power
Creating a hosted cluster on bare metal
Creating heterogeneous node pools on agent hosted clusters

You can deploy hosted control planes by configuring a cluster to function as a hosting cluster. The hosting cluster is an OKD cluster where the control planes are hosted. The hosting cluster is also known as the management cluster.

The management cluster is not the managed cluster. A managed cluster is a cluster that the hub cluster manages.

The multicluster engine Operator supports only the default local-cluster, which is a hub cluster that is managed, and the hub cluster as the hosting cluster.

To provision hosted control planes on bare metal, you can use the Agent platform. The Agent platform uses the central infrastructure management service to add worker nodes to a hosted cluster. For more information, see "Enabling the central infrastructure management service".

Each IBM Power host must be started with a Discovery Image that the central infrastructure management provides. After each host starts, it runs an Agent process to discover the details of the host and completes the installation. An Agent custom resource represents each host.

When you create a hosted cluster with the Agent platform, HyperShift installs the Agent Cluster API provider in the hosted control plane namespace.

Prerequisites to configure hosted control planes on IBM Power

The multicluster engine for Kubernetes Operator version 2.7 and later installed on an OKD cluster. The multicluster engine Operator is automatically installed when you install Red Hat Advanced Cluster Management (RHACM). You can also install the multicluster engine Operator without RHACM as an Operator from the OKD OperatorHub.
The multicluster engine Operator must have at least one managed OKD cluster. The local-cluster managed hub cluster is automatically imported in the multicluster engine Operator version 2.7 and later. For more information about local-cluster, see Advanced configuration in the RHACM documentation. You can check the status of your hub cluster by running the following command:
```
$ oc get managedclusters local-cluster
```
You need a hosting cluster with at least 3 worker nodes to run the HyperShift Operator.
You need to enable the central infrastructure management service. For more information, see "Enabling the central infrastructure management service".
You need to install the hosted control plane command-line interface. For more information, see "Installing the hosted control plane command-line interface".

The hosted control planes feature is enabled by default. If you disabled the feature and want to manually enable it, see "Manually enabling the hosted control planes feature". If you need to disable the feature, see "Disabling the hosted control planes feature".

Additional resources

IBM Power infrastructure requirements

The Agent platform does not create any infrastructure, but requires the following resources for infrastructure:

Agents: An Agent represents a host that is booted with a discovery image and is ready to be provisioned as an OKD node.
DNS: The API and Ingress endpoints must be routable.

DNS configuration for hosted control planes on IBM Power

The API server for the hosted cluster is exposed. A DNS entry must exist for the api.<hosted_cluster_name>.<basedomain> entry that points to the destination where the API server is reachable.

The DNS entry can be as simple as a record that points to one of the nodes in the managed cluster that is running the hosted control plane.

The entry can also point to a load balancer that is deployed to redirect incoming traffic to the ingress pods.

See the following example of a DNS configuration:

$ cat /var/named/<example.krnl.es.zone>

Example output

$ TTL 900
@ IN  SOA bastion.example.krnl.es.com. hostmaster.example.krnl.es.com. (
      2019062002
      1D 1H 1W 3H )
  IN NS bastion.example.krnl.es.com.
;
;
api                   IN A 1xx.2x.2xx.1xx (1)
api-int               IN A 1xx.2x.2xx.1xx
;
;
*.apps.<hosted-cluster-name>.<basedomain>           IN A 1xx.2x.2xx.1xx
;
;EOF

1	The record refers to the IP address of the API load balancer that handles ingress and egress traffic for hosted control planes.

For IBM Power, add IP addresses that correspond to the IP address of the agent.

Example configuration

compute-0              IN A 1xx.2x.2xx.1yy
compute-1              IN A 1xx.2x.2xx.1yy

Creating a hosted cluster on bare metal

When you create a hosted cluster with the Agent platform, HyperShift installs the Agent Cluster API provider in the hosted control plane namespace. You can create a hosted cluster on bare metal or import one.

As you create a hosted cluster, keep the following guidelines in mind:

Each hosted cluster must have a cluster-wide unique name. A hosted cluster name cannot be the same as any existing managed cluster in order for multicluster engine Operator to manage it.
Do not use clusters as a hosted cluster name.
A hosted cluster cannot be created in the namespace of a multicluster engine Operator managed cluster.
The most common service publishing strategy is to expose services through a load balancer. That strategy is the preferred method for exposing the Kubernetes API server. If you create a hosted cluster by using the web console or by using Red Hat Advanced Cluster Management, to set a publishing strategy for a service besides the Kubernetes API server, you must manually specify the servicePublishingStrategy information in the HostedCluster custom resource.

Procedure

Create the hosted control plane namespace by entering the following command:
```
$ oc create ns <hosted_cluster_namespace>-<hosted_cluster_name>
```
Replace <hosted_cluster_namespace> with your hosted cluster namespace name, for example, clusters. Replace <hosted_cluster_name> with your hosted cluster name.

Verify that you have a default storage class configured for your cluster. Otherwise, you might see pending PVCs. Run the following command:

$ hcp create cluster agent \
    --name=<hosted_cluster_name> \(1)
    --pull-secret=<path_to_pull_secret> \(2)
    --agent-namespace=<hosted_control_plane_namespace> \(3)
    --base-domain=<basedomain> \(4)
    --api-server-address=api.<hosted_cluster_name>.<basedomain> \(5)
    --etcd-storage-class=<etcd_storage_class> \(6)
    --ssh-key  <path_to_ssh_public_key> \(7)
    --namespace <hosted_cluster_namespace> \(8)
    --control-plane-availability-policy HighlyAvailable \(9)
    --release-image=quay.io/openshift-release-dev/ocp-release:<ocp_release_image> \(10)
    --node-pool-replicas <node_pool_replica_count> (11)

1	Specify the name of your hosted cluster, for instance, `example`.
2	Specify the path to your pull secret, for example, `/user/name/pullsecret`.
3	Specify your hosted control plane namespace, for example, `clusters-example`. Ensure that agents are available in this namespace by using the `oc get agent -n <hosted_control_plane_namespace>` command.
4	Specify your base domain, for example, `krnl.es`.
5	The `--api-server-address` flag defines the IP address that is used for the Kubernetes API communication in the hosted cluster. If you do not set the `--api-server-address` flag, you must log in to connect to the management cluster.
6	Specify the etcd storage class name, for example, `lvm-storageclass`.
7	Specify the path to your SSH public key. The default file path is `~/.ssh/id_rsa.pub`.
8	Specify your hosted cluster namespace.
9	Specify the availability policy for the hosted control plane components. Supported options are `SingleReplica` and `HighlyAvailable`. The default value is `HighlyAvailable`.
10	Specify the supported OKD version that you want to use, for example, `4.19.0-multi`. If you are using a disconnected environment, replace `<ocp_release_image>` with the digest image. To extract the OKD release image digest, see Extracting the OKD release image digest.
11	Specify the node pool replica count, for example, `3`. You must specify the replica count as `0` or greater to create the same number of replicas. Otherwise, no node pools are created.

After a few moments, verify that your hosted control plane pods are up and running by entering the following command:

$ oc -n <hosted_cluster_namespace>-<hosted_cluster_name> get pods

Example output

NAME                                             READY   STATUS    RESTARTS   AGE
capi-provider-7dcf5fc4c4-nr9sq                   1/1     Running   0          4m32s
catalog-operator-6cd867cc7-phb2q                 2/2     Running   0          2m50s
certified-operators-catalog-884c756c4-zdt64      1/1     Running   0          2m51s
cluster-api-f75d86f8c-56wfz                      1/1     Running   0          4m32s

Creating heterogeneous node pools on agent hosted clusters

On the agent platform, you can create heterogeneous node pools so that your clusters can run diverse machine types, such as x86_64 or ppc64le, within a single hosted cluster.

About creating heterogeneous node pools on agent hosted clusters

A node pool is a group of nodes within a cluster that share the same configuration. Heterogeneous node pools are pools that have different configurations, allowing you to create pools optimized for various workloads.

You can create heterogeneous node pools on the agent platform. It enables clusters to run diverse machine types, for example, x86_64 or ppc64le, within a single hosted cluster.

To create a heterogeneous node pool, perform the following general steps, as described in the following sections:

Create an AgentServiceConfig custom resource (CR) that informs the Operator how much storage is needed for components such as the database and filesystem. The CR also defines which OKD versions to maintain.
Create an agent cluster.
Create the heterogeneous node pool.
Configure DNS for hosted control planes
Create an InfraEnv custom resource (CR) for each architecture.
Add agents to the heterogeneous cluster.

Creating the AgentServiceConfig custom resource

To create heterogeneous node pools on an agent hosted cluster, you first need to create the AgentServiceConfig CR with two heterogeneous architecture operating system (OS) images.

Procedure

Run the following command:

$ envsubst <<"EOF" | oc apply -f -
apiVersion: agent-install.openshift.io/v1beta1
kind: AgentServiceConfig
metadata:
 name: agent
spec:
  databaseStorage:
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: <db_volume_name> (1)
  filesystemStorage:
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: <fs_volume_name> (2)
  osImages:
    - openshiftVersion: <ocp_version> (3)
      version: <ocp_release_version_x86> (4)
      url: <iso_url_x86> (5)
      rootFSUrl: <root_fs_url_x8> (6)
      cpuArchitecture: <arch_x86> (7)
    - openshiftVersion: <ocp_version> (8)
      version: <ocp_release_version_ppc64le> (9)
      url: <iso_url_ppc64le> (10)
      rootFSUrl: <root_fs_url_ppc64le> (11)
      cpuArchitecture: <arch_ppc64le> (12)
EOF

1	Specify the multicluster engine for Kubernetes Operator `agentserviceconfig` config, database volume name.
2	Specify the multicluster engine Operator `agentserviceconfig` config, filesystem volume name.
3	Specify the current version of OKD.
4	Specify the current OKD release version for x86.
5	Specify the ISO URL for x86.
6	Specify the root filesystem URL for X86.
7	Specify the CPU architecture for x86.
8	Specify the current OKD version.
9	Specify the OKD release version for `ppc64le`.
10	Specify the ISO URL for `ppc64le`.
11	Specify the root filesystem URL for `ppc64le`.
12	Specify the CPU architecture for `ppc64le`.

Create an agent cluster

An agent cluster is a cluster that is managed and provisioned using an agent-based approach. An agent cluster can utilize heterogeneous node pools, allowing for different types of worker nodes to be used within the same cluster.

Prerequisites

You used a multi-architecture release image to enable support for heterogeneous node pools when creating a hosted cluster. Find the latest multi-architecture images on the Multi-arch release images page.

Procedure

Create an environment variable for the cluster namespace by running the following command:
```
$ export CLUSTERS_NAMESPACE=<hosted_cluster_namespace>
```
Create an environment variable for the machine classless inter-domain routing (CIDR) notation by running the following command:
```
$ export MACHINE_CIDR=192.168.122.0/24
```
Create the hosted control namespace by running the following command:
```
$ oc create ns <hosted_control_plane_namespace>
```

Create the cluster by running the following command:

$ hcp create cluster agent \
    --name=<hosted_cluster_name> \(1)
    --pull-secret=<pull_secret_file> \(2)
    --agent-namespace=<hosted_control_plane_namespace> \(3)
    --base-domain=<basedomain> \(4)
    --api-server-address=api.<hosted_cluster_name>.<basedomain> \
    --release-image=quay.io/openshift-release-dev/ocp-release:<ocp_release> (5)

1	Specify the hosted cluster name.
2	Specify the pull secret file path.
3	Specify the namespace for the hosted control plane.
4	Specify the base domain for the hosted cluster.
5	Specify the current OKD release version.

Creating heterogeneous node pools

You create heterogeneous node pools by using the NodePool custom resource (CR).

Procedure

To define a NodePool CR, create a YAML file similar to the following example:

envsubst <<"EOF" | oc apply -f -
apiVersion:apiVersion: hypershift.openshift.io/v1beta1
kind: NodePool
metadata:
  name: <hosted_cluster_name>
  namespace: <clusters_namespace>
spec:
  arch: <arch_ppc64le>
  clusterName: <hosted_cluster_name>
  management:
    autoRepair: false
    upgradeType: InPlace
  nodeDrainTimeout: 0s
  nodeVolumeDetachTimeout: 0s
  platform:
    agent:
      agentLabelSelector:
        matchLabels:
          inventory.agent-install.openshift.io/cpu-architecture: <arch_ppc64le> (1)
    type: Agent
  release:
    image: quay.io/openshift-release-dev/ocp-release:<ocp_release>
  replicas: 0
EOF

1	This selector block is selects the agents that match the specified label. To create a node pool of architecture `ppc64le` with zero replicas, specify `ppc64le`. This ensures that it selects only agents from `ppc64le` architecture when it scales.

DNS configuration for hosted control planes

Domain Name Service (DNS) configuration for hosted control planes enables external clients to reach ingress controllers so that they can route traffic to internal components. Configuring this setting ensures that traffic is routed to either ppc64le or x86_64 compute node.

You can point an *.apps.<cluster_name> record to either of the compute nodes where the ingress application is hosted. Or, if you are able to set up a load balancer on top of the compute nodes, point this record to this load balancer. When you are creating a heterogeneous node pool, make sure the compute nodes can reach each other or keep them in the same network.

Creating infrastructure environment resources

For heterogeneous node pools, you must create an infraEnv custom resource (CR) for each architecture. For example, for node pools with x86_64 and ppc64le architectures, create an InfraEnv CR for x86_64 and ppc64le.

Before proceeding, make sure that the operating system (OS) images for both x86_64 and ppc64le architectures are added to the AgentServiceConfig resource. After this, you can use the InfraEnv resources to get the minimal ISO image.

Procedure

Create the InfraEnv resource with x86_64 architecture for heterogeneous node pools by running the following command:

$ envsubst <<"EOF" | oc apply -f -
apiVersion: agent-install.openshift.io/v1beta1
kind: InfraEnv
metadata:
  name: <hosted_cluster_name>-<arch_x86>  (1) (2)
  namespace: <hosted_control_plane_namespace> (3)
spec:
  cpuArchitecture: <arch_x86>
  pullSecretRef:
    name: pull-secret
  sshAuthorizedKey: <ssh_pub_key> (4)
EOF

1	The hosted cluster name.
2	The `x86_64` architecture.
3	The hosted control plane namespace.
4	The ssh public key.

Create the InfraEnv resource with ppc64le architecture for heterogeneous node pools by running the following command:

envsubst <<"EOF" | oc apply -f -
apiVersion: agent-install.openshift.io/v1beta1
kind: InfraEnv
metadata:
  name: <hosted_cluster_name>-<arch_ppc64le>  (1) (2)
  namespace: <hosted_control_plane_namespace> (3)
spec:
  cpuArchitecture: <arch_ppc64le>
  pullSecretRef:
    name: pull-secret
  sshAuthorizedKey: <ssh_pub_key> (4)
EOF

1	The hosted cluster name.
2	The `ppc64le` architecture.
3	The hosted control plane namespace.
4	The ssh public key.

To verify that the InfraEnv resources are created successfully run the following commands:
- Verify that the x86_64 InfraEnv resource is created successfully:
  $ oc describe InfraEnv <hosted_cluster_name>-<arch_x86>
- Verify that the ppc64le InfraEnv resource is created successfully:
  $ oc describe InfraEnv <hosted_cluster_name>-<arch_ppc64le>

Generate a live ISO that allows either a virtual machine or a bare-metal machine to join as agents by running the following commands:

Generate a live ISO for x86_64:

$ oc -n <hosted_control_plane_namespace> get InfraEnv <hosted_cluster_name>-<arch_x86> -ojsonpath="{.status.isoDownloadURL}"

Generate a live ISO for ppc64le:

$ oc -n <hosted_control_plane_namespace> get InfraEnv <hosted_cluster_name>-<arch_ppc64le> -ojsonpath="{.status.isoDownloadURL}"

Adding agents to the heterogeneous cluster

You add agents by manually configuring the machine to boot with a live ISO. You can download the live ISO and use it to boot a bare-metal node or a virtual machine. On boot, the node communicates with the assisted-service and registers as an agent in the same namespace as the InfraEnv resource. When each agent is created, you can optionally set its installation_disk_id and hostname parameters in the specifications. When you are done, approve it to indicate that the agent is ready for use.

Procedure

Obtain a list of agents by running the following command:

oc -n <hosted_control_plane_namespace> get agents

Example output

NAME                                   CLUSTER   APPROVED   ROLE          STAGE
86f7ac75-4fc4-4b36-8130-40fa12602218                        auto-assign
e57a637f-745b-496e-971d-1abbf03341ba                        auto-assign

Patch an agent by running the following command:

oc -n <hosted_control_plane_namespace> patch agent 86f7ac75-4fc4-4b36-8130-40fa12602218 -p '{"spec":{"installation_disk_id":"/dev/sda","approved":true,"hostname":"worker-0.example.krnl.es"}}' --type merge

Patch the second agent by running the following command:

oc -n <hosted_control_plane_namespace> patch agent 23d0c614-2caa-43f5-b7d3-0b3564688baa -p '{"spec":{"installation_disk_id":"/dev/sda","approved":true,"hostname":"worker-1.example.krnl.es"}}' --type merge

Check the agent approval status by running the following command:

oc -n <hosted_control_plane_namespace> get agents

Example output

NAME                                   CLUSTER   APPROVED   ROLE          STAGE
86f7ac75-4fc4-4b36-8130-40fa12602218             true       auto-assign
e57a637f-745b-496e-971d-1abbf03341ba             true       auto-assign

Scaling the node pool

After your agents are approved, you can scale the node pools. The agentLabelSelector value that is configured in the node pool ensures that only matching agents are added to the cluster. This also helps scale down the node pool. To remove specific architecture nodes from the cluster, scale down the corresponding node pool.

Procedure

Scale the node pool by running the following command:

$ oc -n <clusters_namespace> scale nodepool <nodepool_name> --replicas 2

The Cluster API agent provider picks two agents randomly to assign to the hosted cluster. These agents pass through different states and then join the hosted cluster as OKD nodes. The various agent states are binding, discovering, insufficient, installing, installing-in-progress, and added-to-existing-cluster.

Verification

List the agents by running the following command:

$ oc -n <hosted_control_plane_namespace> get agent

Example output

NAME                                   CLUSTER         APPROVED   ROLE          STAGE
4dac1ab2-7dd5-4894-a220-6a3473b67ee6   hypercluster1   true       auto-assign
d9198891-39f4-4930-a679-65fb142b108b                   true       auto-assign
da503cf1-a347-44f2-875c-4960ddb04091   hypercluster1   true       auto-assign

Check the status of a specific scaled agent by running the following command:

$ oc -n <hosted_control_plane_namespace> get agent -o jsonpath='{range .items[*]}BMH: {@.metadata.labels.agent-install\.openshift\.io/bmh} Agent: {@.metadata.name} State: {@.status.debugInfo.state}{"\n"}{end}'

Example output

BMH: ocp-worker-2 Agent: 4dac1ab2-7dd5-4894-a220-6a3473b67ee6 State: binding
BMH: ocp-worker-0 Agent: d9198891-39f4-4930-a679-65fb142b108b State: known-unbound
BMH: ocp-worker-1 Agent: da503cf1-a347-44f2-875c-4960ddb04091 State: insufficient

Once the agents reach the added-to-existing-cluster state, verify that the OKD nodes are ready by running the following command:

$ oc --kubeconfig <hosted_cluster_name>.kubeconfig get nodes

Example output

NAME           STATUS   ROLES    AGE     VERSION
ocp-worker-1   Ready    worker   5m41s   v1.24.0+3882f8f
ocp-worker-2   Ready    worker   6m3s    v1.24.0+3882f8f

Adding workloads to the nodes can reconcile some cluster operators. The following command displays that two machines are created after scaling up the node pool:

$ oc -n <hosted_control_plane_namespace> get machines

Example output

NAME                            CLUSTER               NODENAME       PROVIDERID                                     PHASE     AGE   VERSION
hypercluster1-c96b6f675-m5vch   hypercluster1-b2qhl   ocp-worker-1   agent://da503cf1-a347-44f2-875c-4960ddb04091   Running   15m   4.11.5
hypercluster1-c96b6f675-tl42p   hypercluster1-b2qhl   ocp-worker-2   agent://4dac1ab2-7dd5-4894-a220-6a3473b67ee6   Running   15m   4.11.5