This is a cache of https://docs.openshift.com/rosa/rosa_architecture/rosa_policy_service_definition/rosa-sre-access.html. It is a snapshot of the page at 2024-11-30T03:01:42.013+0000.
SRE and service account access - Policies and service definition | Introduction to ROSA | Red Hat OpenShift Service on AWS
×

Red Hat site reliability engineering (SRE) access to Red Hat OpenShift Service on AWS (ROSA) clusters is outlined through identity and access management.

Identity and access management

Most access by Red Hat SRE teams is done by using cluster Operators through automated configuration management.

Subprocessors

For a list of the available subprocessors, see the Red Hat Subprocessor List on the Red Hat Customer Portal.

SRE cluster access

SRE access to Red Hat OpenShift Service on AWS (ROSA) clusters is controlled through several layers of required authentication, all of which are managed by strict company policy. All authentication attempts to access a cluster and changes made within a cluster are recorded within audit logs, along with the specific account identity of the SRE responsible for those actions. These audit logs help ensure that all changes made by SREs to a customer’s cluster adhere to the strict policies and procedures that make up Red Hat’s managed services guidelines.

The information presented below is an overview of the process an SRE must perform to access a customer’s cluster.

  • SRE requests a refreshed ID token from the Red Hat SSO (Cloud Services). This request is authenticated. The token is valid for fifteen minutes. After the token expires, you can refresh the token again and receive a new token. The ability to refresh to a new token is indefinite; however, the ability to refresh to a new token is revoked after 30 days of inactivity.

  • SRE connects to the Red Hat VPN. The authentication to the VPN is completed by the Red Hat Corporate Identity and Access Management system (RH IAM). With RH IAM, SREs are multifactor and can be managed internally per organization by groups and existing onboarding and offboarding processes. After an SRE is authenticated and connected, the SRE can access the cloud services fleet management plane. Changes to the cloud services fleet management plane require many layers of approval and are maintained by strict company policy.

  • After authorization is complete, the SRE logs into the fleet management plane and receives a service account token that the fleet management plane created. The token is valid for 15 minutes. After the token is no longer valid, it is deleted.

  • With access granted to the fleet management plane, SRE uses various methods to access clusters, depending on network configuration.

    • Accessing a private or public cluster: Request is sent through a specific Network Load Balancer (NLB) by using an encrypted HTTP connection on port 6443.

    • Accessing a PrivateLink cluster: Request is sent to the Red Hat Transit Gateway, which then connects to a Red Hat VPC per region. The VPC that receives the request will be dependent on the target private cluster’s region. Within the VPC, there is a private subnet that contains the PrivateLink endpoint to the customer’s PrivateLink cluster.

SREs access ROSA clusters through the web console or command line interface (CLI) tools. Authentication requires multi-factor authentication (MFA) with industry-standard requirements for password complexity and account lockouts. SREs must authenticate as individuals to ensure auditability. All authentication attempts are logged to a Security Information and Event Management (SIEM) system.

SREs access private clusters using an encrypted HTTP connection. Connections are permitted only from a secured Red Hat network using either an IP allowlist or a private cloud provider link.

267 OpenShift on AWS Access Networking 1222
Figure 1. SRE access to ROSA clusters

Privileged access controls in ROSA

SRE adheres to the principle of least privilege when accessing ROSA and AWS components. There are four basic categories of manual SRE access:

  • SRE admin access through the Red Hat Portal with normal two-factor authentication and no privileged elevation.

  • SRE admin access through the Red Hat corporate SSO with normal two-factor authentication and no privileged elevation.

  • OpenShift elevation, which is a manual elevation using Red Hat SSO. Access is limited to 2 hours, is fully audited, and requires management approval.

  • AWS access or elevation, which is a manual elevation for AWS console or CLI access. Access is limited to 60 minutes and is fully audited.

Each of these access types have different levels of access to components:

Component Typical SRE admin access (Red Hat Portal) Typical SRE admin access (Red Hat SSO) OpenShift elevation Cloud provider access or elevation

OpenShift Cluster Manager

R/W

No access

No access

No access

OpenShift console

No access

R/W

R/W

No access

Node operating system

No access

A specific list of elevated OS and network permissions.

A specific list of elevated OS and network permissions.

No access

AWS Console

No access

No access, but this is the account used to request cloud provider access.

No access

All cloud provider permissions using the SRE identity.

SRE access to AWS accounts

Red Hat personnel do not access AWS accounts in the course of routine Red Hat OpenShift Service on AWS operations. For emergency troubleshooting purposes, the SREs have well-defined and auditable procedures to access cloud infrastructure accounts.

In the isolated backplane flow, SREs request access to a customer’s support role. This request is just-in-time (JIT) processed by the backplane API which dynamically updates the organization role’s permissions to a specific SRE personnel’s account. This SRE’s account is given access to a specific Red Hat customer’s environment. SRE access to a Red Hat customer’s environment is a temporary, short-lived access that is only established at the time of the access request.

Access to the STS token is audit-logged and traceable back to individual users. Both STS and non-STS clusters use the AWS STS service for SRE access. Access control uses the unified backplane flow when the ManagedOpenShift-Technical-Support-Role has the ManagedOpenShift-Support-Access policy attached, and this role is used for administration. Access control uses the isolated backplane flow when the ManagedOpenShift-Support-Role has the ManagedOpenShift-Technical-Support-<org_id> policy attached. See the KCS article Updating Trust Policies for ROSA clusters for more information.

SRE STS view of AWS accounts

When SREs are on a VPN through two-factor authentication, they and Red Hat Support can assume the ManagedOpenShift-Support-Role in your AWS account. The ManagedOpenShift-Support-Role has all the permissions necessary for SREs to directly troubleshoot and manage AWS resources. Upon assumption of the ManagedOpenShift-Support-Role, SREs use a AWS Security Token Service (STS) to generate a unique, time-expiring URL to the customer’s AWS web UI for their account. SREs can then perform multiple troubleshooting actions, which include:

  • Viewing CloudTrail logs

  • Shutting down a faulty EC2 Instance

All activities performed by SREs arrive from Red Hat IP addresses and are logged to CloudTrail to allow you to audit and review all activity. This role is only used in cases where access to AWS services is required to assist you. The majority of permissions are read-only. However, a select few permissions have more access, including the ability to reboot an instance or spin up a new instance. SRE access is limited to the policy permissions attached to the ManagedOpenShift-Support-Role.

For a full list of permissions, see sts_support_permission_policy.json in the About IAM resources for ROSA clusters that use STS user guide.

PrivateLink VPC endpoint service is created as part of the ROSA cluster creation.

When you have a PrivateLink ROSA cluster, its Kubernetes API Server is exposed through a load balancer that can only be accessed from within the VPC by default. Red Hat site reliability engineering (SRE) can connect to this load balancer through a VPC Endpoint Service that has an associated VPC Endpoint in a Red Hat-owned AWS account. This endpoint service contains the name of the cluster, which is also in the ARN.

Under the Allow principals tab, a Red Hat-owned AWS account is listed. This specific user ensures that other entities cannot create VPC Endpoint connections to the PrivateLink cluster’s Kubernetes API Server.

When Red Hat SREs access the API, this fleet management plane can connect to the internal API through the VPC endpoint service.

Red Hat support access

Members of the Red Hat Customer Experience and Engagement (CEE) team typically have read-only access to parts of the cluster. Specifically, CEE has limited access to the core and product namespaces and does not have access to the customer namespaces.

Role Core namespace Layered product namespace Customer namespace AWS account*

OpenShift SRE

Read: All

Write: Very

limited [1]

Read: All

Write: None

Read: None[2]

Write: None

Read: All [3]

Write: All [3]

CEE

Read: All

Write: None

Read: All

Write: None

Read: None[2]

Write: None

Read: None

Write: None

Customer administrator

Read: None

Write: None

Read: None

Write: None

Read: All

Write: All

Read: All

Write: All

Customer user

Read: None

Write: None

Read: None

Write: None

Read: Limited[4]

Write: Limited[4]

Read: None

Write: None

Everybody else

Read: None

Write: None

Read: None

Write: None

Read: None

Write: None

Read: None

Write: None

  1. Limited to addressing common use cases such as failing deployments, upgrading a cluster, and replacing bad worker nodes.

  2. Red Hat associates have no access to customer data by default.

  3. SRE access to the AWS account is an emergency procedure for exceptional troubleshooting during a documented incident.

  4. Limited to what is granted through RBAC by the Customer Administrator and namespaces created by the user.

Customer access

Customer access is limited to namespaces created by the customer and permissions that are granted using RBAC by the Customer Administrator role. Access to the underlying infrastructure or product namespaces is generally not permitted without cluster-admin access. For more information about customer access and authentication, see the "Understanding Authentication" section of the documentation.

Access approval and review

New SRE user access requires management approval. Separated or transferred SRE accounts are removed as authorized users through an automated process. Additionally, the SRE performs periodic access review, including management sign-off of authorized user lists.

The access and identity authorization table includes responsibilities for managing authorized access to clusters, applications, and infrastructure resources. This includes tasks such as providing access control mechanisms, authentication, authorization, and managing access to resources.

Resource Service responsibilities Customer responsibilities

Logging

Red Hat

  • Adhere to an industry standards-based tiered internal access process for platform audit logs.

  • Provide native OpenShift RBAC capabilities.

  • Configure OpenShift RBAC to control access to projects and by extension a project’s application logs.

  • For third-party or custom application logging solutions, the customer is responsible for access management.

Application networking

Red Hat

  • Provide native OpenShift RBAC and dedicated-admin capabilities.

  • Configure OpenShift dedicated-admin and RBAC to control access to route configuration as required.

  • Manage organization administrators for Red Hat to grant access to OpenShift Cluster Manager. The cluster manager is used to configure router options and provide service load balancer quota.

Cluster networking

Red Hat

  • Provide customer access controls through OpenShift Cluster Manager.

  • Provide native OpenShift RBAC and dedicated-admin capabilities.

  • Manage Red Hat organization membership of Red Hat accounts.

  • Manage organization administrators for Red Hat to grant access to OpenShift Cluster Manager.

  • Configure OpenShift dedicated-admin and RBAC to control access to route configuration as required.

Virtual networking management

Red Hat

  • Provide customer access controls through OpenShift Cluster Manager.

  • Manage optional user access to AWS components through OpenShift Cluster Manager.

Virtual storage management

Red Hat

  • Provide customer access controls through Red Hat OpenShift Cluster Manager.

  • Manage optional user access to AWS components through OpenShift Cluster Manager.

  • Create AWS IAM roles and attached policies necessary to enable ROSA service access.

Virtual compute management

Red Hat

  • Provide customer access controls through Red Hat OpenShift Cluster Manager.

  • Manage optional user access to AWS components through OpenShift Cluster Manager.

  • Create AWS IAM roles and attached policies necessary to enable ROSA service access.

AWS software (public AWS services)

AWS

Compute: Provide the Amazon EC2 service, used for ROSA control plane, infrastructure, and worker nodes.

Storage: Provide Amazon EBS, used to allow ROSA to provision local node storage and persistent volume storage for the cluster.

Storage: Provide Amazon S3, used for the service’s built-in image registry.

Networking: Provide AWS Identity and Access Management (IAM), used by customers to control access to ROSA resources running on customer accounts.

  • Create AWS IAM roles and attached policies necessary to enable ROSA service access.

  • Use IAM tools to apply the appropriate permissions to AWS resources in the customer account.

  • To enable ROSA across your AWS organization, the customer is responsible for managing AWS Organizations administrators.

  • To enable ROSA across your AWS organization, the customer is responsible for distributing the ROSA entitlement grant using AWS License Manager.

Hardware and AWS global infrastructure

AWS

  • For information about physical access controls for AWS data centers, see Our Controls on the AWS Cloud Security page.

  • Customer is not responsible for AWS global infrastructure.

How service accounts assume AWS IAM roles in SRE owned projects

When you install a Red Hat OpenShift Service on AWS cluster that uses the AWS Security Token Service (STS), cluster-specific Operator AWS Identity and Access Management (IAM) roles are created. These IAM roles permit the Red Hat OpenShift Service on AWS cluster Operators to run core OpenShift functionality.

Cluster Operators use service accounts to assume IAM roles. When a service account assumes an IAM role, temporary STS credentials are provided for the service account to use in the cluster Operator’s pod. If the assumed role has the necessary AWS privileges, the service account can run AWS SDK operations in the pod.

Workflow for assuming AWS IAM roles in SRE owned projects

The following diagram illustrates the workflow for assuming AWS IAM roles in SRE owned projects:

Workflow for assuming AWS IAM roles in SRE owned projects
Figure 2. Workflow for assuming AWS IAM roles in SRE owned projects

The workflow has the following stages:

  1. Within each project that a cluster Operator runs, the Operator’s deployment spec has a volume mount for the projected service account token, and a secret containing AWS credential configuration for the pod. The token is audience-bound and time-bound. Every hour, Red Hat OpenShift Service on AWS generates a new token, and the AWS SDK reads the mounted secret containing the AWS credential configuration. This configuration has a path to the mounted token and the AWS IAM Role ARN. The secret’s credential configuration includes the following:

    • An $AWS_ARN_ROLE variable that has the ARN for the IAM role that has the permissions required to run AWS SDK operations.

    • An $AWS_WEB_IDENTITY_TOKEN_FILE variable that has the full path in the pod to the OpenID Connect (OIDC) token for the service account. The full path is /var/run/secrets/openshift/serviceaccount/token.

  2. When a cluster Operator needs to assume an AWS IAM role to access an AWS service (such as EC2), the AWS SDK client code running on the Operator invokes the AssumeRoleWithWebIdentity API call.

  3. The OIDC token is passed from the pod to the OIDC provider. The provider authenticates the service account identity if the following requirements are met:

    • The identity signature is valid and signed by the private key.

    • The sts.amazonaws.com audience is listed in the OIDC token and matches the audience configured in the OIDC provider.

      In Red Hat OpenShift Service on AWS with STS clusters, the OIDC provider is created during install and set as the service account issuer by default. The sts.amazonaws.com audience is set by default in the OIDC provider.

    • The OIDC token has not expired.

    • The issuer value in the token has the URL for the OIDC provider.

  4. If the project and service account are in the scope of the trust policy for the IAM role that is being assumed, then authorization succeeds.

  5. After successful authentication and authorization, temporary AWS STS credentials in the form of an AWS access token, secret key, and session token are passed to the pod for use by the service account. By using the credentials, the service account is temporarily granted the AWS permissions enabled in the IAM role.

  6. When the cluster Operator runs, the Operator that is using the AWS SDK in the pod consumes the secret that has the path to the projected service account and AWS IAM Role ARN to authenticate against the OIDC provider. The OIDC provider returns temporary STS credentials for authentication against the AWS API.

Additional resources