required pods per cluster / pods per node = total number of nodes needed
This document describes how to plan your Red Hat OpenShift service on AWS environment based on the tested cluster maximums.
Oversubscribing the physical resources on a node affects resource guarantees the Kubernetes scheduler makes during pod placement. Learn what measures you can take to avoid memory swapping.
Some of the tested maximums are stretched only in a single dimension. They will vary when many objects are running on the cluster.
The numbers noted in this documentation are based on Red Hat testing methodology, setup, configuration, and tunings. These numbers can vary based on your own individual setup and environments.
While planning your environment, determine how many pods are expected to fit per node using the following formula:
required pods per cluster / pods per node = total number of nodes needed
The current maximum number of pods per node is 250. However, the number of pods that fit on a node is dependent on the application itself. Consider the application’s memory, CPU, and storage requirements, as described in Planning your environment based on application requirements.
If you want to scope your cluster for 2200 pods per cluster, you would need at least nine nodes, assuming that there are 250 maximum pods per node:
2200 / 250 = 8.8
If you increase the number of nodes to 20, then the pod distribution changes to 110 pods per node:
2200 / 20 = 110
Where:
required pods per cluster / total number of nodes = expected pods per node
This document describes how to plan your Red Hat OpenShift service on AWS environment based on your application requirements.
Consider an example application environment:
Pod type | Pod quantity | Max memory | CPU cores | Persistent storage |
---|---|---|---|---|
apache |
100 |
500 MB |
0.5 |
1 GB |
node.js |
200 |
1 GB |
1 |
1 GB |
postgresql |
100 |
1 GB |
2 |
10 GB |
JBoss EAP |
100 |
1 GB |
1 |
1 GB |
Extrapolated requirements: 550 CPU cores, 450 GB RAM, and 1.4 TB storage.
Instance size for nodes can be modulated up or down, depending on your preference. Nodes are often resource overcommitted. In this deployment scenario, you can choose to run additional smaller nodes or fewer larger nodes to provide the same amount of resources. Factors such as operational agility and cost-per-instance should be considered.
Node type | Quantity | CPUs | RAM (GB) |
---|---|---|---|
Nodes (option 1) |
100 |
4 |
16 |
Nodes (option 2) |
50 |
8 |
32 |
Nodes (option 3) |
25 |
16 |
64 |
Some applications lend themselves well to overcommitted environments, and some do not. Most Java applications and applications that use huge pages are examples of applications that would not allow for overcommitment. That memory can not be used for other applications. In the example above, the environment would be roughly 30 percent overcommitted, a common ratio.
The application pods can access a service either by using environment variables or DNS. If using environment variables, for each active service the variables are injected by the kubelet when a pod is run on a node. A cluster-aware DNS server watches the Kubernetes API for new services and creates a set of DNS records for each one. If DNS is enabled throughout your cluster, then all pods should automatically be able to resolve services by their DNS name. service discovery using DNS can be used in case you must go beyond 5000 services. When using environment variables for service discovery, if the argument list exceeds the allowed length after 5000 services in a namespace, then the pods and deployments will start failing.
Disable the service links in the deployment’s service specification file to overcome this:
Kind: Template
apiVersion: template.openshift.io/v1
metadata:
name: deploymentConfigTemplate
creationTimestamp:
annotations:
description: This template will create a deploymentConfig with 1 replica, 4 env vars and a service.
tags: ''
objects:
- kind: DeploymentConfig
apiVersion: apps.openshift.io/v1
metadata:
name: deploymentconfig${IDENTIFIER}
spec:
template:
metadata:
labels:
name: replicationcontroller${IDENTIFIER}
spec:
enableserviceLinks: false
containers:
- name: pause${IDENTIFIER}
image: "${IMAGE}"
ports:
- containerPort: 8080
protocol: TCP
env:
- name: ENVVAR1_${IDENTIFIER}
value: "${ENV_VALUE}"
- name: ENVVAR2_${IDENTIFIER}
value: "${ENV_VALUE}"
- name: ENVVAR3_${IDENTIFIER}
value: "${ENV_VALUE}"
- name: ENVVAR4_${IDENTIFIER}
value: "${ENV_VALUE}"
resources: {}
imagePullPolicy: IfNotPresent
capabilities: {}
securityContext:
capabilities: {}
privileged: false
restartPolicy: Always
serviceAccount: ''
replicas: 1
selector:
name: replicationcontroller${IDENTIFIER}
triggers:
- type: ConfigChange
strategy:
type: Rolling
- kind: service
apiVersion: v1
metadata:
name: service${IDENTIFIER}
spec:
selector:
name: replicationcontroller${IDENTIFIER}
ports:
- name: serviceport${IDENTIFIER}
protocol: TCP
port: 80
targetPort: 8080
portalIP: ''
type: ClusterIP
sessionAffinity: None
status:
loadBalancer: {}
parameters:
- name: IDENTIFIER
description: Number to append to the name of resources
value: '1'
required: true
- name: IMAGE
description: Image to use for deploymentConfig
value: gcr.io/google-containers/pause-amd64:3.0
required: false
- name: ENV_VALUE
description: Value to use for environment variables
generate: expression
from: "[A-Za-z0-9]{255}"
required: false
labels:
template: deploymentConfigTemplate
The number of application pods that can run in a namespace is dependent on the number of services and the length of the service name when the environment variables are used for service discovery. ARG_MAX
on the system defines the maximum argument length for a new process and it is set to 2097152 bytes (2 MiB) by default. The kubelet injects environment variables in to each pod scheduled to run in the namespace including:
<service_NAME>_service_HOST=<IP>
<service_NAME>_service_PORT=<PORT>
<service_NAME>_PORT=tcp://<IP>:<PORT>
<service_NAME>_PORT_<PORT>_TCP=tcp://<IP>:<PORT>
<service_NAME>_PORT_<PORT>_TCP_PROTO=tcp
<service_NAME>_PORT_<PORT>_TCP_PORT=<PORT>
<service_NAME>_PORT_<PORT>_TCP_ADDR=<ADDR>
The pods in the namespace start to fail if the argument length exceeds the allowed value and if the number of characters in a service name impacts it.