Managing Pods | Cluster Administration

Overview
Viewing Pods
Limiting Run-once Pod Duration
Limiting the Bandwidth Available to Pods
Setting Pod Disruption Budgets
Injecting Information into Pods Using Pod Presets

Overview

This topic describes the management of pods, including limiting their run-once duration, and how much bandwidth they can use.

Viewing Pods

You can display usage statistics about pods, which provide the runtime environments for containers. These usage statistics include CPU, memory, and storage consumption.

To view the usage statistics:

$ oc adm top pods
NAME                         CPU(cores)   MEMORY(bytes)
hawkular-cassandra-1-pqx6l   219m         1240Mi
hawkular-metrics-rddnv       20m          1765Mi
heapster-n94r4               3m           37Mi

To view the usage statistics for pods with labels:

$ oc adm top pod --selector=''

You must choose the selector (label query) to filter on. Supports =, ==, and !=.

You must have cluster-reader permission to view the usage statistics.

Metrics must be installed to view the usage statistics.

Limiting Run-once Pod Duration

OKD relies on run-once pods to perform tasks such as deploying a pod or performing a build. Run-once pods are pods that have a RestartPolicy of Never or OnFailure.

The cluster administrator can use the RunOnceDuration admission control plug-in to force a limit on the time that those run-once pods can be active. Once the time limit expires, the cluster will try to actively terminate those pods. The main reason to have such a limit is to prevent tasks such as builds to run for an excessive amount of time.

Configuring the RunOnceDuration Plug-in

The plug-in configuration should include the default active deadline for run-once pods. This deadline is enforced globally, but can be superseded on a per-project basis.

admissionConfig:
  pluginConfig:
    RunOnceDuration:
      configuration:
        apiVersion: v1
        kind: RunOnceDurationConfig
        activeDeadlineSecondsOverride: 3600 (1)
....

1	Specify the global default for run-once pods in seconds.

Specifying a Custom Duration per Project

In addition to specifying a global maximum duration for run-once pods, an administrator can add an annotation (openshift.io/active-deadline-seconds-override) to a specific project to override the global default.

For a new project, define the annotation in the project specification .yaml file.
```
apiVersion: v1
kind: Project
metadata:
  annotations:
    openshift.io/active-deadline-seconds-override: "1000" (1)
  name: myproject
```
1 Overrides the default active deadline seconds for run-once pods to 1000 seconds. Note that the value of the override must be specified in string form.

For an existing project,

Run oc edit and add the openshift.io/active-deadline-seconds-override: 1000 annotation in the editor.
```
$ oc edit namespace <project-name>
```
Or

Use the oc patch command:

$ oc patch namespace <project_name> -p '{"metadata":{"annotations":{"openshift.io/active-deadline-seconds-override":"1000"}}}'

Deploying an Egress Router Pod

Example 1. Example Pod Definition for an Egress Router

apiVersion: v1
kind: Pod
metadata:
  name: egress-1
  labels:
    name: egress-1
  annotations:
    pod.network.openshift.io/assign-macvlan: "true"
spec:
  containers:
  - name: egress-router
    image: openshift/origin-egress-router
    securityContext:
      privileged: true
    env:
    - name: EGRESS_SOURCE (1)
      value: 192.168.12.99
    - name: EGRESS_GATEWAY (2)
      value: 192.168.12.1
    - name: EGRESS_DESTINATION (3)
      value: 203.0.113.25
  nodeSelector:
    site: springfield-1 (4)

1	IP address on the node subnet reserved by the cluster administrator for use by this pod.
2	Same value as the default gateway used by the node itself.
3	Connections to the pod are redirected to 203.0.113.25, with a source IP address of 192.168.12.99
4	The pod will only be deployed to nodes with the label site springfield-1.

The pod.network.openshift.io/assign-macvlan annotation creates a Macvlan network interface on the primary network interface, and then moves it into the pod’s network name space before starting the egress-router container.

Preserve the quotation marks around "true". Omitting them will result in errors.

The pod contains a single container, using the openshift/origin-egress-router image, and that container is run privileged so that it can configure the Macvlan interface and set up iptables rules.

The environment variables tell the egress-router image what addresses to use; it will configure the Macvlan interface to use EGRESS_SOURCE as its IP address, with EGRESS_GATEWAY as its gateway.

NAT rules are set up so that connections to any TCP or UDP port on the pod’s cluster IP address are redirected to the same port on EGRESS_DESTINATION.

If only some of the nodes in your cluster are capable of claiming the specified source IP address and using the specified gateway, you can specify a nodeName or nodeSelector indicating which nodes are acceptable.

Deploying an Egress Router Service

Though not strictly necessary, you normally want to create a service pointing to the egress router:

apiVersion: v1
kind: Service
metadata:
  name: egress-1
spec:
  ports:
  - name: http
    port: 80
  - name: https
    port: 443
  type: ClusterIP
  selector:
    name: egress-1

Your pods can now connect to this service. Their connections are redirected to the corresponding ports on the external server, using the reserved egress IP address.

Limiting Pod Access with Egress Firewall

As an OKD cluster administrator, you can use egress policy to limit the external addresses that some or all pods can access from within the cluster, so that:

A pod can only talk to internal hosts, and cannot initiate connections to the public Internet.

Or,
A pod can only talk to the public Internet, and cannot initiate connections to internal hosts (outside the cluster).

Or,
A pod cannot reach specified internal subnets/hosts that it should have no reason to contact.

For example, you can configure projects with different egress policies, allowing <project A> access to a specified IP range, but denying the same access to <project B>.

You must have the ovs-multitenant plug-in enabled in order to limit pod access via egress policy.

Project administrators can neither create EgressNetworkPolicy objects, nor edit the ones you create in their project. There are also several other restrictions on where EgressNetworkPolicy can be created:

The default project (and any other project that has been made global via oc adm pod-network make-projects-global) cannot have egress policy.
If you merge two projects together (via oc adm pod-network join-projects), then you cannot use egress policy in any of the joined projects.
No project may have more than one egress policy object.

Violating any of these restrictions will result in broken egress policy for the project, and may cause all external network traffic to be dropped.

Configuring Pod Access Limits

To configure pod access limits, you must use the oc command or the REST API. You can use oc [create|replace|delete] to manipulate EgressNetworkPolicy objects. The api/swagger-spec/oapi-v1.json file has API-level details on how the objects actually work.

To configure pod access limits:

Navigate to the project you want to affect.
Create a JSON file for the pod limit policy:
```
# oc create -f <policy>.json
```
Configure the JSON file with policy details. For example:
```
{
    "kind": "EgressNetworkPolicy",
    "apiVersion": "v1",
    "metadata": {
        "name": "default"
    },
    "spec": {
        "egress": [
            {
                "type": "Allow",
                "to": {
                    "cidrSelector": "1.2.3.0/24"
                }
            },
            {
                "type": "Allow",
                "to": {
                    "dnsName": "www.foo.com"
                }
            },
            {
                "type": "Deny",
                "to": {
                    "cidrSelector": "0.0.0.0/0"
                }
            }
        ]
    }
}
```
When the example above is added in a project, it allows traffic to IP range 1.2.3.0/24 and domain name www.foo.com, but denies access to all other external IP addresses. (Traffic to other pods is not affected because the policy only applies to external traffic.)

The rules in an EgressNetworkPolicy are checked in order, and the first one that matches takes effect. If the three rules in the above example were reversed, then traffic would not be allowed to 1.2.3.0/24 and www.foo.com because the 0.0.0.0/0 rule would be checked first, and it would match and deny all traffic.

Domain name updates are reflected within 30 minutes. In the above example, suppose www.foo.com resolved to 10.11.12.13, but later it was changed to 20.21.22.23. Then, OKD will take up to 30 minutes to adapt to these DNS updates.

Limiting the Bandwidth Available to Pods

You can apply quality-of-service traffic shaping to a pod and effectively limit its available bandwidth. Egress traffic (from the pod) is handled by policing, which simply drops packets in excess of the configured rate. ingress traffic (to the pod) is handled by shaping queued packets to effectively handle data. The limits you place on a pod do not affect the bandwidth of other pods.

To limit the bandwidth on a pod:

Write an object definition JSON file, and specify the data traffic speed using kubernetes.io/ingress-bandwidth and kubernetes.io/egress-bandwidth annotations. For example, to limit both pod egress and ingress bandwidth to 10M/s:

Limited Pod Object Definition

{
    "kind": "Pod",
    "spec": {
        "containers": [
            {
                "image": "openshift/hello-openshift",
                "name": "hello-openshift"
            }
        ]
    },
    "apiVersion": "v1",
    "metadata": {
        "name": "iperf-slow",
        "annotations": {
            "kubernetes.io/ingress-bandwidth": "10M",
            "kubernetes.io/egress-bandwidth": "10M"
        }
    }
}

Create the pod using the object definition:
```
oc create -f <file_or_dir_path>
```

Setting Pod Disruption Budgets

A pod disruption budget is part of the Kubernetes API, which can be managed with oc commands like other object types. They allow the specification of safety constraints on pods during operations, such as draining a node for maintenance.

PodDisruptionBudget is an API object that specifies the minimum number or percentage of replicas that must be up at a time. Setting these in projects can be helpful during node maintenance (such as scaling a cluster down or a cluster upgrade) and is only honored on voluntary evictions (not on node failures).

A PodDisruptionBudget object’s configuration consists of the following key parts:

A label selector, which is a label query over a set of pods.
An availability level, which specifies the minimum number of pods that must be available simultaneously.

The following is an example of a PodDisruptionBudget resource:

apiVersion: policy/v1beta1 (1)
kind: PodDisruptionBudget
metadata:
  name: my-pdb
spec:
  selector:  (2)
    matchLabels:
      foo: bar
  minAvailable: 2  (3)

1	`PodDisruptionBudget` is part of the `policy/v1beta1` API group.
2	A label query over a set of resources. The result of `matchLabels` and `matchExpressions` are logically conjoined.
3	The minimum number of pods that must be available simultaneously. This can be either an integer or a string specifying a percentage (for example, `20%`).

If you created a YAML file with the above object definition, you could add it to project with the following:

$ oc create -f </path/to/file> -n <project_name>

You can check for pod disruption budgets across all projects with the following:

$ oc get poddisruptionbudget --all-namespaces

NAMESPACE         NAME          MIN-AVAILABLE   SELECTOR
another-project   another-pdb   4               bar=foo
test-project      my-pdb        2               foo=bar

The PodDisruptionBudget is considered healthy when there are at least minAvailable pods running in the system. Every pod above that limit can be evicted.

Injecting Information into Pods Using Pod Presets

A pod preset is an object that injects user-specified information into pods as they are created.

As of OKD 3.7, pod presets are no longer supported.

Using pod preset objects you can inject:

secret objects
ConfigMap objects
storage volumes
container volume mounts
environment variables

Developers only need make sure the pod labels match the label selector on the PodPreset in order to add all that information to the pod. The label on a pod associates the pod with one or more pod preset objects that have a matching label selectors.

Using pod presets, a developer can provision pods without needing to know the details about the services the pod will consume. An administrator can keep configuration items of a service invisible from a developer without preventing the developer from deploying pods. For example, an administrator can create a pod preset that provides the name, user name, and password for a database through a secret and the database port through environment variables. The pod developer only needs to know the label to use to include all the information in pods. A developer can also create pod presets and perform all the same tasks. For example, the developer can create a preset that injects environment variable automatically into multiple pods.

The Pod Preset feature is available only if the Service Catalog has been installed.

You can exclude specific pods from being injected using the podpreset.admission.kubernetes.io/exclude: "true" parameter in the pod specification. See the example pod specification.

For more information, see Injecting Information into Pods Using Pod Presets.