Managing Pods | Cluster Administration | OpenShift Container Platform 3.4

Overview
Managing Pod Networks
- Joining Project Networks
Isolating Project Networks
- Making Project Networks Global
Limiting Run-once Pod Duration
- Configuring the RunOnceDuration Plug-in
- Specifying a Custom Duration per Project
Controlling egress Traffic
- Using an egress Firewall to Limit Access to External Resources
- Using an egress Router to Allow External Resources to Recognize Pod Traffic
Limiting the Bandwidth Available to Pods
Setting Pod Disruption Budgets

Overview

This topic describes the management of pods, including managing their networks, limiting their run-once duration, and limiting what they can access, and how much bandwidth they can use.

Managing Pod Networks

When your cluster is configured to use the ovs-multitenant SDN plug-in, you can manage the separate pod overlay networks for projects using the administrator CLI. See the Configuring the SDN section for plug-in configuration steps, if necessary.

Joining Project Networks

To join projects to an existing project network:

$ oadm pod-network join-projects --to=<project1> <project2> <project3>

In the above example, all the pods and services in <project2> and <project3> can now access any pods and services in <project1> and vice versa.

Alternatively, instead of specifying specific project names, you can use the --selector=<project_selector> option.

Isolating Project Networks

To isolate the project network in the cluster and vice versa, run:

$ oadm pod-network isolate-projects <project1> <project2>

In the above example, all of the pods and services in <project1> and <project2> can not access any pods and services from other non-global projects in the cluster and vice versa.

Alternatively, instead of specifying specific project names, you can use the --selector=<project_selector> option.

Making Project Networks Global

To allow projects to access all pods and services in the cluster and vice versa:

$ oadm pod-network make-projects-global <project1> <project2>

In the above example, all the pods and services in <project1> and <project2> can now access any pods and services in the cluster and vice versa.

Alternatively, instead of specifying specific project names, you can use the --selector=<project_selector> option.

Limiting Run-once Pod Duration

OpenShift Container Platform relies on run-once pods to perform tasks such as deploying a pod or performing a build. Run-once pods are pods that have a RestartPolicy of Never or OnFailure.

The cluster administrator can use the RunOnceDuration admission control plug-in to force a limit on the time that those run-once pods can be active. Once the time limit expires, the cluster will try to actively terminate those pods. The main reason to have such a limit is to prevent tasks such as builds to run for an excessive amount of time.

Configuring the RunOnceDuration Plug-in

The plug-in configuration should include the default active deadline for run-once pods. This deadline is enforced globally, but can be superseded on a per-project basis.

kubernetesMasterConfig:
  admissionConfig:
    pluginConfig:
      RunOnceDuration:
        configuration:
          apiVersion: v1
          kind: RunOnceDurationConfig
          activeDeadlineSecondsOverride: 3600 (1)

1	Specify the global default for run-once pods in seconds.

Specifying a Custom Duration per Project

In addition to specifying a global maximum duration for run-once pods, an administrator can add an annotation (openshift.io/active-deadline-seconds-override) to a specific project to override the global default.

apiVersion: v1
kind: Project
metadata:
  annotations:
    openshift.io/active-deadline-seconds-override: "1000" (1)

1	Overrides the default active deadline seconds for run-once pods to 1000 seconds. Note that the value of the override must be specified in string form.

Controlling egress Traffic

As an OpenShift Container Platform cluster administrator, you can control egress traffic in two ways:

Firewall: Using an egress firewall allows you to enforce the acceptable outbound traffic policies, so that specific endpoints or IP ranges (subnets) are the only acceptable targets for the dynamic endpoints (pods within OpenShift Container Platform) to talk to.
Router: Using an egress router allows you to create identifiable services to send traffic to a specific destination, ensuring an external destination treats traffic as though it were coming from a known source. This helps with security, because it allows you to secure an external database so that only specific pods in a namespace can talk to a service (the egress router), which proxies the traffic to your database.

Using an egress Firewall to Limit Access to External Resources

As an OpenShift Container Platform cluster administrator, you can use egress firewall policy to limit the external addresses that some or all pods can access from within the cluster, so that:

A pod can only talk to internal hosts, and cannot initiate connections to the public Internet.

Or,
A pod can only talk to the public Internet, and cannot initiate connections to internal hosts (outside the cluster).

Or,
A pod cannot reach specified internal subnets/hosts that it should have no reason to contact.

You can configure projects to have different egress policies. For example, allowing <project A> access to a specified IP range, but denying the same access to <project B>. Or restrict application developers from updating from (Python) pip mirrors, and forcing updates to only come from desired sources.

You must have the ovs-multitenant plug-in enabled in order to limit pod access via egress policy.

Project administrators can neither create egressNetworkPolicy objects, nor edit the ones you create in their project. There are also several other restrictions on where egressNetworkPolicy can be created:

The default project (and any other project that has been made global via oadm pod-network make-projects-global) cannot have egress policy.
If you merge two projects together (via oadm pod-network join-projects), then you cannot use egress policy in any of the joined projects.
No project may have more than one egress policy object.

Violating any of these restrictions results in broken egress policy for the project, and may cause all external network traffic to be dropped.

Use the oc command or the REST API to configure egress policy. You can use oc [create|replace|delete] to manipulate egressNetworkPolicy objects. The api/swagger-spec/oapi-v1.json file has API-level details on how the objects actually work.

To configure egress policy:

Navigate to the project you want to affect.
Create a JSON file with the desired policy details. For example:
```
{
    "kind": "egressNetworkPolicy",
    "apiVersion": "v1",
    "metadata": {
        "name": "default"
    },
    "spec": {
        "egress": [
            {
                "type": "Allow",
                "to": {
                    "cidrSelector": "1.2.3.0/24"
                }
            },
            {
                "type": "Deny",
                "to": {
                    "cidrSelector": "0.0.0.0/0"
                }
            }
        ]
    }
}
```
When the example above is added in a project, it allows traffic to 1.2.3.0/24, but denies access to all other external IP addresses. (Traffic to other pods is not affected because the policy only applies to external traffic.)

The rules in an egressNetworkPolicy are checked in order, and the first one that matches takes effect. If the two rules in the above example were swapped, then traffic would not be allowed to 1.2.3.0/24 because the 0.0.0.0/0 rule would be checked first, and it would match and deny all traffic.
Use the JSON file to create an egressNetworkPolicy object:
```
# oc create -f <policy>.json
```

Using an egress Router to Allow External Resources to Recognize Pod Traffic

The OpenShift Container Platform egress router runs a service that redirects traffic to a specified remote server, using a private source IP address that is not used for anything else. The service allows pods to talk to servers that are set up to only allow access from whitelisted IP addresses.

The egress router is not intended for every outgoing connection. Creating large numbers of egress routers can push the limits of your network hardware. For example, creating an egress router for every project or application could exceed the number of local MAC addresses that the network interface can handle before falling back to filtering MAC addresses in software.

Important Deployment Considerations

The egress router adds a second IP address and MAC address to the node’s primary network interface. If you are not running OpenShift Container Platform on bare metal, you may need to configure your hypervisor or cloud provider to allow the additional address.

Red Hat OpenStack Platform

If you are deploying OpenShift Container Platform on Red Hat OpenStack Platform, you need to whitelist the IP and MAC addresses on your Openstack environment, otherwise communication will fail:

neutron port-update $neutron_port_uuid \
  --allowed_address_pairs list=true \
  type=dict mac_address=<mac_address>,ip_address=<ip_address>

Red Hat Enterprise Virtualization

If you are using Red Hat Enterprise Virtualization, you should set EnableMACAntiSpoofingFilterRules to false.

VMware vSphere

If you are using VMware vSphere, follow VMware’s Securing Virtual Switch Ports and Forged Transmissions guidance.

Deploying an egress Router Pod

Create a pod configuration using the following:

Example 1. Example Pod Definition for an egress Router

apiVersion: v1
kind: Pod
metadata:
  name: egress-1
  labels:
    name: egress-1
  annotations:
    pod.network.openshift.io/assign-macvlan: "true" (1)
spec:
  containers:
  - name: egress-router
    image: registry.access.redhat.com/openshift3/ose-egress-router
    securityContext:
      privileged: true
    env:
    - name: egress_SOURCE (2)
      value: 192.168.12.99
    - name: egress_GATEWAY (3)
      value: 192.168.12.1
    - name: egress_DESTINATION (4)
      value: 203.0.113.25
  nodeSelector:
    site: springfield-1 (5)

1	The `pod.network.openshift.io/assign-macvlan annotation` creates a Macvlan network interface on the primary network interface, and then moves it into the pod’s network name space before starting the egress-router container. Preserve the the quotation marks around `"true"`. Omitting them will result in errors.
2	An IP address from the physical network that the node itself is on and is reserved by the cluster administrator for use by this pod.
3	Same value as the default gateway used by the node itself.
4	The external server that to direct traffic to. Using this example, connections to the pod are redirected to 203.0.113.25, with a source IP address of 192.168.12.99.
5	The pod will only be deployed to nodes with the label `site=springfield-1`.

Create the pod using the above definition:
```
$ oc create -f <pod_name>.json
```
To check to see if the pod has been created:
```
oc get pod <pod_name>
```
Ensure other pods can find the pod’s IP address by creating a service to point to the egress router:
apiVersion: v1 kind: Service metadata: name: egress-1 spec: ports: - name: http port: 80 - name: https port: 443 type: ClusterIP selector: name: egress-1
Your pods can now connect to this service. Their connections are redirected to the corresponding ports on the external server, using the reserved egress IP address.

The pod contains a single container, using the openshift3/ose-egress-router image, and that container is run privileged so that it can configure the Macvlan interface and set up iptables rules.

The environment variables tell the egress-router image what addresses to use; it will configure the Macvlan interface to use egress_SOURCE as its IP address, with egress_GATEWAY as its gateway.

NAT rules are set up so that connections to any TCP or UDP port on the pod’s cluster IP address are redirected to the same port on egress_DESTINATION.

If only some of the nodes in your cluster are capable of claiming the specified source IP address and using the specified gateway, you can specify a nodeName or nodeSelector indicating which nodes are acceptable.

Enabling Failover for egress Router Pods

Using a replication controller, you can ensure that there is always one copy of the egress router pod in order to prevent downtime.

Create a replication controller configuration file using the following:

apiVersion: v1
kind: ReplicationController
metadata:
  name: egress-demo-controller
spec:
  replicas: 1 (1)
  selector:
    name: egress-demo
  template:
    metadata:
      name: egress-demo
      labels:
        name: egress-demo
      annotations:
        pod.network.openshift.io/assign-macvlan: "true"
    spec:
      containers:
      - name: egress-demo-container
        image: openshift/origin-egress-router
        env:
        - name: egress_SOURCE
          value: 192.168.12.99
        - name: egress_GATEWAY
          value: 192.168.12.1
        - name: egress_DESTINATION
          value: 203.0.113.25
        securityContext:
          privileged: true
      nodeSelector:
        site: springfield-1

1	Ensure `replicas` is set to `1`, because only one pod can be using a given `egress_SOURCE` value at any time. This means that only a single copy of the router will be running, on a node with the label `site=springfield-1`.

Create the pod using the definition:

$ oc create -f <replication_controller>.json

To verify, check to see if the replication controller pod has been created:
```
oc describe rc <replication_controller>
```

Limiting the Bandwidth Available to Pods

You can apply quality-of-service traffic shaping to a pod and effectively limit its available bandwidth. egress traffic (from the pod) is handled by policing, which simply drops packets in excess of the configured rate. Ingress traffic (to the pod) is handled by shaping queued packets to effectively handle data. The limits you place on a pod do not affect the bandwidth of other pods.

To limit the bandwidth on a pod:

Write an object definition JSON file, and specify the data traffic speed using kubernetes.io/ingress-bandwidth and kubernetes.io/egress-bandwidth annotations. For example, to limit both pod egress and ingress bandwidth to 10M/s:

Example 2. Limited Pod Object Definition

{
    "kind": "Pod",
    "spec": {
        "containers": [
            {
                "image": "openshift/hello-openshift",
                "name": "hello-openshift"
            }
        ]
    },
    "apiVersion": "v1",
    "metadata": {
        "name": "iperf-slow",
        "annotations": {
            "kubernetes.io/ingress-bandwidth": "10M",
            "kubernetes.io/egress-bandwidth": "10M"
        }
    }
}

Create the pod using the object definition:
```
oc create -f <file_or_dir_path>
```

Setting Pod Disruption Budgets

A pod disruption budget is part of the Kubernetes API, which can be managed with oc commands like other object types. They allow the specification of safety constraints on pods during operations, such as draining a node for maintenance.

Starting in OpenShift Container Platform 3.4, pod disruption budgets is a feature in Technology Preview, available only for users with cluster-admin privileges.

PodDisruptionBudget is an API object that specifies the minimum number or percentage of replicas that must be up at a time. Setting these in projects can be helpful during node maintenance (such as scaling a cluster down or a cluster upgrade) and is only honored on voluntary evictions (not on node failures).

A PodDisruptionBudget object’s configuration consists of the following key parts:

A label selector, which is a label query over a set of pods.
An availability level, which specifies the minimum number of pods that must be available simultaneously.

The following is an example of a PodDisruptionBudget resource:

apiVersion: policy/v1alpha1 (1)
kind: PodDisruptionBudget
metadata:
  name: my-pdb
spec:
  selector:  (2)
    matchLabels:
      foo: bar
  minAvailable: 2  (3)

1	`PodDisruptionBudget` is part of the `policy/v1alpha` API group.
2	A label query over a set of resources. The result of `matchLabels` and `matchExpressions` are logically conjoined.
3	The minimum number of pods that must be available simultaneously. This can be either an integer or a string specifying a percentage (for example, `20%`).

If you created a YAML file with the above object definition, you could add it to project with the following:

$ oc create -f </path/to/file> -n <project_name>

You can check for pod disruption budgets across all projects with the following:

$ oc get poddisruptionbudget --all-namespaces

NAMESPACE         NAME          MIN-AVAILABLE   SELECTOR
another-project   another-pdb   4               bar=foo
test-project      my-pdb        2               foo=bar

The PodDisruptionBudget is considered healthy when there are at least minAvailable pods running in the system. Every pod above that limit can be evicted.