Managing Networking | Cluster Administration

Important

Azure Red Hat OpenShift 3.11 will be retired 30 June 2022. Support for creation of new Azure Red Hat OpenShift 3.11 clusters continues through 30 November 2020. Following retirement, remaining Azure Red Hat OpenShift 3.11 clusters will be shut down to prevent security vulnerabilities.

Follow this guide to create an Azure Red Hat OpenShift 4 cluster. If you have specific questions, please contact us

Overview
Managing Pod Networks
- Joining Project Networks
Isolating Project Networks
- Making Project Networks Global
Using an Egress Firewall to Limit Access to External Resources
Enabling NetworkPolicy
- Using NetworkPolicy Efficiently
- Setting a Default NetworkPolicy for New Projects
Enabling HTTP Strict Transport Security
Troubleshooting Throughput Issues

Overview

This topic describes the management of the overall cluster network, including project isolation and outbound traffic control.

Managing Pod Networks

When your cluster is configured to use you can manage the separate pod overlay networks for projects using the administrator CLI.

Joining Project Networks

To join projects to an existing project network:

$ oc adm pod-network join-projects --to=<project1> <project2> <project3>

In the above example, all the pods and services in <project2> and <project3> can now access any pods and services in <project1> and vice versa. Services can be accessed either by IP or fully qualified DNS name (<service>.<pod_namespace>.svc.cluster.local). For example, to access a service named db in a project myproject, use db.myproject.svc.cluster.local.

Alternatively, instead of specifying specific project names, you can use the --selector=<project_selector> option.

To verify the networks you have joined together:

$ oc get netnamespaces

Then look at the NETID column. Projects in the same pod-network will have the same NetID.

Isolating Project Networks

To isolate the project network in the cluster and vice versa, run:

$ oc adm pod-network isolate-projects <project1> <project2>

In the above example, all of the pods and services in <project1> and <project2> can not access any pods and services from other non-global projects in the cluster and vice versa.

Alternatively, instead of specifying specific project names, you can use the --selector=<project_selector> option.

Making Project Networks Global

To allow projects to access all pods and services in the cluster and vice versa:

$ oc adm pod-network make-projects-global <project1> <project2>

In the above example, all the pods and services in <project1> and <project2> can now access any pods and services in the cluster and vice versa.

Alternatively, instead of specifying specific project names, you can use the --selector=<project_selector> option.

Using an Egress Firewall to Limit Access to External Resources

As an Azure Red Hat OpenShift cluster administrator, you can use egress firewall policy to limit the external IP addresses that some or all pods can access from within the cluster. Egress firewall policy supports the following scenarios:

A pod can only connect to internal hosts, and cannot initiate connections to the public Internet.
A pod can only connect to the public Internet, and cannot initiate connections to internal hosts that are outside the Azure Red Hat OpenShift cluster.
A pod cannot reach specified internal subnets or hosts that should be unreachable.

Egress policies can be set by specifying an IP address range in CIDR format or by specifying a DNS name. For example, you can allow <project_A> access to a specified IP range but deny the same access to <project_B>. Alternatively, you can restrict application developers from updating from (Python) pip mirrors, and force updates to only come from approved sources.

Project administrators can neither create EgressNetworkPolicy objects, nor edit the ones you create in their project. There are also several other restrictions on where EgressNetworkPolicy can be created:

The default project (and any other project that has been made global via oc adm pod-network make-projects-global) cannot have egress policy.
If you merge two projects together (via oc adm pod-network join-projects), then you cannot use egress policy in any of the joined projects.
No project may have more than one egress policy object.

Violating any of these restrictions results in broken egress policy for the project, and may cause all external network traffic to be dropped.

Use the oc command or the REST API to configure egress policy. You can use oc [create|replace|delete] to manipulate EgressNetworkPolicy objects. The api/swagger-spec/oapi-v1.json file has API-level details on how the objects actually work.

To configure egress policy:

Navigate to the project you want to affect.

Create a JSON file with the policy configuration you want to use, as in the following example:

{
    "kind": "EgressNetworkPolicy",
    "apiVersion": "v1",
    "metadata": {
        "name": "default"
    },
    "spec": {
        "egress": [
            {
                "type": "Allow",
                "to": {
                    "cidrSelector": "1.2.3.0/24"
                }
            },
            {
                "type": "Allow",
                "to": {
                    "dnsName": "www.foo.com"
                }
            },
            {
                "type": "Deny",
                "to": {
                    "cidrSelector": "0.0.0.0/0"
                }
            }
        ]
    }
}

When the example above is added to a project, it allows traffic to IP range 1.2.3.0/24 and domain name www.foo.com, but denies access to all other external IP addresses. Traffic to other pods is not affected because the policy only applies to external traffic.

The rules in an EgressNetworkPolicy are checked in order, and the first one that matches takes effect. If the three rules in the above example were reversed, then traffic would not be allowed to 1.2.3.0/24 and www.foo.com because the 0.0.0.0/0 rule would be checked first, and it would match and deny all traffic.

Domain name updates are polled based on the TTL (time to live) value of the domain returned by the local non-authoritative servers. The pod should also resolve the domain from the same local nameservers when necessary, otherwise the IP addresses for the domain perceived by the egress network policy controller and the pod will be different, and the egress network policy may not be enforced as expected. Since egress network policy controller and pod are asynchronously polling the same local nameserver, there could be a race condition where pod may get the updated IP before the egress controller. Due to this current limitation, domain name usage in EgressNetworkPolicy is only recommended for domains with infrequent IP address changes.

The egress firewall always allows pods access to the external interface of the node the pod is on for DNS resolution. If your DNS resolution is not handled by something on the local node, then you will need to add egress firewall rules allowing access to the DNS server’s IP addresses if you are using domain names in your pods.

Use the JSON file to create an EgressNetworkPolicy object:
```
$ oc create -f <policy>.json
```

Exposing services by creating routes will ignore EgressNetworkPolicy. Egress network policy service endpoint filtering is done at the node kubeproxy. When the router is involved, kubeproxy is bypassed and egress network policy enforcement is not applied. Administrators can prevent this bypass by limiting access to create routes.

Enabling NetworkPolicy

The ovs-subnet and ovs-multitenant plug-ins have their own legacy models of network isolation and do not support Kubernetes NetworkPolicy. However, NetworkPolicy support is available by using the ovs-networkpolicy plug-in.

The Egress policy type, the ipBlock parameter, and the ability to combine the podSelector and namespaceSelector parameters are not available in Azure Red Hat OpenShift.

Do not apply NetworkPolicy features on default Azure Red Hat OpenShift projects, because they can disrupt communication with the cluster.

NetworkPolicy rules do not apply to the host network namespace. Pods with host networking enabled are unaffected by NetworkPolicy rules.

In a cluster network isolation is controlled entirely by NetworkPolicy objects. By default, all pods in a project are accessible from other pods and network endpoints. To isolate one or more pods in a project, you can create NetworkPolicy objects in that project to indicate the allowed incoming connections. Project administrators can create and delete NetworkPolicy objects within their own project.

Pods that do not have NetworkPolicy objects pointing to them are fully accessible, whereas, pods that have one or more NetworkPolicy objects pointing to them are isolated. These isolated pods only accept connections that are accepted by at least one of their NetworkPolicy objects.

Following are a few sample NetworkPolicy object definitions supporting different scenarios:

Deny All Traffic

To make a project "deny by default" add a NetworkPolicy object that matches all pods but accepts no traffic.

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: deny-by-default
spec:
  podSelector:
  ingress: []

Only Accept connections from pods within project

To make pods accept connections from other pods in the same project, but reject all other connections from pods in other projects:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-same-namespace
spec:
  podSelector:
  ingress:
  - from:
    - podSelector: {}

Only allow HTTP and HTTPS traffic based on pod labels

To enable only HTTP and HTTPS access to the pods with a specific label (role=frontend in following example), add a NetworkPolicy object similar to:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-http-and-https
spec:
  podSelector:
    matchLabels:
      role: frontend
  ingress:
  - ports:
    - protocol: TCP
      port: 80
    - protocol: TCP
      port: 443

NetworkPolicy objects are additive, which means you can combine multiple NetworkPolicy objects together to satisfy complex network requirements.

For example, for the NetworkPolicy objects defined in previous samples, you can define both allow-same-namespace and allow-http-and-https policies within the same project. Thus allowing the pods with the label role=frontend, to accept any connection allowed by each policy. That is, connections on any port from pods in the same namespace, and connections on ports 80 and 443 from pods in any namespace.

Using NetworkPolicy Efficiently

NetworkPolicy objects allow you to isolate pods that are differentiated from one another by labels, within a namespace.

It is inefficient to apply NetworkPolicy objects to large numbers of individual pods in a single namespace. Pod labels do not exist at the IP level, so NetworkPolicy objects generate a separate OVS flow rule for every single possible link between every pod selected with podSelector.

For example, if the spec podSelector and the ingress podSelector within a NetworkPolicy object each match 200 pods, then 40000 (200*200) OVS flow rules are generated. This might slow down the machine.

To reduce the amount of OVS flow rules, use namespaces to contain groups of pods that need to be isolated.

NetworkPolicy objects that select a whole namespace, by using namespaceSelectors or empty podSelectors, only generate a single OVS flow rule that matches the VXLAN VNID of the namespace.

Keep the pods that do not need to be isolated in their original namespace, and move the pods that require isolation into one or more different namespaces.

Create additional targeted cross-namespace policies to allow the specific traffic that you do want to allow from the isolated pods.

Setting a Default NetworkPolicy for New Projects

The cluster administrators can modify the default project template to enable automatic creation of default NetworkPolicy objects (one or more), whenever a new project is created. To do this:

Create a custom project template and configure the master to use it.

To include NetworkPolicy objects into existing template, use the oc edit command. Currently, it is not possible to use oc patch to add objects to a Template resource.

Add each default policy as an element in the objects array:

objects:
...
- apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    name: allow-from-same-namespace
  spec:
    podSelector:
    ingress:
    - from:
      - podSelector: {}
- apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    name: allow-from-default-namespace
  spec:
    podSelector:
    ingress:
    - from:
      - namespaceSelector:
          matchLabels:
            name: default
...

Enabling HTTP Strict Transport Security

HTTP Strict Transport Security (HSTS) policy is a security enhancement, which ensures that only HTTPS traffic is allowed on the host. Any HTTP requests are dropped by default. This is useful for ensuring secure interactions with websites, or to offer a secure application for the user’s benefit.

When HSTS is enabled, HSTS adds a Strict Transport Security header to HTTPS responses from the site. You can use the insecureEdgeTerminationPolicy value in a route to redirect to send HTTP to HTTPS. However, when HSTS is enabled, the client changes all requests from the HTTP URL to HTTPS before the request is sent, eliminating the need for a redirect. This is not required to be supported by the client, and can be disabled by setting max-age=0.

HSTS works only with secure routes (either edge terminated or re-encrypt). The configuration is ineffective on HTTP or passthrough routes.

To enable HSTS to a route, add the haproxy.router.openshift.io/hsts_header value to the edge terminated or re-encrypt route:

apiVersion: v1
kind: Route
metadata:
  annotations:
    haproxy.router.openshift.io/hsts_header: max-age=31536000;includeSubDomains;preload

Ensure there are no spaces and no other values in the parameters in the haproxy.router.openshift.io/hsts_header value. Only max-age is required.

The required max-age parameter indicates the length of time, in seconds, the HSTS policy is in effect for. The client updates max-age whenever a response with a HSTS header is received from the host. When max-age times out, the client discards the policy.

The optional includeSubDomains parameter tells the client that all subdomains of the host are to be treated the same as the host.

If max-age is greater than 0, the optional preload parameter allows external services to include this site in their HSTS preload lists. For example, sites such as Google can construct a list of sites that have preload set. Browsers can then use these lists to determine which sites to only talk to over HTTPS, even before they have interacted with the site. Without preload set, they need to have talked to the site over HTTPS to get the header.

Troubleshooting Throughput Issues

Sometimes applications deployed through Azure Red Hat OpenShift can cause network throughput issues such as unusually high latency between specific services.

Use the following methods to analyze performance issues if pod logs do not reveal any cause of the problem:

Use a packet analyzer, such as ping or tcpdump to analyze traffic between a pod and its node.

For example, run the tcpdump tool on each pod while reproducing the behavior that led to the issue. Review the captures on both sides to compare send and receive timestamps to analyze the latency of traffic to/from a pod. Latency can occur in Azure Red Hat OpenShift if a node interface is overloaded with traffic from other pods, storage devices, or the data plane.
```
$ tcpdump -s 0 -i any -w /tmp/dump.pcap host <podip 1> && host <podip 2> (1)
```
1 podip is the IP address for the pod. Run the following command to get the IP address of the pods:
```
# oc get pod <podname> -o wide
```
tcpdump generates a file at /tmp/dump.pcap containing all traffic between these two pods. Ideally, run the analyzer shortly before the issue is reproduced and stop the analyzer shortly after the issue is finished reproducing to minimize the size of the file. You can also run a packet analyzer between the nodes (eliminating the SDN from the equation) with:
```
# tcpdump -s 0 -i any -w /tmp/dump.pcap port 4789
```
Use a bandwidth measuring tool, such as iperf, to measure streaming throughput and UDP throughput. Run the tool from the pods first, then from the nodes to attempt to locate any bottlenecks. The iperf3 tool is included as part of RHEL 7.