Routing from Edge Load Balancers | Configuring Clusters

Overview
Including the Load Balancer in the SDN
Establishing a Tunnel Using a Ramp Node
- Configuring a Highly-Available Ramp Node

Overview

Pods inside of an OpenShift Container Platform cluster are only reachable via their IP addresses on the cluster network. An edge load balancer can be used to accept traffic from outside networks and proxy the traffic to pods inside the OpenShift Container Platform cluster. In cases where the load balancer is not part of the cluster network, routing becomes a hurdle as the internal cluster network is not accessible to the edge load balancer.

To solve this problem where the OpenShift Container Platform cluster is using OpenShift Container Platform SDN as the cluster networking solution, there are two ways to achieve network access to the pods.

Including the Load Balancer in the SDN

If possible, run an OpenShift Container Platform node instance on the load balancer itself that uses OpenShift SDN as the networking plug-in. This way, the edge machine gets its own Open vSwitch bridge that the SDN automatically configures to provide access to the pods and nodes that reside in the cluster. The routing table is dynamically configured by the SDN as pods are created and deleted, and thus the routing software is able to reach the pods.

Mark the load balancer machine as an unschedulable node so that no pods end up on the load balancer itself:

$ oc adm manage-node <load_balancer_hostname> --schedulable=false

If the load balancer comes packaged as a container, then it is even easier to integrate with OpenShift Container Platform: Simply run the load balancer as a pod with the host port exposed. The pre-packaged HAProxy router in OpenShift Container Platform runs in precisely this fashion.

Establishing a Tunnel Using a Ramp Node

In some cases, the previous solution is not possible. For example, an F5 BIG-IP® host cannot run an OpenShift Container Platform node instance or the OpenShift Container Platform SDN because F5® uses a custom, incompatible Linux kernel and distribution.

Instead, to enable F5 BIG-IP® to reach pods, you can choose an existing node within the cluster network as a ramp node and establish a tunnel between the F5 BIG-IP® host and the designated ramp node. Because it is otherwise an ordinary OpenShift Container Platform node, the ramp node has the necessary configuration to route traffic to any pod on any node in the cluster network. The ramp node thus assumes the role of a gateway through which the F5 BIG-IP® host has access to the entire cluster network.

Following is an example of establishing an ipip tunnel between an F5 BIG-IP® host and a designated ramp node.

On the F5 BIG-IP® host:

Set the following variables:

# F5_IP=10.3.89.66 (1)
# RAMP_IP=10.3.89.89 (1)
# TUNNEL_IP1=10.3.91.216 (2)
# CLUSTER_NETWORK=10.128.0.0/14 (3)

1	The `F5_IP` and `RAMP_IP` variables refer to the F5 BIG-IP® host’s and the ramp node’s IP addresses, respectively, on a shared, internal network.
2	An arbitrary, non-conflicting IP address for the F5® host’s end of the ipip tunnel.
3	The overlay network CIDR range that the OpenShift SDN uses to assign addresses to pods.

Delete any old route, self, tunnel and SNAT pool:

# tmsh delete net route $CLUSTER_NETWORK || true
# tmsh delete net self SDN || true
# tmsh delete net tunnels tunnel SDN || true
# tmsh delete ltm snatpool SDN_snatpool || true

Create the new tunnel, self, route and SNAT pool and use the SNAT pool in the virtual servers:

# tmsh create net tunnels tunnel SDN \
    \{ description "OpenShift SDN" local-address \
    $F5_IP profile ipip remote-address $RAMP_IP \}
# tmsh create net self SDN \{ address \
    ${TUNNEL_IP1}/24 allow-service all vlan SDN \}
# tmsh create net route $CLUSTER_NETWORK interface SDN
# tmsh create ltm snatpool SDN_snatpool members add { $TUNNEL_IP1 }
# tmsh modify ltm virtual  ose-vserver source-address-translation { type snat pool SDN_snatpool }
# tmsh modify ltm virtual  https-ose-vserver source-address-translation { type snat pool SDN_snatpool }

On the ramp node:

The following creates a configuration that is not persistent, meaning that when the ramp node or the openvswitch service is restarted, the settings disappear.

Set the following variables:
```
# F5_IP=10.3.89.66
# TUNNEL_IP1=10.3.91.216
# TUNNEL_IP2=10.3.91.217 (1)
# CLUSTER_NETWORK=10.128.0.0/14 (2)
```
1 A second, arbitrary IP address for the ramp node’s end of the ipip tunnel.

2 The overlay network CIDR range that the OpenShift SDN uses to assign addresses to pods.
Delete any old tunnel:
```
# ip tunnel del tun1 || true
```

Create the ipip tunnel on the ramp node, using a suitable L2-connected interface (e.g., eth0):

# ip tunnel add tun1 mode ipip \
    remote $F5_IP dev eth0
# ip addr add $TUNNEL_IP2 dev tun1
# ip link set tun1 up
# ip route add $TUNNEL_IP1 dev tun1
# ping -c 5 $TUNNEL_IP1

SNAT the tunnel IP with an unused IP from the SDN subnet:

# source /run/openshift-sdn/config.env
# tap1=$(ip -o -4 addr list tun0 | awk '{print $4}' | cut -d/ -f1 | head -n 1)
# subaddr=$(echo ${OPENSHIFT_SDN_TAP1_ADDR:-"$tap1"} | cut -d "." -f 1,2,3)
# export RAMP_SDN_IP=${subaddr}.254

Assign this RAMP_SDN_IP as an additional address to tun0 (the local SDN’s gateway):
```
# ip addr add ${RAMP_SDN_IP} dev tun0
```

Modify the OVS rules for SNAT:

# ipflowopts="cookie=0x999,ip"
# arpflowopts="cookie=0x999, table=0, arp"
#
# ovs-ofctl -O OpenFlow13 add-flow br0 \
    "${ipflowopts},nw_src=${TUNNEL_IP1},actions=mod_nw_src:${RAMP_SDN_IP},resubmit(,0)"
# ovs-ofctl -O OpenFlow13 add-flow br0 \
    "${ipflowopts},nw_dst=${RAMP_SDN_IP},actions=mod_nw_dst:${TUNNEL_IP1},resubmit(,0)"
# ovs-ofctl -O OpenFlow13 add-flow br0 \
    "${arpflowopts}, arp_tpa=${RAMP_SDN_IP}, actions=output:2"
# ovs-ofctl -O OpenFlow13 add-flow br0 \
    "${arpflowopts}, priority=200, in_port=2, arp_spa=${RAMP_SDN_IP}, arp_tpa=${CLUSTER_NETWORK}, actions=goto_table:30"
# ovs-ofctl -O OpenFlow13 add-flow br0 \
    "arp, table=5, priority=300, arp_tpa=${RAMP_SDN_IP}, actions=output:2"
# ovs-ofctl -O OpenFlow13 add-flow br0 \
    "ip,table=5,priority=300,nw_dst=${RAMP_SDN_IP},actions=output:2"
# ovs-ofctl -O OpenFlow13 add-flow br0 "${ipflowopts},nw_dst=${TUNNEL_IP1},actions=output:2"

Optionally, if you do not plan on configuring the ramp node to be highly available, mark the ramp node as unschedulable. Skip this step if you do plan to follow the next section and plan on creating a highly available ramp node.
```
$ oc adm manage-node <ramp_node_hostname> --schedulable=false
```

The F5 router plug-in integrates with F5 BIG-IP®.

Configuring a Highly-Available Ramp Node

You can use OpenShift Container Platform’s ipfailover feature, which uses keepalived internally, to make the ramp node highly available from F5 BIG-IP®'s point of view. To do so, first bring up two nodes, for example called ramp-node-1 and ramp-node-2, on the same L2 subnet.

Then, choose some unassigned IP address from within the same subnet to use for your virtual IP, or VIP. This will be set as the RAMP_IP variable with which you will configure your tunnel on F5 BIG-IP®.

For example, suppose you are using the 10.20.30.0/24 subnet for your ramp nodes, and you have assigned 10.20.30.2 to ramp-node-1 and 10.20.30.3 to ramp-node-2. For your VIP, choose some unassigned address from the same 10.20.30.0/24 subnet, for example 10.20.30.4. Then, to configure ipfailover, mark both nodes with a label, such as f5rampnode:

$ oc label node ramp-node-1 f5rampnode=true
$ oc label node ramp-node-2 f5rampnode=true

Similar to instructions from the ipfailover documentation, you must now create a service account and add it to the privileged SCC. First, create the f5ipfailover service account:

$ oc create serviceaccount f5ipfailover -n default

Next, you can add the f5ipfailover service to the privileged SCC. To add the f5ipfailover in the default namespace to the privileged SCC, run:

$ oc adm policy add-scc-to-user privileged system:serviceaccount:default:f5ipfailover

Finally, configure ipfailover using your chosen VIP (the RAMP_IP variable) and the f5ipfailover service account, assigning the VIP to your two nodes using the f5rampnode label you set earlier:

# RAMP_IP=10.20.30.4
# IFNAME=eth0 (1)
# oc adm ipfailover <name-tag> \
    --virtual-ips=$RAMP_IP \
    --interface=$IFNAME \
    --watch-port=0 \
    --replicas=2 \
    --service-account=f5ipfailover  \
    --selector='f5rampnode=true'

1	The interface where `RAMP_IP` should be configured.

With the above setup, the VIP (the RAMP_IP variable) is automatically re-assigned when the ramp node host that currently has it assigned fails.

1	A second, arbitrary IP address for the ramp node’s end of the ipip tunnel.
2	The overlay network CIDR range that the OpenShift SDN uses to assign addresses to pods.