$ oc adm pod-network join-projects --to=<project1> <project2> <project3>
This topic describes the management of the overall cluster network, including project isolation and outbound traffic control.
Pod-level networking features, such as per-pod bandwidth limits, are discussed in Managing Pods.
When your cluster is configured to use the ovs-multitenant SDN plug-in, you can manage the separate pod overlay networks for projects using the administrator CLI. See the Configuring the SDN section for plug-in configuration steps, if necessary.
To join projects to an existing project network:
$ oc adm pod-network join-projects --to=<project1> <project2> <project3>
In the above example, all the pods and services in <project2>
and <project3>
can now access any pods and services in <project1>
and vice versa. Services
can be accessed either by IP or fully qualified DNS name
(<service>.<pod_namespace>.svc.cluster.local
). For example, to access a
service named db
in a project myproject
, use db.myproject.svc.cluster.local
.
Alternatively, instead of specifying specific project names, you can use the
--selector=<project_selector>
option.
To verify the networks you have joined together:
$ oc get netnamespaces
Then look at the NETID column. Projects in the same pod-network will have the same NetID.
To isolate the project network in the cluster and vice versa, run:
$ oc adm pod-network isolate-projects <project1> <project2>
In the above example, all of the pods and services in <project1>
and
<project2>
can not access any pods and services from other non-global
projects in the cluster and vice versa.
Alternatively, instead of specifying specific project names, you can use the
--selector=<project_selector>
option.
To allow projects to access all pods and services in the cluster and vice versa:
$ oc adm pod-network make-projects-global <project1> <project2>
In the above example, all the pods and services in <project1>
and <project2>
can now access any pods and services in the cluster and vice versa.
Alternatively, instead of specifying specific project names, you can use the
--selector=<project_selector>
option.
In OKD, host name collision prevention for routes and ingress objects is enabled by default. This means that users without the cluster-admin role can set the host name in a route or ingress object only on creation and cannot change it afterwards. However, you can relax this restriction on routes and ingress objects for some or all users.
Because OKD uses the object creation timestamp to determine the oldest route or ingress object for a given host name, a route or ingress object can hijack a host name of a newer route if the older route changes its host name, or if an ingress object is introduced. |
As an OKD cluster administrator, you can edit the host name in a route even after creation. You can also create a role to allow specific users to do so:
$ oc create clusterrole route-editor --verb=update --resource=routes.route.openshift.io/custom-host
You can then bind the new role to a user:
$ oc adm policy add-cluster-role-to-user route-editor user
You can also disable host name collision prevention for ingress objects. Doing so lets users without the cluster-admin role edit a host name for ingress objects after creation. This is useful to OKD installations that depend upon Kubernetes behavior, including allowing the host names in ingress objects be edited.
Add the following to the master.yaml
file:
admissionConfig:
pluginConfig:
openshift.io/IngressAdmission:
configuration:
apiVersion: v1
allowHostnameChanges: true
kind: IngressAdmissionConfig
location: ""
Restart the master services for the changes to take effect:
$ master-restart api
$ master-restart controllers
As a cluster administrator you can allocate a number of static IP addresses to a
specific node at the host level. If an application developer needs a dedicated
IP address for their application service, they can request one during the
process they use to ask for firewall access. They can then deploy an egress
router from the developer’s project, using a nodeSelector
in the deployment
configuration to ensure that the pod lands on the host with the pre-allocated
static IP address.
The egress pod’s deployment declares one of the source IPs, the destination IP
of the protected service, and a gateway IP to reach the destination. After the
pod is deployed, you can
create
a service to access the egress router pod, then add that source IP to the
corporate firewall. The developer then has access information to the egress
router service that was created in their project, for example,
service.project.cluster.domainname.com
.
When the developer needs to access the external, firewalled service, they can
call out to the egress router pod’s service
(service.project.cluster.domainname.com
) in their application (for example,
the JDBC connection information) rather than the actual protected service URL.
You can also assign static IP addresses to projects, ensuring that all outgoing external connections from the specified project have recognizable origins. This is different from the default egress router, which is used to send traffic to specific destinations.
See the Enabling Fixed IPs for External Project Traffic section for more information.
As an OKD cluster administrator, you can control egress traffic in these ways:
Using an egress firewall allows you to enforce the acceptable outbound traffic policies, so that specific endpoints or IP ranges (subnets) are the only acceptable targets for the dynamic endpoints (pods within OKD) to talk to.
Using an egress router allows you to create identifiable services to send traffic to certain destinations, ensuring those external destinations treat traffic as though it were coming from a known source. This helps with security, because it allows you to secure an external database so that only specific pods in a namespace can talk to a service (the egress router), which proxies the traffic to your database.
In addition to the above OKD-internal solutions, it is also possible to create iptables rules that will be applied to outgoing traffic. These rules allow for more possibilities than the egress firewall, but cannot be limited to particular projects.
As an OKD cluster administrator, you can use egress firewall policy to limit the external IP addresses that some or all pods can access from within the cluster. Egress firewall policy supports the following scenarios:
A pod can only connect to internal hosts, and cannot initiate connections to the public Internet.
A pod can only connect to the public Internet, and cannot initiate connections to internal hosts that are outside the OKD cluster.
A pod cannot reach specified internal subnets or hosts that should be unreachable.
Egress policies can be set by specifying an IP address range in CIDR format or
by specifying a DNS name. For example, you can allow <project_A>
access to a
specified IP range but deny the same access to <project_B>
. Alternatively, you
can restrict application developers from updating from (Python) pip mirrors, and
force updates to only come from approved sources.
You must have the ovs-multitenant or ovs-networkpolicy plug-in enabled in order to limit pod access via egress policy. If you are using the ovs-multitenant plug-in, egress policy is compatible with only one policy per project, and will not work with projects that share a network, such as global projects. |
Project administrators can neither create EgressNetworkPolicy
objects, nor
edit the ones you create in their project. There are also several other
restrictions on where EgressNetworkPolicy
can be created:
The default
project (and any other project that has been made global via
oc adm pod-network make-projects-global
) cannot have egress policy.
If you merge two projects together (via oc adm pod-network join-projects
),
then you cannot use egress policy in any of the joined projects.
No project may have more than one egress policy object.
Violating any of these restrictions results in broken egress policy for the project, and may cause all external network traffic to be dropped.
Use the oc
command or the REST API to configure egress policy. You can use
oc [create|replace|delete]
to manipulate EgressNetworkPolicy
objects. The
api/swagger-spec/oapi-v1.json file has API-level details on how the objects
actually work.
To configure egress policy:
Navigate to the project you want to affect.
Create a JSON file with the policy configuration you want to use, as in the following example:
{
"kind": "EgressNetworkPolicy",
"apiVersion": "v1",
"metadata": {
"name": "default"
},
"spec": {
"egress": [
{
"type": "Allow",
"to": {
"cidrSelector": "1.2.3.0/24"
}
},
{
"type": "Allow",
"to": {
"dnsName": "www.foo.com"
}
},
{
"type": "Deny",
"to": {
"cidrSelector": "0.0.0.0/0"
}
}
]
}
}
When the example above is added to a project, it allows traffic to IP range
1.2.3.0/24
and domain name www.foo.com
, but denies access to all other
external IP addresses. Traffic to other pods is not affected because the policy
only applies to external traffic.
The rules in an EgressNetworkPolicy
are checked in order, and the first one
that matches takes effect. If the three rules in the above example were
reversed, then traffic would not be allowed to 1.2.3.0/24
and www.foo.com
because the 0.0.0.0/0
rule would be checked first, and it would match and deny
all traffic.
Domain name updates are polled based on the TTL (time to live) value of the
domain returned by the local non-authoritative servers. The pod should also
resolve the domain from the same local nameservers when necessary, otherwise
the IP addresses for the domain perceived by the egress network policy controller
and the pod will be different, and the egress network policy may not be enforced
as expected. Since egress network policy controller and pod are asynchronously
polling the same local nameserver, there could be a race condition where pod may
get the updated IP before the egress controller. Due to this current limitation,
domain name usage in EgressNetworkPolicy
is only recommended for domains with
infrequent IP address changes.
The egress firewall always allows pods access to the external interface of the node the pod is on for DNS resolution. If your DNS resolution is not handled by something on the local node, then you will need to add egress firewall rules allowing access to the DNS server’s IP addresses if you are using domain names in your pods. |
Use the JSON file to create an EgressNetworkPolicy object:
$ oc create -f <policy>.json
Exposing services by creating
routes will ignore
|
The OKD egress router runs a service that redirects traffic to a specified remote server, using a private source IP address that is not used for anything else. The service allows pods to talk to servers that are set up to only allow access from whitelisted IP addresses.
The egress router is not intended for every outgoing connection. Creating large numbers of egress routers can push the limits of your network hardware. For example, creating an egress router for every project or application could exceed the number of local MAC addresses that the network interface can handle before falling back to filtering MAC addresses in software. |
Currently, the egress router is not compatible with Amazon AWS, Azure Cloud, or any other cloud platform that does not support layer 2 manipulations due to their incompatibility with macvlan traffic. |
Deployment Considerations
The Egress router adds a second IP address and MAC address to the node’s primary network interface. If you are not running OKD on bare metal, you may need to configure your hypervisor or cloud provider to allow the additional address.
If you are deploying OKD on Red Hat OpenStack Platform, you need to whitelist the IP and MAC addresses on your OpenStack environment, otherwise communication will fail:
neutron port-update $neutron_port_uuid \ --allowed_address_pairs list=true \ type=dict mac_address=<mac_address>,ip_address=<ip_address>
If you are using Red Hat Enterprise Virtualization, you should set
EnableMACAntiSpoofingFilterRules
to false
.
If you are using VMware vSphere, see the VMWare documentation for securing vSphere standard switches. View and change VMWare vSphere default settings by selecting the host’s virtual switch from the vSphere Web Client.
Specifically, ensure that the following are enabled:
Egress router Modes
The egress router can run in three different modes: redirect mode, HTTP proxy mode and DNS proxy mode. Redirect mode works for all services except for HTTP and HTTPS. For HTTP and HTTPS services, use HTTP proxy mode. For TCP-based services with IP addresses or domain names, use DNS proxy mode.
In redirect mode, the egress router sets up iptables rules to redirect traffic from its own IP address to one or more destination IP addresses. Client pods that want to make use of the reserved source IP address must be modified to connect to the egress router rather than connecting directly to the destination IP.
Create a pod configuration using the following:
apiVersion: v1
kind: Pod
metadata:
name: egress-1
labels:
name: egress-1
annotations:
pod.network.openshift.io/assign-macvlan: "true" (1)
spec:
initContainers:
- name: egress-router
image: openshift/origin-egress-router
securityContext:
privileged: true
env:
- name: EGRESS_SOURCE (2)
value: 192.168.12.99/24
- name: EGRESS_GATEWAY (3)
value: 192.168.12.1
- name: EGRESS_DESTINATION (4)
value: 203.0.113.25
- name: EGRESS_routeR_MODE (5)
value: init
containers:
- name: egress-router-wait
image: openshift/origin-pod
nodeSelector:
site: springfield-1 (6)
1 | Creates a Macvlan network interface on the primary network interface, and
moves it into the pod’s network project before starting the egress-router
container. Preserve the quotation marks around "true" . Omitting them results
in errors. To create the Macvlan interface on a network interface other than the primary one, set the annotation value to the name of that interface. For example, eth1 . |
2 | IP address from the physical network that the node is on and is reserved by the
cluster administrator for use by this pod. Optionally, you can include the
subnet length, the /24 suffix, so that a proper route to the local subnet can
be set up. If you do not specify a subnet length, then the egress router can
access only the host specified with the EGRESS_GATEWAY variable and no other
hosts on the subnet. |
3 | Same value as the default gateway used by the node. |
4 | The external server to direct traffic to. Using this example, connections to the pod are redirected to 203.0.113.25, with a source IP address of 192.168.12.99. |
5 | This tells the egress router image that it is being deployed as an "init container". Previous versions of OKD (and the egress router image) did not support this mode and had to be run as an ordinary container. |
6 | The pod is only deployed to nodes with the label site=springfield-1 . |
Create the pod using the above definition:
$ oc create -f <pod_name>.json
To check to see if the pod has been created:
$ oc get pod <pod_name>
Ensure other pods can find the pod’s IP address by creating a service to point to the egress router:
apiVersion: v1
kind: Service
metadata:
name: egress-1
spec:
ports:
- name: http
port: 80
- name: https
port: 443
type: ClusterIP
selector:
name: egress-1
Your pods can now connect to this service. Their connections are redirected to the corresponding ports on the external server, using the reserved egress IP address.
The egress router setup is performed by an "init container" created from the
openshift/origin-egress-router
image, and that container is run privileged so that it can configure the Macvlan
interface and set up iptables
rules. After it finishes setting up
the iptables
rules, it exits and the
openshift/origin-pod
container will run (doing nothing) until the pod is killed.
The environment variables tell the egress-router image what addresses to use; it
will configure the Macvlan interface to use EGRESS_SOURCE
as its IP address,
with EGRESS_GATEWAY
as its gateway.
NAT rules are set up so that connections to any TCP or UDP port on the
pod’s cluster IP address are redirected to the same port on
EGRESS_DESTINATION
.
If only some of the nodes in your cluster are capable of claiming the specified
source IP address and using the specified gateway, you can specify a
nodeName
or nodeSelector
indicating which nodes are acceptable.
In the previous example, connections to the egress pod (or its corresponding service) on any port are redirected to a single destination IP. You can also configure different destination IPs depending on the port:
apiVersion: v1
kind: Pod
metadata:
name: egress-multi
labels:
name: egress-multi
annotations:
pod.network.openshift.io/assign-macvlan: "true"
spec:
initContainers:
- name: egress-router
image: openshift/origin-egress-router
securityContext:
privileged: true
env:
- name: EGRESS_SOURCE (1)
value: 192.168.12.99/24
- name: EGRESS_GATEWAY
value: 192.168.12.1
- name: EGRESS_DESTINATION (2)
value: |
80 tcp 203.0.113.25
8080 tcp 203.0.113.26 80
8443 tcp 203.0.113.26 443
203.0.113.27
- name: EGRESS_routeR_MODE
value: init
containers:
- name: egress-router-wait
image: openshift/origin-pod
1 | IP address from the physical network that the node is on and is reserved by the
cluster administrator for use by this pod. Optionally, you can include the
subnet length, the /24 suffix, so that a proper route to the local subnet can
be set up. If you do not specify a subnet length, then the egress router can
access only the host specified with the EGRESS_GATEWAY variable and no other
hosts on the subnet. |
2 | EGRESS_DESTINATION uses YAML syntax for its values, and can be a multi-line string. See the following for more information. |
Each line of EGRESS_DESTINATION
can be one of three types:
<port> <protocol> <IP_address>
- This says that incoming
connections to the given <port>
should be redirected to the same
port on the given <IP_address>
. <protocol>
is either tcp
or
udp
. In the example above, the first line redirects traffic from
local port 80 to port 80 on 203.0.113.25.
<port> <protocol> <IP_address> <remote_port>
- As above, except
that the connection is redirected to a different <remote_port>
on
<IP_address>
. In the example above, the second and third lines
redirect local ports 8080 and 8443 to remote ports 80 and 443 on
203.0.113.26.
<fallback_IP_address>
- If the last line of EGRESS_DESTINATION
is a single IP address, then any connections on any other port will be
redirected to the corresponding port on that IP address (eg,
203.0.113.27 in the example above). If there is no fallback IP address
then connections on other ports would simply be rejected.)
For a large or frequently-changing set of destination mappings, you can use a ConfigMap to externally maintain the list, and have the egress router pod read it from there. This comes with the advantage of project administrators being able to edit the ConfigMap, whereas they may not be able to edit the Pod definition directly, because it contains a privileged container.
Create a file containing the EGRESS_DESTINATION
data:
$ cat my-egress-destination.txt
# Egress routes for Project "Test", version 3
80 tcp 203.0.113.25
8080 tcp 203.0.113.26 80
8443 tcp 203.0.113.26 443
# Fallback
203.0.113.27
Note that you can put blank lines and comments into this file
Create a ConfigMap object from the file:
$ oc delete configmap egress-routes --ignore-not-found
$ oc create configmap egress-routes \
--from-file=destination=my-egress-destination.txt
Here egress-routes
is the name of the ConfigMap object being
created and my-egress-destination.txt
is the name of the file the
data is being read from.
Create a egress router pod definition as above, but specifying the
ConfigMap for EGRESS_DESTINATION
in the environment section:
...
env:
- name: EGRESS_SOURCE (1)
value: 192.168.12.99/24
- name: EGRESS_GATEWAY
value: 192.168.12.1
- name: EGRESS_DESTINATION
valueFrom:
configMapKeyRef:
name: egress-routes
key: destination
- name: EGRESS_routeR_MODE
value: init
...
1 | IP address from the physical network that the node is on and is reserved by the
cluster administrator for use by this pod. Optionally, you can include the
subnet length, the /24 suffix, so that a proper route to the local subnet can
be set up. If you do not specify a subnet length, then the egress router can
access only the host specified with the EGRESS_GATEWAY variable and no other
hosts on the subnet. |
The egress router does not automatically update when the ConfigMap changes. Restart the pod to get updates. |
In HTTP proxy mode, the egress router runs as an HTTP proxy on port 8080
.
This only works for clients talking to HTTP or HTTPS-based services, but usually
requires fewer changes to the client pods to get them to work. Programs can be
told to use an HTTP proxy by setting an environment variable.
Create the pod using the following as an example:
apiVersion: v1
kind: Pod
metadata:
name: egress-http-proxy
labels:
name: egress-http-proxy
annotations:
pod.network.openshift.io/assign-macvlan: "true" (1)
spec:
initContainers:
- name: egress-router-setup
image: openshift/origin-egress-router
securityContext:
privileged: true
env:
- name: EGRESS_SOURCE (2)
value: 192.168.12.99/24
- name: EGRESS_GATEWAY (3)
value: 192.168.12.1
- name: EGRESS_routeR_MODE (4)
value: http-proxy
containers:
- name: egress-router-proxy
image: openshift/origin-egress-http-proxy
env:
- name: EGRESS_HTTP_PROXY_DESTINATION (5)
value: |
!*.example.com
!192.168.1.0/24
*
1 | Creates a Macvlan network interface on the primary network interface, then
moves it into the pod’s network project before starting the egress-router
container. Preserve the quotation marks around "true" . Omitting them results
in errors. |
2 | IP address from the physical network that the node is on and is reserved by the
cluster administrator for use by this pod. Optionally, you can include the
subnet length, the /24 suffix, so that a proper route to the local subnet can
be set up. If you do not specify a subnet length, then the egress router can
access only the host specified with the EGRESS_GATEWAY variable and no other
hosts on the subnet. |
3 | Same value as the default gateway used by the node itself. |
4 | This tells the egress router image that it is being deployed as part of an HTTP proxy, and so it should not set up iptables redirecting rules. |
5 | A string or YAML multi-line string specifying how to configure the proxy. Note that this is specified as an environment variable in the HTTP proxy container, not with the other environment variables in the init container. |
You can specify any of the following for the EGRESS_HTTP_PROXY_DESTINATION
value. You can also use *
, meaning "allow connections to all remote
destinations". Each line in the configuration specifies one group of connections
to allow or deny:
An IP address (eg, 192.168.1.1
) allows connections to that IP address.
A CIDR range (eg, 192.168.1.0/24
) allows connections to that CIDR range.
A host name (eg, www.example.com
) allows proxying to that host.
A domain name preceded by *.
(eg, *.example.com
) allows proxying to that domain and all of its subdomains.
A !
followed by any of the above denies connections rather than allowing them
If the last line is *
, then anything that hasn’t been denied will be allowed. Otherwise, anything that hasn’t been allowed will be denied.
Ensure other pods can find the pod’s IP address by creating a service to point to the egress router:
apiVersion: v1
kind: Service
metadata:
name: egress-1
spec:
ports:
- name: http-proxy
port: 8080 (1)
type: ClusterIP
selector:
name: egress-1
1 | Ensure the http port is always set to 8080 . |
Configure the client pod (not the egress proxy pod) to use the HTTP proxy by setting the http_proxy
or https_proxy
variables:
...
env:
- name: http_proxy
value: http://egress-1:8080/ (1)
- name: https_proxy
value: http://egress-1:8080/
...
1 | The service created in step 2. |
Using the |
You can also specify the EGRESS_HTTP_PROXY_DESTINATION
using a
ConfigMap, similarly to
the redirecting egress router example above.
In DNS proxy mode, the egress router runs as a DNS proxy for TCP-based services from its own IP address to one or more destination IP addresses. Client pods that want to make use of the reserved, source IP address must be modified to connect to the egress router rather than connecting directly to the destination IP. This ensures that external destinations treat traffic as though it were coming from a known source.
Create the pod using the following as an example:
apiVersion: v1
kind: Pod
metadata:
name: egress-dns-proxy
labels:
name: egress-dns-proxy
annotations:
pod.network.openshift.io/assign-macvlan: "true" (1)
spec:
initContainers:
- name: egress-router-setup
image: openshift/origin-egress-router
securityContext:
privileged: true
env:
- name: EGRESS_SOURCE (2)
value: 192.168.12.99/24
- name: EGRESS_GATEWAY (3)
value: 192.168.12.1
- name: EGRESS_routeR_MODE (4)
value: dns-proxy
containers:
- name: egress-dns-proxy
image: openshift/origin-egress-dns-proxy
env:
- name: EGRESS_DNS_PROXY_DEBUG (5)
value: "1"
- name: EGRESS_DNS_PROXY_DESTINATION (6)
value: |
# Egress routes for Project "Foo", version 5
80 203.0.113.25
100 example.com
8080 203.0.113.26 80
8443 foobar.com 443
1 | Using pod.network.openshift.io/assign-macvlan annotation creates a Macvlan
network interface on the primary network interface, then moves it into the
pod’s network name space before starting the egress-router-setup container. Preserve
the quotation marks around "true" . Omitting them results in errors. |
2 | IP address from the physical network that the node is on and is reserved by the
cluster administrator for use by this pod. Optionally, you can include the
subnet length, the /24 suffix, so that a proper route to the local subnet can
be set up. If you do not specify a subnet length, then the egress router can
access only the host specified with the EGRESS_GATEWAY variable and no other
hosts on the subnet. |
3 | Same value as the default gateway used by the node itself. |
4 | This tells the egress router image that it is being deployed as part of a DNS proxy, and so it should not set up iptables redirecting rules. |
5 | Optional. Setting this variable will display DNS proxy log output on stdout. |
6 | This uses the YAML syntax for a multi-line string. See below for details. |
Each line of
|
Ensure other pods can find the pod’s IP address by creating a service to point to the egress router:
apiVersion: v1
kind: Service
metadata:
name: egress-dns-svc
spec:
ports:
- name: con1
protocol: TCP
port: 80
targetPort: 80
- name: con2
protocol: TCP
port: 100
targetPort: 100
- name: con3
protocol: TCP
port: 8080
targetPort: 8080
- name: con4
protocol: TCP
port: 8443
targetPort: 8443
type: ClusterIP
selector:
name: egress-dns-proxy
Pods can now connect to this service. Their connections are proxied to the corresponding ports on the external server, using the reserved egress IP address.
You can also specify the EGRESS_DNS_PROXY_DESTINATION
using a
ConfigMap, similarly to
the redirecting egress router example above.
Using a replication controller, you can ensure that there is always one copy of the egress router pod in order to prevent downtime.
Create a replication controller configuration file using the following:
apiVersion: v1
kind: ReplicationController
metadata:
name: egress-demo-controller
spec:
replicas: 1 (1)
selector:
name: egress-demo
template:
metadata:
name: egress-demo
labels:
name: egress-demo
annotations:
pod.network.openshift.io/assign-macvlan: "true"
spec:
initContainers:
- name: egress-demo-init
image: openshift/origin-egress-router
env:
- name: EGRESS_SOURCE (2)
value: 192.168.12.99/24
- name: EGRESS_GATEWAY
value: 192.168.12.1
- name: EGRESS_DESTINATION
value: 203.0.113.25
- name: EGRESS_routeR_MODE
value: init
securityContext:
privileged: true
containers:
- name: egress-demo-wait
image: openshift/origin-pod
nodeSelector:
site: springfield-1
1 | Ensure replicas is set to 1 , because only one pod can be using a given
EGRESS_SOURCE value at any time. This means that only a single copy of the
router will be running, on a node with the label site=springfield-1 . |
2 | IP address from the physical network that the node is on and is reserved by the
cluster administrator for use by this pod. Optionally, you can include the
subnet length, the /24 suffix, so that a proper route to the local subnet can
be set up. If you do not specify a subnet length, then the egress router can
access only the host specified with the EGRESS_GATEWAY variable and no other
hosts on the subnet. |
Create the pod using the definition:
$ oc create -f <replication_controller>.json
To verify, check to see if the replication controller pod has been created:
$ oc describe rc <replication_controller>
Some cluster administrators may want to perform actions on outgoing
traffic that do not fit within the model of EgressNetworkPolicy
or the
egress router. In some cases, this can be done by creating iptables
rules directly.
For example, you could create rules that log traffic to particular destinations, or to prevent more than a certain number of outgoing connections per second.
OKD does not provide a way to add custom iptables rules
automatically, but it does provide a place where such rules can be
added manually by the administrator. Each node, on startup, will
create an empty chain called OPENSHIFT-ADMIN-OUTPUT-RULES
in the
filter
table (assuming that the chain does not already exist). Any
rules added to that chain by an administrator will be applied to all
traffic going from a pod to a destination outside the cluster (and not
to any other traffic).
There are a few things to watch out for when using this functionality:
It is up to you to ensure that rules get created on each node; OKD does not provide any way to make that happen automatically.
The rules are not applied to traffic that exits the cluster via an
egress router, and they run after EgressNetworkPolicy
rules are applied
(and so will not see traffic that is denied by an
EgressNetworkPolicy
).
The handling of connections from pods to nodes or pods to the master is complicated, because nodes have both "external" IP addresses and "internal" SDN IP addresses. Thus, some pod-to-node/master traffic may pass through this chain, but other pod-to-node/master traffic may bypass it.
As a cluster administrator, you can assign specific, static IP addresses to projects, so that traffic is externally easily recognizable. This is different from the default egress router, which is used to send traffic to specific destinations.
Recognizable IP traffic increases cluster security by ensuring the origin is visible. Once enabled, all outgoing external connections from the specified project will share the same, fixed source IP, meaning that any external resources can recognize the traffic.
Unlike the egress router, this is subject to EgressNetworkPolicy
firewall
rules.
Assigning static IPs addresses for projects in your cluster requires the SDN to use either the ovs-networkpolicy or ovs-multitenant network plug-ins. |
If you use OpenShift SDN in multitenant mode, you cannot use egress IP addresses with any namespace that is joined to another namespace by the projects that are associated with them.
For example, if |
To enable static source IPs:
Update the NetNamespace
with the desired IP:
$ oc patch netnamespace <project_name> -p '{"egressIPs": ["<IP_address>"]}'
For example, to assign the MyProject
project to an IP address of
192.168.1.100:
$ oc patch netnamespace MyProject -p '{"egressIPs": ["192.168.1.100"]}'
The egressIPs
field is an array. You can set egressIPs
to two or more IP addresses on different nodes to provide high
availability. If multiple egress IP addresses are set, pods use the first IP in
the list for egress, but if the node hosting that IP address fails, pods
switch to using the next IP in the list after a short delay.
Manually assign the egress IP to the desired node hosts. Set the egressIPs
field on the HostSubnet
object on the node host. Include as many IPs as you
want to assign to that node host:
$ oc patch hostsubnet <node_name> -p \
'{"egressIPs": ["<IP_address_1>", "<IP_address_2>"]}'
For example, to say that node1
should have the egress IPs 192.168.1.100,
192.168.1.101, and 192.168.1.102:
$ oc patch hostsubnet node1 -p \
'{"egressIPs": ["192.168.1.100", "192.168.1.101", "192.168.1.102"]}'
Egress IPs are implemented as additional IP addresses on the primary network interface, and must be in the same subnet as the node’s primary IP. Additionally, any external IPs should not be configured in any Linux network configuration files, such as ifcfg-eth0. Allowing additional IP addresses on the primary network interface might require extra configuration when using some cloud or VM solutions. |
If the above is enabled for a project, all egress traffic from that project will
be routed to the node hosting that egress IP, then connected (using NAT) to that
IP address. If egressIPs
is set on a NetNamespace
, but there is no node
hosting that egress IP, then egress traffic from the namespace will be dropped.
Similar to Enabling Static
IPs for External Project Traffic, as a cluster administrator, you can assign
egress IP addresses to namespaces by setting the egressIPs
parameter to the
NetNamespace
resource. You can associate only a single IP address with a
project.
If you use OpenShift SDN in multitenant mode, you cannot use egress IP addresses with any namespace that is joined to another namespace by the projects that are associated with them.
For example, if |
With fully automatic egress IPs, you can set the egressCIDRs
parameter of each
node’s HostSubnet
resource to indicate the range of egress IP addresses that
can be hosted. Namespaces that have requested egress IP addresses are matched
with nodes that are able to host those egress IP addresses, then the egress IP
addresses are assigned to those nodes.
High availability is automatic. If a node hosting egress IP addresses goes down
and there are nodes that are able to host those egress IP addresses, based on
the egressCIDR
values of the HostSubnet
resources, then the egress IP
addresses will move to a new node. When the original egress IP address node
comes back online, the egress IP addresses automatically move to balance egress
IP addresses across nodes.
You cannot use manually assigned and automatically assigned egress IP addresses on the same nodes or with the same IP address ranges. |
Update the NetNamespace
with the egress IP address:
$ oc patch netnamespace <project_name> -p '{"egressIPs": ["<IP_address>"]}'
You can specify only a single IP address for the egressIPs
parameter. Using
multiple IP addresses is not supported.
For example, to assign project1
to an IP address of 192.168.1.100 and
project2
to an IP address of 192.168.1.101:
$ oc patch netnamespace project1 -p '{"egressIPs": ["192.168.1.100"]}'
$ oc patch netnamespace project2 -p '{"egressIPs": ["192.168.1.101"]}''
Indicate which nodes can host egress IP addresses by
setting their egressCIDRs
fields:
$ oc patch hostsubnet <node_name> -p \
'{"egressCIDRs": ["<IP_address_range_1>", "<IP_address_range_2>"]}'
For example, to set node1
and node2
to host egress IP addresses
in the range 192.168.1.0 to 192.168.1.255:
$ oc patch hostsubnet node1 -p '{"egressCIDRs": ["192.168.1.0/24"]}'
$ oc patch hostsubnet node2 -p '{"egressCIDRs": ["192.168.1.0/24"]}'
OKD automatically assigns specific egress IP addresses to available
nodes, in a balanced way. In this case, it assigns the egress IP address 192.168.1.100
to node1
and the egress IP address 192.168.1.101 to node2
or vice versa.
At this time, multicast is best used for low bandwidth coordination or service discovery and not a high-bandwidth solution. |
Multicast traffic between OKD pods is disabled by default. If you
are using the ovs-multitenant or ovs-networkpolicy plugin, you can enable
multicast on a per-project basis by setting an annotation on the project’s
corresponding netnamespace
object:
$ oc annotate netnamespace <namespace> \
netnamespace.network.openshift.io/multicast-enabled=true
Disable multicast by removing the annotation:
$ oc annotate netnamespace <namespace> \
netnamespace.network.openshift.io/multicast-enabled-
When using the ovs-multitenant plugin:
In an isolated project, multicast packets sent by a pod will be delivered to all other pods in the project.
If you have
joined
networks together, you will need to enable multicast in each project’s
netnamespace
in order for it to take effect in any of the projects. Multicast
packets sent by a pod in a joined network will be delivered to all pods in all
of the joined-together networks.
To enable multicast in the default
project, you must also enable it in the
kube-service-catalog
project and all other projects that have been
made
global. Global projects are not "global" for purposes of multicast; multicast
packets sent by a pod in a global project will only be delivered to pods in
other global projects, not to all pods in all projects. Likewise, pods in global
projects will only receive multicast packets sent from pods in other global
projects, not from all pods in all projects.
When using the ovs-networkpolicy plugin:
Multicast packets sent by a pod will be delivered to all other pods in the
project, regardless of NetworkPolicy
objects. (Pods may be able to communicate
over multicast even when they can’t communicate over unicast.)
Multicast packets sent by a pod in one project will never be delivered to pods
in any other project, even if there are NetworkPolicy
objects allowing
communication between the projects.
The ovs-subnet and ovs-multitenant plug-ins have their own legacy models of
network isolation and do not support Kubernetes NetworkPolicy
. However,
NetworkPolicy
support is available by using the ovs-networkpolicy plug-in.
The |
Do not apply |
|
In a cluster
configured
to use the ovs-networkpolicy plug-in,
network isolation is controlled entirely by
NetworkPolicy
objects. By default, all pods in a project are accessible from other pods and
network endpoints. To isolate one or more pods in a project, you can create
NetworkPolicy
objects in that project to indicate the allowed incoming
connections. Project administrators can create and delete NetworkPolicy
objects within their own project.
Pods that do not have NetworkPolicy
objects pointing to them are fully
accessible, whereas, pods that have one or more NetworkPolicy
objects pointing
to them are isolated. These isolated pods only accept connections that are
accepted by at least one of their NetworkPolicy
objects.
Following are a few sample NetworkPolicy
object definitions supporting
different scenarios:
Deny All Traffic
To make a project "deny by default" add a NetworkPolicy
object that
matches all pods but accepts no traffic.
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: deny-by-default
spec:
podSelector:
ingress: []
Only Accept connections from pods within project
To make pods accept connections from other pods in the same project, but reject all other connections from pods in other projects:
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: allow-same-namespace
spec:
podSelector:
ingress:
- from:
- podSelector: {}
Only allow HTTP and HTTPS traffic based on pod labels
To enable only HTTP and HTTPS access to the pods with a specific label
(role=frontend
in following example), add a NetworkPolicy
object similar to:
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: allow-http-and-https
spec:
podSelector:
matchLabels:
role: frontend
ingress:
- ports:
- protocol: TCP
port: 80
- protocol: TCP
port: 443
NetworkPolicy
objects are additive, which means you can combine multiple
NetworkPolicy
objects together to satisfy complex network requirements.
For example, for the NetworkPolicy
objects defined in previous samples, you
can define both allow-same-namespace
and allow-http-and-https
policies
within the same project. Thus allowing the pods with the label role=frontend
,
to accept any connection allowed by each policy. That is, connections on any
port from pods in the same namespace, and connections on ports 80
and
443
from pods in any namespace.
NetworkPolicy
objects allow you to isolate pods that are differentiated from
one another by labels, within a namespace.
It is inefficient to apply NetworkPolicy
objects
to large numbers of individual pods in a single namespace.
Pod labels do not exist at the IP level, so NetworkPolicy
objects generate
a separate OVS flow rule for every single possible link between every pod
selected with podSelector
.
For example, if the spec
podSelector
and
the ingress
podSelector
within a NetworkPolicy
object each match 200
pods, then 40000 (200*200) OVS flow rules are generated.
This might slow down the machine.
To reduce the amount of OVS flow rules, use namespaces to contain groups of pods that need to be isolated.
NetworkPolicy
objects that select a whole namespace, by using
namespaceSelectors
or empty podSelectors
, only generate a single OVS flow rule that matches the
VXLAN VNID of the namespace.
Keep the pods that do not need to be isolated in their original namespace, and move the pods that require isolation into one or more different namespaces.
Create additional targeted cross-namespace policies to allow the specific traffic that you do want to allow from the isolated pods.
When using the ovs-multitenant plug-in, traffic from the routers is automatically allowed into all namespaces. This is because the routers are usually in the default namespace, and all namespaces allow connections from pods in that namespace. With the ovs-networkpolicy plug-in, this does not happen automatically. Therefore, if you have a policy that isolates a namespace by default, you need to take additional steps to allow routers to access it.
One option is to create a policy for each service, allowing access from all sources. for example,
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: allow-to-database-service
spec:
podSelector:
matchLabels:
role: database
ingress:
- ports:
- protocol: TCP
port: 5432
This allows routers to access the service, but will also allow pods in other users' namespaces to access it as well. This should not cause any issues, as those pods can normally access the service by using the public router.
Alternatively, you can create a policy allowing full access from the default namespace, as in the ovs-multitenant plug-in:
Add a label to the default namespace.
If you labeled the default project with the |
$ oc label namespace default name=default
Create policies allowing connections from that namespace.
Perform this step for each namespace you want to allow connections into. Users with the Project Administrator role can create policies. |
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: allow-from-default-namespace
spec:
podSelector:
ingress:
- from:
- namespaceSelector:
matchLabels:
name: default
The cluster administrators can modify the default project template to enable
automatic creation of default NetworkPolicy
objects (one or more), whenever a
new project is created. To do this:
Create a custom project template and configure the master to use it.
Label the default
project with the default
label:
If you labeled the default project with the |
$ oc label namespace default name=default
Edit the template to include the desired NetworkPolicy
objects:
$ oc edit template project-request -n default
To include |
Add each default policy as an element in the objects
array:
objects:
...
- apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-from-same-namespace
spec:
podSelector:
ingress:
- from:
- podSelector: {}
- apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-from-default-namespace
spec:
podSelector:
ingress:
- from:
- namespaceSelector:
matchLabels:
name: default
...
HTTP Strict Transport Security (HSTS) policy is a security enhancement, which ensures that only HTTPS traffic is allowed on the host. Any HTTP requests are dropped by default. This is useful for ensuring secure interactions with websites, or to offer a secure application for the user’s benefit.
When HSTS is enabled, HSTS adds a Strict Transport Security header to HTTPS
responses from the site. You can use the insecureEdgeTerminationPolicy
value
in a route to redirect to send HTTP to HTTPS. However, when HSTS is enabled, the
client changes all requests from the HTTP URL to HTTPS before the request is
sent, eliminating the need for a redirect. This is not required to be supported
by the client, and can be disabled by setting max-age=0
.
HSTS works only with secure routes (either edge terminated or re-encrypt). The configuration is ineffective on HTTP or passthrough routes. |
To enable HSTS to a route, add the haproxy.router.openshift.io/hsts_header
value to the edge terminated or re-encrypt route:
apiVersion: v1
kind: route
metadata:
annotations:
haproxy.router.openshift.io/hsts_header: max-age=31536000;includeSubDomains;preload
Ensure there are no spaces and no other values in the parameters in the |
The required max-age
parameter indicates the length of time, in seconds, the
HSTS policy is in effect for. The client updates max-age
whenever a response
with a HSTS header is received from the host. When max-age
times out, the
client discards the policy.
The optional includeSubDomains
parameter tells the client that all subdomains
of the host are to be treated the same as the host.
If max-age
is greater than 0, the optional preload
parameter allows external
services to include this site in their HSTS preload lists. For example, sites
such as Google can construct a list of sites that have preload
set. Browsers
can then use these lists to determine which sites to only talk to over HTTPS,
even before they have interacted with the site. Without preload
set, they need
to have talked to the site over HTTPS to get the header.
Sometimes applications deployed through OKD can cause network throughput issues such as unusually high latency between specific services.
Use the following methods to analyze performance issues if pod logs do not reveal any cause of the problem:
Use a packet analyzer, such as ping or tcpdump to analyze traffic between a pod and its node.
For example, run the tcpdump tool on each pod while reproducing the behavior that led to the issue. Review the captures on both sides to compare send and receive timestamps to analyze the latency of traffic to/from a pod. Latency can occur in OKD if a node interface is overloaded with traffic from other pods, storage devices, or the data plane.
$ tcpdump -s 0 -i any -w /tmp/dump.pcap host <podip 1> && host <podip 2> (1)
1 | podip is the IP address for the pod. Run the following command to get the IP address of the pods: |
# oc get pod <podname> -o wide
tcpdump generates a file at /tmp/dump.pcap containing all traffic between these two pods. Ideally, run the analyzer shortly before the issue is reproduced and stop the analyzer shortly after the issue is finished reproducing to minimize the size of the file. You can also run a packet analyzer between the nodes (eliminating the SDN from the equation) with:
# tcpdump -s 0 -i any -w /tmp/dump.pcap port 4789
Use a bandwidth measuring tool, such as iperf, to measure streaming throughput and UDP throughput. Run the tool from the pods first, then from the nodes to attempt to locate any bottlenecks. The iperf3 tool is included as part of RHEL 7.