Guidelines

Overview
General Container Image Guidelines
OpenShift Dedicated-Specific Guidelines
External References

Overview

When creating container images to run on OpenShift Dedicated there are a number of best practices to consider as an image author to ensure a good experience for consumers of those images. Because images are intended to be immutable and used as-is, the following guidelines help ensure that your images are highly consumable and easy to use on OpenShift Dedicated.

General Container Image Guidelines

The following guidelines apply when creating a container image in general, and are independent of whether the images are used on OpenShift Dedicated.

Reuse Images

Wherever possible, we recommend that you base your image on an appropriate upstream image using the FROM statement. This ensures your image can easily pick up security fixes from an upstream image when it is updated, rather than you having to update your dependencies directly.

In addition, use tags in the FROM instruction (for example, rhel:rhel7) to make it clear to users exactly which version of an image your image is based on. Using a tag other than latest ensures your image is not subjected to breaking changes that might go into the latest version of an upstream image.

Maintain Compatibility Within Tags

When tagging your own images, we recommend that you try to maintain backwards compatibility within a tag. For example, if you provide an image named foo and it currently includes version 1.0, you might provide a tag of foo:v1. When you update the image, as long as it continues to be compatible with the original image, you can continue to tag the new image foo:v1, and downstream consumers of this tag will be able to get updates without being broken.

If you later release an incompatible update, then you should switch to a new tag, for example foo:v2. This allows downstream consumers to move up to the new version at will, but not be inadvertently broken by the new incompatible image. Any downstream consumer using foo:latest takes on the risk of any incompatible changes being introduced.

Avoid Multiple Processes

We recommend that you do not start multiple services, such as a database and SSHD, inside one container. This is not necessary because containers are lightweight and can be easily linked together for orchestrating multiple processes. OpenShift Dedicated allows you to easily colocate and co-manage related images by grouping them into a single pod.

This collocation ensures the containers share a network namespace and storage for communication. Updates are also less disruptive as each image can be updated less frequently and independently. Signal handling flows are also clearer with a single process as you do not need to manage routing signals to spawned processes.

Use `exec` in Wrapper Scripts

See the "Always exec in Wrapper Scripts" section of the Project Atomic documentation for more information.

Also note that your process runs as PID 1 when running in a container. This means that if your main process terminates, the entire container is stopped, killing any child processes you may have launched from your PID 1 process.

See the "Docker and the PID 1 zombie reaping problem" blog article for additional implications. Also see the "Demystifying the init system (PID 1)" blog article for a deep dive on PID 1 and init systems.

Clean Temporary Files

All temporary files you create during the build process should be removed. This also includes any files added with the ADD command. For example, we strongly recommended that you run the yum clean command after performing yum install operations.

You can prevent the yum cache from ending up in an image layer by creating your RUN statement as follows:

RUN yum -y install mypackage && yum -y install myotherpackage && yum clean all -y

Note that if you instead write:

RUN yum -y install mypackage
RUN yum -y install myotherpackage && yum clean all -y

Then the first yum invocation leaves extra files in that layer, and these files cannot be removed when the yum clean operation is run later. The extra files are not visible in the final image, but they are present in the underlying layers.

The current container build process does not allow a command run in a later layer to shrink the space used by the image when something was removed in an earlier layer. However, this may change in the future. This means that if you perform an rm command in a later layer, although the files are hidden it does not reduce the overall size of the image to be downloaded. Therefore, as with the yum clean example, it is best to remove files in the same command that created them, where possible, so they do not end up written to a layer.

In addition, performing multiple commands in a single RUN statement reduces the number of layers in your image, which improves download and extraction time.

Place Instructions in the Proper Order

The container builder reads the Dockerfile and runs the instructions from top to bottom. Every instruction that is successfully executed creates a layer which can be reused the next time this or another image is built. It is very important to place instructions that will rarely change at the top of your Dockerfile. Doing so ensures the next builds of the same image are very fast because the cache is not invalidated by upper layer changes.

For example, if you are working on a Dockerfile that contains an ADD command to install a file you are iterating on, and a RUN command to yum install a package, it is best to put the ADD command last:

FROM foo
RUN yum -y install mypackage && yum clean all -y
ADD myfile /test/myfile

This way each time you edit myfile and rerun podman build or docker build, the system reuses the cached layer for the yum command and only generates the new layer for the ADD operation.

If instead you wrote the Dockerfile as:

FROM foo
ADD myfile /test/myfile
RUN yum -y install mypackage && yum clean all -y

Then each time you changed myfile and reran podman build or docker build, the ADD operation would invalidate the RUN layer cache, so the yum operation would need to be rerun as well.

Mark Important Ports

See the "Always EXPOSE Important Ports" section of the Project Atomic documentation for more information.

Set Environment Variables

It is good practice to set environment variables with the ENV instruction. One example is to set the version of your project. This makes it easy for people to find the version without looking at the Dockerfile. Another example is advertising a path on the system that could be used by another process, such as JAVA_HOME.

Avoid Default Passwords

It is best to avoid setting default passwords. Many people will extend the image and forget to remove or change the default password. This can lead to security issues if a user in production is assigned a well-known password. Passwords should be configurable using an environment variable instead. See the Using Environment Variables for Configuration topic for more information.

If you do choose to set a default password, ensure that an appropriate warning message is displayed when the container is started. The message should inform the user of the value of the default password and explain how to change it, such as what environment variable to set.

Avoid SSHD

It is best to avoid running SSHD in your image. You can use the podman exec or docker exec command to access containers that are running on the local host. Alternatively, you can use the oc exec command or the oc rsh command to access containers that are running on the OpenShift Dedicated cluster. Installing and running SSHD in your image opens up additional vectors for attack and requirements for security patching.

Use Volumes for Persistent Data

Images should use a Docker volume for persistent data. This way OpenShift Dedicated mounts the network storage to the node running the container, and if the container moves to a new node the storage is reattached to that node. By using the volume for all persistent storage needs, the content is preserved even if the container is restarted or moved. If your image writes data to arbitrary locations within the container, that content might not be preserved.

All data that needs to be preserved even after the container is destroyed must be written to a volume. Container engines support a readonly flag for containers which can be used to strictly enforce good practices about not writing data to ephemeral storage in a container. Designing your image around that capability now will make it easier to take advantage of it later.

Furthermore, explicitly defining volumes in your Dockerfile makes it easy for consumers of the image to understand what volumes they need to define when running your image.

See the Kubernetes documentation for more information on how volumes are used in OpenShift Dedicated.

Even with persistent volumes, each instance of your image has its own volume, and the filesystem is not shared between instances. This means the volume cannot be used to share state in a cluster.

External Guidelines

See the following references for other guidelines:

Docker documentation - Best practices for writing Dockerfiles
Project Atomic documentation - Guidance for Container Image Authors

OpenShift Dedicated-Specific Guidelines

The following are guidelines that apply when creating container images specifically for use on OpenShift Dedicated.

Enable Images for Source-To-Image (S2I)

For images that are intended to run application code provided by a third party, such as a Ruby image designed to run Ruby code provided by a developer, you can enable your image to work with the Source-to-Image (S2I) build tool. S2I is a framework which makes it easy to write images that take application source code as an input and produce a new image that runs the assembled application as output.

For example, this Python image defines S2I scripts for building various versions of Python applications.

For more details about how to write S2I scripts for your image, see the S2I Requirements topic.

Support Arbitrary User IDs

By default, OpenShift Dedicated runs containers using an arbitrarily assigned user ID. This provides additional security against processes escaping the container due to a container engine vulnerability and thereby achieving escalated permissions on the host node.

For an image to support running as an arbitrary user, directories and files that may be written to by processes in the image should be owned by the root group and be read/writable by that group. Files to be executed should also have group execute permissions.

Adding the following to your Dockerfile sets the directory and file permissions to allow users in the root group to access them in the built image:

RUN chgrp -R 0 /some/directory && \
    chmod -R g+rwX /some/directory

Because the container user is always a member of the root group, the container user can read and write these files.

Care must be taken when altering the directories and file permissions of sensitive areas of a container (no different than to a normal system).

If applied to sensitive areas, such as /etc/passwd, this can allow the modification of such files by unintended users potentially exposing the container or host. CRI-O supports the insertion of random user IDs into the container’s /etc/passwd, so changing it’s permissions should never be required.

In addition, the processes running in the container must not listen on privileged ports (ports below 1024), since they are not running as a privileged user.

Because the user ID of the container is generated dynamically, it will not have an associated entry in /etc/passwd. This can cause problems for applications that expect to be able to look up their user ID. One way to address this problem is to use nss wrapper and dynamically create a passwd file with the container’s user ID as part of the image’s start script:

export USER_ID=$(id -u)
export GROUP_ID=$(id -g)
envsubst < ${HOME}/passwd.template > /tmp/passwd
export LD_PRELOAD=/usr/lib64/libnss_wrapper.so
export NSS_WRAPPER_PASSWD=/tmp/passwd
export NSS_WRAPPER_GROUP=/etc/group

Where passwd.template contains:

root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:x:99:99:Nobody:/:/sbin/nologin
postgres:x:${USER_ID}:${GROUP_ID}:PostgreSQL Server:${HOME}:/bin/bash

Additionally, you must install the nss_wrapper and gettext packages in your image for this to work. The latter provides the envsubst command. For example you can add this line to your Dockerfile for yum-based images:

RUN yum -y install nss_wrapper gettext

Lastly, the final USER declaration in the Dockerfile should specify the user ID (numeric value) and not the user name. This allows OpenShift Dedicated to validate the authority the image is attempting to run with and prevent running images that are trying to run as root, because running containers as a privileged user exposes potential security holes. If the image does not specify a USER, it inherits the USER from the parent image.

Use Services for Inter-image Communication

For cases where your image needs to communicate with a service provided by another image, such as a web front end image that needs to access a database image to store and retrieve data, your image should consume an OpenShift Dedicated service. Services provide a static endpoint for access which does not change as containers are stopped, started, or moved. In addition, services provide load balancing for requests.

Provide Common Libraries

For images that are intended to run application code provided by a third party, ensure that your image contains commonly used libraries for your platform. In particular, provide database drivers for common databases used with your platform. For example, provide JDBC drivers for MySQL and PostgreSQL if you are creating a Java framework image. Doing so prevents the need for common dependencies to be downloaded during application assembly time, speeding up application image builds. It also simplifies the work required by application developers to ensure all of their dependencies are met.

Use Environment Variables for Configuration

Users of your image should be able to configure it without having to create a downstream image based on your image. This means that the runtime configuration should be handled using environment variables. For a simple configuration, the running process can consume the environment variables directly. For a more complicated configuration or for runtimes which do not support this, configure the runtime by defining a template configuration file that is processed during startup. During this processing, values supplied using environment variables can be substituted into the configuration file or used to make decisions about what options to set in the configuration file.

It is also possible and recommended to pass secrets such as certificates and keys into the container using environment variables. This ensures that the secret values do not end up committed in an image and leaked into a container image registry.

Providing environment variables allows consumers of your image to customize behavior, such as database settings, passwords, and performance tuning, without having to introduce a new layer on top of your image. Instead, they can simply define environment variable values when defining a pod and change those settings without rebuilding the image.

For extremely complex scenarios, configuration can also be supplied using volumes that would be mounted into the container at runtime. However, if you elect to do it this way you must ensure that your image provides clear error messages on startup when the necessary volume or configuration is not present.

This topic is related to the Using Services for Inter-image Communication topic in that configuration like datasources should be defined in terms of environment variables that provide the service endpoint information. This allows an application to dynamically consume a datasource service that is defined in the OpenShift Dedicated environment without modifying the application image.

In addition, tuning should be done by inspecting the cgroups settings for the container. This allows the image to tune itself to the available memory, CPU, and other resources. For example, Java-based images should tune their heap based on the cgroup maximum memory parameter to ensure they do not exceed the limits and get an out-of-memory error.

See the following references for more on how to manage cgroup quotas in containers:

Blog article - Resource management in Docker
Docker documentation - Runtime Metrics
Blog article - Memory inside Linux containers

Set Image Metadata

Defining image metadata helps OpenShift Dedicated better consume your container images, allowing OpenShift Dedicated to create a better experience for developers using your image. For example, you can add metadata to provide helpful descriptions of your image, or offer suggestions on other images that may also be needed.

See the Image Metadata topic for more information on supported metadata and how to define them.

Clustering

You must fully understand what it means to run multiple instances of your image. In the simplest case, the load balancing function of a service handles routing traffic to all instances of your image. However, many frameworks need to share information in order to perform leader election or failover state; for example, in session replication.

Consider how your instances accomplish this communication when running in OpenShift Dedicated. Although pods can communicate directly with each other, their IP addresses change anytime the pod starts, stops, or is moved. Therefore, it is important for your clustering scheme to be dynamic.

Logging

It is best to send all logging to standard out. OpenShift Dedicated collects standard out from containers and sends it to the centralized logging service where it can be viewed. If you need to separate log content, prefix the output with an appropriate keyword, which makes it possible to filter the messages.

If your image logs to a file, users must use manual operations to enter the running container and retrieve or view the log file.

Liveness and Readiness Probes

Document example liveness and readiness probes that can be used with your image. These probes will allow users to deploy your image with confidence that traffic will not be routed to the container until it is prepared to handle it, and that the container will be restarted if the process gets into an unhealthy state.

Templates

Consider providing an example template with your image. A template will give users an easy way to quickly get your image deployed with a working configuration. Your template should include the liveness and readiness probes you documented with the image, for completeness.