This is a cache of https://docs.okd.io/3.6/admin_guide/backup_restore.html. It is a snapshot of the page at 2024-11-29T05:11:33.605+0000.
Backup and Restore | Cluster Administration | OKD 3.6
×

Overview

In OKD, you can back up (saving state to separate storage) and restore (recreating state from separate storage) at the cluster level. There is also some preliminary support for per-project backup. The full state of a cluster installation includes:

  • etcd data (v2 and v3) on each master

  • API objects

  • registry storage

  • volume storage

This topic does not cover how to back up and restore persistent storage, as those topics are left to the underlying storage provider. However, an example of how to perform a generic backup of application data is provided.

This topic only provides a generic way of backing up applications and the OKD cluster. It can not take into account custom requirements. Therefore, you should create a full backup and restore procedure. To prevent data loss, necessary precautions should be taken.

Backup and restore procedures are not fully supported in OKD 3.6 due to dependencies on cluster state.

Note that the etcd backup still has all the references to the storage volumes. When you restore etcd, OKD starts launching the previous pods on nodes and reattaching the same storage. This is really no different than the process of when you remove a node from the cluster and add a new one back in its place. Anything attached to that node will be reattached to the pods on whatever nodes they get rescheduled to.

Backup and restore is not guaranteed. You are responsible for backing up your own data.

Prerequisites

  1. Because the restore procedure involves a complete reinstallation, save all the files used in the initial installation. This may include:

  2. Backup the procedures for post-installation steps. Some installations may involve steps that are not included in the installer. This may include changes to the services outside of the control of OKD or the installation of extra services like monitoring agents. Additional configuration that is not supported yet by the advanced installer might also be affected, for example when using multiple authentication providers.

  3. Install packages that provide various utility commands:

    # yum install etcd
  4. If using a container-based installation, pull the etcd image instead:

    # docker pull rhel7/etcd

Note the location of the etcd data directory (or $ETCD_DATA_DIR in the following sections), which depends on how etcd is deployed.

Deployment Type Description Data Directory

separate etcd

etcd runs as a separate service, either co-located on master nodes or on separate nodes.

/var/lib/etcd

embedded etcd

etcd runs as part of the master service.

/var/lib/origin/openshift.local.etcd

Cluster Backup

Master Backup

You must perform the following step on each master node.

  1. Create a backup of the master host configuration files:

    $ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
    $ sudo mkdir -p ${MYBACKUPDIR}/etc/sysconfig
    $ sudo cp -aR /etc/origin ${MYBACKUPDIR}/etc
    $ sudo cp -aR /etc/sysconfig/atomic-* ${MYBACKUPDIR}/etc/sysconfig/

Etcd Backup

  1. If etcd is running on more than one host, stop it on each host:

    # sudo systemctl stop etcd

    Although this step is not strictly necessary, doing so ensures that the etcd data is fully synchronized.

  2. Capture etcd v2 data by creating a backup:

    # etcdctl backup \
        --data-dir $ETCD_DATA_DIR \
        --backup-dir $ETCD_DATA_DIR.bak

    If etcd is running on more than one host, the various instances regularly synchronize their data, so creating a backup for one of them is sufficient.

    For a container-based installation, you must use docker exec to run etcdctl inside the container.

  3. Capture etcd v3 data by copying the db file over to the backup you created:

    # cp "$ETCD_DATA_DIR"/member/snap/db "$ETCD_DATA_DIR.bak"/member/snap/db
  4. Back up the /etc/etcd/ca/ directory on the master host that contains certificates in that folder:

    $ sudo cp -aR /etc/etcd/ca/ ${MYBACKUPDIR}/etc

Registry Certificates Backup

  1. Save all the registry certificates, on every master and node host.

    # cd /etc/docker/certs.d/
    # tar cf /tmp/docker-registry-certs-$(hostname).tar *

    When working with one or more external secured registry, any host required to pull or push images must trust registry certificates in order to run pods.

Cluster Restore for Single-member etcd Clusters

To restore the cluster:

  1. Reinstall OKD.

    This should be done in the same way that OKD was previously installed.

  2. Run all necessary post-installation steps.

  3. Restore the certificates and keys, on each master:

    # cd /etc/origin/master
    # tar xvf /tmp/certs-and-keys-$(hostname).tar
  4. Restore from the etcd backup:

    # mv $ETCD_DATA_DIR $ETCD_DATA_DIR.orig
    # cp -Rp $ETCD_DATA_DIR.bak $ETCD_DATA_DIR
    # chcon -R --reference $ETCD_DATA_DIR.orig $ETCD_DATA_DIR
    # chown -R etcd:etcd $ETCD_DATA_DIR
  5. Create the new single node cluster using etcd’s --force-new-cluster option. You can do this using the values from /etc/etcd/etcd.conf, or you can temporarily modify the systemd unit file and start the service normally.

    To do so, edit the /usr/lib/systemd/system/etcd.service file, and add --force-new-cluster:

    # sed -i '/ExecStart/s/"$/  --force-new-cluster"/' /usr/lib/systemd/system/etcd.service
    # systemctl show etcd.service --property ExecStart --no-pager
    
    ExecStart=/bin/bash -c "GOMAXPROCS=$(nproc) /usr/bin/etcd --force-new-cluster"

    Then, restart the etcd service:

    # systemctl daemon-reload
    # systemctl start etcd
  6. Verify the etcd service started correctly, then re-edit the /usr/lib/systemd/system/etcd.service file and remove the --force-new-cluster option:

    # sed -i '/ExecStart/s/ --force-new-cluster//' /usr/lib/systemd/system/etcd.service
    # systemctl show etcd.service --property ExecStart --no-pager
    
    ExecStart=/bin/bash -c "GOMAXPROCS=$(nproc) /usr/bin/etcd"
  7. Restart the etcd service, then verify the etcd cluster is running correctly and displays OKD’s configuration:

    # systemctl daemon-reload
    # systemctl restart etcd

Cluster Restore for Multiple-member etcd Clusters

When using a separate etcd cluster, you must first restore the etcd backup by creating a new, single node etcd cluster. If you run etcd as a stand-alone service on your master nodes, you can create the single node etcd cluster on a master node. If you use separate etcd with multiple members, you must then also add any additional etcd members to the etcd cluster one by one.

However, the details of the restoration process differ between embedded and external etcd. See the following section and follow the relevant steps before Bringing OpenShift services Back Online.

Embedded etcd

Restore your etcd backup and configuration:

  1. Run the following on the master with the embedded etcd:

    # ETCD_DIR=/var/lib/origin/openshift.local.etcd
    # mv $ETCD_DIR /var/lib/etcd.orig
    # cp -Rp /var/lib/origin/etcd-backup-<timestamp>/ $ETCD_DIR
    # chcon -R --reference /var/lib/etcd.orig/ $ETCD_DIR
    # chown -R etcd:etcd $ETCD_DIR

    The $ETCD_DIR location differs between external and embedded etcd.

  2. Create the new, single node etcd cluster:

    # etcd -data-dir=/var/lib/origin/openshift.local.etcd \
        -force-new-cluster

    Verify etcd has started successfully by checking the output from the above command, which should look similar to the following near the end:

    [...]
    2016-06-24 12:14:45.644073 I | etcdserver: starting server... [version: 2.2.5, cluster version: 2.2]
    [...]
    2016-06-24 12:14:46.834394 I | etcdserver: published {Name:default ClientURLs:[http://localhost:2379 http://localhost:4001]} to cluster 5580663a6e0002
  3. Shut down the process by running the following from a separate terminal:

    # pkill etcd
  4. Continue to Bringing OKD services Back Online.

Separate etcd

Choose a system to be the initial etcd member, and restore its etcd backup and configuration:

  1. Run the following on the etcd host:

    # ETCD_DIR=/var/lib/etcd/
    # mv $ETCD_DIR /var/lib/etcd.orig
    # cp -Rp /var/lib/origin/etcd-backup-<timestamp>/ $ETCD_DIR
    # chcon -R --reference /var/lib/etcd.orig/ $ETCD_DIR
    # chown -R etcd:etcd $ETCD_DIR

    The $ETCD_DIR location differs between external and embedded etcd.

  2. Restore your /etc/etcd/etcd.conf file from backup or .rpmsave.

  3. Depending on your environment, follow the instructions for Containerized etcd Deployments or Non-Containerized etcd Deployments.

Containerized etcd Deployments

  1. Create the new single node cluster using etcd’s --force-new-cluster option. You can do this with a long, complex command using the values from /etc/etcd/etcd.conf, or you can temporarily modify the systemd unit file and start the service normally.

    To do so, edit the /etc/systemd/system/etcd_container.service file, and add --force-new-cluster:

    # sed -i '/ExecStart=/s/$/  --force-new-cluster/' /etc/systemd/system/etcd_container.service
    
    ExecStart=/usr/bin/docker run --name etcd --rm -v \
    /var/lib/etcd:/var/lib/etcd:z -v /etc/etcd:/etc/etcd:ro --env-file=/etc/etcd/etcd.conf \
    --net=host --entrypoint=/usr/bin/etcd rhel7/etcd:3.1.9  --force-new-cluster

    Then, restart the etcd service:

    # systemctl daemon-reload
    # systemctl start etcd_container
  2. Verify the etcd service started correctly, then re-edit the /etc/systemd/system/etcd_container.service file and remove the --force-new-cluster option:

    # sed  -i '/ExecStart=/s/ --force-new-cluster//' /etc/systemd/system/etcd_container.service
    
    ExecStart=/usr/bin/docker run --name etcd --rm -v /var/lib/etcd:/var/lib/etcd:z -v \
    /etc/etcd:/etc/etcd:ro --env-file=/etc/etcd/etcd.conf --net=host \
    --entrypoint=/usr/bin/etcd rhel7/etcd:3.1.9
  3. Restart the etcd service, then verify the etcd cluster is running correctly and displays OKD’s configuration:

    # systemctl daemon-reload
    # systemctl restart etcd_container
    # etcdctl --cert-file=/etc/etcd/peer.crt \
        --key-file=/etc/etcd/peer.key \
        --ca-file=/etc/etcd/ca.crt \
        --peers="https://172.16.4.18:2379,https://172.16.4.27:2379" \ (1)
        ls /
    1 Ensure that you specify the URLs of only active etcd members in the --peers parameter value.
  4. If you have additional etcd members to add to your cluster, continue to Adding Additional etcd Members. Otherwise, if you only want a single node standalone etcd, continue to Bringing OKD services Back Online.

Non-Containerized etcd Deployments

  1. Create the new single node cluster using etcd’s --force-new-cluster option. You can do this with a long, complex command using the values from /etc/etcd/etcd.conf, or you can temporarily modify the systemd unit file and start the service normally.

    To do so, edit the /usr/lib/systemd/system/etcd.service file, and add --force-new-cluster:

    # sed -i '/ExecStart/s/"$/  --force-new-cluster"/' /usr/lib/systemd/system/etcd.service
    # systemctl show etcd.service --property ExecStart --no-pager
    
    ExecStart=/bin/bash -c "GOMAXPROCS=$(nproc) /usr/bin/etcd --force-new-cluster"

    Then restart the etcd service:

    # systemctl daemon-reload
    # systemctl start etcd
  2. Verify the etcd service started correctly, then re-edit the /usr/lib/systemd/system/etcd.service file and remove the --force-new-cluster option:

    # sed -i '/ExecStart/s/ --force-new-cluster//' /usr/lib/systemd/system/etcd.service
    # systemctl show etcd.service --property ExecStart --no-pager
    
    ExecStart=/bin/bash -c "GOMAXPROCS=$(nproc) /usr/bin/etcd"
  3. Restart the etcd service, then verify the etcd cluster is running correctly and displays OKD’s configuration:

    # systemctl daemon-reload
    # systemctl restart etcd
    # etcdctl --cert-file=/etc/etcd/peer.crt \
        --key-file=/etc/etcd/peer.key \
        --ca-file=/etc/etcd/ca.crt \
        --peers="https://172.16.4.18:2379,https://172.16.4.27:2379" \ (1)
        ls /
    1 Ensure that you specify the URLs of only active etcd members in the --peers parameter value.
  4. If you have additional etcd members to add to your cluster, continue to Adding Additional etcd Members. Otherwise, if you only want a single node separate etcd cluster, continue to Bringing OKD services Back Online.

Adding Additional etcd Members

To add additional etcd members to the cluster, you must first adjust the default localhost peer in the peerURLs value for the first member:

  1. Get the member ID for the first member using the member list command:

    # etcdctl --cert-file=/etc/etcd/peer.crt \
        --key-file=/etc/etcd/peer.key \
        --ca-file=/etc/etcd/ca.crt \
        --peers="https://172.18.1.18:2379,https://172.18.9.202:2379,https://172.18.0.75:2379" \ (1)
        member list
    1 Ensure that you specify the URLs of only active etcd members in the --peers parameter value.
  2. Obtain the IP address where etcd listens for cluster peers:

    $ ss -l4n | grep 2380
  3. Update the value of peerURLs using the etcdctl member update command by passing the member ID and IP address obtained from the previous steps:

    # etcdctl --cert-file=/etc/etcd/peer.crt \
        --key-file=/etc/etcd/peer.key \
        --ca-file=/etc/etcd/ca.crt \
        --peers="https://172.18.1.18:2379,https://172.18.9.202:2379,https://172.18.0.75:2379" \
        member update 511b7fb6cc0001 https://172.18.1.18:2380

    Alternatively, you can use curl:

    # curl --cacert /etc/etcd/ca.crt \
        --cert /etc/etcd/peer.crt \
        --key /etc/etcd/peer.key \
        https://172.18.1.18:2379/v2/members/511b7fb6cc0001 \
        -XPUT -H "Content-Type: application/json" \
        -d '{"peerURLs":["https://172.18.1.18:2380"]}'
  4. Re-run the member list command and ensure the peer URLs no longer include localhost.

  5. Now, add each additional member to the cluster one at a time.

    Each member must be fully added and brought online one at a time. When adding each additional member to the cluster, the peerURLs list must be correct for that point in time, so it will grow by one for each member added. The etcdctl member add command will output the values that need to be set in the etcd.conf file as you add each member, as described in the following instructions.

    1. For each member, add it to the cluster using the values that can be found in that system’s etcd.conf file:

      # etcdctl --cert-file=/etc/etcd/peer.crt \
          --key-file=/etc/etcd/peer.key \
          --ca-file=/etc/etcd/ca.crt \
          --peers="https://172.16.4.18:2379,https://172.16.4.27:2379" \
          member add 10.3.9.222 https://172.16.4.27:2380 (1)
      
      Added member named 10.3.9.222 with ID 4e1db163a21d7651 to cluster
      
      ETCD_NAME="10.3.9.222"
      ETCD_INITIAL_CLUSTER="10.3.9.221=https://172.16.4.18:2380,10.3.9.222=https://172.16.4.27:2380"
      ETCD_INITIAL_CLUSTER_STATE="existing"
      1 In this line, 10.3.9.222 is a label for the etcd member. You can specify the host name, IP address, or a simple name.
    2. Using the environment variables provided in the output of the above etcdctl member add command, edit the /etc/etcd/etcd.conf file on the member system itself and ensure these settings match. If you previously used the member system as an etcd node, you must still overwrite the current values in the /etc/etcd/etcd.conf file.

    3. Now start etcd on the new member:

      # rm -rf /var/lib/etcd/member
      # systemctl enable etcd
      # systemctl start etcd
    4. Ensure the service starts correctly and the etcd cluster is now healthy:

      # etcdctl --cert-file=/etc/etcd/peer.crt \
          --key-file=/etc/etcd/peer.key \
          --ca-file=/etc/etcd/ca.crt \
          --peers="https://172.16.4.18:2379,https://172.16.4.27:2379" \
          member list
      
      51251b34b80001: name=10.3.9.221 peerURLs=https://172.16.4.18:2380 clientURLs=https://172.16.4.18:2379
      d266df286a41a8a4: name=10.3.9.222 peerURLs=https://172.16.4.27:2380 clientURLs=https://172.16.4.27:2379
      
      # etcdctl --cert-file=/etc/etcd/peer.crt \
          --key-file=/etc/etcd/peer.key \
          --ca-file=/etc/etcd/ca.crt \
          --peers="https://172.16.4.18:2379,https://172.16.4.27:2379" \
          cluster-health
      
      cluster is healthy
      member 51251b34b80001 is healthy
      member d266df286a41a8a4 is healthy
    5. Now repeat this process for the next member to add to the cluster.

  6. After all additional etcd members have been added, continue to Bringing OKD services Back Online.

Adding New etcd Hosts

In cases where etcd members have failed and you still have a quorum of etcd cluster members running, you can use the surviving members to add additional etcd members without downtime.

Suggested Cluster Size

Having a cluster with an odd number of etcd hosts can account for fault tolerance. Having an odd number of etcd hosts does not change the number needed for a quorum, but increases the tolerance for failure. For example, a cluster size of three members, quorum is two leaving a failure tolerance of one. This ensures the cluster will continue to operate if two of the members are healthy.

Having an in-production cluster of three etcd hosts is recommended.

The following presumes you have a backup of the /etc/etcd configuration for the etcd hosts.

  1. If the new etcd members will also be OKD nodes, see Add the desired number of hosts to the cluster. The rest of this procedure presumes you have added just one host, but if adding multiple, perform all steps on each host.

  2. Upgrade etcd and iptables on the surviving nodes:

    # yum update etcd iptables-services

    Ensure version etcd-2.3.7-4.el7.x86_64 or greater is installed, and that the same version is installed on each host.

  3. Install etcd and iptables on the new host

    # yum install etcd iptables-services

    Ensure version etcd-2.3.7-4.el7.x86_64 or greater is installed, and that the same version is installed on the new host.

  4. Backup the etcd data store on surviving hosts before making any cluster configuration changes.

  5. If replacing a failed etcd member, remove the failed member before adding the new member.

    # etcdctl -C https://<surviving host IP>:2379 \
      --ca-file=/etc/etcd/ca.crt     \
      --cert-file=/etc/etcd/peer.crt     \
      --key-file=/etc/etcd/peer.key cluster-health
    
    # etcdctl -C https://<surviving host IP>:2379 \
      --ca-file=/etc/etcd/ca.crt     \
      --cert-file=/etc/etcd/peer.crt     \
      --key-file=/etc/etcd/peer.key member remove <failed member identifier>

    Stop the etcd service on the failed etcd member:

    # systemctl stop etcd
  6. On the new host, add the appropriate iptables rules:

    # systemctl enable iptables.service --now
    # iptables -N OS_FIREWALL_ALLOW
    # iptables -t filter -I INPUT -j OS_FIREWALL_ALLOW
    # iptables -A OS_FIREWALL_ALLOW -p tcp -m state \
      --state NEW -m tcp --dport 2379 -j ACCEPT
    # iptables -A OS_FIREWALL_ALLOW -p tcp -m state \
      --state NEW -m tcp --dport 2380 -j ACCEPT
    # iptables-save > /etc/sysconfig/iptables
  7. Generate the required certificates for the new host. On a surviving etcd host:

    1. Make a backup of the /etc/etcd/ca/ directory. Ensure that the directory contains the etcd certificates.

    2. Set the variables and working directory for the certificates, ensuring to create the PREFIX directory if one has not been created:

      # cd /etc/etcd
      # export NEW_ETCD="<NEW_HOST_NAME>"
      
      # export CN=$NEW_ETCD
      # export SAN="IP:<NEW_HOST_IP>"
      # export PREFIX="./generated_certs/etcd-$CN/"
    3. Create the $PREFIX directory:

      $ mkdir -p $PREFIX
    4. Create the server.csr and server.crt certificates:

      # openssl req -new -keyout ${PREFIX}server.key \
        -config ca/openssl.cnf \
        -out ${PREFIX}server.csr \
        -reqexts etcd_v3_req -batch -nodes \
        -subj /CN=$CN
      
      # openssl ca -name etcd_ca -config ca/openssl.cnf \
        -out ${PREFIX}server.crt \
        -in ${PREFIX}server.csr \
        -extensions etcd_v3_ca_server -batch
    5. Create the peer.csr and peer.crt certificates:

      # openssl req -new -keyout ${PREFIX}peer.key \
        -config ca/openssl.cnf \
        -out ${PREFIX}peer.csr \
        -reqexts etcd_v3_req -batch -nodes \
        -subj /CN=$CN
      
      # openssl ca -name etcd_ca -config ca/openssl.cnf \
        -out ${PREFIX}peer.crt \
        -in ${PREFIX}peer.csr \
        -extensions etcd_v3_ca_peer -batch
    6. Copy the etcd.conf and ca.crt files, and archive the contents of the directory:

      # cp etcd.conf ${PREFIX}
      # cp ca.crt ${PREFIX}
      # tar -czvf ${PREFIX}${CN}.tgz -C ${PREFIX} .
    7. Transfer the files to the new etcd hosts:

      # scp ${PREFIX}${CN}.tgz  $CN:/etc/etcd/
  8. While still on the surviving etcd host, add the new host to the cluster:

    1. Add the new host to the cluster:

      # export ETCD_CA_HOST="<SURVIVING_ETCD_HOSTNAME>"
      # export NEW_ETCD="<NEW_ETCD_HOSTNAME>"
      # export NEW_ETCD_IP="<NEW_HOST_IP>"
      
      # etcdctl -C https://${ETCD_CA_HOST}:2379 \
        --ca-file=/etc/etcd/ca.crt     \
        --cert-file=/etc/etcd/peer.crt     \
        --key-file=/etc/etcd/peer.key member add ${NEW_ETCD} https://${NEW_ETCD_IP}:2380
      
      ETCD_NAME="<NEW_ETCD_HOSTNAME>"
      ETCD_INITIAL_CLUSTER="<NEW_ETCD_HOSTNAME>=https://<NEW_HOST_IP>:2380,<SURVIVING_ETCD_HOST>=https:/<SURVIVING_HOST_IP>:2380
      ETCD_INITIAL_CLUSTER_STATE="existing"

      Copy the three environment variables in the etcdctl member add output. They will be used later.

    2. On the new host, extract the copied configuration data and set the permissions:

      # tar -xf /etc/etcd/<NEW_ETCD_HOSTNAME>.tgz -C /etc/etcd/ --overwrite
      # chown -R etcd:etcd /etc/etcd/*
    3. On the new host, remove any etcd data:

      # rm -rf /var/lib/etcd/member
      # chown -R etcd:etcd /var/lib/etcd
  9. On the new etcd host, update the etcd.conf file:

    1. Replace the following with the values generated in the previous step:

      • ETCD_NAME

      • ETCD_INITIAL_CLUSTER

      • ETCD_INITIAL_CLUSTER_STATE

    2. Replace the IP address with the "NEW_ETCD" value for:

      • ETCD_LISTEN_PEER_URLS

      • ETCD_LISTEN_CLIENT_URLS

      • ETCD_INITIAL_ADVERTISE_PEER_URLS

      • ETCD_ADVERTISE_CLIENT_URLS

    3. For replacing failed members, replace the failed hosts with the new hosts.

  10. To ensure the etcd configuration does not use the failed host when the etcd service is restarted, modify the etcd.conf file on all remaining etcd hosts and remove the failed host in the value for the ETCD_INITIAL_CLUSTER variable.

  11. On the node that hosts the installation files, update the [etcd] hosts group in the /etc/ansible/hosts inventory file. Remove the old etcd hosts and add the new ones.

  12. Start etcd on the new host:

    # systemctl enable etcd --now
  13. To verify that the new member has been added successfully:

    etcdctl -C https://${ETCD_CA_HOST}:2379 --ca-file=/etc/etcd/ca.crt \
      --cert-file=/etc/etcd/peer.crt     \
      --key-file=/etc/etcd/peer.key cluster-health
  14. Update the master configuration on all masters to point to the new etcd host

    1. On every master in the cluster, edit /etc/origin/master/master-config.yaml

    2. Find the etcdClientInfo section.

    3. Add the new etcd host to the urls list.

    4. If a failed etcd host was replaced, remove it from the list.

    5. Restart the master API service.

      On each master:

      # systemctl restart atomic-openshift-master-api atomic-openshift-master-controllers

The procedure to add an etcd member is complete.

Bringing OKD services Back Online

On each OKD master, restore your master and node configuration from backup and enable and restart all relevant services.

On the master in a single master cluster:

# cp ${MYBACKUPDIR}/etc/sysconfig/atomic-openshift-master /etc/sysconfig/atomic-openshift-master
# cp ${MYBACKUPDIR}/etc/origin/master/master-config.yaml.<timestamp> /etc/origin/master/master-config.yaml
# cp ${MYBACKUPDIR}/etc/origin/node/node-config.yaml.<timestamp> /etc/origin/node/node-config.yaml
# systemctl enable atomic-openshift-master
# systemctl enable atomic-openshift-node
# systemctl start atomic-openshift-master
# systemctl start atomic-openshift-node

On each master in a multi-master cluster:

# cp ${MYBACKUPDIR}/etc/sysconfig/atomic-openshift-master-api /etc/sysconfig/atomic-openshift-master-api
# cp ${MYBACKUPDIR}/etc/sysconfig/atomic-openshift-master-controllers /etc/sysconfig/atomic-openshift-master-controllers
# cp ${MYBACKUPDIR}/etc/origin/master/master-config.yaml.<timestamp> /etc/origin/master/master-config.yaml
# cp ${MYBACKUPDIR}/etc/origin/node/node-config.yaml.<timestamp> /etc/origin/node/node-config.yaml
# systemctl enable atomic-openshift-master-api
# systemctl enable atomic-openshift-master-controllers
# systemctl enable atomic-openshift-node
# systemctl start atomic-openshift-master-api
# systemctl start atomic-openshift-master-controllers
# systemctl start atomic-openshift-node

On each OKD node, restore your node-config.yaml file from backup and enable and restart the atomic-openshift-node service:

# cp /etc/origin/node/node-config.yaml.<timestamp> /etc/origin/node/node-config.yaml
# systemctl enable atomic-openshift-node
# systemctl start atomic-openshift-node

Your OKD cluster should now be back online.

Project Backup

A future release of OKD will feature specific support for per-project back up and restore.

For now, to back up API objects at the project level, use oc export for each object to be saved. For example, to save the deployment configuration frontend in YAML format:

$ oc export dc frontend -o yaml > dc-frontend.yaml

To back up all of the project (with the exception of cluster objects like namespaces and projects):

$ oc export all -o yaml > project.yaml

Role Bindings

Sometimes custom policy role bindings are used in a project. For example, a project administrator can give another user a certain role in the project and grant that user project access.

These role bindings can be exported:

$ oc get rolebindings -o yaml --export=true > rolebindings.yaml

service Accounts

If custom service accounts are created in a project, these need to be exported:

$ oc get serviceaccount -o yaml --export=true > serviceaccount.yaml

Secrets

Custom secrets like source control management secrets (SSH Public Keys, Username/Password) should be exported if they are used:

$ oc get secret -o yaml --export=true > secret.yaml

Persistent Volume Claims

If the application within a project uses a persistent volume through a persistent volume claim (PVC), these should be backed up:

$ oc get pvc -o yaml --export=true > pvc.yaml

Project Restore

To restore a project, recreate the project and recreate all of the objects that were exported during the backup:

$ oc new-project myproject
$ oc create -f project.yaml
$ oc create -f secret.yaml
$ oc create -f serviceaccount.yaml
$ oc create -f pvc.yaml
$ oc create -f rolebindings.yaml

Some resources can fail to be created (for example, pods and default service accounts).

Application Data Backup

In many cases, application data can be backed up using the oc rsync command, assuming rsync is installed within the container image. The Red Hat rhel7 base image does contain rsync. Therefore, all images that are based on rhel7 contain it as well. See Troubleshooting and Debugging CLI Operations - rsync.

This is a generic backup of application data and does not take into account application-specific backup procedures, for example, special export/import procedures for database systems.

Other means of backup may exist depending on the type of the persistent volume (for example, Cinder, NFS, Gluster, or others).

The paths to back up are also application specific. You can determine what path to back up by looking at the mountPath for volumes in the deploymentconfig.

Example of Backing up a Jenkins Deployment’s Application Data
  1. Get the application data mountPath from the deploymentconfig:

    $ oc get dc/jenkins -o jsonpath='{ .spec.template.spec.containers[?(@.name=="jenkins")].volumeMounts[?(@.name=="jenkins-data")].mountPath }'
    /var/lib/jenkins
  2. Get the name of the pod that is currently running:

    $ oc get pod --selector=deploymentconfig=jenkins -o jsonpath='{ .metadata.name }'
    jenkins-1-37nux
  3. Use the oc rsync command to copy application data:

    $ oc rsync jenkins-1-37nux:/var/lib/jenkins /tmp/

This type of application data backup can only be performed while an application pod is currently running.

Application Data Restore

The process for restoring application data is similar to the application backup procedure using the oc rsync tool. The same restrictions apply and the process of restoring application data requires a persistent volume.

Example of Restoring a Jenkins Deployment’s Application Data
  1. Verify the backup:

    $ ls -la /tmp/jenkins-backup/
    total 8
    drwxrwxr-x.  3 user     user   20 Sep  6 11:14 .
    drwxrwxrwt. 17 root     root 4096 Sep  6 11:16 ..
    drwxrwsrwx. 12 user     user 4096 Sep  6 11:14 jenkins
  2. Use the oc rsync tool to copy the data into the running pod:

    $ oc rsync /tmp/jenkins-backup/jenkins jenkins-1-37nux:/var/lib

    Depending on the application, you may be required to restart the application.

  3. Restart the application with new data (optional):

    $ oc delete pod jenkins-1-37nux

    Alternatively, you can scale down the deployment to 0, and then up again:

    $ oc scale --replicas=0 dc/jenkins
    $ oc scale --replicas=1 dc/jenkins