About disaster recovery - Disaster recovery | Backup and restore

This is a cache of https://docs.openshift.com/container-platform/4.5/backup_and_restore/disaster_recovery/about-disaster-recovery.html. It is a snapshot of the page at 2024-11-26T00:51:46.803+0000.

About disaster recovery - Disaster recovery | Backup and restore | OpenShift Container Platform 4.5

About
- Welcome
- About OpenShift Kubernetes Engine
- Legal notice
Release notes
- OpenShift Container Platform 4.5 release notes
- Versioning policy
Architecture
- Product architecture
- Installation and update
- The control plane
- Understanding OpenShift development
- Red Hat Enterprise Linux CoreOS
- The CI/CD methodology and practice
- Using ArgoCD
- Admission plug-ins
Installing
- Mirroring images for a disconnected installation
- Installing on AWS
- Installing on Azure
- Installing on GCP
- Installing on bare metal
- Installing on IBM Z and LinuxONE
  - Installing a cluster on IBM Z and LinuxONE
  - Restricted network IBM Z installation
- Installing on IBM Power
  - Installing a cluster on IBM Power
  - Restricted network IBM Power installation
- Installing on OpenStack
- Installing on RHV
- Installing on vSphere
- Installation configuration
- Troubleshooting installation issues
- Support for FIPS cryptography
Updating clusters
- Updating a cluster between minor versions
- Updating a cluster within a minor version from the web console
- Updating a cluster within a minor version by using the CLI
- Updating a cluster that includes RHEL compute machines
- Updating a restricted network cluster
Post-installation configuration
- Cluster tasks
- Node tasks
- Network configuration
- Storage configuration
- Preparing for users
Support
- Getting support
- Remote health monitoring with connected clusters
- Gathering data about your cluster
- Summarizing cluster specifications
- Troubleshooting
Web console
- Accessing the web console
- Viewing cluster information
- Configuring the web console
- Customizing the web console
- Developer perspective
- Web terminal
- Disabling the web console
Security
- Container security
- Configuring certificates
- Certificate types and descriptions
- Viewing audit logs
- Allowing JavaScript-based access to the API server from additional hosts
- Encrypting etcd data
- Scanning pods for vulnerabilities
Authentication and authorization
- Understanding authentication
- Configuring the internal OAuth server
- Understanding identity provider configuration
- Configuring identity providers
- Using RBAC to define and apply permissions
- Removing the kubeadmin user
- Understanding and creating service accounts
- Using service accounts in applications
- Using a service account as an OAuth client
- Scoping tokens
- Using bound service account tokens
- Managing security context constraints
- Impersonating the system:admin user
- Syncing LDAP groups
- Creating and using config maps
Networking
- Understanding networking
- Accessing hosts
- Understanding the Cluster Network Operator
- Understanding the DNS Operator
- Understanding the Ingress Operator
- Using SCTP
- Configuring PTP hardware
- Network policy
- Multiple networks
- Hardware networks
- OpenShift SDN default CNI network provider
- OVN-Kubernetes default CNI network provider
- Configuring Routes
  - Route configuration
  - Secured routes
- Configuring ingress cluster traffic
- Configuring the cluster-wide proxy
- Configuring a custom PKI
- Load balancing on OpenStack
Storage
- Understanding ephemeral storage
- Understanding persistent storage
- Configuring persistent storage
- Using Container Storage Interface (CSI)
- Expanding persistent volumes
- Dynamic provisioning
Registry
- Overview
- Image Registry Operator in OpenShift Container Platform
- Setting up and configuring the registry
- Registry options
- Accessing the registry
- Exposing the registry
Operators
- Understanding Operators
- User tasks
- Administrator tasks
- Developing Operators
- Red Hat Operators reference
Builds
- Understanding image builds
- Understanding build configurations
- Creating build inputs
- Managing build output
- Using build strategies
- Custom image builds with Buildah
- Performing basic builds
- Triggering and modifying builds
- Performing advanced builds
- Using Red Hat subscriptions in builds
- Securing builds by strategy
- Build configuration resources
- Troubleshooting builds
- Setting up additional trusted certificate authorities for builds
Pipelines
- Understanding OpenShift Pipelines
- Installing OpenShift Pipelines
- Uninstalling OpenShift Pipelines
- Creating CI/CD solutions for applications using OpenShift Pipelines
- Working with Pipelines using the Developer perspective
- OpenShift Pipelines release notes
Images
- Configuring the Cluster Samples Operator
- Using the Cluster Samples Operator with an alternate registry
- Understanding containers, images, and imagestreams
- Creating images
- Managing images
- Managing image streams
- Using image streams with Kubernetes resources
- Triggering updates on image stream changes
- Image configuration resources
- Using templates
- Using Ruby on Rails
- Using images
Applications
- Projects
- Application life cycle management
- Deployments
- Quotas
  - Resource quotas per project
  - Resource quotas across multiple projects
- Monitoring project and application metrics using the Developer perspective
- Monitoring application health
- Idling applications
- Pruning objects to reclaim resources
- Using the Red Hat Marketplace
Machine management
- Creating machine sets
- Manually scaling a machine set
- Modifying a machine set
- Deleting a machine
- Applying autoscaling to a cluster
- Creating infrastructure machine sets
- Adding a RHEL compute machine
- Adding more RHEL compute machines
- User-provisioned infrastructure
- Deploying machine health checks
Nodes
- Working with pods
- Controlling pod placement onto nodes (scheduling)
- Using Jobs and DaemonSets
  - Running background tasks on nodes automatically with daemonsets
  - Running tasks in pods using jobs
- Working with nodes
- Working with containers
- Working with clusters
Logging
- About cluster logging
- Installing cluster logging
- Configuring your cluster logging deployment
- Viewing logs for a specific resource
- Viewing cluster logs in Kibana
- Forwarding logs to third party systems
- Collecting and storing Kubernetes events
- Updating cluster logging
- Troubleshooting cluster logging
- Uninstalling cluster logging
- Exported fields
Monitoring
- Cluster monitoring
- Monitoring your own services
- Exposing custom application metrics for autoscaling
Metering
- About metering
- Installing metering
- Upgrading metering
- Configuring metering
- Reports
  - About reports
  - Storage Locations
- Using metering
- Examples of using metering
- Troubleshooting and debugging
- Uninstalling metering
Scalability and performance
- Recommended installation practices
- Recommended host practices
- Recommended cluster scaling practices
- Using the Node Tuning Operator
- Using Cluster Loader
- Using CPU Manager
- Using Topology Manager
- Scaling the Cluster Monitoring Operator
- Planning your environment according to object maximums
- Optimizing storage
- Optimizing routing
- Optimizing networking
- What huge pages do and how they are consumed by apps
Backup and restore
- Backing up etcd data
- Replacing an unhealthy etcd member
- Shutting down a cluster gracefully
- Restarting a cluster gracefully
- Disaster recovery
Migrating from OpenShift Container Platform 3 to 4
- About migrating from OpenShift Container Platform 3 to 4
- Differences between OpenShift Container Platform 3 and 4
- About MTC
- Installing MTC
- Installing MTC in a restricted network environment
- Upgrading MTC
- Premigration checklists
- Migrating your applications
- Advanced migration options
- Troubleshooting
Migration Toolkit for Containers
- About MTC
- Installing MTC
- Installing MTC in a restricted network environment
- Upgrading MTC
- Premigration checklists
- Migrating your applications
- Advanced migration options
- Troubleshooting
CLI tools
- OpenShift CLI (oc)
- Developer CLI (odo)
- Helm CLI
  - Getting started with Helm on OpenShift Container Platform
- Knative CLI (kn) for use with OpenShift Serverless
- Pipelines CLI (tkn)
API reference
- API list
- Common object reference
  - Index
- Authorization APIs
  - About Authorization APIs
  - LocalResourceAccessReview [authorization.openshift.io/v1]
  - LocalSubjectAccessReview [authorization.openshift.io/v1]
  - ResourceAccessReview [authorization.openshift.io/v1]
  - SelfSubjectRulesReview [authorization.openshift.io/v1]
  - SubjectAccessReview [authorization.openshift.io/v1]
  - SubjectRulesReview [authorization.openshift.io/v1]
  - TokenReview [authentication.k8s.io/v1]
  - LocalSubjectAccessReview [authorization.k8s.io/v1]
  - SelfSubjectAccessReview [authorization.k8s.io/v1]
  - SelfSubjectRulesReview [authorization.k8s.io/v1]
  - SubjectAccessReview [authorization.k8s.io/v1]
- Autoscale APIs
  - About Autoscale APIs
  - ClusterAutoscaler [autoscaling.openshift.io/v1]
  - MachineAutoscaler [autoscaling.openshift.io/v1beta1]
  - HorizontalPodAutoscaler [autoscaling/v1]
- Config APIs
  - About Config APIs
  - APIServer [config.openshift.io/v1]
  - Authentication [config.openshift.io/v1]
  - Build [config.openshift.io/v1]
  - ClusterOperator [config.openshift.io/v1]
  - ClusterVersion [config.openshift.io/v1]
  - Console [config.openshift.io/v1]
  - DNS [config.openshift.io/v1]
  - FeatureGate [config.openshift.io/v1]
  - Image [config.openshift.io/v1]
  - Infrastructure [config.openshift.io/v1]
  - Ingress [config.openshift.io/v1]
  - Network [config.openshift.io/v1]
  - OAuth [config.openshift.io/v1]
  - OperatorHub [config.openshift.io/v1]
  - Project [config.openshift.io/v1]
  - Proxy [config.openshift.io/v1]
  - Scheduler [config.openshift.io/v1]
- Console APIs
  - About Console APIs
  - ConsoleCLIDownload [console.openshift.io/v1]
  - ConsoleExternalLogLink [console.openshift.io/v1]
  - ConsoleLink [console.openshift.io/v1]
  - ConsoleNotification [console.openshift.io/v1]
  - ConsoleYAMLSample [console.openshift.io/v1]
- Extension APIs
  - About Extension APIs
  - APIService [apiregistration.k8s.io/v1]
  - CustomResourceDefinition [apiextensions.k8s.io/v1]
  - MutatingWebhookConfiguration [admissionregistration.k8s.io/v1]
  - ValidatingWebhookConfiguration [admissionregistration.k8s.io/v1]
- Image APIs
  - About Image APIs
  - Image [image.openshift.io/v1]
  - ImageSignature [image.openshift.io/v1]
  - ImageStreamImage [image.openshift.io/v1]
  - ImageStreamImport [image.openshift.io/v1]
  - ImageStreamMapping [image.openshift.io/v1]
  - ImageStream [image.openshift.io/v1]
  - ImageStreamTag [image.openshift.io/v1]
  - ImageTag [image.openshift.io/v1]
- Machine APIs
  - About Machine APIs
  - ContainerRuntimeConfig [machineconfiguration.openshift.io/v1]
  - ControllerConfig [machineconfiguration.openshift.io/v1]
  - KubeletConfig [machineconfiguration.openshift.io/v1]
  - MachineConfigPool [machineconfiguration.openshift.io/v1]
  - MachineConfig [machineconfiguration.openshift.io/v1]
  - MachineHealthCheck [machine.openshift.io/v1beta1]
  - Machine [machine.openshift.io/v1beta1]
  - MachineSet [machine.openshift.io/v1beta1]
- Metadata APIs
  - About Metadata APIs
  - Binding [core/v1]
  - ComponentStatus [core/v1]
  - ConfigMap [core/v1]
  - ControllerRevision [apps/v1]
  - Event [events.k8s.io/v1beta1]
  - Event [core/v1]
  - Lease [coordination.k8s.io/v1]
  - Namespace [core/v1]
- Monitoring APIs
  - About Monitoring APIs
  - Alertmanager [monitoring.coreos.com/v1]
  - PodMonitor [monitoring.coreos.com/v1]
  - Prometheus [monitoring.coreos.com/v1]
  - PrometheusRule [monitoring.coreos.com/v1]
  - ServiceMonitor [monitoring.coreos.com/v1]
  - ThanosRuler [monitoring.coreos.com/v1]
- Network APIs
  - About Network APIs
  - ClusterNetwork [network.openshift.io/v1]
  - Endpoints [core/v1]
  - EndpointSlice [discovery.k8s.io/v1beta1]
  - EgressNetworkPolicy [network.openshift.io/v1]
  - HostSubnet [network.openshift.io/v1]
  - Ingress [networking.k8s.io/v1beta1]
  - IngressClass [networking.k8s.io/v1beta1]
  - NetNamespace [network.openshift.io/v1]
  - NetworkAttachmentDefinition [k8s.cni.cncf.io/v1]
  - NetworkPolicy [networking.k8s.io/v1]
  - Route [route.openshift.io/v1]
  - Service [core/v1]
- Node APIs
  - About Node APIs
  - Node [core/v1]
  - Profile [tuned.openshift.io/v1]
  - RuntimeClass [node.k8s.io/v1beta1]
  - Tuned [tuned.openshift.io/v1]
- OAuth APIs
  - About OAuth APIs
  - OAuthAccessToken [oauth.openshift.io/v1]
  - OAuthAuthorizeToken [oauth.openshift.io/v1]
  - OAuthClientAuthorization [oauth.openshift.io/v1]
  - OAuthClient [oauth.openshift.io/v1]
- Operator APIs
  - About Operator APIs
  - Authentication [operator.openshift.io/v1]
  - Console [operator.openshift.io/v1]
  - Config [operator.openshift.io/v1]
  - Config [imageregistry.operator.openshift.io/v1]
  - Config [samples.operator.openshift.io/v1]
  - CSISnapshotController [operator.openshift.io/v1]
  - DNS [operator.openshift.io/v1]
  - DNSRecord [ingress.operator.openshift.io/v1]
  - Etcd [operator.openshift.io/v1]
  - ImageContentSourcePolicy [operator.openshift.io/v1alpha1]
  - ImagePruner [imageregistry.operator.openshift.io/v1]
  - IngressController [operator.openshift.io/v1]
  - KubeAPIServer [operator.openshift.io/v1]
  - KubeControllerManager [operator.openshift.io/v1]
  - KubeScheduler [operator.openshift.io/v1]
  - KubeStorageVersionMigrator [operator.openshift.io/v1]
  - Network [operator.openshift.io/v1]
  - OpenShiftAPIServer [operator.openshift.io/v1]
  - OpenShiftControllerManager [operator.openshift.io/v1]
  - ServiceCA [operator.openshift.io/v1]
- OperatorHub APIs
  - About OperatorHub APIs
  - CatalogSource [operators.coreos.com/v1alpha1]
  - ClusterServiceVersion [operators.coreos.com/v1alpha1]
  - InstallPlan [operators.coreos.com/v1alpha1]
  - OperatorGroup [operators.coreos.com/v1]
  - OperatorSource [operators.coreos.com/v1]
  - PackageManifest [packages.operators.coreos.com/v1]
  - Subscription [operators.coreos.com/v1alpha1]
- Policy APIs
  - About Policy APIs
  - PodDisruptionBudget [policy/v1beta1]
- Project APIs
  - About Project APIs
  - Project [project.openshift.io/v1]
  - ProjectRequest [project.openshift.io/v1]
- RBAC APIs
  - About RBAC APIs
  - ClusterRoleBinding [rbac.authorization.k8s.io/v1]
  - ClusterRole [rbac.authorization.k8s.io/v1]
  - RoleBinding [rbac.authorization.k8s.io/v1]
  - Role [rbac.authorization.k8s.io/v1]
- Role APIs
  - About Role APIs
  - ClusterRoleBinding [authorization.openshift.io/v1]
  - ClusterRole [authorization.openshift.io/v1]
  - RoleBindingRestriction [authorization.openshift.io/v1]
  - RoleBinding [authorization.openshift.io/v1]
  - Role [authorization.openshift.io/v1]
- Schedule and quota APIs
  - About Schedule and quota APIs
  - AppliedClusterResourceQuota [quota.openshift.io/v1]
  - ClusterResourceQuota [quota.openshift.io/v1]
  - LimitRange [core/v1]
  - PriorityClass [scheduling.k8s.io/v1]
  - ResourceQuota [core/v1]
- Security APIs
  - About Security APIs
  - CertificateSigningRequest [certificates.k8s.io/v1beta1]
  - CredentialsRequest [cloudcredential.openshift.io/v1]
  - PodSecurityPolicyReview [security.openshift.io/v1]
  - PodSecurityPolicySelfSubjectReview [security.openshift.io/v1]
  - PodSecurityPolicySubjectReview [security.openshift.io/v1]
  - RangeAllocation [security.openshift.io/v1]
  - Secret [core/v1]
  - SecurityContextConstraints [security.openshift.io/v1]
  - ServiceAccount [core/v1]
- Storage APIs
  - About Storage APIs
  - CSIDriver [storage.k8s.io/v1]
  - CSINode [storage.k8s.io/v1]
  - PersistentVolumeClaim [core/v1]
  - StorageClass [storage.k8s.io/v1]
  - VolumeAttachment [storage.k8s.io/v1]
  - VolumeSnapshot [snapshot.storage.k8s.io/v1beta1]
  - VolumeSnapshotClass [snapshot.storage.k8s.io/v1beta1]
  - VolumeSnapshotContent [snapshot.storage.k8s.io/v1beta1]
- Template APIs
  - About Template APIs
  - BrokerTemplateInstance [template.openshift.io/v1]
  - PodTemplate [core/v1]
  - Template [template.openshift.io/v1]
  - TemplateInstance [template.openshift.io/v1]
- User and group APIs
  - About User and group APIs
  - Group [user.openshift.io/v1]
  - Identity [user.openshift.io/v1]
  - UserIdentityMapping [user.openshift.io/v1]
  - User [user.openshift.io/v1]
- Workloads APIs
  - About Workloads APIs
  - BuildConfig [build.openshift.io/v1]
  - Build [build.openshift.io/v1]
  - CronJob [batch/v1beta1]
  - DaemonSet [apps/v1]
  - Deployment [apps/v1]
  - DeploymentConfig [apps.openshift.io/v1]
  - Job [batch/v1]
  - Pod [core/v1]
  - ReplicationController [core/v1]
  - PersistentVolume [core/v1]
  - ReplicaSet [apps/v1]
  - StatefulSet [apps/v1]
Service Mesh
- Service Mesh 1.x
Jaeger
- Jaeger release notes
- Jaeger architecture
  - Jaeger architecture
- Jaeger installation
OpenShift Virtualization
- About OpenShift Virtualization
- OpenShift Virtualization release notes
- OpenShift Virtualization installation
- Upgrading OpenShift Virtualization
- Additional security privileges granted for kubevirt-controller and virt-launcher
- Using the CLI tools
- Virtual machines
- Virtual machine templates
- Live migration
- Node maintenance
- Node networking
- Logging, events, and monitoring
Serverless
- Release Notes
- Support
- Getting started
- Installing OpenShift Serverless
- Architecture
  - Knative Serving
  - Knative Eventing
- Creating and managing serverless applications
- High availability on OpenShift Serverless
- Tracing requests
- Knative Serving
- Event workflows
  - Event delivery workflows using brokers and triggers
  - Event delivery workflows using channels
- Event sources
- Networking
- Using metering with OpenShift Serverless
- Integrations
  - Using NVIDIA GPU resources with serverless applications

The disaster recovery documentation provides information for administrators on how to recover from several disaster situations that might occur with their OpenShift Container Platform cluster. As an administrator, you might need to follow one or more of the following procedures in order to return your cluster to a working state.

Disaster recovery requires you to have at least one healthy master host.

Restoring to a previous cluster state

This solution handles situations where you want to restore your cluster to a previous state, for example, if an administrator deletes something critical. This also includes situations where you have lost the majority of your master hosts, leading to etcd quorum loss and the cluster going offline. As long as you have taken an etcd backup, you can follow this procedure to restore your cluster to a previous state.

If applicable, you might also need to recover from expired control plane certificates.

If you have a majority of your masters still available and have an etcd quorum, then follow the procedure to replace a single unhealthy etcd member.

Recovering from expired control plane certificates

This solution handles situations where your control plane certificates have expired. For example, if you shut down your cluster before the first certificate rotation, which occurs 24 hours after installation, your certificates will not be rotated and will expire. You can follow this procedure to recover from expired control plane certificates.