OADP 1.4 release notes - OADP Application backup and restore | Backup and restore

OADP 1.4.1 release notes
OADP 1.4.0 release notes

The release notes for OpenShift API for Data Protection (OADP) describe new features and enhancements, deprecated features, product recommendations, known issues, and resolved issues.

OADP 1.4.1 release notes

The OpenShift API for Data Protection (OADP) 1.4.1 release notes lists new features, resolved issues and bugs, and known issues.

New features

New DPA fields to update client qps and burst

You can now change Velero Server Kubernetes API queries per second and burst values by using the new Data Protection Application (DPA) fields. The new DPA fields are spec.configuration.velero.client-qps and spec.configuration.velero.client-burst, which both default to 100. OADP-4076

Enabling non-default algorithms with Kopia

With this update, you can now configure the hash, encryption, and splitter algorithms in Kopia to select non-default options to optimize performance for different backup workloads.

To configure these algorithms, set the env variable of a velero pod in the podConfig section of the DataProtectionApplication (DPA) configuration. If this variable is not set, or an unsupported algorithm is chosen, Kopia will default to its standard algorithms. OADP-4640

Resolved issues

Restoring a backup without pods is now successful

Previously, restoring a backup without pods and having StorageClass VolumeBindingMode set as WaitForFirstConsumer, resulted in the PartiallyFailed status with an error: fail to patch dynamic PV, err: context deadline exceeded. With this update, patching dynamic PV is skipped and restoring a backup is successful without any PartiallyFailed status. OADP-4231

PodVolumeBackup CR now displays correct message

Previously, the PodVolumeBackup custom resource (CR) generated an incorrect message, which was: get a podvolumebackup with status "InProgress" during the server starting, mark it as "Failed". With this update, the message produced is now:

found a podvolumebackup with status "InProgress" during the server starting,
mark it as "Failed".

OADP-4224

Overriding imagePullPolicy is now possible with DPA

Previously, OADP set the imagePullPolicy parameter to Always for all images. With this update, OADP checks if each image contains sha256 or sha512 digest, then it sets imagePullPolicy to IfNotPresent; otherwise imagePullPolicy is set to Always. You can now override this policy by using the new spec.containerImagePullPolicy DPA field. OADP-4172

OADP Velero can now retry updating the restore status if initial update fails

Previously, OADP Velero failed to update the restored CR status. This left the status at InProgress indefinitely. Components which relied on the backup and restore CR status to determine the completion would fail. With this update, the restore CR status for a restore correctly proceeds to the Completed or Failed status. OADP-3227

Restoring BuildConfig Build from a different cluster is successful without any errors

Previously, when performing a restore of the BuildConfig Build resource from a different cluster, the application generated an error on TLS verification to the internal image registry. The resulting error was failed to verify certificate: x509: certificate signed by unknown authority error. With this update, the restore of the BuildConfig build resources to a different cluster can proceed successfully without generating the failed to verify certificate error. OADP-4692

Restoring an empty PVC is successful

Previously, downloading data failed while restoring an empty persistent volume claim (PVC). It failed with the following error:

data path restore failed: Failed to run kopia restore: Unable to load
    snapshot : snapshot not found

With this update, the downloading of data proceeds to correct conclusion when restoring an empty PVC and the error message is not generated. OADP-3106

There is no Velero memory leak in CSI and DataMover plugins

Previously, a Velero memory leak was caused by using the CSI and DataMover plugins. When the backup ended, the Velero plugin instance was not deleted and the memory leak consumed memory until an Out of Memory (OOM) condition was generated in the Velero pod. With this update, there is no resulting Velero memory leak when using the CSI and DataMover plugins. OADP-4448

Post-hook operation does not start before the related PVs are released

Previously, due to the asynchronous nature of the Data Mover operation, a post-hook might be attempted before the Data Mover persistent volume claim (PVC) releases the persistent volumes (PVs) of the related pods. This problem would cause the backup to fail with a PartiallyFailed status. With this update, the post-hook operation is not started until the related PVs are released by the Data Mover PVC, eliminating the PartiallyFailed backup status. OADP-3140

Deploying a DPA works as expected in namespaces with more than 37 characters

When you install the OADP Operator in a namespace with more than 37 characters to create a new DPA, labeling the "cloud-credentials" Secret fails and the DPA reports the following error:

The generated label name is too long.

With this update, creating a DPA does not fail in namespaces with more than 37 characters in the name. OADP-3960

Restore is successfully completed by overriding the timeout error

Previously, in a large scale environment, the restore operation would result in a Partiallyfailed status with the error: fail to patch dynamic PV, err: context deadline exceeded. With this update, the resourceTimeout Velero server argument is used to override this timeout error resulting in a successful restore. OADP-4344

For a complete list of all issues resolved in this release, see the list of OADP 1.4.1 resolved issues in Jira.

Known issues

Cassandra application pods enter into the CrashLoopBackoff status after restoring OADP

After OADP restores, the Cassandra application pods might enter CrashLoopBackoff status. To work around this problem, delete the StatefulSet pods that are returning the error CrashLoopBackoff state after restoring OADP. The StatefulSet controller then recreates these pods and it runs normally. OADP-4407

deployment referencing ImageStream is not restored properly leading to corrupted pod and volume contents

During a File System Backup (FSB) restore operation, a deployment resource referencing an ImageStream is not restored properly. The restored pod that runs the FSB, and the postHook is terminated prematurely.

During the restore operation, the OpenShift Container Platform controller updates the spec.template.spec.containers[0].image field in the deployment resource with an updated ImageStreamTag hash. The update triggers the rollout of a new pod, terminating the pod on which velero runs the FSB along with the post-hook. For more information about image stream trigger, see Triggering updates on image stream changes.

The workaround for this behavior is a two-step restore process:

Perform a restore excluding the deployment resources, for example:

$ velero restore create <RESTORE_NAME> \
  --from-backup <BACKUP_NAME> \
  --exclude-resources=deployment.apps

Once the first restore is successful, perform a second restore by including these resources, for example:

$ velero restore create <RESTORE_NAME> \
  --from-backup <BACKUP_NAME> \
  --include-resources=deployment.apps

OADP-3954

OADP 1.4.0 release notes

The OpenShift API for Data Protection (OADP) 1.4.0 release notes lists resolved issues and known issues.

Resolved issues

Restore works correctly in OKD 4.16

Previously, while restoring the deleted application namespace, the restore operation partially failed with the resource name may not be empty error in OKD 4.16. With this update, restore works as expected in OKD 4.16. OADP-4075

Data Mover backups work properly in the OKD 4.16 cluster

Previously, Velero was using the earlier version of SDK where the Spec.SourceVolumeMode field did not exist. As a consequence, Data Mover backups failed in the OKD 4.16 cluster on the external snapshotter with version 4.2. With this update, external snapshotter is upgraded to version 7.0 and later. As a result, backups do not fail in the OKD 4.16 cluster. OADP-3922

For a complete list of all issues resolved in this release, see the list of OADP 1.4.0 resolved issues in Jira.

Known issues

Backup fails when checksumAlgorithm is not set for MCG

While performing a backup of any application with Noobaa as the backup location, if the checksumAlgorithm configuration parameter is not set, backup fails. To fix this problem, if you do not provide a value for checksumAlgorithm in the Backup Storage Location (BSL) configuration, an empty value is added. The empty value is only added for BSLs that are created using Data Protection Application (DPA) custom resource (CR), and this value is not added if BSLs are created using any other method. OADP-4274

For a complete list of all known issues in this release, see the list of OADP 1.4.0 known issues in Jira.

Upgrade notes

Always upgrade to the next minor version. Do not skip versions. To update to a later version, upgrade only one channel at a time. For example, to upgrade from OpenShift API for Data Protection (OADP) 1.1 to 1.3, upgrade first to 1.2, and then to 1.3.

Changes from OADP 1.3 to 1.4

The Velero server has been updated from version 1.12 to 1.14. Note that there are no changes in the Data Protection Application (DPA).

This changes the following:

The velero-plugin-for-csi code is now available in the Velero code, which means an init container is no longer required for the plugin.
Velero changed client Burst and QPS defaults from 30 and 20 to 100 and 100, respectively.
The velero-plugin-for-aws plugin updated default value of the spec.config.checksumAlgorithm field in BackupStorageLocation objects (BSLs) from "" (no checksum calculation) to the CRC32 algorithm. For more information, see Velero plugins for AWS Backup Storage Location. The checksum algorithm types are known to work only with AWS. Several S3 providers require the md5sum to be disabled by setting the checksum algorithm to "". Confirm md5sum algorithm support and configuration with your storage provider.

In OADP 1.4, the default value for BSLs created within DPA for this configuration is "". This default value means that the md5sum is not checked, which is consistent with OADP 1.3. For BSLs created within DPA, update it by using the spec.backupLocations[].velero.config.checksumAlgorithm field in the DPA. If your BSLs are created outside DPA, you can update this configuration by using spec.config.checksumAlgorithm in the BSLs.

Backing up the DPA configuration

You must back up your current DataProtectionApplication (DPA) configuration.

Procedure

Save your current DPA configuration by running the following command:
Example command
```
$ oc get dpa -n openshift-adp -o yaml > dpa.orig.backup
```

Upgrading the OADP Operator

Use the following procedure when upgrading the OpenShift API for Data Protection (OADP) Operator.

Procedure

Change your subscription channel for the OADP Operator from stable-1.3 to stable-1.4.
Wait for the Operator and containers to update and restart.

Additional resources

Updating installed Operators

Converting DPA to the new version

To upgrade from OADP 1.3 to 1.4, no Data Protection Application (DPA) changes are required.

Verifying the upgrade

Use the following procedure to verify the upgrade.

Procedure

Verify the installation by viewing the OpenShift API for Data Protection (OADP) resources by running the following command:

$ oc get all -n openshift-adp

Example output

NAME                                                     READY   STATUS    RESTARTS   AGE
pod/oadp-operator-controller-manager-67d9494d47-6l8z8    2/2     Running   0          2m8s
pod/restic-9cq4q                                         1/1     Running   0          94s
pod/restic-m4lts                                         1/1     Running   0          94s
pod/restic-pv4kr                                         1/1     Running   0          95s
pod/velero-588db7f655-n842v                              1/1     Running   0          95s

NAME                                                       TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/oadp-operator-controller-manager-metrics-service   ClusterIP   172.30.70.140    <none>        8443/TCP   2m8s

NAME                    DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/restic   3         3         3       3            3           <none>          96s

NAME                                                READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/oadp-operator-controller-manager    1/1     1            1           2m9s
deployment.apps/velero                              1/1     1            1           96s

NAME                                                           DESIRED   CURRENT   READY   AGE
replicaset.apps/oadp-operator-controller-manager-67d9494d47    1         1         1       2m9s
replicaset.apps/velero-588db7f655                              1         1         1       96s

Verify that the DataProtectionApplication (DPA) is reconciled by running the following command:

$ oc get dpa dpa-sample -n openshift-adp -o jsonpath='{.status}'

Example output

{"conditions":[{"lastTransitionTime":"2023-10-27T01:23:57Z","message":"Reconcile complete","reason":"Complete","status":"True","type":"Reconciled"}]}

Verify the type is set to Reconciled.

Verify the backup storage location and confirm that the PHASE is Available by running the following command:

$ oc get backupStorageLocation -n openshift-adp

Example output

NAME           PHASE       LAST VALIDATED   AGE     DEFAULT
dpa-sample-1   Available   1s               3d16h   true