level=error msg="Error backing up item" backup=velero/monitoring error="timed out waiting for all PodVolumeBackups to complete"
Configure OADP timeout parameters for Restic, Velero, Data Mover, CSI snapshots, and item operations to allow complex or resource-intensive processes to complete successfully. This helps you reduce errors, retries, and failures caused by premature termination of backup and restore operations.
Ensure that you balance timeout extensions in a logical manner so that you do not configure excessively long timeouts that might hide underlying issues in the process. Consider and monitor an appropriate timeout value that meets the needs of the process and the overall system performance.
Review the following OADP timeout instructions:
Configure the Restic timeout parameter to prevent backup failures for large persistent volumes or long-running backup operations. This helps you avoid timeout errors when backing up data greater than 500GB or when backups exceed the default one-hour limit.
Use the spec.configuration.nodeAgent.timeout parameter to set the Restic timeout. The default value is 1h.
Use the Restic timeout parameter in the nodeAgent section for the following scenarios:
For Restic backups with total PV data usage that is greater than 500GB.
If backups are timing out with the following error:
level=error msg="Error backing up item" backup=velero/monitoring error="timed out waiting for all PodVolumeBackups to complete"
Edit the values in the spec.configuration.nodeAgent.timeout block of the DataProtectionApplication custom resource (CR) manifest, as shown in the following example:
apiVersion: oadp.openshift.io/v1alpha1
kind: DataProtectionApplication
metadata:
name: <dpa_name>
spec:
configuration:
nodeAgent:
enable: true
uploaderType: restic
timeout: 1h
# ...
Configure the resourceTimeout parameter in the DataProtectionApplication custom resource (CR) to define how long Velero waits for resource availability. Adjusting this timeout helps you prevent errors during large backups, repository readiness checks, and restore operations.
Use the resourceTimeout for the following scenarios:
For backups with total PV data usage that is greater than 1 TB. Use the parameter as a timeout value when Velero tries to clean up or delete the Container Storage Interface (CSI) snapshots, before marking the backup as complete.
A sub-task of this cleanup tries to patch VSC, and this timeout can be used for that task.
To create or ensure a backup repository is ready for filesystem based backups for Restic or Kopia.
To check if the Velero CRD is available in the cluster before restoring the custom resource (CR) or resource from the backup.
Edit the values in the spec.configuration.velero.resourceTimeout block of the DataProtectionApplication CR manifest, as shown in the following example:
apiVersion: oadp.openshift.io/v1alpha1
kind: DataProtectionApplication
metadata:
name: <dpa_name>
spec:
configuration:
velero:
resourceTimeout: 10m
# ...
Configure the defaultItemOperationTimeout parameter in the `DataProtectionApplication`ccustom resource (CR) to define how long Velero waits for backup and restore operations to finish. Adjusting this timeout helps you prevent errors during Container Storage Interface (CSI) Data Mover tasks.
The default value is 1h.
Use the defaultItemOperationTimeout for the following scenarios:
Only with Data Mover 1.2.x.
When defaultItemOperationTimeout is defined in the Data Protection Application (DPA) using the defaultItemOperationTimeout, it applies to both backup and restore operations. You can use itemOperationTimeout to define only the backup or only the restore of those CRs, as described in the following "Item operation timeout - restore", and "Item operation timeout - backup" sections.
Edit the values in the spec.configuration.velero.defaultItemOperationTimeout block of the DataProtectionApplication CR manifest, as shown in the following example:
apiVersion: oadp.openshift.io/v1alpha1
kind: DataProtectionApplication
metadata:
name: <dpa_name>
spec:
configuration:
velero:
defaultItemOperationTimeout: 1h
# ...
Configure the Data Mover timeout parameter in the DataProtectionApplication custom resource (CR) to define how long backup and restore operations run. Adjusting this value helps prevent timeouts in large environments over 500GB or when using the VolumeSnapshotMover plugin. The default value is 10m.
Use the Data Mover timeout for the following scenarios:
If creation of VolumeSnapshotBackups (VSBs) and VolumeSnapshotRestores (VSRs), times out after 10 minutes.
For large scale environments with total PV data usage that is greater than 500GB. Set the timeout for 1h.
With the VolumeSnapshotMover (VSM) plugin.
Edit the values in the spec.features.dataMover.timeout block of the DataProtectionApplication CR manifest, as shown in the following example:
apiVersion: oadp.openshift.io/v1alpha1
kind: DataProtectionApplication
metadata:
name: <dpa_name>
spec:
features:
dataMover:
timeout: 10m
# ...
Configure the CSISnapshotTimeout parameter in the Backup custom resource (CR) to define how long to wait for a CSI snapshot to become ready. Adjusting this timeout prevents errors when using the CSI plugin to take snapshots of large storage volumes that require more time. The default value is 10m.
|
Typically, the default value for |
Edit the values in the spec.csiSnapshotTimeout block of the Backup CR manifest, as shown in the following example:
apiVersion: velero.io/v1
kind: Backup
metadata:
name: <backup_name>
spec:
csiSnapshotTimeout: 10m
# ...
Configure the ItemOperationTimeout parameter in the Restore custom resource (CR) to define how long restore operations wait to complete. Adjusting this timeout prevents failures when Data Mover needs more time to download large storage volumes. The default value is 1h.
Edit the values in the Restore.spec.itemOperationTimeout block of the Restore CR manifest, as shown in the following example:
apiVersion: velero.io/v1
kind: Restore
metadata:
name: <restore_name>
spec:
itemOperationTimeout: 1h
# ...
Configure the ItemOperationTimeout parameter in the Backup custom resource (CR) to define how long asynchronous BackupItemAction operations wait to complete. Adjusting this timeout prevents failures when Data Mover needs more time to upload large storage volumes. The default value is 1h.
Edit the values in the Backup.spec.itemOperationTimeout block of the Backup CR manifest, as shown in the following example:
apiVersion: velero.io/v1
kind: Backup
metadata:
name: <backup_name>
spec:
itemOperationTimeout: 1h
# ...