$ alias velero='oc -n openshift-adp exec deployment/velero -c velero -it -- ./velero'You can debug Velero custom resources (CRs) by using the OpenShift CLI tool or the Velero CLI tool. The Velero CLI tool provides more detailed logs and information.
You can check installation issues, backup and restore CR issues, and Restic issues.
You can collect logs, CR information, and Prometheus metric data by using the must-gather tool.
You can obtain the Velero CLI tool by:
Downloading the Velero CLI tool
Accessing the Velero binary in the Velero deployment in the cluster
You can download and install the Velero CLI tool by following the instructions on the Velero documentation page.
The page includes instructions for:
macOS by using Homebrew
GitHub
Windows by using Chocolatey
You have access to a Kubernetes cluster, v1.16 or later, with dns and container networking enabled.
You have installed kubectl locally.
Open a browser and navigate to "Install the CLI" on the Verleo website.
Follow the appropriate procedure for macOS, GitHub, or Windows.
Download the Velero version appropriate for your version of OADP and OKD according to the table that follows:
| OADP version | Velero version | OKD version | 
|---|---|---|
| 1.0.0 | 4.6 and later | |
| 1.0.1 | 4.6 and later | |
| 1.0.2 | 4.6 and later | |
| 1.0.3 | 4.6 and later | |
| 1.1.0 | 4.9 and later | |
| 1.1.1 | 4.9 and later | |
| 1.1.2 | 4.9 and later | 
You can use a shell command to access the Velero binary in the Velero deployment in the cluster.
Your DataProtectionApplication custom resource has a status of Reconcile complete.
Enter the following command to set the needed alias:
$ alias velero='oc -n openshift-adp exec deployment/velero -c velero -it -- ./velero'You can debug a failed backup or restore by checking Velero custom resources (CRs) and the Velero pod log with the OpenShift CLI tool.
Use the oc describe command to retrieve a summary of warnings and errors associated with a Backup or Restore CR:
$ oc describe <velero_cr> <cr_name>Use the oc logs command to retrieve the Velero pod logs:
$ oc logs pod/<velero>You can specify the Velero log level in the DataProtectionApplication resource as shown in the following example.
| This option is available starting from OADP 1.0.3. | 
apiVersion: oadp.openshift.io/v1alpha1
kind: DataProtectionApplication
metadata:
  name: velero-sample
spec:
  configuration:
    velero:
      logLevel: warningThe following logLevel values are available:
trace
debug
info
warning
error
fatal
panic
It is recommended to use debug for most logs.
You can debug Backup and Restore custom resources (CRs) and retrieve logs with the Velero CLI tool.
The Velero CLI tool provides more detailed information than the OpenShift CLI tool.
Use the oc exec command to run a Velero CLI command:
$ oc -n openshift-adp exec deployment/velero -c velero -- ./velero \
  <backup_restore_cr> <command> <cr_name>$ oc -n openshift-adp exec deployment/velero -c velero -- ./velero \
  backup describe 0e44ae00-5dc3-11eb-9ca8-df7e5254778b-2d8qlUse the velero --help option to list all Velero CLI commands:
$ oc -n openshift-adp exec deployment/velero -c velero -- ./velero \
  --helpUse the velero describe command to retrieve a summary of warnings and errors associated with a Backup or Restore CR:
$ oc -n openshift-adp exec deployment/velero -c velero -- ./velero \
  <backup_restore_cr> describe <cr_name>$ oc -n openshift-adp exec deployment/velero -c velero -- ./velero \
  backup describe 0e44ae00-5dc3-11eb-9ca8-df7e5254778b-2d8qlUse the velero logs command to retrieve the logs of a Backup or Restore CR:
$ oc -n openshift-adp exec deployment/velero -c velero -- ./velero \
  <backup_restore_cr> logs <cr_name>$ oc -n openshift-adp exec deployment/velero -c velero -- ./velero \
  restore logs ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbfVelero has limited abilities to resolve admission webhook issues during a restore. If you have workloads with admission webhooks, you might need to use an additional Velero plugin or make changes to how you restore the workload.
Typically, workloads with admission webhooks require you to create a resource of a specific kind first. This is especially true if your workload has child resources because admission webhooks typically block child resources.
For example, creating or restoring a top-level object such as service.serving.knative.dev typically creates child resources automatically. If you do this first, you will not need to use Velero to create and restore these resources. This avoids the problem of child resources being blocked by an admission webhook that Velero might use.
This section describes the additional steps required to restore resources for several types of Velero backups that use admission webhooks.
You might encounter problems using Velero to back up Knative resources that use admission webhooks.
You can avoid such problems by restoring the top level Service resource first whenever you back up and restore Knative resources that use admission webhooks.
Restore the top level service.serving.knavtive.dev Service resource:
$ velero restore <restore_name> \
  --from-backup=<backup_name> --include-resources \
  service.serving.knavtive.devIf you experience issues when you use Velero to a restore an IBM AppConnect resource that has an admission webhook, you can run the checks in this procedure.
Check if you have any mutating admission plugins of kind: MutatingWebhookConfiguration in the cluster:
$ oc get mutatingwebhookconfigurationsExamine the YAML file of each kind: MutatingWebhookConfiguration to ensure that none of its rules block creation of the objects that are experiencing issues. For more information, see the official Kuberbetes documentation.
Check that any spec.version in type: Configuration.appconnect.ibm.com/v1beta1 used at backup time is supported by the installed Operator.
You might encounter issues caused by using invalid directories or incorrect credentials when you install the Data Protection Application.
The Velero pod log displays the error message, Backup storage contains invalid top-level directories.
The object storage contains top-level directories that are not Velero directories.
If the object storage is not dedicated to Velero, you must specify a prefix for the bucket by setting the spec.backupLocations.velero.objectStorage.prefix parameter in the DataProtectionApplication manifest.
The oadp-aws-registry pod log displays the error message, InvalidAccessKeyId: The AWS Access Key Id you provided does not exist in our records.
The Velero pod log displays the error message, NoCredentialProviders: no valid providers in chain.
The credentials-velero file used to create the Secret object is incorrectly formatted.
Ensure that the credentials-velero file is correctly formatted, as in the following example:
credentials-velero file[default] (1) aws_access_key_id=AKIAIOSFODNN7EXAMPLE (2) aws_secret_access_key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
| 1 | AWS default profile. | 
| 2 | Do not enclose the values with quotation marks ( ",'). | 
You might encounter these common issues with Backup and Restore custom resources (CRs).
The Backup CR displays the error message, InvalidVolume.NotFound: The volume ‘vol-xxxx’ does not exist.
The persistent volume (PV) and the snapshot locations are in different regions.
Edit the value of the spec.snapshotLocations.velero.config.region key in the DataProtectionApplication manifest so that the snapshot location is in the same region as the PV.
Create a new Backup CR.
The status of a Backup CR remains in the InProgress phase and does not complete.
If a backup is interrupted, it cannot be resumed.
Retrieve the details of the Backup CR:
$ oc -n {namespace} exec deployment/velero -c velero -- ./velero \
  backup describe <backup>Delete the Backup CR:
$ oc delete backup <backup> -n openshift-adpYou do not need to clean up the backup location because a Backup CR in progress has not uploaded  files to object storage.
Create a new Backup CR.
The status of a Backup CR without Restic in use remains in the PartiallyFailed phase and does not complete. A snapshot of the affiliated PVC is not created.
If the backup is created based on the CSI snapshot class, but the label is missing, CSI snapshot plugin fails to create a snapshot. As a result, the Velero pod logs an error similar to the following:
+
time="2023-02-17T16:33:13Z" level=error msg="Error backing up item" backup=openshift-adp/user1-backup-check5 error="error executing custom action (groupResource=persistentvolumeclaims, namespace=busy1, name=pvc1-user1): rpc error: code = Unknown desc = failed to get volumesnapshotclass for storageclass ocs-storagecluster-ceph-rbd: failed to get volumesnapshotclass for provisioner openshift-storage.rbd.csi.ceph.com, ensure that the desired volumesnapshot class has the velero.io/csi-volumesnapshot-class label" logSource="/remote-source/velero/app/pkg/backup/backup.go:417" name=busybox-79799557b5-vprqDelete the Backup CR:
$ oc delete backup <backup> -n openshift-adpIf required, clean up the stored data on the BackupStorageLocation to free up space.
Apply label velero.io/csi-volumesnapshot-class=true to the VolumeSnapshotClass object:
$ oc label volumesnapshotclass/<snapclass_name> velero.io/csi-volumesnapshot-class=trueCreate a new Backup CR.
You might encounter these issues when you back up applications with Restic.
The Restic pod log displays the error message: controller=pod-volume-backup error="fork/exec/usr/bin/restic: permission denied".
If your NFS data volumes have root_squash enabled, Restic maps to nfsnobody and does not have permission to create backups.
You can resolve this issue by creating a supplemental group for Restic and adding the group ID to the DataProtectionApplication manifest:
Create a supplemental group for Restic on the NFS data volume.
Set the setgid bit on the NFS directories so that group ownership is inherited.
Add the spec.configuration.restic.supplementalGroups parameter and the group ID to the DataProtectionApplication manifest, as in the following example:
spec:
  configuration:
    restic:
      enable: true
      supplementalGroups:
      - <group_id> (1)| 1 | Specify the supplemental group ID. | 
Wait for the Restic pods to restart so that the changes are applied.
If you create a Restic Backup CR for a namespace, empty the object storage bucket, and then recreate the Backup CR for the same namespace, the recreated Backup CR fails.
The velero pod log displays the following error message: stderr=Fatal: unable to open config file: Stat: The specified key does not exist.\nIs there a repository at the following location?.
Velero does not recreate or update the Restic repository from the ResticRepository manifest if the Restic directories are deleted from object storage. See Velero issue 4421 for more information.
Remove the related Restic repository from the namespace by running the following command:
$ oc delete resticrepository openshift-adp <name_of_the_restic_repository>In the following error log, mysql-persistent is the problematic Restic repository. The name of the repository appears in italics for clarity.
 time="2021-12-29T18:29:14Z" level=info msg="1 errors
 encountered backup up item" backup=velero/backup65
 logSource="pkg/backup/backup.go:431" name=mysql-7d99fc949-qbkds
 time="2021-12-29T18:29:14Z" level=error msg="Error backing up item"
 backup=velero/backup65 error="pod volume backup failed: error running
 restic backup, stderr=Fatal: unable to open config file: Stat: The
 specified key does not exist.\nIs there a repository at the following
 location?\ns3:http://minio-minio.apps.mayap-oadp-
 veleo-1234.qe.devcluster.openshift.com/mayapvelerooadp2/velero1/
 restic/mysql-persistent\n: exit status 1" error.file="/remote-source/
 src/github.com/vmware-tanzu/velero/pkg/restic/backupper.go:184"
 error.function="github.com/vmware-tanzu/velero/
 pkg/restic.(*backupper).BackupPodVolumes"
 logSource="pkg/backup/backup.go:435" name=mysql-7d99fc949-qbkdsYou can collect logs, metrics, and information about OADP custom resources by using the must-gather tool.
The must-gather data must be attached to all customer cases.
You must be logged in to the OKD cluster as a user with the cluster-admin role.
You must have the OpenShift CLI (oc) installed.
Navigate to the directory where you want to store the must-gather data.
Run the oc adm must-gather command for one of the following data collection options:
$ oc adm must-gather --image=registry.redhat.io/oadp/oadp-mustgather-rhel8:v1.1The data is saved as must-gather/must-gather.tar.gz. You can upload this file to a support case on the Red Hat Customer Portal.
$ oc adm must-gather --image=registry.redhat.io/oadp/oadp-mustgather-rhel8:v1.1 \
  -- /usr/bin/gather_metrics_dumpThis operation can take a long time. The data is saved as must-gather/metrics/prom_data.tar.gz.
You can view the metrics data with the Prometheus console.
Decompress the prom_data.tar.gz file:
$ tar -xvzf must-gather/metrics/prom_data.tar.gzCreate a local Prometheus instance:
$ make prometheus-runThe command outputs the Prometheus URL.
Started Prometheus on http://localhost:9090Launch a web browser and navigate to the URL to view the data by using the Prometheus web console.
After you have viewed the data, delete the Prometheus instance and data:
$ make prometheus-cleanup