This is a cache of https://docs.okd.io/4.14/observability/monitoring/configuring-core-platform-monitoring/storing-and-recording-data.html. It is a snapshot of the page at 2025-03-03T06:13:05.053+0000.
Storing and recording data - Monitoring | Observability | OKD 4.14
×

Store and record your metrics and alerting data, configure logs to specify which activities are recorded, control how long Prometheus retains stored data, and set the maximum amount of disk space for the data. These actions help you protect your data and use them for troubleshooting.

Configuring persistent storage

Run cluster monitoring with persistent storage to gain the following benefits:

  • Protect your metrics and alerting data from data loss by storing them in a persistent volume (PV). As a result, they can survive pods being restarted or recreated.

  • Avoid getting duplicate notifications and losing silences for alerts when the Alertmanager pods are restarted.

For production environments, it is highly recommended to configure persistent storage.

In multi-node clusters, you must configure persistent storage for Prometheus, Alertmanager, and Thanos Ruler to ensure high availability.

Persistent storage prerequisites

  • Dedicate sufficient persistent storage to ensure that the disk does not become full.

  • Use Filesystem as the storage type value for the volumeMode parameter when you configure the persistent volume.

    • Do not use a raw block volume, which is described with volumeMode: Block in the PersistentVolume resource. Prometheus cannot use raw block volumes.

    • Prometheus does not support file systems that are not POSIX compliant. For example, some NFS file system implementations are not POSIX compliant. If you want to use an NFS file system for storage, verify with the vendor that their NFS implementation is fully POSIX compliant.

Configuring a persistent volume claim

To use a persistent volume (PV) for monitoring components, you must configure a persistent volume claim (PVC).

Prerequisites
  • You have access to the cluster as a user with the cluster-admin cluster role.

  • You have created the cluster-monitoring-config ConfigMap object.

  • You have installed the OpenShift CLI (oc).

Procedure
  1. Edit the cluster-monitoring-config config map in the openshift-monitoring project:

    $ oc -n openshift-monitoring edit configmap cluster-monitoring-config
  2. Add your PVC configuration for the component under data/config.yaml:

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: cluster-monitoring-config
      namespace: openshift-monitoring
    data:
      config.yaml: |
        <component>: (1)
          volumeClaimTemplate:
            spec:
              storageClassName: <storage_class> (2)
              resources:
                requests:
                  storage: <amount_of_storage> (3)
    1 Specify the monitoring component for which you want to configure the PVC.
    2 Specify an existing storage class. If a storage class is not specified, the default storage class is used.
    3 Specify the amount of required storage.

    The following example configures a PVC that claims persistent storage for Prometheus:

    Example PVC configuration
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: cluster-monitoring-config
      namespace: openshift-monitoring
    data:
      config.yaml: |
        prometheusK8s:
          volumeClaimTemplate:
            spec:
              storageClassName: my-storage-class
              resources:
                requests:
                  storage: 40Gi
  3. Save the file to apply the changes. The pods affected by the new configuration are automatically redeployed and the new storage configuration is applied.

    When you update the config map with a PVC configuration, the affected StatefulSet object is recreated, resulting in a temporary service outage.

Additional resources

Resizing a persistent volume

You can resize a persistent volume (PV) for monitoring components, such as Prometheus or Alertmanager. You need to manually expand a persistent volume claim (PVC), and then update the config map in which the component is configured.

You can only expand the size of the PVC. Shrinking the storage size is not possible.

Prerequisites
  • You have access to the cluster as a user with the cluster-admin cluster role.

  • You have created the cluster-monitoring-config ConfigMap object.

  • You have configured at least one PVC for core OKD monitoring components.

  • You have installed the OpenShift CLI (oc).

Procedure
  1. Manually expand a PVC with the updated storage request. For more information, see "Expanding persistent volume claims (PVCs) with a file system" in Expanding persistent volumes.

  2. Edit the cluster-monitoring-config config map in the openshift-monitoring project:

    $ oc -n openshift-monitoring edit configmap cluster-monitoring-config
  3. Add a new storage size for the PVC configuration for the component under data/config.yaml:

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: cluster-monitoring-config
      namespace: openshift-monitoring
    data:
      config.yaml: |
        <component>: (1)
          volumeClaimTemplate:
            spec:
              resources:
                requests:
                  storage: <amount_of_storage> (2)
    1 The component for which you want to change the storage size.
    2 Specify the new size for the storage volume. It must be greater than the previous value.

    The following example sets the new PVC request to 100 gigabytes for the Prometheus instance:

    Example storage configuration for prometheusK8s
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: cluster-monitoring-config
      namespace: openshift-monitoring
    data:
      config.yaml: |
        prometheusK8s:
          volumeClaimTemplate:
            spec:
              resources:
                requests:
                  storage: 100Gi
  4. Save the file to apply the changes. The pods affected by the new configuration are automatically redeployed.

    When you update the config map with a new storage size, the affected StatefulSet object is recreated, resulting in a temporary service outage.

Modifying retention time and size for Prometheus metrics data

By default, Prometheus retains metrics data for 15 days for core platform monitoring. You can modify the retention time for the Prometheus instance to change when the data is deleted. You can also set the maximum amount of disk space the retained metrics data uses.

Data compaction occurs every two hours. Therefore, a persistent volume (PV) might fill up before compaction, potentially exceeding the retentionSize limit. In such cases, the KubePersistentVolumeFillingUp alert fires until the space on a PV is lower than the retentionSize limit.

Prerequisites
  • You have access to the cluster as a user with the cluster-admin cluster role.

  • You have created the cluster-monitoring-config ConfigMap object.

  • You have installed the OpenShift CLI (oc).

Procedure
  1. Edit the cluster-monitoring-config config map in the openshift-monitoring project:

    $ oc -n openshift-monitoring edit configmap cluster-monitoring-config
  2. Add the retention time and size configuration under data/config.yaml:

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: cluster-monitoring-config
      namespace: openshift-monitoring
    data:
      config.yaml: |
        prometheusK8s:
          retention: <time_specification> (1)
          retentionSize: <size_specification> (2)
    1 The retention time: a number directly followed by ms (milliseconds), s (seconds), m (minutes), h (hours), d (days), w (weeks), or y (years). You can also combine time values for specific times, such as 1h30m15s.
    2 The retention size: a number directly followed by B (bytes), KB (kilobytes), MB (megabytes), GB (gigabytes), TB (terabytes), PB (petabytes), and EB (exabytes).

    The following example sets the retention time to 24 hours and the retention size to 10 gigabytes for the Prometheus instance:

    Example of setting retention time for Prometheus
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: cluster-monitoring-config
      namespace: openshift-monitoring
    data:
      config.yaml: |
        prometheusK8s:
          retention: 24h
          retentionSize: 10GB
  3. Save the file to apply the changes. The pods affected by the new configuration are automatically redeployed.

Configuring a dedicated service monitor

You can configure OKD core platform monitoring to use dedicated service monitors to collect metrics for the resource metrics pipeline.

When enabled, a dedicated service monitor exposes two additional metrics from the kubelet endpoint and sets the value of the honorTimestamps field to true.

By enabling a dedicated service monitor, you can improve the consistency of Prometheus Adapter-based CPU usage measurements used by, for example, the oc adm top pod command or the Horizontal Pod Autoscaler.

Enabling a dedicated service monitor

You can configure core platform monitoring to use a dedicated service monitor by configuring the dedicatedserviceMonitors key in the cluster-monitoring-config ConfigMap object in the openshift-monitoring namespace.

Prerequisites
  • You have installed the OpenShift CLI (oc).

  • You have access to the cluster as a user with the cluster-admin cluster role.

  • You have created the cluster-monitoring-config ConfigMap object.

Procedure
  1. Edit the cluster-monitoring-config ConfigMap object in the openshift-monitoring namespace:

    $ oc -n openshift-monitoring edit configmap cluster-monitoring-config
  2. Add an enabled: true key-value pair as shown in the following sample:

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: cluster-monitoring-config
      namespace: openshift-monitoring
    data:
      config.yaml: |
        k8sPrometheusAdapter:
          dedicatedserviceMonitors:
            enabled: true (1)
    1 Set the value of the enabled field to true to deploy a dedicated service monitor that exposes the kubelet /metrics/resource endpoint.
  3. Save the file to apply the changes automatically.

    When you save changes to a cluster-monitoring-config config map, the pods and other resources in the openshift-monitoring project might be redeployed. The running monitoring processes in that project might also restart.

Setting audit log levels for the Prometheus Adapter

In default platform monitoring, you can configure the audit log level for the Prometheus Adapter.

Prerequisites
  • You have installed the OpenShift CLI (oc).

  • You have access to the cluster as a user with the cluster-admin cluster role.

  • You have created the cluster-monitoring-config ConfigMap object.

Procedure

You can set an audit log level for the Prometheus Adapter in the default openshift-monitoring project:

  1. Edit the cluster-monitoring-config ConfigMap object in the openshift-monitoring project:

    $ oc -n openshift-monitoring edit configmap cluster-monitoring-config
  2. Add profile: in the k8sPrometheusAdapter/audit section under data/config.yaml:

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: cluster-monitoring-config
      namespace: openshift-monitoring
    data:
      config.yaml: |
        k8sPrometheusAdapter:
          audit:
            profile: <audit_log_level> (1)
    1 The audit log level to apply to the Prometheus Adapter.
  3. Set the audit log level by using one of the following values for the profile: parameter:

    • None: Do not log events.

    • Metadata: Log only the metadata for the request, such as user, timestamp, and so forth. Do not log the request text and the response text. Metadata is the default audit log level.

    • Request: Log only the metadata and the request text but not the response text. This option does not apply for non-resource requests.

    • RequestResponse: Log event metadata, request text, and response text. This option does not apply for non-resource requests.

  4. Save the file to apply the changes. The pods affected by the new configuration are automatically redeployed.

Verification
  1. In the config map, under k8sPrometheusAdapter/audit/profile, set the log level to Request and save the file.

  2. Confirm that the pods for the Prometheus Adapter are running. The following example lists the status of pods in the openshift-monitoring project:

    $ oc -n openshift-monitoring get pods
  3. Confirm that the audit log level and audit log file path are correctly configured:

    $ oc -n openshift-monitoring get deploy prometheus-adapter -o yaml
    Example output
    ...
      - --audit-policy-file=/etc/audit/request-profile.yaml
      - --audit-log-path=/var/log/adapter/audit.log
  4. Confirm that the correct log level has been applied in the prometheus-adapter deployment in the openshift-monitoring project:

    $ oc -n openshift-monitoring exec deploy/prometheus-adapter -c prometheus-adapter -- cat /etc/audit/request-profile.yaml
    Example output
    "apiVersion": "audit.k8s.io/v1"
    "kind": "Policy"
    "metadata":
      "name": "Request"
    "omitStages":
    - "RequestReceived"
    "rules":
    - "level": "Request"

    If you enter an unrecognized profile value for the Prometheus Adapter in the ConfigMap object, no changes are made to the Prometheus Adapter, and an error is logged by the Cluster Monitoring Operator.

  5. Review the audit log for the Prometheus Adapter:

    $ oc -n openshift-monitoring exec -c <prometheus_adapter_pod_name> -- cat /var/log/adapter/audit.log

Setting log levels for monitoring components

You can configure the log level for Alertmanager, Prometheus Operator, Prometheus, and Thanos Querier.

The following log levels can be applied to the relevant component in the cluster-monitoring-config ConfigMap object:

  • debug. Log debug, informational, warning, and error messages.

  • info. Log informational, warning, and error messages.

  • warn. Log warning and error messages only.

  • error. Log error messages only.

The default log level is info.

Prerequisites
  • You have access to the cluster as a user with the cluster-admin cluster role.

  • You have created the cluster-monitoring-config ConfigMap object.

  • You have installed the OpenShift CLI (oc).

Procedure
  1. Edit the cluster-monitoring-config config map in the openshift-monitoring project:

    $ oc -n openshift-monitoring edit configmap cluster-monitoring-config
  2. Add logLevel: <log_level> for a component under data/config.yaml:

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: cluster-monitoring-config
      namespace: openshift-monitoring
    data:
      config.yaml: |
        <component>: (1)
          logLevel: <log_level> (2)
    1 The monitoring stack component for which you are setting a log level. Available component values are prometheusK8s, alertmanagerMain, prometheusOperator, and thanosQuerier.
    2 The log level to set for the component. The available values are error, warn, info, and debug. The default value is info.
  3. Save the file to apply the changes. The pods affected by the new configuration are automatically redeployed.

  4. Confirm that the log level has been applied by reviewing the deployment or pod configuration in the related project. The following example checks the log level for the prometheus-operator deployment:

    $ oc -n openshift-monitoring get deploy prometheus-operator -o yaml | grep "log-level"
    Example output
            - --log-level=debug
  5. Check that the pods for the component are running. The following example lists the status of pods:

    $ oc -n openshift-monitoring get pods

    If an unrecognized logLevel value is included in the ConfigMap object, the pods for the component might not restart successfully.

Enabling the query log file for Prometheus

You can configure Prometheus to write all queries that have been run by the engine to a log file.

Because log rotation is not supported, only enable this feature temporarily when you need to troubleshoot an issue. After you finish troubleshooting, disable query logging by reverting the changes you made to the ConfigMap object to enable the feature.

Prerequisites
  • You have access to the cluster as a user with the cluster-admin cluster role.

  • You have created the cluster-monitoring-config ConfigMap object.

  • You have installed the OpenShift CLI (oc).

Procedure
  1. Edit the cluster-monitoring-config config map in the openshift-monitoring project:

    $ oc -n openshift-monitoring edit configmap cluster-monitoring-config
  2. Add the queryLogFile parameter for Prometheus under data/config.yaml:

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: cluster-monitoring-config
      namespace: openshift-monitoring
    data:
      config.yaml: |
        prometheusK8s:
          queryLogFile: <path> (1)
    1 Add the full path to the file in which queries will be logged.
  3. Save the file to apply the changes. The pods affected by the new configuration are automatically redeployed.

  4. Verify that the pods for the component are running. The following sample command lists the status of pods:

    $ oc -n openshift-monitoring get pods
    Example output
    ...
    prometheus-operator-567c9bc75c-96wkj   2/2     Running   0          62m
    prometheus-k8s-0                       6/6     Running   1          57m
    prometheus-k8s-1                       6/6     Running   1          57m
    thanos-querier-56c76d7df4-2xkpc        6/6     Running   0          57m
    thanos-querier-56c76d7df4-j5p29        6/6     Running   0          57m
    ...
  5. Read the query log:

    $ oc -n openshift-monitoring exec prometheus-k8s-0 -- cat <path>

    Revert the setting in the config map after you have examined the logged query information.

Enabling query logging for Thanos Querier

For default platform monitoring in the openshift-monitoring project, you can enable the Cluster Monitoring Operator (CMO) to log all queries run by Thanos Querier.

Because log rotation is not supported, only enable this feature temporarily when you need to troubleshoot an issue. After you finish troubleshooting, disable query logging by reverting the changes you made to the ConfigMap object to enable the feature.

Prerequisites
  • You have installed the OpenShift CLI (oc).

  • You have access to the cluster as a user with the cluster-admin cluster role.

  • You have created the cluster-monitoring-config ConfigMap object.

Procedure

You can enable query logging for Thanos Querier in the openshift-monitoring project:

  1. Edit the cluster-monitoring-config ConfigMap object in the openshift-monitoring project:

    $ oc -n openshift-monitoring edit configmap cluster-monitoring-config
  2. Add a thanosQuerier section under data/config.yaml and add values as shown in the following example:

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: cluster-monitoring-config
      namespace: openshift-monitoring
    data:
      config.yaml: |
        thanosQuerier:
          enableRequestLogging: <value> (1)
          logLevel: <value> (2)
    1 Set the value to true to enable logging and false to disable logging. The default value is false.
    2 Set the value to debug, info, warn, or error. If no value exists for logLevel, the log level defaults to error.
  3. Save the file to apply the changes. The pods affected by the new configuration are automatically redeployed.

Verification
  1. Verify that the Thanos Querier pods are running. The following sample command lists the status of pods in the openshift-monitoring project:

    $ oc -n openshift-monitoring get pods
  2. Run a test query using the following sample commands as a model:

    $ token=`oc create token prometheus-k8s -n openshift-monitoring`
    $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=cluster_version'
  3. Run the following command to read the query log:

    $ oc -n openshift-monitoring logs <thanos_querier_pod_name> -c thanos-query

    Because the thanos-querier pods are highly available (HA) pods, you might be able to see logs in only one pod.

  4. After you examine the logged query information, disable query logging by changing the enableRequestLogging value to false in the config map.