$ oc -n openshift-monitoring edit configmap cluster-monitoring-config
Store and record your metrics and alerting data, configure logs to specify which activities are recorded, control how long Prometheus retains stored data, and set the maximum amount of disk space for the data. These actions help you protect your data and use them for troubleshooting.
Run cluster monitoring with persistent storage to gain the following benefits:
Protect your metrics and alerting data from data loss by storing them in a persistent volume (PV). As a result, they can survive pods being restarted or recreated.
Avoid getting duplicate notifications and losing silences for alerts when the Alertmanager pods are restarted.
For production environments, it is highly recommended to configure persistent storage.
In multi-node clusters, you must configure persistent storage for Prometheus, Alertmanager, and Thanos Ruler to ensure high availability. |
Dedicate sufficient persistent storage to ensure that the disk does not become full.
Use Filesystem
as the storage type value for the volumeMode
parameter when you configure the persistent volume.
|
To use a persistent volume (PV) for monitoring components, you must configure a persistent volume claim (PVC).
You have access to the cluster as a user with the cluster-admin
cluster role.
You have created the cluster-monitoring-config
ConfigMap
object.
You have installed the OpenShift CLI (oc
).
Edit the cluster-monitoring-config
config map in the openshift-monitoring
project:
$ oc -n openshift-monitoring edit configmap cluster-monitoring-config
Add your PVC configuration for the component under data/config.yaml
:
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
<component>: (1)
volumeClaimTemplate:
spec:
storageClassName: <storage_class> (2)
resources:
requests:
storage: <amount_of_storage> (3)
1 | Specify the monitoring component for which you want to configure the PVC. |
2 | Specify an existing storage class. If a storage class is not specified, the default storage class is used. |
3 | Specify the amount of required storage. |
The following example configures a PVC that claims persistent storage for Prometheus:
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
prometheusK8s:
volumeClaimTemplate:
spec:
storageClassName: my-storage-class
resources:
requests:
storage: 40Gi
Save the file to apply the changes. The pods affected by the new configuration are automatically redeployed and the new storage configuration is applied.
When you update the config map with a PVC configuration, the affected |
PersistentVolumeClaims (Kubernetes documentation)
You can resize a persistent volume (PV) for monitoring components, such as Prometheus or Alertmanager. You need to manually expand a persistent volume claim (PVC), and then update the config map in which the component is configured.
You can only expand the size of the PVC. Shrinking the storage size is not possible. |
You have access to the cluster as a user with the cluster-admin
cluster role.
You have created the cluster-monitoring-config
ConfigMap
object.
You have configured at least one PVC for core OKD monitoring components.
You have installed the OpenShift CLI (oc
).
Manually expand a PVC with the updated storage request. For more information, see "Expanding persistent volume claims (PVCs) with a file system" in Expanding persistent volumes.
Edit the cluster-monitoring-config
config map in the openshift-monitoring
project:
$ oc -n openshift-monitoring edit configmap cluster-monitoring-config
Add a new storage size for the PVC configuration for the component under data/config.yaml
:
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
<component>: (1)
volumeClaimTemplate:
spec:
resources:
requests:
storage: <amount_of_storage> (2)
1 | The component for which you want to change the storage size. |
2 | Specify the new size for the storage volume. It must be greater than the previous value. |
The following example sets the new PVC request to 100 gigabytes for the Prometheus instance:
prometheusK8s
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
prometheusK8s:
volumeClaimTemplate:
spec:
resources:
requests:
storage: 100Gi
Save the file to apply the changes. The pods affected by the new configuration are automatically redeployed.
When you update the config map with a new storage size, the affected |
By default, Prometheus retains metrics data for 15 days for core platform monitoring. You can modify the retention time for the Prometheus instance to change when the data is deleted. You can also set the maximum amount of disk space the retained metrics data uses.
Data compaction occurs every two hours. Therefore, a persistent volume (PV) might fill up before compaction, potentially exceeding the |
You have access to the cluster as a user with the cluster-admin
cluster role.
You have created the cluster-monitoring-config
ConfigMap
object.
You have installed the OpenShift CLI (oc
).
Edit the cluster-monitoring-config
config map in the openshift-monitoring
project:
$ oc -n openshift-monitoring edit configmap cluster-monitoring-config
Add the retention time and size configuration under data/config.yaml
:
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
prometheusK8s:
retention: <time_specification> (1)
retentionSize: <size_specification> (2)
1 | The retention time: a number directly followed by ms (milliseconds), s (seconds), m (minutes), h (hours), d (days), w (weeks), or y (years). You can also combine time values for specific times, such as 1h30m15s . |
2 | The retention size: a number directly followed by B (bytes), KB (kilobytes), MB (megabytes), GB (gigabytes), TB (terabytes), PB (petabytes), and EB (exabytes). |
The following example sets the retention time to 24 hours and the retention size to 10 gigabytes for the Prometheus instance:
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
prometheusK8s:
retention: 24h
retentionSize: 10GB
Save the file to apply the changes. The pods affected by the new configuration are automatically redeployed.
You can configure audit logs for Metrics Server to help you troubleshoot issues with the server. Audit logs record the sequence of actions in a cluster. It can record user, application, or control plane activities.
You can set audit log rules, which determine what events are recorded and what data they should include. This can be achieved with the following audit profiles:
Metadata (default): This profile enables the logging of event metadata including user, timestamps, resource, and verb. It does not record request and response bodies.
Request: This enables the logging of event metadata and request body, but it does not record response body. This configuration does not apply for non-resource requests.
RequestResponse: This enables the logging of event metadata, and request and response bodies. This configuration does not apply for non-resource requests.
None: None of the previously described events are recorded.
You can configure the audit profiles by modifying the cluster-monitoring-config
config map.
The following example sets the profile to Request
, allowing the logging of event metadata and request body for Metrics Server:
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
metricsServer:
audit:
profile: Request
You can configure the log level for Alertmanager, Prometheus Operator, Prometheus, and Thanos Querier.
The following log levels can be applied to the relevant component in the cluster-monitoring-config
ConfigMap
object:
debug
. Log debug, informational, warning, and error messages.
info
. Log informational, warning, and error messages.
warn
. Log warning and error messages only.
error
. Log error messages only.
The default log level is info
.
You have access to the cluster as a user with the cluster-admin
cluster role.
You have created the cluster-monitoring-config
ConfigMap
object.
You have installed the OpenShift CLI (oc
).
Edit the cluster-monitoring-config
config map in the openshift-monitoring
project:
$ oc -n openshift-monitoring edit configmap cluster-monitoring-config
Add logLevel: <log_level>
for a component under data/config.yaml
:
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
<component>: (1)
logLevel: <log_level> (2)
1 | The monitoring stack component for which you are setting a log level.
Available component values are prometheusK8s , alertmanagerMain , prometheusOperator , and thanosQuerier . |
2 | The log level to set for the component.
The available values are error , warn , info , and debug .
The default value is info . |
Save the file to apply the changes. The pods affected by the new configuration are automatically redeployed.
Confirm that the log level has been applied by reviewing the deployment or pod configuration in the related project.
The following example checks the log level for the prometheus-operator
deployment:
$ oc -n openshift-monitoring get deploy prometheus-operator -o yaml | grep "log-level"
- --log-level=debug
Check that the pods for the component are running. The following example lists the status of pods:
$ oc -n openshift-monitoring get pods
If an unrecognized |
You can configure Prometheus to write all queries that have been run by the engine to a log file.
Because log rotation is not supported, only enable this feature temporarily when you need to troubleshoot an issue. After you finish troubleshooting, disable query logging by reverting the changes you made to the |
You have access to the cluster as a user with the cluster-admin
cluster role.
You have created the cluster-monitoring-config
ConfigMap
object.
You have installed the OpenShift CLI (oc
).
Edit the cluster-monitoring-config
config map in the openshift-monitoring
project:
$ oc -n openshift-monitoring edit configmap cluster-monitoring-config
Add the queryLogFile
parameter for Prometheus under data/config.yaml
:
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
prometheusK8s:
queryLogFile: <path> (1)
1 | Add the full path to the file in which queries will be logged. |
Save the file to apply the changes. The pods affected by the new configuration are automatically redeployed.
Verify that the pods for the component are running. The following sample command lists the status of pods:
$ oc -n openshift-monitoring get pods
...
prometheus-operator-567c9bc75c-96wkj 2/2 Running 0 62m
prometheus-k8s-0 6/6 Running 1 57m
prometheus-k8s-1 6/6 Running 1 57m
thanos-querier-56c76d7df4-2xkpc 6/6 Running 0 57m
thanos-querier-56c76d7df4-j5p29 6/6 Running 0 57m
...
Read the query log:
$ oc -n openshift-monitoring exec prometheus-k8s-0 -- cat <path>
Revert the setting in the config map after you have examined the logged query information. |
For default platform monitoring in the openshift-monitoring
project, you can enable the Cluster Monitoring Operator (CMO) to log all queries run by Thanos Querier.
Because log rotation is not supported, only enable this feature temporarily when you need to troubleshoot an issue. After you finish troubleshooting, disable query logging by reverting the changes you made to the |
You have installed the OpenShift CLI (oc
).
You have access to the cluster as a user with the cluster-admin
cluster role.
You have created the cluster-monitoring-config
ConfigMap
object.
You can enable query logging for Thanos Querier in the openshift-monitoring
project:
Edit the cluster-monitoring-config
ConfigMap
object in the openshift-monitoring
project:
$ oc -n openshift-monitoring edit configmap cluster-monitoring-config
Add a thanosQuerier
section under data/config.yaml
and add values as shown in the following example:
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
thanosQuerier:
enableRequestLogging: <value> (1)
logLevel: <value> (2)
1 | Set the value to true to enable logging and false to disable logging. The default value is false . |
2 | Set the value to debug , info , warn , or error . If no value exists for logLevel , the log level defaults to error . |
Save the file to apply the changes. The pods affected by the new configuration are automatically redeployed.
Verify that the Thanos Querier pods are running. The following sample command lists the status of pods in the openshift-monitoring
project:
$ oc -n openshift-monitoring get pods
Run a test query using the following sample commands as a model:
$ token=`oc create token prometheus-k8s -n openshift-monitoring`
$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=cluster_version'
Run the following command to read the query log:
$ oc -n openshift-monitoring logs <thanos_querier_pod_name> -c thanos-query
Because the |
After you examine the logged query information, disable query logging by changing the enableRequestLogging
value to false
in the config map.