$ oc -n openstack get monitoringstacks metric-storage -o yaml
You can correlate observability metrics for clusters that run on Red Hat OpenStack Services on OpenShift (RHOSO). By collecting metrics from both environments, you can monitor and troubleshoot issues across the infrastructure and application layers.
There are two supported methods for metric correlation for clusters that run on RHOSO:
Remote writing to an external Prometheus instance.
Collecting data from the OKD federation endpoint to the RHOSO observability stack.
Use remote write with both Red Hat OpenStack Services on OpenShift (RHOSO) and OKD to push their metrics to an external Prometheus instance.
You have access to an external Prometheus instance.
You have administrative access to RHOSO and your cluster.
You have certificates for secure communication with mTLS.
Your Prometheus instance is configured for client TLS certificates and has been set up as a remote write receiver.
The Cluster Observability Operator is installed on your RHOSO cluster.
The monitoring stack for your RHOSO cluster is configured to collect the metrics that you are interested in.
Telemetry is enabled in the RHOSO environment.
To verify that the telemetry service is operating normally, entering the following command:
The |
Configure your RHOSO management cluster to send metrics to Prometheus:
Create a secret that is named mtls-bundle
in the openstack
namespace that contains HTTPS client certificates for authentication to Prometheus by entering the following command:
$ oc --namespace openstack \
create secret generic mtls-bundle \
--from-file=./ca.crt \
--from-file=osp-client.crt \
--from-file=osp-client.key
Open the controlplane
configuration for editing by running the following command:
$ oc -n openstack edit openstackcontrolplane/controlplane
With the configuration open, replace the .spec.telemetry.template.metricStorage
section so that RHOSO sends metrics to Prometheus. As an example:
metricStorage:
customMonitoringStack:
alertmanagerConfig:
disabled: false
logLevel: info
prometheusConfig:
scrapeInterval: 30s
remoteWrite:
- url: https://external-prometheus.example.com/api/v1/write (1)
tlsConfig:
ca:
secret:
name: mtls-bundle
key: ca.crt
cert:
secret:
name: mtls-bundle
key: ocp-client.crt
keySecret:
name: mtls-bundle
key: ocp-client.key
replicas: 2
resourceSelector:
matchLabels:
service: metricStorage
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 100m
memory: 256Mi
retention: 1d (2)
dashboardsEnabled: false
dataplaneNetwork: ctlplane
enabled: true
prometheusTls: {}
1 | Replace this URL with the URL of your Prometheus instance. |
2 | Set a retention period. Optionally, you can reduce retention for local metrics because of external collection. |
Configure the tenant cluster on which your workloads run to send metrics to Prometheus:
Create a cluster monitoring config map as a YAML file. The map must include a remote write configuration and cluster identifiers. As an example:
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
prometheusK8s:
retention: 1d (1)
remoteWrite:
- url: "https://external-prometheus.example.com/api/v1/write"
writeRelabelConfigs:
- sourceLabels:
- __tmp_openshift_cluster_id__
targetLabel: cluster_id
action: replace
tlsConfig:
ca:
secret:
name: mtls-bundle
key: ca.crt
cert:
secret:
name: mtls-bundle
key: ocp-client.crt
keySecret:
name: mtls-bundle
key: ocp-client.key
1 | Set a retention period. Optionally, you can reduce retention for local metrics because of external collection. |
Save the config map as a file called cluster-monitoring-config.yaml
.
Create a secret that is named mtls-bundle
in the openshift-monitoring
namespace that contains HTTPS client certificates for authentication to Prometheus by entering the following command:
$ oc --namespace openshift-monitoring \
create secret generic mtls-bundle \
--from-file=./ca.crt \
--from-file=ocp-client.crt \
--from-file=ocp-client.key
Apply the cluster monitoring configuration by running the following command:
$ oc apply -f cluster-monitoring-config.yaml
After the changes propagate, you can see aggregated metrics in your external Prometheus instance.
You can employ the federation endpoint of your OKD cluster to make metrics available to a Red Hat OpenStack Services on OpenShift (RHOSO) cluster to practice pull-based monitoring.
You have administrative access to RHOSO and the tenant cluster that is running on it.
Telemetry is enabled in the RHOSO environment.
The Cluster Observability Operator is installed on your cluster.
The monitoring stack for your cluster is configured.
Your cluster has its federation endpoint exposed.
Connect to your cluster by using a username and password; do not log in by using a kubeconfig
file that was generated by the installation program.
To retrieve a token from the OKD cluster, run the following command on it:
$ oc whoami -t
Make the token available as a secret in the openstack
namespace in the RHOSO management cluster by running the following command:
$ oc -n openstack create secret generic ocp-federated --from-literal=token=<the_token_fetched_previously>
To get the Prometheus federation route URL from your OKD cluster, run the following command:
$ oc -n openshift-monitoring get route prometheus-k8s-federate -ojsonpath={'.status.ingress[].host'}
Write a manifest for a scrape configuration and save it as a file called cluster-scrape-config.yaml
. As an example:
apiVersion: monitoring.rhobs/v1alpha1
kind: ScrapeConfig
metadata:
labels:
service: metricStorage
name: sos1-federated
namespace: openstack
spec:
params:
'match[]':
- '{__name__=~"kube_node_info|kube_persistentvolume_info|cluster:master_nodes"}' (1)
metricsPath: '/federate'
authorization:
type: Bearer
credentials:
name: ocp-federated (2)
key: token
scheme: HTTPS # or HTTP
scrapeInterval: 30s (3)
staticConfigs:
- targets:
- prometheus-k8s-federate-openshift-monitoring.apps.openshift.example (4)
1 | Add metrics here. In this example, only the metrics kube_node_info , kube_persistentvolume_info , and cluster:master_nodes are requested. |
2 | Insert the previously generated secret name here. |
3 | Limit scraping to fewer than 1000 samples for each request with a maximum frequency of once every 30 seconds. |
4 | Insert the URL you fetched previously here. If the endpoint is HTTPS and uses a custom certificate authority, add a tlsConfig section after it. |
While connected to the RHOSO management cluster, apply the manifest by running the following command:
$ oc apply -f cluster-scrape-config.yaml
After the config propagates, the cluster metrics are accessible for querying in the OKD UI in RHOSO.
To query metrics and identifying resources across the stack, there are helper metrics that establish a correlation between Red Hat OpenStack Services on OpenShift (RHOSO) infrastructure resources and their representations in the tenant OKD cluster.
To map nodes with RHOSO compute instances, in the metric kube_node_info
:
node
is the Kubernetes node name.
provider_id
contains the identifier of the corresponding compute service instance.
To map persistent volumes with RHOSO block storage or shared filesystems shares, in the metric kube_persistentvolume_info
:
persistentvolume
is the volume name.
csi_volume_handle
is the block storage volume or share identifier.
By default, the compute machines that back the cluster control plane nodes are created in a server group with a soft anti-affinity policy. As a result, the compute service creates them on separate hypervisors on a best-effort basis. However, if the state of the RHOSO cluster is not appropriate for this distribution, the machines are created anyway.
In combination with the default soft anti-affinity policy, you can configure an alert that activates when a hypervisor hosts more than one control plane node of a given cluster to highlight the degraded level of high availability.
As an example, this PromQL query returns the number of OKD master nodes per OpenStack host:
sum by (vm_instance) (
group by (vm_instance, resource) (ceilometer_cpu)
/ on (resource) group_right(vm_instance) (
group by (node, resource) (
label_replace(kube_node_info, "resource", "$1", "system_uuid", "(.+)")
)
/ on (node) group_left group by (node) (
cluster:master_nodes
)
)
)