This is a cache of https://docs.openshift.com/container-platform/4.16/observability/distr_tracing/distr_tracing_tempo/distr-tracing-tempo-configuring.html. It is a snapshot of the page at 2024-11-05T10:07:45.638+0000.
Configuring - Distributed tracing | Observability | OpenShift Container Platform 4.16
×

Configuring back-end storage

For information about configuring the back-end storage, see Understanding persistent storage and the relevant configuration section for your chosen storage option.

Introduction to TempoStack configuration parameters

The TempoStack custom resource (CR) defines the architecture and settings for creating the distributed tracing platform (Tempo) resources. You can modify these parameters to customize your implementation to your business needs.

Example TempoStack CR
apiVersion: tempo.grafana.com/v1alpha1 (1)
kind: TempoStack (2)
metadata: (3)
  name: <name> (4)
spec: (5)
  storage: {} (6)
  resources: {} (7)
  replicationFactor: 1 (8)
  retention: {} (9)
  template:
      distributor: {} (10)
      ingester: {} (11)
      compactor: {} (12)
      querier: {} (13)
      queryFrontend: {} (14)
      gateway: {} (15)
  limits: (16)
    global:
      ingestion: {} (17)
      query: {} (18)
  observability: (19)
    grafana: {}
    metrics: {}
    tracing: {}
  search: {} (20)
managementState: managed (21)
1 API version to use when creating the object.
2 Defines the kind of Kubernetes object to create.
3 Data that uniquely identifies the object, including a name string, UID, and optional namespace. OpenShift Container Platform automatically generates the UID and completes the namespace with the name of the project where the object is created.
4 Name of the TempoStack instance.
5 Contains all of the configuration parameters of the TempoStack instance. When a common definition for all Tempo components is required, define it in the spec section. When the definition relates to an individual component, place it in the spec.template.<component> section.
6 Storage is specified at instance deployment. See the installation page for information about storage options for the instance.
7 Defines the compute resources for the Tempo container.
8 Configuration for the replication factor.
9 Configuration options for retention of traces.
10 Configuration options for the Tempo distributor component.
11 Configuration options for the Tempo ingester component.
12 Configuration options for the Tempo compactor component.
13 Configuration options for the Tempo querier component.
14 Configuration options for the Tempo query-frontend component.
15 Configuration options for the Tempo gateway component.
16 Limits ingestion and query rates.
17 Defines ingestion rate limits.
18 Defines query rate limits.
19 Configures operands to handle telemetry data.
20 Configures search capabilities.
21 Defines whether or not this CR is managed by the Operator. The default value is managed.

Query configuration options

Two components of the distributed tracing platform (Tempo), the querier and query frontend, manage queries. You can configure both of these components.

The querier component finds the requested trace ID in the ingesters or back-end storage. Depending on the set parameters, the querier component can query both the ingesters and pull bloom or indexes from the back end to search blocks in object storage. The querier component exposes an HTTP endpoint at GET /querier/api/traces/<trace_id>, but it is not expected to be used directly. Queries must be sent to the query frontend.

Table 1. Configuration parameters for the querier component
Parameter Description Values

nodeSelector

The simple form of the node-selection constraint.

type: object

replicas

The number of replicas to be created for the component.

type: integer; format: int32

tolerations

Component-specific pod tolerations.

type: array

The query frontend component is responsible for sharding the search space for an incoming query. The query frontend exposes traces via a simple HTTP endpoint: GET /api/traces/<trace_id>. Internally, the query frontend component splits the blockID space into a configurable number of shards and then queues these requests. The querier component connects to the query frontend component via a streaming gRPC connection to process these sharded queries.

Table 2. Configuration parameters for the query frontend component
Parameter Description Values

component

Configuration of the query frontend component.

type: object

component.nodeSelector

The simple form of the node selection constraint.

type: object

component.replicas

The number of replicas to be created for the query frontend component.

type: integer; format: int32

component.tolerations

Pod tolerations specific to the query frontend component.

type: array

jaegerQuery

The options specific to the Jaeger Query component.

type: object

jaegerQuery.enabled

When enabled, creates the Jaeger Query component,jaegerQuery.

type: boolean

jaegerQuery.ingress

The options for the Jaeger Query ingress.

type: object

jaegerQuery.ingress.annotations

The annotations of the ingress object.

type: object

jaegerQuery.ingress.host

The hostname of the ingress object.

type: string

jaegerQuery.ingress.ingressClassName

The name of an IngressClass cluster resource. Defines which ingress controller serves this ingress resource.

type: string

jaegerQuery.ingress.route

The options for the OpenShift route.

type: object

jaegerQuery.ingress.route.termination

The termination type. The default is edge.

type: string (enum: insecure, edge, passthrough, reencrypt)

jaegerQuery.ingress.type

The type of ingress for the Jaeger Query UI. The supported types are ingress, route, and none.

type: string (enum: ingress, route)

jaegerQuery.monitorTab

The monitor tab configuration.

type: object

jaegerQuery.monitorTab.enabled

Enables the monitor tab in the Jaeger console. The PrometheusEndpoint must be configured.

type: boolean

jaegerQuery.monitorTab.prometheusEndpoint

The endpoint to the Prometheus instance that contains the span rate, error, and duration (RED) metrics. For example, https://thanos-querier.openshift-monitoring.svc.cluster.local:9091.

type: string

Example configuration of the query frontend component in a TempoStack CR
apiVersion: tempo.grafana.com/v1alpha1
kind: TempoStack
metadata:
  name: simplest
spec:
  storage:
    secret:
      name: minio
      type: s3
  storageSize: 200M
  resources:
    total:
      limits:
        memory: 2Gi
        cpu: 2000m
  template:
    queryFrontend:
      jaegerQuery:
        enabled: true
        ingress:
          route:
            termination: edge
          type: route

Configuration of the monitor tab in Jaeger UI

Trace data contains rich information, and the data is normalized across instrumented languages and frameworks. Therefore, request rate, error, and duration (RED) metrics can be extracted from traces. The metrics can be visualized in Jaeger console in the Monitor tab.

The metrics are derived from spans in the OpenTelemetry Collector that are scraped from the Collector by the Prometheus deployed in the user-workload monitoring stack. The Jaeger UI queries these metrics from the Prometheus endpoint and visualizes them.

OpenTelemetry Collector configuration

The OpenTelemetry Collector requires configuration of the spanmetrics connector that derives metrics from traces and exports the metrics in the Prometheus format.

OpenTelemetry Collector custom resource for span RED
kind: OpenTelemetryCollector
apiVersion: opentelemetry.io/v1alpha1
metadata:
  name: otel
spec:
  mode: deployment
  observability:
    metrics:
      enableMetrics: true (1)
  config: |
    connectors:
      spanmetrics: (2)
        metrics_flush_interval: 15s

    receivers:
      otlp: (3)
        protocols:
          grpc:
          http:

    exporters:
      prometheus: (4)
        endpoint: 0.0.0.0:8889
        add_metric_suffixes: false
        resource_to_telemetry_conversion:
          enabled: true # by default resource attributes are dropped

      otlp:
        endpoint: "tempo-simplest-distributor:4317"
        tls:
          insecure: true

    service:
      pipelines:
        traces:
          receivers: [otlp]
          exporters: [otlp, spanmetrics] (5)
        metrics:
          receivers: [spanmetrics] (6)
          exporters: [prometheus]
1 Creates the ServiceMonitor custom resource to enable scraping of the Prometheus exporter.
2 The Spanmetrics connector receives traces and exports metrics.
3 The OTLP receiver to receive spans in the OpenTelemetry protocol.
4 The Prometheus exporter is used to export metrics in the Prometheus format.
5 The Spanmetrics connector is configured as exporter in traces pipeline.
6 The Spanmetrics connector is configured as receiver in metrics pipeline.

Tempo configuration

The TempoStack custom resource must specify the following: the Monitor tab is enabled, and the Prometheus endpoint is set to the Thanos querier service to query the data from the user-defined monitoring stack.

TempoStack custom resource with the enabled Monitor tab
apiVersion: tempo.grafana.com/v1alpha1
kind: TempoStack
metadata:
  name: redmetrics
spec:
  storage:
    secret:
      name: minio-test
      type: s3
  storageSize: 1Gi
  template:
    gateway:
      enabled: false
    queryFrontend:
      jaegerQuery:
        enabled: true
        monitorTab:
          enabled: true (1)
          prometheusEndpoint: https://thanos-querier.openshift-monitoring.svc.cluster.local:9091 (2)
        ingress:
          type: route
1 Enables the monitoring tab in the Jaeger console.
2 The service name for Thanos Querier from user-workload monitoring.

Span RED metrics and alerting rules

The metrics generated by the spanmetrics connector are usable with alerting rules. For example, for alerts about a slow service or to define service level objectives (SLOs), the connector creates a duration_bucket histogram and the calls counter metric. These metrics have labels that identify the service, API name, operation type, and other attributes.

Table 3. Labels of the metrics created in the spanmetrics connector
Label Description Values

service_name

Service name set by the otel_service_name environment variable.

frontend

span_name

Name of the operation.

  • /

  • /customer

span_kind

Identifies the server, client, messaging, or internal operation.

  • SPAN_KIND_SERVER

  • SPAN_KIND_CLIENT

  • SPAN_KIND_PRODUCER

  • SPAN_KIND_CONSUMER

  • SPAN_KIND_INTERNAL

Example PrometheusRule CR that defines an alerting rule for SLO when not serving 95% of requests within 2000ms on the front-end service
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: span-red
spec:
  groups:
  - name: server-side-latency
    rules:
    - alert: SpanREDFrontendAPIRequestLatency
      expr: histogram_quantile(0.95, sum(rate(duration_bucket{service_name="frontend", span_kind="SPAN_KIND_SERVER"}[5m])) by (le, service_name, span_name)) > 2000 (1)
      labels:
        severity: Warning
      annotations:
        summary: "High request latency on {{$labels.service_name}} and {{$labels.span_name}}"
        description: "{{$labels.instance}} has 95th request latency above 2s (current value: {{$value}}s)"
1 The expression for checking if 95% of the front-end server response time values are below 2000 ms. The time range ([5m]) must be at least four times the scrape interval and long enough to accommodate a change in the metric.

Configuring the receiver TLS

The custom resource of your TempoStack or TempoMonolithic instance supports configuring the TLS for receivers by using user-provided certificates or OpenShift’s service serving certificates.

Receiver TLS configuration for a TempoStack instance

You can provide a TLS certificate in a secret or use the service serving certificates that are generated by OpenShift Container Platform.

  • To provide a TLS certificate in a secret, configure it in the TempoStack custom resource.

    This feature is not supported with the enabled Tempo Gateway.

    TLS for receivers and using a user-provided certificate in a secret
    apiVersion: tempo.grafana.com/v1alpha1
    kind:  TempoStack
    # ...
    spec:
    # ...
      template:
        distributor:
          tls:
            enabled: true (1)
            certName: <tls_secret> (2)
            caName: <ca_name> (3)
    # ...
    1 TLS enabled at the Tempo Distributor.
    2 Secret containing a tls.key key and tls.crt certificate that you apply in advance.
    3 Optional: CA in a config map to enable mutual TLS authentication (mTLS).
  • Alternatively, you can use the service serving certificates that are generated by OpenShift Container Platform.

    Mutual TLS authentication (mTLS) is not supported with this feature.

    TLS for receivers and using the service serving certificates that are generated by OpenShift Container Platform
    apiVersion: tempo.grafana.com/v1alpha1
    kind:  TempoStack
    # ...
    spec:
    # ...
      template:
        distributor:
          tls:
            enabled: true (1)
    # ...
    1 Sufficient configuration for the TLS at the Tempo Distributor.

Receiver TLS configuration for a TempoMonolithic instance

You can provide a TLS certificate in a secret or use the service serving certificates that are generated by OpenShift Container Platform.

  • To provide a TLS certificate in a secret, configure it in the TempoMonolithic custom resource.

    This feature is not supported with the enabled Tempo Gateway.

    TLS for receivers and using a user-provided certificate in a secret
    apiVersion: tempo.grafana.com/v1alpha1
    kind:  TempoMonolithic
    # ...
      spec:
    # ...
      ingestion:
        otlp:
          grpc:
            tls:
              enabled: true (1)
              certName: <tls_secret> (2)
              caName: <ca_name> (3)
    # ...
    1 TLS enabled at the Tempo Distributor.
    2 Secret containing a tls.key key and tls.crt certificate that you apply in advance.
    3 Optional: CA in a config map to enable mutual TLS authentication (mTLS).
  • Alternatively, you can use the service serving certificates that are generated by OpenShift Container Platform.

    Mutual TLS authentication (mTLS) is not supported with this feature.

    TLS for receivers and using the service serving certificates that are generated by OpenShift Container Platform
    apiVersion: tempo.grafana.com/v1alpha1
    kind:  TempoMonolithic
    # ...
      spec:
    # ...
      ingestion:
        otlp:
          grpc:
            tls:
              enabled: true
          http:
            tls:
              enabled: true (1)
    # ...
    1 Minimal configuration for the TLS at the Tempo Distributor.

Multitenancy

Multitenancy with authentication and authorization is provided in the Tempo Gateway service. The authentication uses OpenShift OAuth and the Kubernetes TokenReview API. The authorization uses the Kubernetes SubjectAccessReview API.

The Tempo Gateway service supports ingestion of traces only via the OTLP/gRPC. The OTLP/HTTP is not supported.

Sample Tempo CR with two tenants, dev and prod
apiVersion: tempo.grafana.com/v1alpha1
kind:  TempoStack
metadata:
  name: simplest
  namespace: chainsaw-multitenancy
spec:
  storage:
    secret:
      name: minio
      type: s3
  storageSize: 1Gi
  resources:
    total:
      limits:
        memory: 2Gi
        cpu: 2000m
  tenants:
    mode: openshift (1)
    authentication: (2)
      - tenantName: dev (3)
        tenantId: "1610b0c3-c509-4592-a256-a1871353dbfa" (4)
      - tenantName: prod
        tenantId: "1610b0c3-c509-4592-a256-a1871353dbfb"
  template:
    gateway:
      enabled: true (5)
    queryFrontend:
      jaegerQuery:
        enabled: true
1 Must be set to openshift.
2 The list of tenants.
3 The tenant name. Must be provided in the X-Scope-OrgId header when ingesting the data.
4 A unique tenant ID.
5 Enables a gateway that performs authentication and authorization. The Jaeger UI is exposed at http://<gateway-ingress>/api/traces/v1/<tenant-name>/search.

The authorization configuration uses the ClusterRole and ClusterRoleBinding of the Kubernetes Role-Based Access Control (RBAC). By default, no users have read or write permissions.

Sample of the read RBAC configuration that allows authenticated users to read the trace data of the dev and prod tenants
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: tempostack-traces-reader
rules:
  - apiGroups:
      - 'tempo.grafana.com'
    resources: (1)
      - dev
      - prod
    resourceNames:
      - traces
    verbs:
      - 'get' (2)
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: tempostack-traces-reader
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: tempostack-traces-reader
subjects:
  - kind: Group
    apiGroup: rbac.authorization.k8s.io
    name: system:authenticated (3)
1 Lists the tenants.
2 The get value enables the read operation.
3 Grants all authenticated users the read permissions for trace data.
Sample of the write RBAC configuration that allows the otel-collector service account to write the trace data for the dev tenant
apiVersion: v1
kind: ServiceAccount
metadata:
  name: otel-collector (1)
  namespace: otel
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: tempostack-traces-write
rules:
  - apiGroups:
      - 'tempo.grafana.com'
    resources: (2)
      - dev
    resourceNames:
      - traces
    verbs:
      - 'create' (3)
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: tempostack-traces
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: tempostack-traces-write
subjects:
  - kind: ServiceAccount
    name: otel-collector
    namespace: otel
1 The service account name for the client to use when exporting trace data. The client must send the service account token, /var/run/secrets/kubernetes.io/serviceaccount/token, as the bearer token header.
2 Lists the tenants.
3 The create value enables the write operation.

Trace data can be sent to the Tempo instance from the OpenTelemetry Collector that uses the service account with RBAC for writing the data.

Sample OpenTelemetry CR configuration
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: cluster-collector
  namespace: tracing-system
spec:
  mode: deployment
  serviceAccount: otel-collector
  config: |
      extensions:
        bearertokenauth:
          filename: "/var/run/secrets/kubernetes.io/serviceaccount/token"
      exporters:
        otlp/dev: (1)
          endpoint: tempo-simplest-gateway.tempo.svc.cluster.local:8090
          tls:
            insecure: false
            ca_file: "/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt"
          auth:
            authenticator: bearertokenauth
          headers:
            X-Scope-OrgID: "dev"
        otlphttp/dev: (2)
          endpoint: https://tempo-simplest-gateway.chainsaw-multitenancy.svc.cluster.local:8080/api/traces/v1/dev
          tls:
            insecure: false
            ca_file: "/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt"
          auth:
            authenticator: bearertokenauth
          headers:
            X-Scope-OrgID: "dev"
      service:
        extensions: [bearertokenauth]
        pipelines:
          traces:
            exporters: [otlp/dev] (3)
1 OTLP gRPC Exporter.
2 OTLP HTTP Exporter.
3 You can specify otlp/dev for the OTLP gRPC Exporter or otlphttp/dev for the OTLP HTTP Exporter.

Using taints and tolerations

Configuring monitoring and alerts

The Tempo Operator supports monitoring and alerts about each TempoStack component such as distributor, ingester, and so on, and exposes upgrade and operational metrics about the Operator itself.

Configuring the TempoStack metrics and alerts

You can enable metrics and alerts of TempoStack instances.

Prerequisites
  • Monitoring for user-defined projects is enabled in the cluster.

Procedure
  1. To enable metrics of a TempoStack instance, set the spec.observability.metrics.createServiceMonitors field to true:

    apiVersion: tempo.grafana.com/v1alpha1
    kind: TempoStack
    metadata:
      name: <name>
    spec:
      observability:
        metrics:
          createServiceMonitors: true
  2. To enable alerts for a TempoStack instance, set the spec.observability.metrics.createPrometheusRules field to true:

    apiVersion: tempo.grafana.com/v1alpha1
    kind: TempoStack
    metadata:
      name: <name>
    spec:
      observability:
        metrics:
          createPrometheusRules: true
Verification

You can use the Administrator view of the web console to verify successful configuration:

  1. Go to ObserveTargets, filter for Source: User, and check that ServiceMonitors in the format tempo-<instance_name>-<component> have the Up status.

  2. To verify that alerts are set up correctly, go to ObserveAlertingAlerting rules, filter for Source: User, and check that the Alert rules for the TempoStack instance components are available.

Configuring the Tempo Operator metrics and alerts

When installing the Tempo Operator from the web console, you can select the Enable Operator recommended cluster monitoring on this Namespace checkbox, which enables creating metrics and alerts of the Tempo Operator.

If the checkbox was not selected during installation, you can manually enable metrics and alerts even after installing the Tempo Operator.

Procedure
  • Add the openshift.io/cluster-monitoring: "true" label in the project where the Tempo Operator is installed, which is openshift-tempo-operator by default.

Verification

You can use the Administrator view of the web console to verify successful configuration:

  1. Go to ObserveTargets, filter for Source: Platform, and search for tempo-operator, which must have the Up status.

  2. To verify that alerts are set up correctly, go to ObserveAlertingAlerting rules, filter for Source: Platform, and locate the Alert rules for the Tempo Operator.