This is a cache of https://docs.openshift.com/container-platform/4.17/service_mesh/v2x/ossm-observability.html. It is a snapshot of the page at 2024-11-26T08:32:51.396+0000.
Metrics, logs, and traces - <strong>service</strong> Mesh 2.x | <strong>service</strong> Mesh | OpenShift Container Platform 4.17
×

Discovering console addresses

Red Hat OpenShift service Mesh provides the following consoles to view your service mesh data:

  • Kiali console - Kiali is the management console for Red Hat OpenShift service Mesh.

  • Jaeger console - Jaeger is the management console for Red Hat OpenShift distributed tracing platform.

  • Grafana console - Grafana provides mesh administrators with advanced query and metrics analysis and dashboards for Istio data. Optionally, Grafana can be used to analyze service mesh metrics.

  • Prometheus console - Red Hat OpenShift service Mesh uses Prometheus to store telemetry information from services.

When you install the service Mesh control plane, it automatically generates routes for each of the installed components. Once you have the route address, you can access the Kiali, Jaeger, Prometheus, or Grafana console to view and manage your service mesh data.

Prerequisite
  • The component must be enabled and installed. For example, if you did not install distributed tracing, you will not be able to access the Jaeger console.

Procedure from OpenShift console
  1. Log in to the OpenShift Container Platform web console as a user with cluster-admin rights. If you use Red Hat OpenShift Dedicated, you must have an account with the dedicated-admin role.

  2. Navigate to NetworkingRoutes.

  3. On the Routes page, select the service Mesh control plane project, for example istio-system, from the Namespace menu.

    The Location column displays the linked address for each route.

  4. If necessary, use the filter to find the component console whose route you want to access. Click the route Location to launch the console.

  5. Click Log In With OpenShift.

Procedure from the CLI
  1. Log in to the OpenShift Container Platform CLI as a user with the cluster-admin role. If you use Red Hat OpenShift Dedicated, you must have an account with the dedicated-admin role.

    $ oc login --username=<NAMEOFUSER> https://<HOSTNAME>:6443
  2. Switch to the service Mesh control plane project. In this example, istio-system is the service Mesh control plane project. Run the following command:

    $ oc project istio-system
  3. To get the routes for the various Red Hat OpenShift service Mesh consoles, run the folowing command:

    $ oc get routes

    This command returns the URLs for the Kiali, Jaeger, Prometheus, and Grafana web consoles, and any other routes in your service mesh. You should see output similar to the following:

    NAME                    HOST/PORT                         serviceS              PORT    TERMINATION
    bookinfo-gateway        bookinfo-gateway-yourcompany.com  istio-ingressgateway          http2
    grafana                 grafana-yourcompany.com           grafana               <all>   reencrypt/Redirect
    istio-ingressgateway    istio-ingress-yourcompany.com     istio-ingressgateway  8080
    jaeger                  jaeger-yourcompany.com            jaeger-query          <all>   reencrypt
    kiali                   kiali-yourcompany.com             kiali                 20001   reencrypt/Redirect
    prometheus              prometheus-yourcompany.com        prometheus            <all>   reencrypt/Redirect
  4. Copy the URL for the console you want to access from the HOST/PORT column into a browser to open the console.

  5. Click Log In With OpenShift.

Accessing the Kiali console

You can view your application’s topology, health, and metrics in the Kiali console. If your service is experiencing problems, the Kiali console lets you view the data flow through your service. You can view insights about the mesh components at different levels, including abstract applications, services, and workloads. Kiali also provides an interactive graph view of your namespace in real time.

To access the Kiali console you must have Red Hat OpenShift service Mesh installed, Kiali installed and configured.

The installation process creates a route to access the Kiali console.

If you know the URL for the Kiali console, you can access it directly. If you do not know the URL, use the following directions.

Procedure for administrators
  1. Log in to the OpenShift Container Platform web console with an administrator role.

  2. Click HomeProjects.

  3. On the Projects page, if necessary, use the filter to find the name of your project.

  4. Click the name of your project, for example, bookinfo.

  5. On the Project details page, in the Launcher section, click the Kiali link.

  6. Log in to the Kiali console with the same user name and password that you use to access the OpenShift Container Platform console.

    When you first log in to the Kiali Console, you see the Overview page which displays all the namespaces in your service mesh that you have permission to view.

    If you are validating the console installation and namespaces have not yet been added to the mesh, there might not be any data to display other than istio-system.

Procedure for developers
  1. Log in to the OpenShift Container Platform web console with a developer role.

  2. Click Project.

  3. On the Project Details page, if necessary, use the filter to find the name of your project.

  4. Click the name of your project, for example, bookinfo.

  5. On the Project page, in the Launcher section, click the Kiali link.

  6. Click Log In With OpenShift.

Viewing service mesh data in the Kiali console

The Kiali Graph offers a powerful visualization of your mesh traffic. The topology combines real-time request traffic with your Istio configuration information to present immediate insight into the behavior of your service mesh, letting you quickly pinpoint issues. Multiple Graph Types let you visualize traffic as a high-level service topology, a low-level workload topology, or as an application-level topology.

There are several graphs to choose from:

  • The App graph shows an aggregate workload for all applications that are labeled the same.

  • The service graph shows a node for each service in your mesh but excludes all applications and workloads from the graph. It provides a high level view and aggregates all traffic for defined services.

  • The Versioned App graph shows a node for each version of an application. All versions of an application are grouped together.

  • The Workload graph shows a node for each workload in your service mesh. This graph does not require you to use the application and version labels. If your application does not use version labels, use this the graph.

Graph nodes are decorated with a variety of information, pointing out various route routing options like virtual services and service entries, as well as special configuration like fault-injection and circuit breakers. It can identify mTLS issues, latency issues, error traffic and more. The Graph is highly configurable, can show traffic animation, and has powerful Find and Hide abilities.

Click the Legend button to view information about the shapes, colors, arrows, and badges displayed in the graph.

To view a summary of metrics, select any node or edge in the graph to display its metric details in the summary details panel.

Changing graph layouts in Kiali

The layout for the Kiali graph can render differently depending on your application architecture and the data to display. For example, the number of graph nodes and their interactions can determine how the Kiali graph is rendered. Because it is not possible to create a single layout that renders nicely for every situation, Kiali offers a choice of several different layouts.

Prerequisites
  • If you do not have your own application installed, install the Bookinfo sample application. Then generate traffic for the Bookinfo application by entering the following command several times.

    $ curl "http://$GATEWAY_URL/productpage"

    This command simulates a user visiting the productpage microservice of the application.

Procedure
  1. Launch the Kiali console.

  2. Click Log In With OpenShift.

  3. In Kiali console, click Graph to view a namespace graph.

  4. From the Namespace menu, select your application namespace, for example, bookinfo.

  5. To choose a different graph layout, do either or both of the following:

    • Select different graph data groupings from the menu at the top of the graph.

      • App graph

      • service graph

      • Versioned App graph (default)

      • Workload graph

    • Select a different graph layout from the Legend at the bottom of the graph.

      • Layout default dagre

      • Layout 1 cose-bilkent

      • Layout 2 cola

Viewing logs in the Kiali console

You can view logs for your workloads in the Kiali console. The Workload Detail page includes a Logs tab which displays a unified logs view that displays both application and proxy logs. You can select how often you want the log display in Kiali to be refreshed.

To change the logging level on the logs displayed in Kiali, you change the logging configuration for the workload or the proxy.

Prerequisites
  • service Mesh installed and configured.

  • Kiali installed and configured.

  • The address for the Kiali console.

  • Application or Bookinfo sample application added to the mesh.

Procedure
  1. Launch the Kiali console.

  2. Click Log In With OpenShift.

    The Kiali Overview page displays namespaces that have been added to the mesh that you have permissions to view.

  3. Click Workloads.

  4. On the Workloads page, select the project from the Namespace menu.

  5. If necessary, use the filter to find the workload whose logs you want to view. Click the workload Name. For example, click ratings-v1.

  6. On the Workload Details page, click the Logs tab to view the logs for the workload.

If you do not see any log entries, you may need to adjust either the Time Range or the Refresh interval.

Viewing metrics in the Kiali console

You can view inbound and outbound metrics for your applications, workloads, and services in the Kiali console. The Detail pages include the following tabs:

  • inbound Application metrics

  • outbound Application metrics

  • inbound Workload metrics

  • outbound Workload metrics

  • inbound service metrics

These tabs display predefined metrics dashboards, tailored to the relevant application, workload or service level. The application and workload detail views show request and response metrics such as volume, duration, size, or TCP traffic. The service detail view shows request and response metrics for inbound traffic only.

Kiali lets you customize the charts by choosing the charted dimensions. Kiali can also present metrics reported by either source or destination proxy metrics. And for troubleshooting, Kiali can overlay trace spans on the metrics.

Prerequisites
  • service Mesh installed and configured.

  • Kiali installed and configured.

  • The address for the Kiali console.

  • (Optional) Distributed tracing installed and configured.

Procedure
  1. Launch the Kiali console.

  2. Click Log In With OpenShift.

    The Kiali Overview page displays namespaces that have been added to the mesh that you have permissions to view.

  3. Click either Applications, Workloads, or services.

  4. On the Applications, Workloads, or services page, select the project from the Namespace menu.

  5. If necessary, use the filter to find the application, workload, or service whose logs you want to view. Click the Name.

  6. On the Application Detail, Workload Details, or service Details page, click either the Inbound Metrics or Outbound Metrics tab to view the metrics.

Distributed tracing

Distributed tracing is the process of tracking the performance of individual services in an application by tracing the path of the service calls in the application. Each time a user takes action in an application, a request is executed that might require many services to interact to produce a response. The path of this request is called a distributed transaction.

Red Hat OpenShift service Mesh uses Red Hat OpenShift distributed tracing platform to allow developers to view call flows in a microservice application.

Configuring the Red Hat OpenShift distributed tracing platform (Tempo) and the Red Hat build of OpenTelemetry

You can expose tracing data to the Red Hat OpenShift distributed tracing platform (Tempo) by appending a named element and the opentelemetry provider to the spec.meshConfig.extensionProviders specification in the serviceMeshControlPlane. Then, a telemetry custom resource configures Istio proxies to collect trace spans and send them to the OpenTelemetry Collector endpoint.

You can create a Red Hat build of OpenTelemetry instance in a mesh namespace and configure it to send tracing data to a tracing platform backend service.

Prerequisites
  • You created a TempoStack instance using the Red Hat Tempo Operator in the tracing-system namespace. For more information, see "Installing Red Hat OpenShift distributed tracing platform (Tempo)".

  • You installed the Red Hat build of OpenTelemetry Operator in either the recommended namespace or the openshift-operators namespace. For more information, see "Installing the Red Hat build of OpenTelemetry".

  • If using Red Hat OpenShift service Mesh 2.5 or earlier, set the spec.tracing.type parameter of the serviceMeshControlPlane resource to None so tracing data can be sent to the OpenTelemetry Collector.

Procedure
  1. Create an OpenTelemetry Collector instance in a mesh namespace. This example uses the bookinfo namespace:

    Example OpenTelemetry Collector configuration
    apiVersion: opentelemetry.io/v1alpha1
    kind: OpenTelemetryCollector
    metadata:
      name: otel
      namespace: bookinfo  (1)
    spec:
      mode: deployment
      config: |
        receivers:
          otlp:
            protocols:
              grpc:
                endpoint: 0.0.0.0:4317
        exporters:
          otlp:
            endpoint: "tempo-sample-distributor.tracing-system.svc.cluster.local:4317" (2)
            tls:
              insecure: true
        service:
          pipelines:
            traces:
              receivers: [otlp]
              processors: []
              exporters: [otlp]
    1 Include the namespace in the serviceMeshMemberRoll member list.
    2 In this example, a TempoStack instance is running in the tracing-system namespace. You do not have to include the TempoStack namespace, such as`tracing-system`, in the serviceMeshMemberRoll member list.
    • Create a single instance of the OpenTelemetry Collector in one of the serviceMeshMemberRoll member namespaces.

    • You can add an otel-collector as a part of the mesh by adding sidecar.istio.io/inject: 'true' to the OpenTelemetryCollector resource.

  2. Check the otel-collector pod log and verify that the pod is running:

    Example otel-collector pod log check
    $ oc logs -n bookinfo  -l app.kubernetes.io/name=otel-collector
  3. Create or update an existing serviceMeshControlPlane custom resource (CR) in the istio-system namespace:

    Example SMCP custom resource
    kind: serviceMeshControlPlane
    apiVersion: maistra.io/v2
    metadata:
      name: basic
      namespace: istio-system
    spec:
      addons:
        grafana:
          enabled: false
        kiali:
          enabled: true
        prometheus:
          enabled: true
      meshConfig:
        extensionProviders:
          - name: otel
            opentelemetry:
              port: 4317
              service: otel-collector.bookinfo.svc.cluster.local
      policy:
        type: Istiod
      telemetry:
        type: Istiod
      version: v2.6

    When upgrading from SMCP 2.5 to 2.6, set the spec.tracing.type parameter to None:

    Example SMCP spec.tracing.type parameter
    spec:
      tracing:
        type: None
  4. Create a Telemetry resource in the istio-system namespace:

    Example Telemetry resource
    apiVersion: telemetry.istio.io/v1alpha1
    kind: Telemetry
    metadata:
      name: mesh-default
      namespace: istio-system
    spec:
      tracing:
      - providers:
        - name: otel
        randomSamplingPercentage: 100
  5. Verify the istiod log.

  6. Configure the Kiali resource specification to enable a Kiali workload traces dashboard. You can use the dashboard to view tracing query results.

    Example Kiali resource
    apiVersion: kiali.io/v1alpha1
    kind: Kiali
    # ...
    spec:
      external_services:
        tracing:
          query_timeout: 30 (1)
          enabled: true
          in_cluster_url: 'http://tempo-sample-query-frontend.tracing-system.svc.cluster.local:16685'
          url: '[Tempo query frontend Route url]'
          use_grpc: true (2)
    1 The default query_timeout integer value is 30 seconds. If you set the value to greater than 30 seconds, you must update .spec.server.write_timeout in the Kiali CR and add the annotation haproxy.router.openshift.io/timeout=50s to the Kiali route. Both .spec.server.write_timeout and haproxy.router.openshift.io/timeout= must be greater than query_timeout.
    2 If you are not using the default HTTP or gRPC port, replace the in_cluster_url: port with your custom port.

    Kiali 1.73 uses the Jaeger Query API, which causes a longer response time depending on Tempo resource limits. If you see a Could not fetch spans error message in the Kiali UI, then check your Tempo configuration or reduce the limit per query in Kiali.

  7. Send requests to your application.

  8. Verify the istiod pod logs and the otel-collector pod logs.

Configuring the OpenTelemetryCollector in a mTLS encrypted service Mesh member namespace

All traffic is TLS encrypted when you enable service Mesh dataPlane mTLS encryption.

To enable the mesh to communicate with the OpenTelemetryCollector service, disable the TLS trafficPolicy by applying a DestinationRule for the OpenTelemetryCollector service:

Example DestinationRule Tempo CR
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: otel-disable-tls
spec:
  host: "otel-collector.bookinfo.svc.cluster.local"
  trafficPolicy:
    tls:
      mode: DISABLE

Configuring the Red Hat OpenShift distributed tracing platform (Tempo) in a mTLS encrypted service Mesh member namespace

You don’t need this additional DestinationRule configuration if you created a TempoStack instance in a namespace that is not a service Mesh member namespace.

All traffic is TLS encrypted when you enable service Mesh dataPlane mTLS encryption and you create a TempoStack instance in a service Mesh member namespace such as tracing-system-mtls. This encryption is not expected from the Tempo distributed service and returns a TLS error.

To fix the TLS error, disable the TLS trafficPolicy by applying a DestinationRule for Tempo and Kiali:

Example DestinationRule Tempo
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: tempo
  namespace: tracing-system-mtls
spec:
  host: "*.tracing-system-mtls.svc.cluster.local"
  trafficPolicy:
    tls:
      mode: DISABLE
Example DestinationRule Kiali
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: kiali
  namespace: istio-system
spec:
  host: kiali.istio-system.svc.cluster.local
  trafficPolicy:
    tls:
      mode: DISABLE

Connecting an existing distributed tracing Jaeger instance

If you already have an existing Red Hat OpenShift distributed tracing platform (Jaeger) instance in OpenShift Container Platform, you can configure your serviceMeshControlPlane resource to use that instance for distributed tracing platform.

Starting with Red Hat OpenShift service Mesh 2.5, Red Hat OpenShift distributed tracing platform (Jaeger) and OpenShift Elasticsearch Operator are deprecated and will be removed in a future release. Red Hat will provide bug fixes and support for these features during the current release lifecycle, but these features will no longer receive enhancements and will be removed. As an alternative to Red Hat OpenShift distributed tracing platform (Jaeger), you can use Red Hat OpenShift distributed tracing platform (Tempo) instead.

Prerequisites
  • Red Hat OpenShift distributed tracing platform instance installed and configured.

Procedure
  1. In the OpenShift Container Platform web console, click OperatorsInstalled Operators.

  2. Click the Project menu and select the project where you installed the service Mesh control plane, for example istio-system.

  3. Click the Red Hat OpenShift service Mesh Operator. In the Istio service Mesh Control Plane column, click the name of your serviceMeshControlPlane resource, for example basic.

  4. Add the name of your distributed tracing platform (Jaeger) instance to the serviceMeshControlPlane.

    1. Click the YAML tab.

    2. Add the name of your distributed tracing platform (Jaeger) instance to spec.addons.jaeger.name in your serviceMeshControlPlane resource. In the following example, distr-tracing-production is the name of the distributed tracing platform (Jaeger) instance.

      Example distributed tracing configuration
      spec:
        addons:
          jaeger:
            name: distr-tracing-production
    3. Click Save.

  5. Click Reload to verify the serviceMeshControlPlane resource was configured correctly.

Adjusting the sampling rate

A trace is an execution path between services in the service mesh. A trace is comprised of one or more spans. A span is a logical unit of work that has a name, start time, and duration. The sampling rate determines how often a trace is persisted.

The Envoy proxy sampling rate is set to sample 100% of traces in your service mesh by default. A high sampling rate consumes cluster resources and performance but is useful when debugging issues. Before you deploy Red Hat OpenShift service Mesh in production, set the value to a smaller proportion of traces. For example, set spec.tracing.sampling to 100 to sample 1% of traces.

Configure the Envoy proxy sampling rate as a scaled integer representing 0.01% increments.

In a basic installation, spec.tracing.sampling is set to 10000, which samples 100% of traces. For example:

  • Setting the value to 10 samples 0.1% of traces.

  • Setting the value to 500 samples 5% of traces.

The Envoy proxy sampling rate applies for applications that are available to a service Mesh, and use the Envoy proxy. This sampling rate determines how much data the Envoy proxy collects and tracks.

The Jaeger remote sampling rate applies to applications that are external to the service Mesh, and do not use the Envoy proxy, such as a database. This sampling rate determines how much data the distributed tracing system collects and stores. For more information, see Distributed tracing configuration options.

Procedure
  1. In the OpenShift Container Platform web console, click OperatorsInstalled Operators.

  2. Click the Project menu and select the project where you installed the control plane, for example istio-system.

  3. Click the Red Hat OpenShift service Mesh Operator. In the Istio service Mesh Control Plane column, click the name of your serviceMeshControlPlane resource, for example basic.

  4. To adjust the sampling rate, set a different value for spec.tracing.sampling.

    1. Click the YAML tab.

    2. Set the value for spec.tracing.sampling in your serviceMeshControlPlane resource. In the following example, set it to 100.

      Jaeger sampling example
      spec:
        tracing:
          sampling: 100
    3. Click Save.

  5. Click Reload to verify the serviceMeshControlPlane resource was configured correctly.

Accessing the Jaeger console

To access the Jaeger console you must have Red Hat OpenShift service Mesh installed, Red Hat OpenShift distributed tracing platform (Jaeger) installed and configured.

The installation process creates a route to access the Jaeger console.

If you know the URL for the Jaeger console, you can access it directly. If you do not know the URL, use the following directions.

Starting with Red Hat OpenShift service Mesh 2.5, Red Hat OpenShift distributed tracing platform (Jaeger) and OpenShift Elasticsearch Operator have been deprecated and will be removed in a future release. Red Hat will provide bug fixes and support for this feature during the current release lifecycle, but this feature will no longer receive enhancements and will be removed. As an alternative to Red Hat OpenShift distributed tracing platform (Jaeger), you can use Red Hat OpenShift distributed tracing platform (Tempo) instead.

Procedure from OpenShift console
  1. Log in to the OpenShift Container Platform web console as a user with cluster-admin rights. If you use Red Hat OpenShift Dedicated, you must have an account with the dedicated-admin role.

  2. Navigate to NetworkingRoutes.

  3. On the Routes page, select the service Mesh control plane project, for example istio-system, from the Namespace menu.

    The Location column displays the linked address for each route.

  4. If necessary, use the filter to find the jaeger route. Click the route Location to launch the console.

  5. Click Log In With OpenShift.

Procedure from Kiali console
  1. Launch the Kiali console.

  2. Click Distributed Tracing in the left navigation pane.

  3. Click Log In With OpenShift.

Procedure from the CLI
  1. Log in to the OpenShift Container Platform CLI as a user with the cluster-admin role. If you use Red Hat OpenShift Dedicated, you must have an account with the dedicated-admin role.

    $ oc login --username=<NAMEOFUSER> https://<HOSTNAME>:6443
  2. To query for details of the route using the command line, enter the following command. In this example, istio-system is the service Mesh control plane namespace.

    $ oc get route -n istio-system jaeger -o jsonpath='{.spec.host}'
  3. Launch a browser and navigate to https://<JAEGER_URL>, where <JAEGER_URL> is the route that you discovered in the previous step.

  4. Log in using the same user name and password that you use to access the OpenShift Container Platform console.

  5. If you have added services to the service mesh and have generated traces, you can use the filters and Find Traces button to search your trace data.

    If you are validating the console installation, there is no trace data to display.

For more information about configuring Jaeger, see the distributed tracing documentation.

Accessing the Grafana console

Grafana is an analytics tool you can use to view, query, and analyze your service mesh metrics. In this example, istio-system is the service Mesh control plane namespace. To access Grafana, do the following:

Procedure
  1. Log in to the OpenShift Container Platform web console.

  2. Click the Project menu and select the project where you installed the service Mesh control plane, for example istio-system.

  3. Click Routes.

  4. Click the link in the Location column for the Grafana row.

  5. Log in to the Grafana console with your OpenShift Container Platform credentials.

Accessing the Prometheus console

Prometheus is a monitoring and alerting tool that you can use to collect multi-dimensional data about your microservices. In this example, istio-system is the service Mesh control plane namespace.

Procedure
  1. Log in to the OpenShift Container Platform web console.

  2. Click the Project menu and select the project where you installed the service Mesh control plane, for example istio-system.

  3. Click Routes.

  4. Click the link in the Location column for the Prometheus row.

  5. Log in to the Prometheus console with your OpenShift Container Platform credentials.

Integrating with user-workload monitoring

By default, Red Hat OpenShift service Mesh (OSSM) installs the service Mesh control plane (SMCP) with a dedicated instance of Prometheus for collecting metrics from a mesh. However, production systems need more advanced monitoring systems, like OpenShift Container Platform monitoring for user-defined projects.

The following steps show how to integrate service Mesh with user-workload monitoring.

Prerequisites
  • User-workload monitoring is enabled.

  • Red Hat OpenShift service Mesh Operator 2.4 is installed.

  • Kiali Operator 1.65 is installed.

Procedure
  1. Grant the cluster-monitoring-view role to the Kiali service Account:

    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: kiali-monitoring-rbac
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: cluster-monitoring-view
    subjects:
    - kind: serviceAccount
      name: kiali-service-account
      namespace: istio-system
  2. Configure Kiali for user-workload monitoring:

    apiVersion: kiali.io/v1alpha1
    kind: Kiali
    metadata:
      name: kiali-user-workload-monitoring
      namespace: istio-system
    spec:
      external_services:
        prometheus:
          auth:
            type: bearer
            use_kiali_token: true
          query_scope:
            mesh_id: "basic-istio-system"
          thanos_proxy:
            enabled: true
          url: https://thanos-querier.openshift-monitoring.svc.cluster.local:9091
    • If you use Istio Operator 2.4, use this configuration to configure Kiali for user-workload monitoring:

      apiVersion: kiali.io/v1alpha1
      kind: Kiali
      metadata:
        name: kiali-user-workload-monitoring
        namespace: istio-system
      spec:
        external_services:
          istio:
            config_map_name: istio-<smcp-name>
            istio_sidecar_injector_config_map_name: istio-sidecar-injector-<smcp-name>
            istiod_deployment_name: istiod-<smcp-name>
            url_service_version: 'http://istiod-<smcp-name>.istio-system:15014/version'
          prometheus:
            auth:
              token: secret:thanos-querier-web-token:token
              type: bearer
              use_kiali_token: false
            query_scope:
              mesh_id: "basic-istio-system"
            thanos_proxy:
              enabled: true
            url: https://thanos-querier.openshift-monitoring.svc.cluster.local:9091
        version: v1.65
  3. Configure the SMCP for external Prometheus:

    apiVersion: maistra.io/v2
    kind: serviceMeshControlPlane
    metadata:
      name: basic
      namespace: istio-system
    spec:
      addons:
        prometheus:
          enabled: false (1)
        grafana:
          enabled: false (2)
        kiali:
          name: kiali-user-workload-monitoring
      meshConfig:
        extensionProviders:
        - name: prometheus
          prometheus: {}
    1 Disable the default Prometheus instance provided by OSSM.
    2 Disable Grafana. It is not supported with an external Prometheus instance.
  4. Apply a custom network policy to allow ingress traffic from the monitoring namespace:

    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: user-workload-access
      namespace: istio-system (1)
    spec:
      ingress:
      - from:
        - namespaceSelector:
            matchLabels:
              network.openshift.io/policy-group: monitoring
      podSelector: {}
      policyTypes:
      - Ingress
    1 The custom network policy must be applied to all namespaces.
  5. Apply a Telemetry object to enable traffic metrics in Istio proxies:

    apiVersion: telemetry.istio.io/v1alpha1
    kind: Telemetry
    metadata:
      name: enable-prometheus-metrics
      namespace: istio-system (1)
    spec:
      selector: (2)
        matchLabels:
          app: bookinfo
      metrics:
      - providers:
        - name: prometheus
    1 A Telemetry object created in the control plane namespace applies to all workloads in a mesh. To apply telemetry to only one namespace, create the object in the target namespace.
    2 Optional: Setting the selector.matchLabels spec applies the Telemetry object to specific workloads in the target namespace.
  6. Apply a serviceMonitor object to monitor the Istio control plane:

    apiVersion: monitoring.coreos.com/v1
    kind: serviceMonitor
    metadata:
      name: istiod-monitor
      namespace: istio-system (1)
    spec:
      targetLabels:
      - app
      selector:
        matchLabels:
          istio: pilot
      endpoints:
      - port: http-monitoring
        interval: 30s
        relabelings:
        - action: replace
          replacement: "basic-istio-system" (2)
          targetLabel: mesh_id
    1 Create this serviceMonitor object in the Istio control plane namespace because it monitors the Istiod service. In this example, the namespace is istio-system.
    2 The string "basic-istio-system" is a combination of the SMCP name and its namespace, but any label can be used as long as it is unique for every mesh using user workload monitoring in the cluster. The spec.prometheus.query_scope of the Kiali resource configured in Step 2 needs to match this value.

    If there is only one mesh using user-workload monitoring, then both the mesh_id relabeling and the spec.prometheus.query_scope field in the Kiali resource are optional (but the query_scope field given here should be removed if the mesh_id label is removed).

    If multiple mesh instances on the cluster might use user-workload monitoring, then both the mesh_id relabelings and the spec.prometheus.query_scope field in the Kiali resource are required. This ensures that Kiali only sees metrics from its associated mesh.

    If you are not deploying Kiali, you can still apply mesh_id relabeling so that metrics from different meshes can be distinguished from one another.

  7. Apply a PodMonitor object to collect metrics from Istio proxies:

    apiVersion: monitoring.coreos.com/v1
    kind: PodMonitor
    metadata:
      name: istio-proxies-monitor
      namespace: istio-system (1)
    spec:
      selector:
        matchExpressions:
        - key: istio-prometheus-ignore
          operator: DoesNotExist
      podMetricsEndpoints:
      - path: /stats/prometheus
        interval: 30s
        relabelings:
        - action: keep
          sourceLabels: [__meta_kubernetes_pod_container_name]
          regex: "istio-proxy"
        - action: keep
          sourceLabels: [__meta_kubernetes_pod_annotationpresent_prometheus_io_scrape]
        - action: replace
          regex: (\d+);(([A-Fa-f0-9]{1,4}::?){1,7}[A-Fa-f0-9]{1,4})
          replacement: '[$2]:$1'
          sourceLabels: [__meta_kubernetes_pod_annotation_prometheus_io_port,
          __meta_kubernetes_pod_ip]
          targetLabel: __address__
        - action: replace
          regex: (\d+);((([0-9]+?)(\.|$)){4})
          replacement: $2:$1
          sourceLabels: [__meta_kubernetes_pod_annotation_prometheus_io_port,
          __meta_kubernetes_pod_ip]
          targetLabel: __address__
        - action: labeldrop
          regex: "__meta_kubernetes_pod_label_(.+)"
        - sourceLabels: [__meta_kubernetes_namespace]
          action: replace
          targetLabel: namespace
        - sourceLabels: [__meta_kubernetes_pod_name]
          action: replace
          targetLabel: pod_name
        - action: replace
          replacement: "basic-istio-system" (2)
          targetLabel: mesh_id
    1 Since OpenShift Container Platform monitoring ignores the namespaceSelector spec in serviceMonitor and PodMonitor objects, you must apply the PodMonitor object in all mesh namespaces, including the control plane namespace.
    2 The string "basic-istio-system" is a combination of the SMCP name and its namespace, but any label can be used as long as it is unique for every mesh using user workload monitoring in the cluster. The spec.prometheus.query_scope of the Kiali resource configured in Step 2 needs to match this value.

    If there is only one mesh using user-workload monitoring, then both the mesh_id relabeling and the spec.prometheus.query_scope field in the Kiali resource are optional (but the query_scope field given here should be removed if the mesh_id label is removed).

    If multiple mesh instances on the cluster might use user-workload monitoring, then both the mesh_id relabelings and the spec.prometheus.query_scope field in the Kiali resource are required. This ensures that Kiali only sees metrics from its associated mesh.

    If you are not deploying Kiali, you can still apply mesh_id relabeling so that metrics from different meshes can be distinguished from one another.

  8. Open the OpenShift Container Platform web console, and check that metrics are visible.