This is a cache of https://docs.okd.io/4.13/operators/operator_sdk/osdk-monitoring-prometheus.html. It is a snapshot of the page at 2024-11-27T00:32:01.676+0000.
Configuring built-in monitoring with Prometheus - Developing Operators | Operators | OKD 4.13
×

This guide describes the built-in monitoring support provided by the Operator SDK using the Prometheus Operator and details usage for authors of Go-based and Ansible-based Operators.

Prometheus Operator support

Prometheus is an open-source systems monitoring and alerting toolkit. The Prometheus Operator creates, configures, and manages Prometheus clusters running on Kubernetes-based clusters, such as OKD.

Helper functions exist in the Operator SDK by default to automatically set up metrics in any generated Go-based Operator for use on clusters where the Prometheus Operator is deployed.

Exposing custom metrics for Go-based Operators

As an Operator author, you can publish custom metrics by using the global Prometheus registry from the controller-runtime/pkg/metrics library.

Prerequisites
  • Go-based Operator generated using the Operator SDK

  • Prometheus Operator, which is deployed by default on OKD clusters

Procedure
  1. In your Operator SDK project, uncomment the following line in the config/default/kustomization.yaml file:

    ../prometheus
  2. Create a custom controller class to publish additional metrics from the Operator. The following example declares the widgets and widgetFailures collectors as global variables, and then registers them with the init() function in the controller’s package:

    controllers/memcached_controller_test_metrics.go file
    package controllers
    
    import (
    	"github.com/prometheus/client_golang/prometheus"
    	"sigs.k8s.io/controller-runtime/pkg/metrics"
    )
    
    
    var (
        widgets = prometheus.NewCounter(
            prometheus.CounterOpts{
                Name: "widgets_total",
                Help: "Number of widgets processed",
            },
        )
        widgetFailures = prometheus.NewCounter(
            prometheus.CounterOpts{
                Name: "widget_failures_total",
                Help: "Number of failed widgets",
            },
        )
    )
    
    func init() {
        // Register custom metrics with the global prometheus registry
        metrics.Registry.MustRegister(widgets, widgetFailures)
    }
  3. Record to these collectors from any part of the reconcile loop in the main controller class, which determines the business logic for the metric:

    controllers/memcached_controller.go file
    func (r *MemcachedReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    	...
    	...
    	// Add metrics
    	widgets.Inc()
    	widgetFailures.Inc()
    
    	return ctrl.Result{}, nil
    }
  4. Build and push the Operator:

    $ make docker-build docker-push IMG=<registry>/<user>/<image_name>:<tag>
  5. Deploy the Operator:

    $ make deploy IMG=<registry>/<user>/<image_name>:<tag>
  6. Create role and role binding definitions to allow the service monitor of the Operator to be scraped by the Prometheus instance of the OKD cluster.

    Roles must be assigned so that service accounts have the permissions to scrape the metrics of the namespace:

    config/prometheus/role.yaml role
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: prometheus-k8s-role
      namespace: memcached-operator-system
    rules:
      - apiGroups:
          - ""
        resources:
          - endpoints
          - pods
          - services
          - nodes
          - secrets
        verbs:
          - get
          - list
          - watch
    config/prometheus/rolebinding.yaml role binding
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: prometheus-k8s-rolebinding
      namespace: memcached-operator-system
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: prometheus-k8s-role
    subjects:
      - kind: ServiceAccount
        name: prometheus-k8s
        namespace: openshift-monitoring
  7. Apply the roles and role bindings for the deployed Operator:

    $ oc apply -f config/prometheus/role.yaml
    $ oc apply -f config/prometheus/rolebinding.yaml
  8. Set the labels for the namespace that you want to scrape, which enables OpenShift cluster monitoring for that namespace:

    $ oc label namespace <operator_namespace> openshift.io/cluster-monitoring="true"
Verification
  • Query and view the metrics in the OKD web console. You can use the names that were set in the custom controller class, for example widgets_total and widget_failures_total.

Exposing custom metrics for Ansible-based Operators

As an Operator author creating Ansible-based Operators, you can use the Operator SDK’s osdk_metrics module to expose custom Operator and Operand metrics, emit events, and support logging.

Prerequisites
  • Ansible-based Operator generated using the Operator SDK

  • Prometheus Operator, which is deployed by default on OKD clusters

Procedure
  1. Generate an Ansible-based Operator. This example uses a testmetrics.com domain:

    $ operator-sdk init \
        --plugins=ansible \
        --domain=testmetrics.com
  2. Create a metrics API. This example uses a kind named Testmetrics:

    $ operator-sdk create api \
        --group metrics \
        --version v1 \
        --kind Testmetrics \
        --generate-role
  3. Edit the roles/testmetrics/tasks/main.yml file and use the osdk_metrics module to create custom metrics for your Operator project:

    Example roles/testmetrics/tasks/main.yml file
    ---
    # tasks file for Memcached
    - name: start k8sstatus
      k8s:
        definition:
          kind: Deployment
          apiVersion: apps/v1
          metadata:
            name: '{{ ansible_operator_meta.name }}-memcached'
            namespace: '{{ ansible_operator_meta.namespace }}'
          spec:
            replicas: "{{size}}"
            selector:
              matchLabels:
                app: memcached
            template:
              metadata:
                labels:
                  app: memcached
              spec:
                containers:
                - name: memcached
                  command:
                  - memcached
                  - -m=64
                  - -o
                  - modern
                  - -v
                  image: "docker.io/memcached:1.4.36-alpine"
                  ports:
                    - containerPort: 11211
    
    - osdk_metric:
        name: my_thing_counter
        description: This metric counts things
        counter: {}
    
    - osdk_metric:
        name: my_counter_metric
        description: Add 3.14 to the counter
        counter:
          increment: yes
    
    - osdk_metric:
        name: my_gauge_metric
        description: Create my gauge and set it to 2.
        gauge:
          set: 2
    
    - osdk_metric:
        name: my_histogram_metric
        description: Observe my histogram
        histogram:
          observe: 2
    
    - osdk_metric:
        name: my_summary_metric
        description: Observe my summary
        summary:
          observe: 2
Verification
  1. Run your Operator on a cluster. For example, to use the "run as a deployment" method:

    1. Build the Operator image and push it to a registry:

      $ make docker-build docker-push IMG=<registry>/<user>/<image_name>:<tag>
    2. Install the Operator on a cluster:

      $ make install
    3. Deploy the Operator:

      $ make deploy IMG=<registry>/<user>/<image_name>:<tag>
  2. Create a Testmetrics custom resource (CR):

    1. Define the CR spec:

      Example config/samples/metrics_v1_testmetrics.yaml file
      apiVersion: metrics.testmetrics.com/v1
      kind: Testmetrics
      metadata:
        name: testmetrics-sample
      spec:
        size: 1
    2. Create the object:

      $ oc create -f config/samples/metrics_v1_testmetrics.yaml
  3. Get the pod details:

    $ oc get pods
    Example output
    NAME                                    READY   STATUS    RESTARTS   AGE
    ansiblemetrics-controller-manager-<id>  2/2     Running   0          149m
    testmetrics-sample-memcached-<id>       1/1     Running   0          147m
  4. Get the endpoint details:

    $ oc get ep
    Example output
    NAME                                                ENDPOINTS          AGE
    ansiblemetrics-controller-manager-metrics-service   10.129.2.70:8443   150m
  5. Request a custom metrics token:

    $ token=`oc create token prometheus-k8s -n openshift-monitoring`
  6. Check the metrics values:

    1. Check the my_counter_metric value:

      $ oc exec ansiblemetrics-controller-manager-<id> -- curl -k -H "Authoriza
      tion: Bearer $token" 'https://10.129.2.70:8443/metrics' | grep  my_counter
      Example output
      HELP my_counter_metric Add 3.14 to the counter
      TYPE my_counter_metric counter
      my_counter_metric 2
    2. Check the my_gauge_metric value:

      $ oc exec ansiblemetrics-controller-manager-<id> -- curl -k -H "Authoriza
      tion: Bearer $token" 'https://10.129.2.70:8443/metrics' | grep  gauge
      Example output
      HELP my_gauge_metric Create my gauge and set it to 2.
    3. Check the my_histogram_metric and my_summary_metric values:

      $ oc exec ansiblemetrics-controller-manager-<id> -- curl -k -H "Authoriza
      tion: Bearer $token" 'https://10.129.2.70:8443/metrics' | grep  Observe
      Example output
      HELP my_histogram_metric Observe my histogram
      HELP my_summary_metric Observe my summary