Serverless administrator metrics - Administer | Serverless

Prerequisites
Controller metrics
Webhook metrics
Knative Eventing metrics
Knative Serving metrics

Metrics enable cluster administrators to monitor how OpenShift Serverless cluster components and workloads are performing.

You can view different metrics for OpenShift Serverless by navigating to Dashboards in the OpenShift Container Platform web console Administrator perspective.

Prerequisites

See the OpenShift Container Platform documentation on Managing metrics for information about enabling metrics for your cluster.
To view metrics for Knative components on OpenShift Container Platform, you need cluster administrator permissions, and access to the web console Administrator perspective.

If Service Mesh is enabled with mTLS, metrics for Knative Serving are disabled by default because Service Mesh prevents Prometheus from scraping metrics.

For information about resolving this issue, see Enabling Knative Serving metrics when using Service Mesh with mTLS.

Scraping the metrics does not affect autoscaling of a Knative service, because scraping requests do not go through the activator. Consequently, no scraping takes place if no pods are running.

Controller metrics

The following metrics are emitted by any component that implements a controller logic. These metrics show details about reconciliation operations and the work queue behavior upon which reconciliation requests are added to the work queue.

Metric name Description Type Tags Unit

Metric name	Description	Type	Tags	Unit
`work_queue_depth`	The depth of the work queue.	Gauge	`reconciler`	Integer (no units)
`reconcile_count`	The number of reconcile operations.	Counter	`reconciler`, `success`	Integer (no units)
`reconcile_latency`	The latency of reconcile operations.	Histogram	`reconciler`, `success`	Milliseconds
`workqueue_adds_total`	The total number of add actions handled by the work queue.	Counter	`name`	Integer (no units)
`workqueue_queue_latency_seconds`	The length of time an item stays in the work queue before being requested.	Histogram	`name`	Seconds
`workqueue_retries_total`	The total number of retries that have been handled by the work queue.	Counter	`name`	Integer (no units)
`workqueue_work_duration_seconds`	The length of time it takes to process and item from the work queue.	Histogram	`name`	Seconds
`workqueue_unfinished_work_seconds`	The length of time that outstanding work queue items have been in progress.	Histogram	`name`	Seconds
`workqueue_longest_running_processor_seconds`	The length of time that the longest outstanding work queue items has been in progress.	Histogram	`name`	Seconds

work_queue_depth

The depth of the work queue.

Gauge

reconciler

Integer (no units)

reconcile_count

The number of reconcile operations.

Counter

reconciler, success

Integer (no units)

reconcile_latency

The latency of reconcile operations.

Histogram

reconciler, success

Milliseconds

workqueue_adds_total

The total number of add actions handled by the work queue.

Counter

name

Integer (no units)

workqueue_queue_latency_seconds

The length of time an item stays in the work queue before being requested.

Histogram

name

Seconds

workqueue_retries_total

The total number of retries that have been handled by the work queue.

Counter

name

Integer (no units)

workqueue_work_duration_seconds

The length of time it takes to process and item from the work queue.

Histogram

name

Seconds

workqueue_unfinished_work_seconds

The length of time that outstanding work queue items have been in progress.

Histogram

name

Seconds

workqueue_longest_running_processor_seconds

The length of time that the longest outstanding work queue items has been in progress.

Histogram

name

Seconds

Webhook metrics

Webhook metrics report useful information about operations. For example, if a large number of operations fail, this might indicate an issue with a user-created resource.

Metric name Description Type Tags Unit

Metric name	Description	Type	Tags	Unit
`request_count`	The number of requests that are routed to the webhook.	Counter	`admission_allowed`, `kind_group`, `kind_kind`, `kind_version`, `request_operation`, `resource_group`, `resource_namespace`, `resource_resource`, `resource_version`	Integer (no units)
`request_latencies`	The response time for a webhook request.	Histogram	`admission_allowed`, `kind_group`, `kind_kind`, `kind_version`, `request_operation`, `resource_group`, `resource_namespace`, `resource_resource`, `resource_version`	Milliseconds

request_count

The number of requests that are routed to the webhook.

Counter

admission_allowed, kind_group, kind_kind, kind_version, request_operation, resource_group, resource_namespace, resource_resource, resource_version

Integer (no units)

request_latencies

The response time for a webhook request.

Histogram

admission_allowed, kind_group, kind_kind, kind_version, request_operation, resource_group, resource_namespace, resource_resource, resource_version

Milliseconds

Knative Eventing metrics

Cluster administrators can view the following metrics for Knative Eventing components.

By aggregating the metrics from HTTP code, events can be separated into two categories; successful events (2xx) and failed events (5xx).

Broker ingress metrics

You can use the following metrics to debug the broker ingress, see how it is performing, and see which events are being dispatched by the ingress component.

Metric name Description Type Tags Unit

Metric name	Description	Type	Tags	Unit
`event_count`	Number of events received by a broker.	Counter	`broker_name`, `event_type`, `namespace_name`, `response_code`, `response_code_class`, `unique_name`	Integer (no units)
`event_dispatch_latencies`	The time taken to dispatch an event to a channel.	Histogram	`broker_name`, `event_type`, `namespace_name`, `response_code`, `response_code_class`, `unique_name`	Milliseconds

event_count

Number of events received by a broker.

Counter

broker_name, event_type, namespace_name, response_code, response_code_class, unique_name

Integer (no units)

event_dispatch_latencies

The time taken to dispatch an event to a channel.

Histogram

broker_name, event_type, namespace_name, response_code, response_code_class, unique_name

Milliseconds

Broker filter metrics

You can use the following metrics to debug broker filters, see how they are performing, and see which events are being dispatched by the filters. You can also measure the latency of the filtering action on an event.

Metric name Description Type Tags Unit

Metric name	Description	Type	Tags	Unit
`event_count`	Number of events received by a broker.	Counter	`broker_name`, `container_name`, `filter_type`, `namespace_name`, `response_code`, `response_code_class`, `trigger_name`, `unique_name`	Integer (no units)
`event_dispatch_latencies`	The time taken to dispatch an event to a channel.	Histogram	`broker_name`, `container_name`, `filter_type`, `namespace_name`, `response_code`, `response_code_class`, `trigger_name`, `unique_name`	Milliseconds
`event_processing_latencies`	The time it takes to process an event before it is dispatched to a trigger subscriber.	Histogram	`broker_name`, `container_name`, `filter_type`, `namespace_name`, `trigger_name`, `unique_name`	Milliseconds

event_count

Number of events received by a broker.

Counter

broker_name, container_name, filter_type, namespace_name, response_code, response_code_class, trigger_name, unique_name

Integer (no units)

event_dispatch_latencies

The time taken to dispatch an event to a channel.

Histogram

broker_name, container_name, filter_type, namespace_name, response_code, response_code_class, trigger_name, unique_name

Milliseconds

event_processing_latencies

The time it takes to process an event before it is dispatched to a trigger subscriber.

Histogram

broker_name, container_name, filter_type, namespace_name, trigger_name, unique_name

Milliseconds

InMemoryChannel dispatcher metrics

You can use the following metrics to debug InMemoryChannel channels, see how they are performing, and see which events are being dispatched by the channels.

Metric name Description Type Tags Unit

Metric name	Description	Type	Tags	Unit
`event_count`	Number of events dispatched by `InMemoryChannel` channels.	Counter	`broker_name`, `container_name`, `filter_type`, `namespace_name`, `response_code`, `response_code_class`, `trigger_name`, `unique_name`	Integer (no units)
`event_dispatch_latencies`	The time taken to dispatch an event from an `InMemoryChannel` channel.	Histogram	`broker_name`, `container_name`, `filter_type`, `namespace_name`, `response_code`, `response_code_class`, `trigger_name`, `unique_name`	Milliseconds

event_count

Number of events dispatched by InMemoryChannel channels.

Counter

broker_name, container_name, filter_type, namespace_name, response_code, response_code_class, trigger_name, unique_name

Integer (no units)

event_dispatch_latencies

The time taken to dispatch an event from an InMemoryChannel channel.

Histogram

broker_name, container_name, filter_type, namespace_name, response_code, response_code_class, trigger_name, unique_name

Milliseconds

Event source metrics

You can use the following metrics to verify that events have been delivered from the event source to the connected event sink.

Metric name Description Type Tags Unit

Metric name	Description	Type	Tags	Unit
`event_count`	Number of events sent by the event source.	Counter	`broker_name`, `container_name`, `filter_type`, `namespace_name`, `response_code`, `response_code_class`, `trigger_name`, `unique_name`	Integer (no units)
`retry_event_count`	Number of retried events sent by the event source after initially failing to be delivered.	Counter	`event_source`, `event_type`, `name`, `namespace_name`, `resource_group`, `response_code`, `response_code_class`, `response_error`, `response_timeout`	Integer (no units)

event_count

Number of events sent by the event source.

Counter

broker_name, container_name, filter_type, namespace_name, response_code, response_code_class, trigger_name, unique_name

Integer (no units)

retry_event_count

Number of retried events sent by the event source after initially failing to be delivered.

Counter

event_source, event_type, name, namespace_name, resource_group, response_code, response_code_class, response_error, response_timeout

Integer (no units)

Knative Serving metrics

Cluster administrators can view the following metrics for Knative Serving components.

Activator metrics

You can use the following metrics to understand how applications respond when traffic passes through the activator.

Metric name Description Type Tags Unit

Metric name	Description	Type	Tags	Unit
`request_concurrency`	The number of concurrent requests that are routed to the activator, or average concurrency over a reporting period.	Gauge	`configuration_name`, `container_name`, `namespace_name`, `pod_name`, `revision_name`, `service_name`	Integer (no units)
`request_count`	The number of requests that are routed to activator. These are requests that have been fulfilled from the activator handler.	Counter	`configuration_name`, `container_name`, `namespace_name`, `pod_name`, `response_code`, `response_code_class`, `revision_name`, `service_name`,	Integer (no units)
`request_latencies`	The response time in milliseconds for a fulfilled, routed request.	Histogram	`configuration_name`, `container_name`, `namespace_name`, `pod_name`, `response_code`, `response_code_class`, `revision_name`, `service_name`	Milliseconds

request_concurrency

The number of concurrent requests that are routed to the activator, or average concurrency over a reporting period.

Gauge

configuration_name, container_name, namespace_name, pod_name, revision_name, service_name

Integer (no units)

request_count

The number of requests that are routed to activator. These are requests that have been fulfilled from the activator handler.

Counter

configuration_name, container_name, namespace_name, pod_name, response_code, response_code_class, revision_name, service_name,

Integer (no units)

request_latencies

The response time in milliseconds for a fulfilled, routed request.

Histogram

configuration_name, container_name, namespace_name, pod_name, response_code, response_code_class, revision_name, service_name

Milliseconds

Autoscaler metrics

The autoscaler component exposes a number of metrics related to autoscaler behavior for each revision. For example, at any given time, you can monitor the targeted number of pods the autoscaler tries to allocate for a service, the average number of requests per second during the stable window, or whether the autoscaler is in panic mode if you are using the Knative pod autoscaler (KPA).

Metric name Description Type Tags Unit

Metric name	Description	Type	Tags	Unit
`desired_pods`	The number of pods the autoscaler tries to allocate for a service.	Gauge	`configuration_name`, `namespace_name`, `revision_name`, `service_name`	Integer (no units)
`excess_burst_capacity`	The excess burst capacity served over the stable window.	Gauge	`configuration_name`, `namespace_name`, `revision_name`, `service_name`	Integer (no units)
`stable_request_concurrency`	The average number of requests for each observed pod over the stable window.	Gauge	`configuration_name`, `namespace_name`, `revision_name`, `service_name`	Integer (no units)
`panic_request_concurrency`	The average number of requests for each observed pod over the panic window.	Gauge	`configuration_name`, `namespace_name`, `revision_name`, `service_name`	Integer (no units)
`target_concurrency_per_pod`	The number of concurrent requests that the autoscaler tries to send to each pod.	Gauge	`configuration_name`, `namespace_name`, `revision_name`, `service_name`	Integer (no units)
`stable_requests_per_second`	The average number of requests-per-second for each observed pod over the stable window.	Gauge	`configuration_name`, `namespace_name`, `revision_name`, `service_name`	Integer (no units)
`panic_requests_per_second`	The average number of requests-per-second for each observed pod over the panic window.	Gauge	`configuration_name`, `namespace_name`, `revision_name`, `service_name`	Integer (no units)
`target_requests_per_second`	The number of requests-per-second that the autoscaler targets for each pod.	Gauge	`configuration_name`, `namespace_name`, `revision_name`, `service_name`	Integer (no units)
`panic_mode`	This value is `1` if the autoscaler is in panic mode, or `0` if the autoscaler is not in panic mode.	Gauge	`configuration_name`, `namespace_name`, `revision_name`, `service_name`	Integer (no units)
`requested_pods`	The number of pods that the autoscaler has requested from the Kubernetes cluster.	Gauge	`configuration_name`, `namespace_name`, `revision_name`, `service_name`	Integer (no units)
`actual_pods`	The number of pods that are allocated and currently have a ready state.	Gauge	`configuration_name`, `namespace_name`, `revision_name`, `service_name`	Integer (no units)
`not_ready_pods`	The number of pods that have a not ready state.	Gauge	`configuration_name`, `namespace_name`, `revision_name`, `service_name`	Integer (no units)
`pending_pods`	The number of pods that are currently pending.	Gauge	`configuration_name`, `namespace_name`, `revision_name`, `service_name`	Integer (no units)
`terminating_pods`	The number of pods that are currently terminating.	Gauge	`configuration_name`, `namespace_name`, `revision_name`, `service_name`	Integer (no units)

desired_pods

The number of pods the autoscaler tries to allocate for a service.

Gauge