Performance and scalability - <strong>service</strong> Mesh 2.x | <strong>service</strong> Mesh

Setting limits on compute resources
Load test results
- service Mesh Control plane performance
- Data plane performance

The default serviceMeshControlPlane settings are not intended for production use; they are designed to install successfully on a default OpenShift Container Platform installation, which is a resource-limited environment. After you have verified a successful SMCP installation, you should modify the settings defined within the SMCP to suit your environment.

Setting limits on compute resources

By default, spec.proxy has the settings cpu: 10m and memory: 128M. If you are using Pilot, spec.runtime.components.pilot has the same default values.

The settings in the following example are based on 1,000 services and 1,000 requests per second. You can change the values for cpu and memory in the serviceMeshControlPlane.

Procedure

In the OpenShift Container Platform web console, click Operators → Installed Operators.
Click the Project menu and select the project where you installed the service Mesh control plane, for example istio-system.
Click the Red Hat OpenShift service Mesh Operator. In the Istio service Mesh Control Plane column, click the name of your serviceMeshControlPlane, for example basic.

Add the name of your standalone Jaeger instance to the serviceMeshControlPlane.

Click the YAML tab.

Set the values for spec.proxy.runtime.container.resources.requests.cpu, spec.proxy.runtime.container.resources.requests.memory, components.kiali.container, and components.global.oauthproxy in your serviceMeshControlPlane resource.

Example version 2.6 serviceMeshControlPlane

apiVersion: maistra.io/v2
kind: serviceMeshControlPlane
metadata:
  name: basic
  namespace: istio-system
spec:
  version: v2.6
  proxy:
    runtime:
      container:
        resources:
          requests:
            cpu: 600m
            memory: 50Mi
          limits: {}
  runtime:
    components:
      pilot:
        container:
          resources:
            requests:
              cpu: 1000m
              memory: 1.6Gi
            limits: {}
      kiali:
        container:
          resources:
            limits:
              cpu: "90m"
              memory: "245Mi"
            requests:
              cpu: "30m"
              memory: "108Mi"
      global.oauthproxy:
        container:
          resources:
            requests:
              cpu: "101m"
              memory: "256Mi"
            limits:
              cpu: "201m"
              memory: "512Mi"

To set values for Red Hat OpenShift distributed tracing platform (Jaeger), see "Configuring and deploying the distributed tracing platform Jaeger".
Click Save.

Verification

Click Reload to verify that the serviceMeshControlPlane resource was configured correctly.

Additional resources

Configuring and deploying the distributed tracing platform Jaeger.

Load test results

The upstream Istio community load tests mesh consists of 1000 services and 2000 sidecars with 70,000 mesh-wide requests per second. Running the tests using Istio 1.12.3, generated the following results:

The Envoy proxy uses 0.35 vCPU and 40 MB memory per 1000 requests per second going through the proxy.
Istiod uses 1 vCPU and 1.5 GB of memory.
The Envoy proxy adds 2.65 ms to the 90th percentile latency.
The legacy istio-telemetry service (disabled by default in service Mesh 2.0) uses 0.6 vCPU per 1000 mesh-wide requests per second for deployments that use Mixer. The data plane components, the Envoy proxies, handle data flowing through the system. The service Mesh control plane component, Istiod, configures the data plane. The data plane and control plane have distinct performance concerns.

service Mesh Control plane performance

Istiod configures sidecar proxies based on user authored configuration files and the current state of the system. In a Kubernetes environment, Custom Resource Definitions (CRDs) and deployments constitute the configuration and state of the system. The Istio configuration objects like gateways and virtual services, provide the user-authored configuration. To produce the configuration for the proxies, Istiod processes the combined configuration and system state from the Kubernetes environment and the user-authored configuration.

The service Mesh control plane supports thousands of services, spread across thousands of pods with a similar number of user authored virtual services and other configuration objects. Istiod’s CPU and memory requirements scale with the number of configurations and possible system states. The CPU consumption scales with the following factors:

The rate of deployment changes.
The rate of configuration changes.
The number of proxies connecting to Istiod.

However this part is inherently horizontally scalable.

Data plane performance

Data plane performance depends on many factors, for example:

Number of client connections
Target request rate
Request size and response size
Number of proxy worker threads
Protocol
CPU cores
Number and types of proxy filters, specifically telemetry v2 related filters.

The latency, throughput, and the proxies' CPU and memory consumption are measured as a function of these factors.

CPU and memory consumption

Since the sidecar proxy performs additional work on the data path, it consumes CPU and memory. As of Istio 1.12.3, a proxy consumes about 0.5 vCPU per 1000 requests per second.

The memory consumption of the proxy depends on the total configuration state the proxy holds. A large number of listeners, clusters, and routes can increase memory usage.

Since the proxy normally does not buffer the data passing through, request rate does not affect the memory consumption.

Additional latency

Since Istio injects a sidecar proxy on the data path, latency is an important consideration. Istio adds an authentication filter, a telemetry filter, and a metadata exchange filter to the proxy. Every additional filter adds to the path length inside the proxy and affects latency.

The Envoy proxy collects raw telemetry data after a response is sent to the client. The time spent collecting raw telemetry for a request does not contribute to the total time taken to complete that request. However, because the worker is busy handling the request, the worker does not start handling the next request immediately. This process adds to the queue wait time of the next request and affects average and tail latencies. The actual tail latency depends on the traffic pattern.

Inside the mesh, a request traverses the client-side proxy and then the server-side proxy. In the default configuration of Istio 1.12.3 (that is, Istio with telemetry v2), the two proxies add about 1.7 ms and 2.7 ms to the 90th and 99th percentile latency, respectively, over the baseline data plane latency.