apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: myapp
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/target: 50
spec:
containers:
- image: myimage
OpenShift Serverless provides capabilities for automatic pod scaling, including scaling inactive pods to zero, by enabling the Knative Serving autoscaling system in an OpenShift Container Platform cluster.
To enable autoscaling for Knative Serving, you must configure concurrency and scale bounds in the revision template.
Any limits or targets set in the revision template are measured against a single instance of your application. For example, setting the target annotation to 50 will configure the autoscaler to scale the application so that each instance of it will handle 50 requests at a time.
|
You can specify the number of concurrent requests that should be handled by each instance of an application (revision container) by adding the target
annotation or the containerConcurrency
spec in the revision template.
target
annotation used in a revision templateapiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: myapp
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/target: 50
spec:
containers:
- image: myimage
containerConcurrency
spec used in a revision templateapiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: myapp
spec:
template:
metadata:
annotations:
spec:
containerConcurrency: 100
containers:
- image: myimage
Adding a value for both target
and containerConcurrency
will target the target
number of concurrent requests, but impose a hard limit of the containerConcurrency
number of requests.
For example, if the target
value is 50 and the containerConcurrency
value is 100, the targeted number of requests will be 50, but the hard limit will be 100.
If the containerConcurrency
value is less than the target
value, the target
value will be tuned down, since there is no need to target more requests than the number that can actually be handled.
|
The default target for the number of concurrent requests is 100
, but you can override this value by adding or modifying the autoscaling.knative.dev/target
annotation value in the revision template.
Here is an example of how this annotation is used in the revision template to set the target to 50
.
autoscaling.knative.dev/target: 50
containerConcurrency
sets a hard limit on the number of concurrent requests handled.
containerConcurrency: 0 | 1 | 2-N
allows unlimited concurrent requests.
guarantees that only one request is handled at a time by a given instance of the revision container.
will limit request concurrency to that value.
If there is no |
The minScale
and maxScale
annotations can be used to configure the minimum and maximum number of pods that can serve applications.
These annotations can be used to prevent cold starts or to help control computing costs.
If the minScale
annotation is not set, pods will scale to zero (or to 1 if enable-scale-to-zero is false per the configmap
).
If the maxScale
annotation is not set, there will be no upper limit for the number of pods created.
The minScale
and maxScale
annotations can be configured as follows in the revision template:
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/minScale: "2"
autoscaling.knative.dev/maxScale: "10"
Using these annotations in the revision template will propagate this configuration to PodAutoscaler
objects.
These annotations apply for the full lifetime of a revision. Even when a revision is not referenced by any route, the minimal pod count specified by the |