openshift-docs/modules/serverless-target-utilization.adoc

// Module included in the following assemblies:
//
// * /serverless/autoscaling/serverless-autoscaling-concurrency.adoc

[id="serverless-target-utilization_{context}"]
= Concurrency target utilization

This value specifies the percentage of the concurrency limit that is actually targeted by the autoscaler. This is also known as specifying the _hotness_ at which a replica runs, which enables the autoscaler to scale up before the defined hard limit is reached.

For example, if the `containerConcurrency` annotation value is set to 10, and the `targetUtilizationPercentage` value is set to 70 percent, the autoscaler creates a new replica when the average number of concurrent requests across all existing replicas reaches 7. Requests numbered 7 to 10 are still sent to the existing replicas, but additional replicas are started in anticipation of being required after the `containerConcurrency` annotation limit is reached.

.Example service configured using the targetUtilizationPercentage annotation
[source,yaml]
----
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: example-service
  namespace: default
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/targetUtilizationPercentage: "70"
...
----