openshift-docs/serverless/develop/serverless-autoscaling-developer.adoc

:_content-type: ASSEMBLY
[id="serverless-autoscaling-developer"]
= Autoscaling
include::modules/common-attributes.adoc[]
include::modules/serverless-document-attributes.adoc[]
:context: serverless-autoscaling-developer

toc::[]

Knative Serving provides automatic scaling, or _autoscaling_, for applications to match incoming demand. For example, if an application is receiving no traffic, and scale-to-zero is enabled, Knative Serving scales the application down to zero replicas. If scale-to-zero is disabled, the application is scaled down to the xref:../../serverless/develop/serverless-autoscaling-developer.adoc#serverless-autoscaling-developer-minscale[minimum number of replicas specified for applications on the cluster]. Replicas can also be scaled up to meet demand if traffic to the application increases.

If Knative autoscaling is enabled for your cluster, you can configure concurrency and scale bounds for your application.

[NOTE]
====
Any limits or targets set in the revision template are measured against a single instance of your application. For example, setting the `target` annotation to `50` configures the autoscaler to scale the application so that each revision handles 50 requests at a time.
====

[id="serverless-autoscaling-developer-scale-bounds"]
== Scale bounds

Scale bounds determine the minimum and maximum numbers of replicas that can serve an application at any given time.

You can set scale bounds for an application to help prevent cold starts or control computing costs.

[id="serverless-autoscaling-developer-minscale"]
=== Minimum scale bounds

The minimum number of replicas that can serve an application is determined by the `minScale` annotation.

The `minScale` value defaults to `0` replicas if the following conditions are met:

* The `minScale` annotation is not set
* Scaling to zero is enabled
* The class `KPA` is used

If scale to zero is not enabled, the `minScale` value defaults to `1`.

// TODO: Document KPA if supported, link to docs about setting class

// TO DO:
// Add info / links about enabling and disabling autoscaling (admin docs)
// if `enable-scale-to-zero` is set to `false` in the `config-autoscaler` config map.

.Example service spec with `minScale` spec
[source,yaml]
----
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: example-service
  namespace: default
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/minScale: "0"
...
----

include::modules/serverless-autoscaling-minscale-kn.adoc[leveloffset=+3]

[id="serverless-autoscaling-developer-maxscale"]
=== Maximum scale bounds

The maximum number of replicas that can serve an application is determined by the `maxScale` annotation. If the `maxScale` annotation is not set, there is no upper limit for the number of replicas created.

.Example service spec with `maxScale` spec
[source,yaml]
----
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: example-service
  namespace: default
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/maxScale: "10"
...
----

include::modules/serverless-autoscaling-maxscale-kn.adoc[leveloffset=+3]

[id="serverless-autoscaling-developer-concurrency"]
== Concurrency

Concurrency determines the number of simultaneous requests that can be processed by each replica of an application at any given time.

include::modules/serverless-concurrency-limits.adoc[leveloffset=+2]
include::modules/serverless-concurrency-limits-configure-soft.adoc[leveloffset=+2]
include::modules/serverless-concurrency-limits-configure-hard.adoc[leveloffset=+2]
include::modules/serverless-target-utilization.adoc[leveloffset=+2]

[id="additional-resources_serverless-autoscaling-developer"]
== Additional resources

* Scale-to-zero can be enabled or disabled for the cluster by cluster administrators. For more information, see xref:../../serverless/admin_guide/serverless-admin-autoscaling.adoc#serverless-enable-scale-to-zero_serverless-admin-autoscaling[Enabling scale-to-zero].