openshift-docs/serverless/develop/serverless-autoscaling-developer.adoc

:_content-type: ASSEMBLY
[id="serverless-autoscaling-developer"]
= Autoscaling
include::modules/common-attributes.adoc[]
include::modules/serverless-document-attributes.adoc[]
:context: serverless-autoscaling-developer

toc::[]

Knative Serving provides automatic scaling, or _autoscaling_, for applications to match incoming demand. For example, if an application is receiving no traffic, and scale-to-zero is enabled, Knative Serving scales the application down to zero replicas. If scale-to-zero is disabled, the application is scaled down to the minimum number of replicas configured for applications on the cluster. Replicas can also be scaled up to meet demand if traffic to the application increases.

ifdef::openshift-enterprise[]
Autoscaling settings for Knative services can be global settings that are configured by cluster administrators, or per-revision settings that are configured for individual services.
endif::[]

ifdef::openshift-dedicated[]
Autoscaling settings for Knative services can be global settings that are configured by cluster or dedicated administrators, or per-revision settings that are configured for individual services.
endif::[]

You can modify per-revision settings for your services by using the {product-title} web console, by modifying the YAML file for your service, or by using the `kn` CLI.

[NOTE]
====
Any limits or targets that you set for a service are measured against a single instance of your application. For example, setting the `target` annotation to `50` configures the autoscaler to scale the application so that each revision handles 50 requests at a time.
====

[id="serverless-autoscaling-developer-scale-bounds"]
== Scale bounds

Scale bounds determine the minimum and maximum numbers of replicas that can serve an application at any given time.

You can set scale bounds for an application to help prevent cold starts or control computing costs.

// minscale docs
include::modules/serverless-autoscaling-developer-minscale.adoc[leveloffset=+2]
include::modules/serverless-autoscaling-minscale-kn.adoc[leveloffset=+3]

// maxscale docs
include::modules/serverless-autoscaling-developer-maxscale.adoc[leveloffset=+2]
include::modules/serverless-autoscaling-maxscale-kn.adoc[leveloffset=+3]

[id="serverless-autoscaling-developer-concurrency"]
== Concurrency

Concurrency determines the number of simultaneous requests that can be processed by each replica of an application at any given time.

include::modules/serverless-concurrency-limits.adoc[leveloffset=+2]
include::modules/serverless-concurrency-limits-configure-soft.adoc[leveloffset=+2]
include::modules/serverless-concurrency-limits-configure-hard.adoc[leveloffset=+2]
include::modules/serverless-target-utilization.adoc[leveloffset=+2]

[id="additional-resources_serverless-autoscaling-developer"]
[role="_additional-resources"]
== Additional resources

ifdef::openshift-enterprise[]
* Scale-to-zero can be enabled or disabled for the cluster by cluster administrators. For more information, see xref:../../serverless/admin_guide/serverless-admin-autoscaling.adoc#serverless-enable-scale-to-zero_serverless-admin-autoscaling[Enabling scale-to-zero].
endif::[]

ifdef::openshift-dedicated[]
* Scale-to-zero can be enabled or disabled for the cluster by cluster or dedicated administrators. For more information, see xref:../../serverless/admin_guide/serverless-admin-autoscaling.adoc#serverless-enable-scale-to-zero_serverless-admin-autoscaling[Enabling scale-to-zero].
endif::[]