openshift-docs/modules/nodes-cluster-worker-latency-profiles-about.adoc

// Module included in the following assemblies:
//
// scalability_and_performance/scaling-worker-latency-profiles.adoc

:_mod-docs-content-type: CONCEPT
[id="nodes-cluster-worker-latency-profiles-about_{context}"]
= Understanding worker latency profiles

[role="_abstract"]
Review the following information to learn about worker latency profiles, which allow you to control the reaction of the cluster to latency issues without needing to determine the best values by using manual methods.

Worker latency profiles are four different categories of carefully-tuned parameters. The four parameters which implement these values are `node-status-update-frequency`, `node-monitor-grace-period`, `default-not-ready-toleration-seconds` and `default-unreachable-toleration-seconds`.

[IMPORTANT]
====
Setting these parameters manually is not supported. Incorrect parameter settings adversely affect cluster stability.
====

All worker latency profiles configure the following parameters:

node-status-update-frequency:: Specifies how often the kubelet posts node status to the API server.
node-monitor-grace-period::  Specifies the amount of time in seconds that the Kubernetes Controller Manager waits for an update from a kubelet before marking the node unhealthy and adding the `node.kubernetes.io/not-ready` or `node.kubernetes.io/unreachable` taint to the node.
default-not-ready-toleration-seconds:: Specifies the amount of time in seconds after marking a node unhealthy that the Kube API Server Operator waits before evicting pods from that node.
default-unreachable-toleration-seconds:: Specifies the amount of time in seconds after marking a node unreachable that the Kube API Server Operator waits before evicting pods from that node.

The following Operators monitor the changes to the worker latency profiles and respond accordingly:

* The Machine Config Operator (MCO) updates the `node-status-update-frequency` parameter on the worker nodes.
* The Kubernetes Controller Manager updates the `node-monitor-grace-period` parameter on the control plane nodes.
* The Kubernetes API Server Operator updates the `default-not-ready-toleration-seconds` and `default-unreachable-toleration-seconds` parameters on the control plane nodes.

ifndef::openshift-rosa,openshift-dedicated[]
Although the default configuration works in most cases, {product-title} offers two other worker latency profiles for situations where the network is experiencing higher latency than usual. The three worker latency profiles are described in the following sections:
endif::openshift-rosa,openshift-dedicated[]
ifdef::openshift-rosa,openshift-dedicated[]
Although the default configuration works in most cases, {product-title} offers a second worker latency profile for situations where the network is experiencing higher latency than usual. The two worker latency profiles are described in the following sections:
endif::openshift-rosa,openshift-dedicated[]

Default worker latency profile:: With the `Default` profile, each `Kubelet` updates its status every 10 seconds (`node-status-update-frequency`). The `Kube Controller Manager` checks the statuses of `Kubelet` every 5 seconds.
+
The Kubernetes Controller Manager waits 40 seconds (`node-monitor-grace-period`) for a status update from `Kubelet` before considering the `Kubelet` unhealthy. If no status is made available to the Kubernetes Controller Manager, it then marks the node with the `node.kubernetes.io/not-ready` or `node.kubernetes.io/unreachable` taint and evicts the pods on that node.
+
If a pod is on a node that has the `NoExecute` taint, the pod runs according to `tolerationSeconds`. If the node has no taint, it will be evicted in 300 seconds (`default-not-ready-toleration-seconds` and `default-unreachable-toleration-seconds` settings of the `Kube API Server`).
+
[cols="2,1,2,1"]
|===
| Profile | Component | Parameter | Value

.4+| Default
| kubelet
| `node-status-update-frequency`
| 10s

| Kubelet Controller Manager
| `node-monitor-grace-period`
| 40s

| Kubernetes API Server Operator
| `default-not-ready-toleration-seconds`
| 300s

| Kubernetes API Server Operator
| `default-unreachable-toleration-seconds`
| 300s

|===

Medium worker latency profile:: Use the `MediumUpdateAverageReaction` profile if the network latency is slightly higher than usual.
+
The `MediumUpdateAverageReaction` profile reduces the frequency of kubelet updates to 20 seconds and changes the period that the Kubernetes Controller Manager waits for those updates to 2 minutes. The pod eviction period for a pod on that node is reduced to 60 seconds. If the pod has the `tolerationSeconds` parameter, the eviction waits for the period specified by that parameter.
+
The Kubernetes Controller Manager waits for 2 minutes to consider a node unhealthy. In another minute, the eviction process starts.
+
[cols="2,1,2,1"]
|===
| Profile | Component | Parameter | Value

.4+| MediumUpdateAverageReaction
| kubelet
| `node-status-update-frequency`
| 20s

| Kubelet Controller Manager
| `node-monitor-grace-period`
| 2m

| Kubernetes API Server Operator
| `default-not-ready-toleration-seconds`
| 60s

| Kubernetes API Server Operator
| `default-unreachable-toleration-seconds`
| 60s

|===

ifndef::openshift-rosa,openshift-dedicated[]

Low worker latency profile:: Use the `LowUpdateSlowReaction` profile if the network latency is extremely high.
+
The `LowUpdateSlowReaction` profile reduces the frequency of kubelet updates to 1 minute and changes the period that the Kubernetes Controller Manager waits for those updates to 5 minutes. The pod eviction period for a pod on that node is reduced to 60 seconds. If the pod has the `tolerationSeconds` parameter, the eviction waits for the period specified by that parameter.
+
The Kubernetes Controller Manager waits for 5 minutes to consider a node unhealthy. In another minute, the eviction process starts.
+
[cols="2,1,2,1"]
|===
| Profile | Component | Parameter | Value

.4+| LowUpdateSlowReaction
| kubelet
| `node-status-update-frequency`
| 1m

| Kubelet Controller Manager
| `node-monitor-grace-period`
| 5m

| Kubernetes API Server Operator
| `default-not-ready-toleration-seconds`
| 60s

| Kubernetes API Server Operator
| `default-unreachable-toleration-seconds`
| 60s

|===
endif::openshift-rosa,openshift-dedicated[]

[NOTE]
====
The latency profiles do not support custom machine config pools, only the default worker machine config pools.
====