From 460bf21e706bfa2d55282108ce67cd0525cd5eae Mon Sep 17 00:00:00 2001 From: Mike McKiernan Date: Wed, 10 Feb 2021 12:53:52 -0500 Subject: [PATCH] fix: remove kube-reserved Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1853249. * Expanded scope: Remove dangling comma in cpu-manager so that rougify doesn't highlight the boo-boo in pink. * Review by Michael. * Remove para that mentions kubelet command-line args. --- modules/nodes-nodes-managing-about.adoc | 7 +- ...des-nodes-resources-configuring-about.adoc | 65 ++++--------------- ...s-nodes-resources-configuring-setting.adoc | 10 +-- modules/setting-up-cpu-manager.adoc | 4 +- .../nodes-nodes-resources-configuring.adoc | 13 +--- 5 files changed, 24 insertions(+), 75 deletions(-) diff --git a/modules/nodes-nodes-managing-about.adoc b/modules/nodes-nodes-managing-about.adoc index a8d81b23da..09704210f0 100644 --- a/modules/nodes-nodes-managing-about.adoc +++ b/modules/nodes-nodes-managing-about.adoc @@ -58,11 +58,8 @@ spec: podsPerCore: 10 maxPods: 250 systemReserved: - cpu: 1000m - memory: 500Mi - kubeReserved: - cpu: 1000m - memory: 500Mi + cpu: 2000m + memory: 1Gi ---- <1> Assign a name to CR. <2> Specify the label to apply the configuration change, this is the label you added to the machine config pool. diff --git a/modules/nodes-nodes-resources-configuring-about.adoc b/modules/nodes-nodes-resources-configuring-about.adoc index fd74ac1363..f77f42ae4a 100644 --- a/modules/nodes-nodes-resources-configuring-about.adoc +++ b/modules/nodes-nodes-resources-configuring-about.adoc @@ -9,14 +9,13 @@ CPU and memory resources reserved for node components in {product-title} are bas [options="header",cols="1,2"] |=== - |Setting |Description |`kube-reserved` -| Resources reserved for node components. Default is none. +| This setting is not used with {product-title}. Add the CPU and memory resources that you planned to reserve to the `system-reserved` setting. |`system-reserved` -| Resources reserved for the remaining system components. Default settings depend on the {product-title} and Machine Config Operator versions. Confirm the default `systemReserved` parameter on the `machine-config-operator` repository. +| This setting identifies the resources to reserve for the node components and system components. The default settings depend on the {product-title} and Machine Config Operator versions. Confirm the default `systemReserved` parameter on the `machine-config-operator` repository. |=== If a flag is not set, the defaults are used. If none of the flags are set, the @@ -29,65 +28,33 @@ introduction of allocatable resources. An allocated amount of a resource is computed based on the following formula: ---- -[Allocatable] = [Node Capacity] - [kube-reserved] - [system-reserved] - [Hard-Eviction-Thresholds] +[Allocatable] = [Node Capacity] - [system-reserved] - [Hard-Eviction-Thresholds] ---- [NOTE] ==== -The withholding of `Hard-Eviction-Thresholds` from allocatable is a change in behavior to improve -system reliability now that allocatable is enforced for end-user pods at the node level. -The `experimental-allocatable-ignore-eviction` setting is available to preserve legacy behavior, -but it will be deprecated in a future release. +The withholding of `Hard-Eviction-Thresholds` from `Allocatable` improves system reliability because the value for `Allocatable` is enforced for pods at the node level. ==== -If `[Allocatable]` is negative, it is set to *0*. +If `Allocatable` is negative, it is set to `0`. -Each node reports system resources utilized by the container runtime and kubelet. -To better aid your ability to configure `--system-reserved` and `--kube-reserved`, -you can introspect corresponding node's resource usage using the node summary API, -which is accessible at `/api/v1/nodes//proxy/stats/summary`. +Each node reports the system resources that are used by the container runtime and kubelet. To simplify configuring the `system-reserved` parameter, view the resource use for the node by using the node summary API. The node summary is available at `/api/v1/nodes//proxy/stats/summary`. [id="allocate-node-enforcement_{context}"] == How nodes enforce resource constraints -The node is able to limit the total amount of resources that pods -may consume based on the configured allocatable value. This feature significantly -improves the reliability of the node by preventing pods from starving -system services (for example: container runtime, node agent, etc.) for resources. -It is strongly encouraged that administrators reserve -resources based on the desired node utilization target -in order to improve node reliability. +The node is able to limit the total amount of resources that pods can consume based on the configured allocatable value. This feature significantly improves the reliability of the node by preventing pods from using CPU and memory resources that are needed by system services such as the container runtime and node agent. To improve node reliability, administrators should reserve resources based on a target for resource use. -The node enforces resource constraints using a new *cgroup* hierarchy -that enforces quality of service. All pods are launched in a -dedicated cgroup hierarchy separate from system daemons. +The node enforces resource constraints by using a new cgroup hierarchy that enforces quality of service. All pods are launched in a dedicated cgroup hierarchy that is separate from system daemons. -Optionally, the node can be made to enforce kube-reserved and system-reserved by -specifying those tokens in the enforce-node-allocatable flag. If specified, the -corresponding `--kube-reserved-cgroup` or `--system-reserved-cgroup` needs to be provided. -In future releases, the node and container runtime will be packaged in a common cgroup -separate from `system.slice`. Until that time, we do not recommend users -change the default value of enforce-node-allocatable flag. +Administrators should treat system daemons similar to pods that have a guaranteed quality of service. System daemons can burst within their bounding control groups and this behavior must be managed as part of cluster deployments. Reserve CPU and memory resources for system daemons by specifying the amount of CPU and memory resources in `system-reserved`. -Administrators should treat system daemons similar to Guaranteed pods. System daemons -can burst within their bounding control groups and this behavior needs to be managed -as part of cluster deployments. Enforcing system-reserved limits -can lead to critical system services being CPU starved or OOM killed on the node. The -recommendation is to enforce system-reserved only if operators have profiled their nodes -exhaustively to determine precise estimates and are confident in their ability to -recover if any process in that group is OOM killed. - -As a result, we strongly recommended that users only enforce node allocatable for -`pods` by default, and set aside appropriate reservations for system daemons to maintain -overall node reliability. +Enforcing `system-reserved` limits can prevent critical system services from receiving CPU and memory resources. As a result, a critical system service can be ended by the out-of-memory killer. The recommendation is to enforce `system-reserved` only if you have profiled the nodes exhaustively to determine precise estimates and you are confident that critical system services can recover if any process in that group is ended by the out-of-memory killer. [id="allocate-eviction-thresholds_{context}"] == Understanding Eviction Thresholds -If a node is under memory pressure, it can impact the entire node and all pods running on -it. If a system daemon is using more than its reserved amount of memory, an OOM -event may occur that can impact the entire node and all pods running on it. To avoid -(or reduce the probability of) system OOMs the node provides out-of-resource handling. +If a node is under memory pressure, it can impact the entire node and all pods running on the node. For example, a system daemon that uses more than its reserved amount of memory can trigger an out-of-memory event. To avoid or reduce the probability of system out-of-memory events, the node provides out-of-resource handling. You can reserve some memory using the `--eviction-hard` flag. The node attempts to evict pods whenever memory availability on the node drops below the absolute value or percentage. @@ -98,16 +65,12 @@ before reaching out of memory conditions are not available for pods. The following is an example to illustrate the impact of node allocatable for memory: * Node capacity is `32Gi` -* --kube-reserved is `2Gi` -* --system-reserved is `1Gi` +* --system-reserved is `3Gi` * --eviction-hard is set to `100Mi`. -For this node, the effective node allocatable value is `28.9Gi`. If the node -and system components use up all their reservation, the memory available for pods is `28.9Gi`, -and kubelet will evict pods when it exceeds this usage. +For this node, the effective node allocatable value is `28.9Gi`. If the node and system components use all their reservation, the memory available for pods is `28.9Gi`, and kubelet evicts pods when it exceeds this threshold. -If you enforce node allocatable (`28.9Gi`) via top level cgroups, then pods can never exceed `28.9Gi`. -Evictions would not be performed unless system daemons are consuming more than `3.1Gi` of memory. +If you enforce node allocatable, `28.9Gi`, with top-level cgroups, then pods can never exceed `28.9Gi`. Evictions are not performed unless system daemons consume more than `3.1Gi` of memory. If system daemons do not use up all their reservation, with the above example, pods would face memcg OOM kills from their bounding cgroup before node evictions kick in. diff --git a/modules/nodes-nodes-resources-configuring-setting.adoc b/modules/nodes-nodes-resources-configuring-setting.adoc index 35b957c9ae..d4ab1fc551 100644 --- a/modules/nodes-nodes-resources-configuring-setting.adoc +++ b/modules/nodes-nodes-resources-configuring-setting.adoc @@ -12,8 +12,7 @@ As an administrator, you can set these using a custom resource (CR) through a se .Prerequisites -. To help you determine setting for `--system-reserved` and `--kube-reserved` you can introspect the corresponding node's resource usage -using the node summary API, which is accessible at `/api/v1/nodes//proxy/stats/summary`. Enter the following command for your node: +. To help you determine values for the `system-reserved` setting, you can introspect the resource use for a node by using the node summary API. Enter the following command for your node: + [source,terminal] ---- @@ -117,11 +116,8 @@ spec: custom-kubelet: small-pods <2> kubeletConfig: systemReserved: - cpu: 500m - memory: 512Mi - kubeReserved: - cpu: 500m - memory: 512Mi + cpu: 1000m + memory: 1Gi ---- <1> Assign a name to CR. <2> Specify the label from the Machine Config Pool. diff --git a/modules/setting-up-cpu-manager.adoc b/modules/setting-up-cpu-manager.adoc index bcba88b491..9ee0a41c48 100644 --- a/modules/setting-up-cpu-manager.adoc +++ b/modules/setting-up-cpu-manager.adoc @@ -80,7 +80,7 @@ This adds the CPU Manager feature to the kubelet config and, if needed, the Mach "name": "cpumanager-enabled", "uid": "7ed5616d-6b72-11e9-aae1-021e1ce18878" } - ], + ] ---- . Check the worker for the updated `kubelet.conf`: @@ -241,7 +241,7 @@ Allocated resources: cpu 1440m (96%) 1 (66%) ---- + -This VM has two CPU cores. You set `kube-reserved` to 500 millicores, meaning half of one core is subtracted from the total capacity of the node to arrive at the `Node Allocatable` amount. You can see that `Allocatable CPU` is 1500 millicores. This means you can run one of the CPU Manager pods since each will take one whole core. A whole core is equivalent to 1000 millicores. If you try to schedule a second pod, the system will accept the pod, but it will never be scheduled: +This VM has two CPU cores. The `system-reserved` setting reserves 500 millicores, meaning that half of one core is subtracted from the total capacity of the node to arrive at the `Node Allocatable` amount. You can see that `Allocatable CPU` is 1500 millicores. This means you can run one of the CPU Manager pods since each will take one whole core. A whole core is equivalent to 1000 millicores. If you try to schedule a second pod, the system will accept the pod, but it will never be scheduled: + [source, terminal] ---- diff --git a/nodes/nodes/nodes-nodes-resources-configuring.adoc b/nodes/nodes/nodes-nodes-resources-configuring.adoc index 382cef8663..72b950a659 100644 --- a/nodes/nodes/nodes-nodes-resources-configuring.adoc +++ b/nodes/nodes/nodes-nodes-resources-configuring.adoc @@ -1,18 +1,11 @@ - -:context: nodes-nodes-resources-configuring [id="nodes-nodes-resources-configuring"] = Allocating resources for nodes in an {product-title} cluster include::modules/common-attributes.adoc[] +:context: nodes-nodes-resources-configuring toc::[] - -To provide more reliable scheduling and minimize node resource overcommitment, -each node can reserve a portion of its resources for use by all underlying node -components (such as kubelet, kube-proxy) and the remaining system -components (such as *sshd*, *NetworkManager*) on the host. Once specified, the -scheduler has more information about the resources (e.g., memory, CPU) a node -has allocated for pods. +To provide more reliable scheduling and minimize node resource overcommitment, reserve a portion of the CPU and memory resources for use by the underlying node components, such as `kubelet` and `kube-proxy`, and the remaining system components, such as `sshd` and `NetworkManager`. By specifying the resources to reserve, you provide the scheduler with more information about the remaining CPU and memory resources that a node has available for use by pods. // The following include statements pull in the module files that comprise // the assembly. Include any combination of concept, procedure, or reference @@ -27,7 +20,7 @@ include::modules/nodes-nodes-resources-configuring-setting.adoc[leveloffset=+1] == Additional resources The ephemeral storage management feature is disabled by default. To enable this -feature, +feature, See /install_config/configuring_ephemeral.adoc#install-config-configuring-ephemeral-storage[configuring for ephemeral storage].