1
0
mirror of https://github.com/openshift/openshift-docs.git synced 2026-02-05 21:46:22 +01:00

CNV#34781: removing TP from wasp-agent doc

This commit is contained in:
Pan Ousley
2024-10-14 13:42:47 -04:00
parent aafe7e8105
commit 8cbafccd96
3 changed files with 260 additions and 191 deletions

View File

@@ -4,10 +4,9 @@
:_mod-docs-content-type: PROCEDURE
[id="virt-using-wasp-agent-to-configure-higher-vm-workload-density_{context}"]
= Using `wasp-agent` to configure higher VM workload density
= Using wasp-agent to increase VM workload density
The `wasp-agent` component enables an {product-title} cluster to assign swap resources to virtual machine (VM) workloads.
Swap usage is only supported on worker nodes.
The `wasp-agent` component facilitates memory overcommitment by assigning swap resources to worker nodes. It also manages pod evictions when nodes are at risk due to high swap I/O traffic or high utilization.
[IMPORTANT]
====
@@ -18,14 +17,99 @@ For descriptions of QoS classes, see link:https://kubernetes.io/docs/tasks/confi
.Prerequisites
* The `oc` tool is available.
* You are logged into the cluster with the cluster-admin role.
* A memory over-commit ratio is defined.
* You have installed the OpenShift CLI (`oc`).
* You are logged into the cluster with the `cluster-admin` role.
* A memory overcommit ratio is defined.
* The node belongs to a worker pool.
[NOTE]
====
The `wasp-agent` component deploys an Open Container Initiative (OCI) hook to enable swap usage for containers on the node level. The low-level nature requires the `DaemonSet` object to be privileged.
====
.Procedure
. Create a privileged service account by entering the following commands:
. Configure the `kubelet` service to permit swap usage:
.. Create or edit a `KubeletConfig` file with the parameters shown in the following example:
+
.Example of a `KubeletConfig` file
[source,yaml]
----
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
name: custom-config
spec:
machineConfigPoolSelector:
matchLabels:
pools.operator.machineconfiguration.openshift.io/worker: '' # MCP
#machine.openshift.io/cluster-api-machine-role: worker # machine
#node-role.kubernetes.io/worker: '' # node
kubeletConfig:
failSwapOn: false
----
.. Wait for the worker nodes to sync with the new configuration by running the following command:
+
[source,yaml]
----
$ oc wait mcp worker --for condition=Updated=True --timeout=-1s
----
. Provision swap by creating a `MachineConfig` object. For example:
+
[source,yaml]
----
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: 90-worker-swap
spec:
config:
ignition:
version: 3.4.0
systemd:
units:
- contents: |
[Unit]
Description=Provision and enable swap
ConditionFirstBoot=no
[Service]
Type=oneshot
Environment=SWAP_SIZE_MB=5000
ExecStart=/bin/sh -c "sudo dd if=/dev/zero of=/var/tmp/swapfile count=${SWAP_SIZE_MB} bs=1M && \
sudo chmod 600 /var/tmp/swapfile && \
sudo mkswap /var/tmp/swapfile && \
sudo swapon /var/tmp/swapfile && \
free -h && \
sudo systemctl set-property --runtime system.slice MemorySwapMax=0 IODeviceLatencyTargetSec=\"/ 50ms\""
[Install]
RequiredBy=kubelet-dependencies.target
enabled: true
name: swap-provision.service
----
+
To have enough swap space for the worst-case scenario, make sure to have at least as much swap space provisioned as overcommitted RAM. Calculate the amount of swap space to be provisioned on a node by using the following formula:
+
[source,terminal]
----
NODE_SWAP_SPACE = NODE_RAM * (MEMORY_OVER_COMMIT_PERCENT / 100% - 1)
----
+
.Example
[source,terminal]
----
NODE_SWAP_SPACE = 16 GB * (150% / 100% - 1)
= 16 GB * (1.5 - 1)
= 16 GB * (0.5)
= 8 GB
----
. Create a privileged service account by running the following commands:
+
[source,terminal]
----
@@ -46,13 +130,27 @@ $ oc create clusterrolebinding wasp --clusterrole=cluster-admin --serviceaccount
----
$ oc adm policy add-scc-to-user -n wasp privileged -z wasp
----
. Wait for the worker nodes to sync with the new configuration by running the following command:
+
[NOTE]
====
The `wasp-agent` component deploys an OCI hook to enable swap usage for containers on the node level. The low-level nature requires the `DaemonSet` object to be privileged.
====
[source,yaml]
----
$ oc wait mcp worker --for condition=Updated=True --timeout=-1s
----
. Determine the pull URL for the wasp agent image by running the following commands:
+
. Deploy `wasp-agent` by creating a `DaemonSet` object as follows:
[source,terminal]
----
$ OCP_VERSION=$(oc get clusterversion | awk 'NR==2' |cut -d' ' -f4 | cut -d'-' -f1)
----
+
[source,terminal]
----
$ oc get csv kubevirt-hyperconverged-operator.v${OCP_VERSION} -nopenshift-cnv -ojson | jq '.spec.relatedImages[] | select(.name|test(".*wasp-agent.*")) | .image'
----
. Deploy `wasp-agent` by creating a `DaemonSet` object as shown in the following example:
+
[source,yaml]
----
@@ -74,20 +172,30 @@ spec:
description: >-
Configures swap for workloads
labels:
name: wasp
name: wasp
spec:
serviceAccountName: wasp
hostPID: true
hostUsers: true
terminationGracePeriodSeconds: 5
containers:
- name: wasp-agent
- env:
- name: SWAP_UTILIZATION_THRESHOLD_FACTOR
value: 0.8
- name: MAX_AVERAGE_SWAP_IN_PAGES_PER_SECOND
value: "1000"
- name: MAX_AVERAGE_SWAP_OUT_PAGES_PER_SECOND
value: "1000"
- name: AVERAGE_WINDOW_SIZE_SECONDS
value: "30"
- name: VERBOSITY
value: "1"
- name: FSROOT
value: /host
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
image: >-
registry.redhat.io/container-native-virtualization/wasp-agent-rhel9:v4.17
quay.io/openshift-virtualization/wasp-agent:v4.17 <1>
imagePullPolicy: Always
env:
- name: "FSROOT"
value: "/host"
name: wasp-agent
resources:
requests:
cpu: 100m
@@ -95,175 +203,87 @@ spec:
securityContext:
privileged: true
volumeMounts:
- name: host
mountPath: "/host"
volumes:
- name: host
hostPath:
path: "/"
- mountPath: /host
name: host
- mountPath: /rootfs
name: rootfs
hostPID: true
hostUsers: true
priorityClassName: system-node-critical
serviceAccountName: wasp
terminationGracePeriodSeconds: 5
volumes:
- hostPath:
path: /
name: host
- hostPath:
path: /
name: rootfs
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 10%
maxSurge: 0
status: {}
----
. Configure the `kubelet` service to permit swap:
.. Create a `KubeletConfiguration` file as shown in the example:
+
.Example of a `KubeletConfiguration` file
[source,yaml]
----
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
name: custom-config
spec:
machineConfigPoolSelector:
matchLabels:
pools.operator.machineconfiguration.openshift.io/worker: '' # MCP
#machine.openshift.io/cluster-api-machine-role: worker # machine
#node-role.kubernetes.io/worker: '' # node
kubeletConfig:
failSwapOn: false
evictionSoft:
memory.available: "1Gi"
evictionSoftGracePeriod:
memory.available: "10s"
----
+
If the cluster is already using an existing `KubeletConfiguration` file, add the following to the `spec` section:
<1> Replace the `image` value with the image URL from the previous step.
. Deploy alerting rules by creating a `PrometheusRule` object. For example:
+
[source,yaml]
----
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
name: custom-config
# ...
spec
# ...
kubeletConfig:
evictionSoft:
memory.available: 1Gi
evictionSoftGracePeriod:
memory.available: 1m30s
failSwapOn: false
----
.. Run the following command:
+
[source,yaml]
----
$ oc wait mcp worker --for condition=Updated=True
----
. Create a `MachineConfig` object to provision swap as follows:
+
[source,yaml]
----
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: 90-worker-swap
spec:
config:
ignition:
version: 3.4.0
systemd:
units:
- contents: |
[Unit]
Description=Provision and enable swap
ConditionFirstBoot=no
[Service]
Type=oneshot
Environment=SWAP_SIZE_MB=5000
ExecStart=/bin/sh -c "sudo dd if=/dev/zero of=/var/tmp/swapfile count=${SWAP_SIZE_MB} bs=1M && \
sudo chmod 600 /var/tmp/swapfile && \
sudo mkswap /var/tmp/swapfile && \
sudo swapon /var/tmp/swapfile && \
free -h && \
sudo systemctl set-property --runtime system.slice MemorySwapMax=0 IODeviceLatencyTargetSec=\"/ 50ms\""
[Install]
RequiredBy=kubelet-dependencies.target
enabled: true
name: swap-provision.service
----
+
To have enough swap space for the worst-case scenario, make sure to have at least as much swap space provisioned as overcommitted RAM. Calculate the amount of swap space to be provisioned on a node using the following formula:
+
[source,terminal]
----
NODE_SWAP_SPACE = NODE_RAM * (MEMORY_OVER_COMMIT_PERCENT / 100% - 1)
----
+
Example:
+
[source,terminal]
----
NODE_SWAP_SPACE = 16 GB * (150% / 100% - 1)
= 16 GB * (1.5 - 1)
= 16 GB * (0.5)
= 8 GB
----
+
. Deploy alerting rules as follows:
+
[source,yaml]
----
apiVersion: monitoring.openshift.io/v1
kind: AlertingRule
metadata:
name: wasp-alerts
namespace: openshift-monitoring
tier: node
wasp.io: ""
name: wasp-rules
namespace: wasp
spec:
groups:
- name: wasp.rules
rules:
- alert: NodeSwapping
annotations:
description: Node {{ $labels.instance }} is swapping at a rate of {{ printf "%.2f" $value }} MB/s
runbook_url: https://github.com/openshift-virtualization/wasp-agent/tree/main/runbooks/alerts/NodeSwapping.md
summary: A node is swapping memory pages
expr: |
# In MB/s
irate(node_memory_SwapFree_bytes{job="node-exporter"}[5m]) / 1024^2 > 0
for: 1m
labels:
severity: critical
- name: alerts.rules
rules:
- alert: NodeHighSwapActivity
annotations:
description: High swap activity detected at {{ $labels.instance }}. The rate
of swap out and swap in exceeds 200 in both operations in the last minute.
This could indicate memory pressure and may affect system performance.
runbook_url: https://github.com/openshift-virtualization/wasp-agent/tree/main/docs/runbooks/NodeHighSwapActivity.md
summary: High swap activity detected at {{ $labels.instance }}.
expr: rate(node_vmstat_pswpout[1m]) > 200 and rate(node_vmstat_pswpin[1m]) >
200
for: 1m
labels:
kubernetes_operator_component: kubevirt
kubernetes_operator_part_of: kubevirt
operator_health_impact: warning
severity: warning
----
. Configure {VirtProductName} to use memory overcommit either by using the {product-title} web console or by editing the HyperConverged custom resource (CR) file as shown in the following example.
. Add the `cluster-monitoring` label to the `wasp` namespace by running the following command:
+
Example:
[source,terminal]
----
$ oc label namespace wasp openshift.io/cluster-monitoring="true"
----
. Enable memory overcommitment in {VirtProductName} by using the web console or the CLI.
+
[source,yaml]
----
apiVersion: hco.kubevirt.io/v1beta1
kind: HyperConverged
metadata:
name: kubevirt-hyperconverged
namespace: openshift-cnv
spec:
higherWorkloadDensity:
memoryOvercommitPercentage: 150
----
. Apply all the configurations to compute nodes in your cluster by entering the following command:
--
.Web console
.. In the {product-title} web console, go to *Virtualization* -> *Overview* -> *Settings* -> *General settings* -> *Memory density*.
.. Set *Enable memory density* to on.
.CLI
* Run the following command:
+
[source,terminal]
----
$ oc patch --type=merge \
-f <../manifests/hco-set-memory-overcommit.yaml> \
--patch-file <../manifests/hco-set-memory-overcommit.yaml>
-f <../manifests/openshift/hco-set-memory-overcommit.yaml> \
--patch-file <../manifests/openshift/hco-set-memory-overcommit.yaml>
----
+
[NOTE]
====
After applying all configurations, the swap feature is fully available only after all `MachineConfigPool` rollouts are complete.
====
--
.Verification
@@ -271,32 +291,35 @@ After applying all configurations, the swap feature is fully available only afte
+
[source, terminal]
----
$ oc rollout status ds wasp-agent -n wasp
$ oc rollout status ds wasp-agent -n wasp
----
+
If the deployment is successful, the following message is displayed:
+
.Example output
[source, terminal]
----
daemon set "wasp-agent" successfully rolled out
----
. To verify that swap is correctly provisioned, do the following:
.. Run the following command:
. To verify that swap is correctly provisioned, complete the following steps:
.. View a list of worker nodes by running the following command:
+
[source,terminal]
----
$ oc get nodes -l node-role.kubernetes.io/worker
----
.. Select a node from the provided list and run the following command:
.. Select a node from the list and display its memory usage by running the following command:
+
[source,terminal]
----
$ oc debug node/<selected-node> -- free -m
$ oc debug node/<selected_node> -- free -m <1>
----
<1> Replace `<selected_node>` with the node name.
+
If swap is provisioned correctly, an amount greater than zero is displayed, similar to the following:
If swap is provisioned, an amount greater than zero is displayed in the `Swap:` row.
+
.Example output
[cols="1,1,1,1,1,1,1"]
|===
| |total |used |free |shared |buff/cache |available
@@ -309,10 +332,12 @@ If swap is provisioned correctly, an amount greater than zero is displayed, simi
[source,terminal]
----
$ oc get -n openshift-cnv HyperConverged kubevirt-hyperconverged -o jsonpath="{.spec.higherWorkloadDensity.memoryOvercommitPercentage}"
----
+
.Example output
[source,terminal]
----
150
----
+
The returned value, for example `150`, must match the value you had previously configured.
The returned value must match the value you had previously configured.

View File

@@ -0,0 +1,51 @@
// Module included in the following assemblies:
//
// * virt/post_installation_configuration/virt-configuring-higher-vm-workload-density.adoc
:_mod-docs-content-type: CONCEPT
[id="virt-wasp-agent-pod-eviction_{context}"]
= Pod eviction conditions used by wasp-agent
The wasp agent manages pod eviction when the system is heavily loaded and nodes are at risk. Eviction is triggered if one of the following conditions is met:
High swap I/O traffic::
This condition is met when swap-related I/O traffic is excessively high.
+
.Condition
[source,text]
----
averageSwapInPerSecond > maxAverageSwapInPagesPerSecond
&&
averageSwapOutPerSecond > maxAverageSwapOutPagesPerSecond
----
+
By default, `maxAverageSwapInPagesPerSecond` and `maxAverageSwapOutPagesPerSecond` are set to 1000 pages. The default time interval for calculating the average is 30 seconds.
High swap utilization::
This condition is met when swap utilization is excessively high, causing the current virtual memory usage to exceed the factored threshold. The `NODE_SWAP_SPACE` setting in your `MachineConfig` object can impact this condition.
+
.Condition
[source,text]
----
nodeWorkingSet + nodeSwapUsage < totalNodeMemory + totalSwapMemory × thresholdFactor
----
[id="environment-variables_{context}"]
== Environment variables
You can use the following environment variables to adjust the values used to calculate eviction conditions:
[cols="1,1"]
|===
|*Environment variable* |*Function*
|`MAX_AVERAGE_SWAP_IN_PAGES_PER_SECOND`
|Sets the value of `maxAverageSwapInPagesPerSecond`.
|`MAX_AVERAGE_SWAP_OUT_PAGES_PER_SECOND`
|Sets the value of `maxAverageSwapOutPagesPerSecond`.
|`SWAP_UTILIZATION_THRESHOLD_FACTOR`
|Sets the `thresholdFactor` value used to calculate high swap utilization.
|`AVERAGE_WINDOW_SIZE_SECONDS`
|Sets the time interval for calculating the average swap usage.
|===

View File

@@ -6,23 +6,16 @@ include::_attributes/common-attributes.adoc[]
toc::[]
To increase the number of virtual machines (VMs), you can configure a higher VM workload density in your cluster by overcommitting the amount of memory (RAM).
You can increase the number of virtual machines (VMs) on nodes by overcommitting memory (RAM). Increasing VM workload density can be useful in the following situations:
:FeatureName: Configuring higher workload density
include::snippets/technology-preview.adoc[]
The following workloads are especially suited for higher workload density:
* Many similar workloads
* Underused workloads
* You have many similar workloads.
* You have underused workloads.
[NOTE]
====
While overcommitted memory can lead to a higher workload density, it can also lower workload performance of a highly utilized system.
Memory overcommitment can lower workload performance on a highly utilized system.
====
include::modules/virt-using-wasp-agent-to-configure-higher-vm-workload-density.adoc[leveloffset=+1]
include::modules/virt-wasp-agent-pod-eviction.adoc[leveloffset=+1]