mirror of
https://github.com/openshift/openshift-docs.git
synced 2026-02-05 21:46:22 +01:00
CNV#34781: removing TP from wasp-agent doc
This commit is contained in:
@@ -4,10 +4,9 @@
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="virt-using-wasp-agent-to-configure-higher-vm-workload-density_{context}"]
|
||||
= Using `wasp-agent` to configure higher VM workload density
|
||||
= Using wasp-agent to increase VM workload density
|
||||
|
||||
The `wasp-agent` component enables an {product-title} cluster to assign swap resources to virtual machine (VM) workloads.
|
||||
Swap usage is only supported on worker nodes.
|
||||
The `wasp-agent` component facilitates memory overcommitment by assigning swap resources to worker nodes. It also manages pod evictions when nodes are at risk due to high swap I/O traffic or high utilization.
|
||||
|
||||
[IMPORTANT]
|
||||
====
|
||||
@@ -18,14 +17,99 @@ For descriptions of QoS classes, see link:https://kubernetes.io/docs/tasks/confi
|
||||
|
||||
.Prerequisites
|
||||
|
||||
* The `oc` tool is available.
|
||||
* You are logged into the cluster with the cluster-admin role.
|
||||
* A memory over-commit ratio is defined.
|
||||
* You have installed the OpenShift CLI (`oc`).
|
||||
* You are logged into the cluster with the `cluster-admin` role.
|
||||
* A memory overcommit ratio is defined.
|
||||
* The node belongs to a worker pool.
|
||||
|
||||
[NOTE]
|
||||
====
|
||||
The `wasp-agent` component deploys an Open Container Initiative (OCI) hook to enable swap usage for containers on the node level. The low-level nature requires the `DaemonSet` object to be privileged.
|
||||
====
|
||||
|
||||
.Procedure
|
||||
|
||||
. Create a privileged service account by entering the following commands:
|
||||
. Configure the `kubelet` service to permit swap usage:
|
||||
.. Create or edit a `KubeletConfig` file with the parameters shown in the following example:
|
||||
+
|
||||
.Example of a `KubeletConfig` file
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: machineconfiguration.openshift.io/v1
|
||||
kind: KubeletConfig
|
||||
metadata:
|
||||
name: custom-config
|
||||
spec:
|
||||
machineConfigPoolSelector:
|
||||
matchLabels:
|
||||
pools.operator.machineconfiguration.openshift.io/worker: '' # MCP
|
||||
#machine.openshift.io/cluster-api-machine-role: worker # machine
|
||||
#node-role.kubernetes.io/worker: '' # node
|
||||
kubeletConfig:
|
||||
failSwapOn: false
|
||||
----
|
||||
|
||||
.. Wait for the worker nodes to sync with the new configuration by running the following command:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
$ oc wait mcp worker --for condition=Updated=True --timeout=-1s
|
||||
----
|
||||
|
||||
. Provision swap by creating a `MachineConfig` object. For example:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: machineconfiguration.openshift.io/v1
|
||||
kind: MachineConfig
|
||||
metadata:
|
||||
labels:
|
||||
machineconfiguration.openshift.io/role: worker
|
||||
name: 90-worker-swap
|
||||
spec:
|
||||
config:
|
||||
ignition:
|
||||
version: 3.4.0
|
||||
systemd:
|
||||
units:
|
||||
- contents: |
|
||||
[Unit]
|
||||
Description=Provision and enable swap
|
||||
ConditionFirstBoot=no
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
Environment=SWAP_SIZE_MB=5000
|
||||
ExecStart=/bin/sh -c "sudo dd if=/dev/zero of=/var/tmp/swapfile count=${SWAP_SIZE_MB} bs=1M && \
|
||||
sudo chmod 600 /var/tmp/swapfile && \
|
||||
sudo mkswap /var/tmp/swapfile && \
|
||||
sudo swapon /var/tmp/swapfile && \
|
||||
free -h && \
|
||||
sudo systemctl set-property --runtime system.slice MemorySwapMax=0 IODeviceLatencyTargetSec=\"/ 50ms\""
|
||||
|
||||
[Install]
|
||||
RequiredBy=kubelet-dependencies.target
|
||||
enabled: true
|
||||
name: swap-provision.service
|
||||
----
|
||||
+
|
||||
To have enough swap space for the worst-case scenario, make sure to have at least as much swap space provisioned as overcommitted RAM. Calculate the amount of swap space to be provisioned on a node by using the following formula:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
NODE_SWAP_SPACE = NODE_RAM * (MEMORY_OVER_COMMIT_PERCENT / 100% - 1)
|
||||
----
|
||||
+
|
||||
.Example
|
||||
[source,terminal]
|
||||
----
|
||||
NODE_SWAP_SPACE = 16 GB * (150% / 100% - 1)
|
||||
= 16 GB * (1.5 - 1)
|
||||
= 16 GB * (0.5)
|
||||
= 8 GB
|
||||
----
|
||||
|
||||
. Create a privileged service account by running the following commands:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
@@ -46,13 +130,27 @@ $ oc create clusterrolebinding wasp --clusterrole=cluster-admin --serviceaccount
|
||||
----
|
||||
$ oc adm policy add-scc-to-user -n wasp privileged -z wasp
|
||||
----
|
||||
|
||||
. Wait for the worker nodes to sync with the new configuration by running the following command:
|
||||
+
|
||||
[NOTE]
|
||||
====
|
||||
The `wasp-agent` component deploys an OCI hook to enable swap usage for containers on the node level. The low-level nature requires the `DaemonSet` object to be privileged.
|
||||
====
|
||||
[source,yaml]
|
||||
----
|
||||
$ oc wait mcp worker --for condition=Updated=True --timeout=-1s
|
||||
----
|
||||
|
||||
. Determine the pull URL for the wasp agent image by running the following commands:
|
||||
+
|
||||
. Deploy `wasp-agent` by creating a `DaemonSet` object as follows:
|
||||
[source,terminal]
|
||||
----
|
||||
$ OCP_VERSION=$(oc get clusterversion | awk 'NR==2' |cut -d' ' -f4 | cut -d'-' -f1)
|
||||
----
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc get csv kubevirt-hyperconverged-operator.v${OCP_VERSION} -nopenshift-cnv -ojson | jq '.spec.relatedImages[] | select(.name|test(".*wasp-agent.*")) | .image'
|
||||
----
|
||||
|
||||
. Deploy `wasp-agent` by creating a `DaemonSet` object as shown in the following example:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
@@ -74,20 +172,30 @@ spec:
|
||||
description: >-
|
||||
Configures swap for workloads
|
||||
labels:
|
||||
name: wasp
|
||||
name: wasp
|
||||
spec:
|
||||
serviceAccountName: wasp
|
||||
hostPID: true
|
||||
hostUsers: true
|
||||
terminationGracePeriodSeconds: 5
|
||||
containers:
|
||||
- name: wasp-agent
|
||||
- env:
|
||||
- name: SWAP_UTILIZATION_THRESHOLD_FACTOR
|
||||
value: 0.8
|
||||
- name: MAX_AVERAGE_SWAP_IN_PAGES_PER_SECOND
|
||||
value: "1000"
|
||||
- name: MAX_AVERAGE_SWAP_OUT_PAGES_PER_SECOND
|
||||
value: "1000"
|
||||
- name: AVERAGE_WINDOW_SIZE_SECONDS
|
||||
value: "30"
|
||||
- name: VERBOSITY
|
||||
value: "1"
|
||||
- name: FSROOT
|
||||
value: /host
|
||||
- name: NODE_NAME
|
||||
valueFrom:
|
||||
fieldRef:
|
||||
fieldPath: spec.nodeName
|
||||
image: >-
|
||||
registry.redhat.io/container-native-virtualization/wasp-agent-rhel9:v4.17
|
||||
quay.io/openshift-virtualization/wasp-agent:v4.17 <1>
|
||||
imagePullPolicy: Always
|
||||
env:
|
||||
- name: "FSROOT"
|
||||
value: "/host"
|
||||
name: wasp-agent
|
||||
resources:
|
||||
requests:
|
||||
cpu: 100m
|
||||
@@ -95,175 +203,87 @@ spec:
|
||||
securityContext:
|
||||
privileged: true
|
||||
volumeMounts:
|
||||
- name: host
|
||||
mountPath: "/host"
|
||||
volumes:
|
||||
- name: host
|
||||
hostPath:
|
||||
path: "/"
|
||||
- mountPath: /host
|
||||
name: host
|
||||
- mountPath: /rootfs
|
||||
name: rootfs
|
||||
hostPID: true
|
||||
hostUsers: true
|
||||
priorityClassName: system-node-critical
|
||||
serviceAccountName: wasp
|
||||
terminationGracePeriodSeconds: 5
|
||||
volumes:
|
||||
- hostPath:
|
||||
path: /
|
||||
name: host
|
||||
- hostPath:
|
||||
path: /
|
||||
name: rootfs
|
||||
updateStrategy:
|
||||
type: RollingUpdate
|
||||
rollingUpdate:
|
||||
maxUnavailable: 10%
|
||||
maxSurge: 0
|
||||
status: {}
|
||||
----
|
||||
. Configure the `kubelet` service to permit swap:
|
||||
.. Create a `KubeletConfiguration` file as shown in the example:
|
||||
+
|
||||
.Example of a `KubeletConfiguration` file
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: machineconfiguration.openshift.io/v1
|
||||
kind: KubeletConfig
|
||||
metadata:
|
||||
name: custom-config
|
||||
spec:
|
||||
machineConfigPoolSelector:
|
||||
matchLabels:
|
||||
pools.operator.machineconfiguration.openshift.io/worker: '' # MCP
|
||||
#machine.openshift.io/cluster-api-machine-role: worker # machine
|
||||
#node-role.kubernetes.io/worker: '' # node
|
||||
kubeletConfig:
|
||||
failSwapOn: false
|
||||
evictionSoft:
|
||||
memory.available: "1Gi"
|
||||
evictionSoftGracePeriod:
|
||||
memory.available: "10s"
|
||||
----
|
||||
+
|
||||
If the cluster is already using an existing `KubeletConfiguration` file, add the following to the `spec` section:
|
||||
<1> Replace the `image` value with the image URL from the previous step.
|
||||
|
||||
. Deploy alerting rules by creating a `PrometheusRule` object. For example:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: machineconfiguration.openshift.io/v1
|
||||
kind: KubeletConfig
|
||||
metadata:
|
||||
name: custom-config
|
||||
# ...
|
||||
spec
|
||||
# ...
|
||||
kubeletConfig:
|
||||
evictionSoft:
|
||||
memory.available: 1Gi
|
||||
evictionSoftGracePeriod:
|
||||
memory.available: 1m30s
|
||||
failSwapOn: false
|
||||
----
|
||||
.. Run the following command:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
$ oc wait mcp worker --for condition=Updated=True
|
||||
----
|
||||
. Create a `MachineConfig` object to provision swap as follows:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: machineconfiguration.openshift.io/v1
|
||||
kind: MachineConfig
|
||||
apiVersion: monitoring.coreos.com/v1
|
||||
kind: PrometheusRule
|
||||
metadata:
|
||||
labels:
|
||||
machineconfiguration.openshift.io/role: worker
|
||||
name: 90-worker-swap
|
||||
spec:
|
||||
config:
|
||||
ignition:
|
||||
version: 3.4.0
|
||||
systemd:
|
||||
units:
|
||||
- contents: |
|
||||
[Unit]
|
||||
Description=Provision and enable swap
|
||||
ConditionFirstBoot=no
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
Environment=SWAP_SIZE_MB=5000
|
||||
ExecStart=/bin/sh -c "sudo dd if=/dev/zero of=/var/tmp/swapfile count=${SWAP_SIZE_MB} bs=1M && \
|
||||
sudo chmod 600 /var/tmp/swapfile && \
|
||||
sudo mkswap /var/tmp/swapfile && \
|
||||
sudo swapon /var/tmp/swapfile && \
|
||||
free -h && \
|
||||
sudo systemctl set-property --runtime system.slice MemorySwapMax=0 IODeviceLatencyTargetSec=\"/ 50ms\""
|
||||
|
||||
[Install]
|
||||
RequiredBy=kubelet-dependencies.target
|
||||
enabled: true
|
||||
name: swap-provision.service
|
||||
----
|
||||
+
|
||||
To have enough swap space for the worst-case scenario, make sure to have at least as much swap space provisioned as overcommitted RAM. Calculate the amount of swap space to be provisioned on a node using the following formula:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
NODE_SWAP_SPACE = NODE_RAM * (MEMORY_OVER_COMMIT_PERCENT / 100% - 1)
|
||||
----
|
||||
+
|
||||
Example:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
NODE_SWAP_SPACE = 16 GB * (150% / 100% - 1)
|
||||
= 16 GB * (1.5 - 1)
|
||||
= 16 GB * (0.5)
|
||||
= 8 GB
|
||||
----
|
||||
+
|
||||
. Deploy alerting rules as follows:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: monitoring.openshift.io/v1
|
||||
kind: AlertingRule
|
||||
metadata:
|
||||
name: wasp-alerts
|
||||
namespace: openshift-monitoring
|
||||
tier: node
|
||||
wasp.io: ""
|
||||
name: wasp-rules
|
||||
namespace: wasp
|
||||
spec:
|
||||
groups:
|
||||
- name: wasp.rules
|
||||
rules:
|
||||
- alert: NodeSwapping
|
||||
annotations:
|
||||
description: Node {{ $labels.instance }} is swapping at a rate of {{ printf "%.2f" $value }} MB/s
|
||||
runbook_url: https://github.com/openshift-virtualization/wasp-agent/tree/main/runbooks/alerts/NodeSwapping.md
|
||||
summary: A node is swapping memory pages
|
||||
expr: |
|
||||
# In MB/s
|
||||
irate(node_memory_SwapFree_bytes{job="node-exporter"}[5m]) / 1024^2 > 0
|
||||
for: 1m
|
||||
labels:
|
||||
severity: critical
|
||||
- name: alerts.rules
|
||||
rules:
|
||||
- alert: NodeHighSwapActivity
|
||||
annotations:
|
||||
description: High swap activity detected at {{ $labels.instance }}. The rate
|
||||
of swap out and swap in exceeds 200 in both operations in the last minute.
|
||||
This could indicate memory pressure and may affect system performance.
|
||||
runbook_url: https://github.com/openshift-virtualization/wasp-agent/tree/main/docs/runbooks/NodeHighSwapActivity.md
|
||||
summary: High swap activity detected at {{ $labels.instance }}.
|
||||
expr: rate(node_vmstat_pswpout[1m]) > 200 and rate(node_vmstat_pswpin[1m]) >
|
||||
200
|
||||
for: 1m
|
||||
labels:
|
||||
kubernetes_operator_component: kubevirt
|
||||
kubernetes_operator_part_of: kubevirt
|
||||
operator_health_impact: warning
|
||||
severity: warning
|
||||
----
|
||||
. Configure {VirtProductName} to use memory overcommit either by using the {product-title} web console or by editing the HyperConverged custom resource (CR) file as shown in the following example.
|
||||
|
||||
. Add the `cluster-monitoring` label to the `wasp` namespace by running the following command:
|
||||
+
|
||||
Example:
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc label namespace wasp openshift.io/cluster-monitoring="true"
|
||||
----
|
||||
|
||||
. Enable memory overcommitment in {VirtProductName} by using the web console or the CLI.
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: hco.kubevirt.io/v1beta1
|
||||
kind: HyperConverged
|
||||
metadata:
|
||||
name: kubevirt-hyperconverged
|
||||
namespace: openshift-cnv
|
||||
spec:
|
||||
higherWorkloadDensity:
|
||||
memoryOvercommitPercentage: 150
|
||||
----
|
||||
. Apply all the configurations to compute nodes in your cluster by entering the following command:
|
||||
--
|
||||
.Web console
|
||||
.. In the {product-title} web console, go to *Virtualization* -> *Overview* -> *Settings* -> *General settings* -> *Memory density*.
|
||||
.. Set *Enable memory density* to on.
|
||||
|
||||
.CLI
|
||||
* Run the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc patch --type=merge \
|
||||
-f <../manifests/hco-set-memory-overcommit.yaml> \
|
||||
--patch-file <../manifests/hco-set-memory-overcommit.yaml>
|
||||
-f <../manifests/openshift/hco-set-memory-overcommit.yaml> \
|
||||
--patch-file <../manifests/openshift/hco-set-memory-overcommit.yaml>
|
||||
----
|
||||
+
|
||||
[NOTE]
|
||||
====
|
||||
After applying all configurations, the swap feature is fully available only after all `MachineConfigPool` rollouts are complete.
|
||||
====
|
||||
--
|
||||
|
||||
.Verification
|
||||
|
||||
@@ -271,32 +291,35 @@ After applying all configurations, the swap feature is fully available only afte
|
||||
+
|
||||
[source, terminal]
|
||||
----
|
||||
$ oc rollout status ds wasp-agent -n wasp
|
||||
$ oc rollout status ds wasp-agent -n wasp
|
||||
----
|
||||
+
|
||||
If the deployment is successful, the following message is displayed:
|
||||
+
|
||||
.Example output
|
||||
[source, terminal]
|
||||
----
|
||||
daemon set "wasp-agent" successfully rolled out
|
||||
----
|
||||
|
||||
. To verify that swap is correctly provisioned, do the following:
|
||||
.. Run the following command:
|
||||
. To verify that swap is correctly provisioned, complete the following steps:
|
||||
.. View a list of worker nodes by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc get nodes -l node-role.kubernetes.io/worker
|
||||
----
|
||||
.. Select a node from the provided list and run the following command:
|
||||
.. Select a node from the list and display its memory usage by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc debug node/<selected-node> -- free -m
|
||||
$ oc debug node/<selected_node> -- free -m <1>
|
||||
----
|
||||
<1> Replace `<selected_node>` with the node name.
|
||||
+
|
||||
If swap is provisioned correctly, an amount greater than zero is displayed, similar to the following:
|
||||
If swap is provisioned, an amount greater than zero is displayed in the `Swap:` row.
|
||||
+
|
||||
.Example output
|
||||
[cols="1,1,1,1,1,1,1"]
|
||||
|===
|
||||
| |total |used |free |shared |buff/cache |available
|
||||
@@ -309,10 +332,12 @@ If swap is provisioned correctly, an amount greater than zero is displayed, simi
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc get -n openshift-cnv HyperConverged kubevirt-hyperconverged -o jsonpath="{.spec.higherWorkloadDensity.memoryOvercommitPercentage}"
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
150
|
||||
----
|
||||
+
|
||||
The returned value, for example `150`, must match the value you had previously configured.
|
||||
|
||||
|
||||
|
||||
The returned value must match the value you had previously configured.
|
||||
|
||||
51
modules/virt-wasp-agent-pod-eviction.adoc
Normal file
51
modules/virt-wasp-agent-pod-eviction.adoc
Normal file
@@ -0,0 +1,51 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * virt/post_installation_configuration/virt-configuring-higher-vm-workload-density.adoc
|
||||
|
||||
:_mod-docs-content-type: CONCEPT
|
||||
[id="virt-wasp-agent-pod-eviction_{context}"]
|
||||
= Pod eviction conditions used by wasp-agent
|
||||
|
||||
The wasp agent manages pod eviction when the system is heavily loaded and nodes are at risk. Eviction is triggered if one of the following conditions is met:
|
||||
|
||||
High swap I/O traffic::
|
||||
|
||||
This condition is met when swap-related I/O traffic is excessively high.
|
||||
+
|
||||
.Condition
|
||||
[source,text]
|
||||
----
|
||||
averageSwapInPerSecond > maxAverageSwapInPagesPerSecond
|
||||
&&
|
||||
averageSwapOutPerSecond > maxAverageSwapOutPagesPerSecond
|
||||
----
|
||||
+
|
||||
By default, `maxAverageSwapInPagesPerSecond` and `maxAverageSwapOutPagesPerSecond` are set to 1000 pages. The default time interval for calculating the average is 30 seconds.
|
||||
|
||||
High swap utilization::
|
||||
|
||||
This condition is met when swap utilization is excessively high, causing the current virtual memory usage to exceed the factored threshold. The `NODE_SWAP_SPACE` setting in your `MachineConfig` object can impact this condition.
|
||||
+
|
||||
.Condition
|
||||
[source,text]
|
||||
----
|
||||
nodeWorkingSet + nodeSwapUsage < totalNodeMemory + totalSwapMemory × thresholdFactor
|
||||
----
|
||||
|
||||
[id="environment-variables_{context}"]
|
||||
== Environment variables
|
||||
|
||||
You can use the following environment variables to adjust the values used to calculate eviction conditions:
|
||||
|
||||
[cols="1,1"]
|
||||
|===
|
||||
|*Environment variable* |*Function*
|
||||
|`MAX_AVERAGE_SWAP_IN_PAGES_PER_SECOND`
|
||||
|Sets the value of `maxAverageSwapInPagesPerSecond`.
|
||||
|`MAX_AVERAGE_SWAP_OUT_PAGES_PER_SECOND`
|
||||
|Sets the value of `maxAverageSwapOutPagesPerSecond`.
|
||||
|`SWAP_UTILIZATION_THRESHOLD_FACTOR`
|
||||
|Sets the `thresholdFactor` value used to calculate high swap utilization.
|
||||
|`AVERAGE_WINDOW_SIZE_SECONDS`
|
||||
|Sets the time interval for calculating the average swap usage.
|
||||
|===
|
||||
@@ -6,23 +6,16 @@ include::_attributes/common-attributes.adoc[]
|
||||
|
||||
toc::[]
|
||||
|
||||
To increase the number of virtual machines (VMs), you can configure a higher VM workload density in your cluster by overcommitting the amount of memory (RAM).
|
||||
You can increase the number of virtual machines (VMs) on nodes by overcommitting memory (RAM). Increasing VM workload density can be useful in the following situations:
|
||||
|
||||
:FeatureName: Configuring higher workload density
|
||||
include::snippets/technology-preview.adoc[]
|
||||
|
||||
The following workloads are especially suited for higher workload density:
|
||||
|
||||
* Many similar workloads
|
||||
* Underused workloads
|
||||
* You have many similar workloads.
|
||||
* You have underused workloads.
|
||||
|
||||
[NOTE]
|
||||
====
|
||||
While overcommitted memory can lead to a higher workload density, it can also lower workload performance of a highly utilized system.
|
||||
Memory overcommitment can lower workload performance on a highly utilized system.
|
||||
====
|
||||
|
||||
include::modules/virt-using-wasp-agent-to-configure-higher-vm-workload-density.adoc[leveloffset=+1]
|
||||
|
||||
|
||||
|
||||
|
||||
include::modules/virt-wasp-agent-pod-eviction.adoc[leveloffset=+1]
|
||||
|
||||
Reference in New Issue
Block a user