1
0
mirror of https://github.com/openshift/openshift-docs.git synced 2026-02-05 21:46:22 +01:00

changes for CNF-802 updates

fixed dupe content error

updates for CNF-802

CNF-802 Tweaks

latest changes for CNF-802

updated id

updated id

updates for @dshchedr

fixes for globallyDisableIrqLoadBalancing typo and sundry formatting fixes

updates for marcel  jan 13

yupdates

changes 'enables' to 'disables' for nabling_interrupt_processing_for_individual_pods_cnf

typo

typo

API formatting typos

updated install procedure for CLI and GUI. removed spec from pao yaml, and set install to apply to all namespaces

added 4.6 > 4.7 upgrade info, removed obsolete 4.6 PAO developer preview warnings

added 4.6 > 4.7 upgrade info, removed obsolete 4.6 PAO developer preview warnings

updates for marcel

updates for marcel - upgrade from 4.6

typo in cmd

changed title

changed title

PAO upgrade tweak

typo

typo

put back the object name

typo

typo
This commit is contained in:
Aidan Reilly
2020-12-11 16:30:16 +00:00
committed by openshift-cherrypick-robot
parent ac8ac938e7
commit bfb1856dde
10 changed files with 304 additions and 26 deletions

View File

@@ -0,0 +1,174 @@
// Module included in the following assemblies:
//
// scalability_and_performance/cnf-performance-addon-operator-for-low-latency-nodes.adoc
[id="configuring_for_irq_dynamic_load_balancing_{context}"]
= Configuring a node for IRQ dynamic load balancing
To configure a cluster node to handle IRQ dynamic load balancing, do the following:
. Log in to the {product-title} cluster as a user with cluster-admin privileges.
. Set the performance profile `apiVersion` to use `performance.openshift.io/v2`.
. Remove the `globallyDisableIrqLoadBalancing` field or set it to `false`.
. Set the appropriate isolated and reserved CPUs. The following snippet illustrates a profile that reserves 2 CPUs. IRQ load-balancing is enabled for pods running on the `isolated` CPU set:
+
[source,yaml]
----
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
name: dynamic-irq-profile
spec:
cpu:
isolated: 2-5
reserved: 0-1
...
----
. Create the pod that uses exclusive CPUs, and set `irq-load-balancing.crio.io` and `cpu-quota.crio.io` annotations to `disable`. For example:
+
[source,yaml]
----
apiVersion: v1
kind: Pod
metadata:
name: dynamic-irq-pod
annotations:
irq-load-balancing.crio.io: "disable"
cpu-quota.crio.io: "disable"
spec:
containers:
- name: dynamic-irq-pod
image: "quay.io/openshift-kni/cnf-tests:4.6"
command: ["sleep", "10h"]
resources:
requests:
cpu: 2
memory: "200M"
limits:
cpu: 2
memory: "200M"
nodeSelector:
node-role.kubernetes.io/worker-cnf: ""
runtimeClassName: performance-dynamic-irq-profile
...
----
. Enter the pod `runtimeClassName` in the form performance-<profile_name>, where <profile_name> is the `name` from the `PerformanceProfile` YAML, in this example, `performance-dynamic-irq-profile`.
. Set the node selector to target a cnf-worker.
. Ensure the pod is running correctly. Status should be `running`, and the correct cnf-worker node should be set:
+
[source,terminal]
----
$ oc get pod -o wide
----
+
.Expected output
+
[source,terminal]
----
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
dynamic-irq-pod 1/1 Running 0 5h33m <ip-address> <node-name> <none> <none>
----
. Get the CPUs that the pod configured for IRQ dynamic load balancing runs on:
+
[source,terminal]
----
$ oc exec -it dynamic-irq-pod -- /bin/bash -c "grep Cpus_allowed_list /proc/self/status | awk '{print $2}'"
----
+
.Expected output
+
[source,terminal]
----
Cpus_allowed_list: 2-3
----
. Ensure the node configuration is applied correctly. SSH into the node to verify the configuration.
+
[source,terminal]
----
$ oc debug node/<node-name>
----
+
.Expected output
+
[source,terminal]
----
Starting pod/<node-name>-debug ...
To use host binaries, run `chroot /host`
Pod IP: <ip-address>
If you don't see a command prompt, try pressing enter.
sh-4.4#
----
. Verify that you can use the node file system:
+
[source,terminal]
----
sh-4.4# chroot /host
----
+
.Expected output
+
[source,terminal]
----
sh-4.4#
----
. Ensure the default system CPU affinity mask does not include the `dynamic-irq-pod` CPUs, for example, CPUs 2 and 3.
+
[source,terminal]
----
$ cat /proc/irq/default_smp_affinity
----
+
.Example output
+
[source,terminal]
----
33
----
. Ensure the system IRQs are not configured to run on the `dynamic-irq-pod` CPUs:
+
[source,terminal]
----
find /proc/irq/ -name smp_affinity_list -exec sh -c 'i="$1"; mask=$(cat $i); file=$(echo $i); echo $file: $mask' _ {} \;
----
+
.Example output
+
[source,terminal]
----
/proc/irq/0/smp_affinity_list: 0-5
/proc/irq/1/smp_affinity_list: 5
/proc/irq/2/smp_affinity_list: 0-5
/proc/irq/3/smp_affinity_list: 0-5
/proc/irq/4/smp_affinity_list: 0
/proc/irq/5/smp_affinity_list: 0-5
/proc/irq/6/smp_affinity_list: 0-5
/proc/irq/7/smp_affinity_list: 0-5
/proc/irq/8/smp_affinity_list: 4
/proc/irq/9/smp_affinity_list: 4
/proc/irq/10/smp_affinity_list: 0-5
/proc/irq/11/smp_affinity_list: 0
/proc/irq/12/smp_affinity_list: 1
/proc/irq/13/smp_affinity_list: 0-5
/proc/irq/14/smp_affinity_list: 1
/proc/irq/15/smp_affinity_list: 0
/proc/irq/24/smp_affinity_list: 1
/proc/irq/25/smp_affinity_list: 1
/proc/irq/26/smp_affinity_list: 1
/proc/irq/27/smp_affinity_list: 5
/proc/irq/28/smp_affinity_list: 1
/proc/irq/29/smp_affinity_list: 0
/proc/irq/30/smp_affinity_list: 0-5
----
Some IRQ controllers do not support IRQ re-balancing and will always expose all online CPUs as the IRQ mask. These IRQ controllers effectively run on CPU 0. For more information on the host configuration, SSH into the host and run the following, replacing `<irq-num>` with the CPU number that you want to query:
[source,terminal]
----
$ cat /proc/irq/<irq-num>/effective_affinity
----

View File

@@ -27,7 +27,7 @@ will be reserved for housekeeping, and CPUs that will be used for running the wo
This is a typical performance profile:
+
----
apiversion: performance.openshift.io/v1alpha1
apiversion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
name: <unique-name>

View File

@@ -55,9 +55,6 @@ kind: OperatorGroup
metadata:
name: openshift-performance-addon-operator
namespace: openshift-performance-addon-operator
spec:
targetNamespaces:
- openshift-performance-addon-operator
----
.. Create the `OperatorGroup` CR by running the following command:
@@ -132,7 +129,7 @@ You must create the `Namespace` CR and `OperatorGroup` CR as mentioned in the pr
.. Choose *Performance Addon Operator* from the list of available Operators, and then click *Install*.
.. On the *Install Operator* page, under *A specific namespace on the cluster* select *openshift-performance-addon-operator*. Then, click *Install*.
.. On the *Install Operator* page, select *All namespaces on the cluster*. Then, click *Install*.
. Optional: Verify that the performance-addon-operator installed successfully:

View File

@@ -0,0 +1,56 @@
// CNF-802 Infrastructure-provided interrupt processing for guaranteed pod CPUs
// Module included in the following assemblies:
//
// *cnf-performance-addon-operator-for-low-latency-nodes.adoc
[id="managing-device-interrupt-processing-for-guaranteed-pod-isolated-cpus_{context}"]
= Managing device interrupt processing for guaranteed pod isolated CPUs
The Performance Addon Operator manages host CPUs by dividing them into reserved CPUs for cluster and operating system housekeeping duties, and isolated CPUs for workloads. CPUs that are used for low latency workloads are set as isolated.
Device interrupts are load balanced between all isolated and reserved CPUs in order to avoid CPUs being overloaded, with the exception of CPUs where there is a guaranteed pod running. Guaranteed pod CPUs are prevented from processing device interrupts when the relevant annotations are set for the pod.
In the performance profile, `globallyDisableIrqLoadBalancing` is used to manage whether device interrupts are processed or not. For certain workloads the reserved CPUs are not always sufficient for dealing with device interrupts, and for this reason, device interrupts are not globally disabled on the isolated CPUs. By default, Performance Addon Operator does not disable device interrupts on isolated CPUs.
To achieve low latency for workloads, some (but not all) pods require the CPUs they are running on to not process device interrupts. A pod annotation, `irq-load-balancing.crio.io`, is used to define whether device interrupts are processed or not. When configured, CRI-O disables device interrupts only as long as the pod is running.
[id="configuring-global-device-interrupts-handling-for-isolated-cpus_{context}"]
== Disabling global device interrupts handling in Performance Addon Operator
To configure Performance Addon Operator to disable global device interrupts for the isolated CPU set, set the `globallyDisableIrqLoadBalancing` field in the performance profile to `true`. When `true`, conflicting pod annotations are ignored. When `false`, IRQ loads are balanced across all CPUs.
A performance profile snippet illustrates this setting:
[source,yaml]
----
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
name: manual
spec:
globallyDisableIrqLoadBalancing:
enabled: "true"
...
----
[id="disabling_interrupt_processing_for_individual_pods_{context}"]
== Disabling interrupt processing for individual pods
To disable interrupt processing for individual pods, ensure that `globallyDisableIrqLoadBalancing` is set to `false` in the performance profile. Then, in the pod specification, set the `irq-load-balancing.crio.io` and `cpu-quota.crio.io` pod annotations to `disable`. An example pod specification snippet that illustrates this is below:
[source,yaml]
----
apiVersion: performance.openshift.io/v2
kind: Pod
metadata:
annotations:
irq-load-balancing.crio.io: "disable"
cpu-quota.crio.io: "disable"
spec:
runtimeClassName: performance-<profile_name>
...
----

View File

@@ -444,7 +444,7 @@ To do this, a profile like the following one can be mounted inside the container
[source,yaml]
----
apiVersion: performance.openshift.io/v1
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
name: performance
@@ -552,7 +552,7 @@ To do this, use a profile like the following one that can be mounted inside the
[source,yaml]
----
apiVersion: performance.openshift.io/v1
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
name: performance

View File

@@ -48,7 +48,7 @@ You must decide which nodes will be configured with real-time workloads. It coul
+
[source,yaml]
----
apiVersion: performance.openshift.io/v1
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
name: example-performanceprofile
@@ -176,7 +176,7 @@ Functionality to disable or enable CPU load balancing is implemented on the CRI-
+
[source,yaml]
----
apiVersion: performance.openshift.io/v1
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
...
status:

View File

@@ -3,12 +3,6 @@
// Epic CNF-422 (4.5)
// scalability_and_performance/cnf-performance-addon-operator-for-low-latency-nodes.adoc
[IMPORTANT]
====
The feature described in this document is for *Developer Preview* purposes and is *not supported* by Red Hat at this time.
This feature could cause nodes to reboot and not be available.
====
[id="cnf-tuning-nodes-for-low-latency-via-performanceprofile_{context}"]
= Tuning nodes for low latency with the performance profile
@@ -32,7 +26,7 @@ This is a typical performance profile:
+
[source,yaml]
----
apiVersion: performance.openshift.io/v1
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
name: performance

View File

@@ -1,24 +1,16 @@
// Module included in the following assemblies:
//
// * scalability_and_performance/cnf-performance-addon-operator-for-low-latency-nodes.adoc
//
[IMPORTANT]
====
The feature described in this document is for *Developer Preview* purposes and is *not supported* by Red Hat at this time.
This feature could cause nodes to reboot and not be available.
====
[id="upgrading-performance-addon-operator_{context}"]
= Upgrading Performance Addon Operator
You can manually upgrade to the next minor version of Performance Addon Operator and monitor the status of an update
by using the web console.
You can manually upgrade to the next minor version of Performance Addon Operator and monitor the status of an update by using the web console.
[id="about-upgrading-performance-addon-operator_{context}"]
== About upgrading Performance Addon Operator
* You can upgrade to the next minor version of Performance Addon Operator by using the OpenShift web console to change the channel of your Operator subscription.
* You can upgrade to the next minor version of Performance Addon Operator by using the {product-title} web console to change the channel of your Operator subscription.
* You can enable automatic z-stream updates during Performance Addon Operator installation.
@@ -63,6 +55,36 @@ You can manually upgrade Performance Addon Operator to the next minor version by
$ oc get csv -n openshift-performance-addon-operator
----
[id="upgrading-performance-addon-operator-configured-for-a-specific-namespace_{context}"]
=== Upgrading Performance Addon Operator when previously installed to a specific namespace
If you previously installed the Performance Addon Operator to a specific namespace on the cluster, for example `openshift-performance-addon-operator`, modify the `OperatorGroup` object to remove the `targetNamespaces` entry before upgrading.
.Prerequisites
* Install the {product-title} CLI (oc).
* Log in to the OpenShift cluster as a user with cluster-admin privileges.
.Procedure
. Edit the Performance Addon Operator `OperatorGroup` CR and remove the `spec` element that contains the `targetNamespaces` entry by running the following command:
+
[source,terminal]
----
$ oc patch operatorgroup -n openshift-performance-addon-operator openshift-performance-addon-operator --type json -p '[{ "op": "remove", "path": "/spec" }]'
----
. Wait until the Operator Lifecycle Manager (OLM) processes the change.
. Verify that the OperatorGroup CR change has been successfully applied. Check that the `OperatorGroup` CR `spec` element has been removed:
+
[source,terminal]
----
$ oc describe -n openshift-performance-addon-operator og openshift-performance-addon-operator
----
. Proceed with the Performance Addon Operator upgrade.
//. Proceed with the xref:../scalability_and_performance/cnf-performance-addon-operator-for-low-latency-nodes#upgrading-performance-addon-operator_{context}[Performance Addon Operator upgrade].
[id="performance-addon-operator-monitoring-upgrade-status_{context}"]
== Monitoring upgrade status
The best way to monitor Performance Addon Operator upgrade status is to watch the `ClusterServiceVersion` (CSV) `PHASE`.

View File

@@ -0,0 +1,29 @@
// CNF-802 Infrastructure-provided interrupt processing for guaranteed pod CPUs
// Module included in the following assemblies:
//
// *cnf-performance-addon-operator-for-low-latency-nodes.adoc
[id="use-device-interrupt-processing-for-isolated-cpus_{context}"]
= Upgrading the performance profile to use device interrupt processing
When you upgrade the Performance Addon Operator performance profile custom resource definition (CRD) from v1 or v1alpha1 to v2, `globallyDisableIrqLoadBalancing` is set to `true` on existing profiles.
[NOTE]
====
When `globallyDisableIrqLoadBalancing` is set to `true`, device interrupts are processed across all CPUs as long as they don't belong to a guaranteed pod.
====
[id="pao_supported_api_versions_{context}"]
== Supported API Versions
The Performance Addon Operator supports `v2`, `v1`, and `v1alpha1` for the performance profile `apiVersion` field. The v1 and v1alpha1 APIs are identical. The v2 API includes an optional boolean field `globallyDisableIrqLoadBalancing` with a default value of `false`.
[id="upgrading_pao_api_from_v1alpha1_to_v1_{context}"]
=== Upgrading Performance Addon Operator API from v1alpha1 to v1
When upgrading Performance Addon Operator API version from v1alpha1 to v1, the v1alpha1 performance profiles are converted on-the-fly using a "None" Conversion strategy and served to the Performance Addon Operator with API version v1.
[id="upgrading_pao_api_from_v1alpha1_to_v1_or_v2_{context}"]
=== Upgrading Performance Addon Operator API from v1alpha1 or v1 to v2
When upgrading from an older Performance Addon Operator API version, the existing v1 and v1alpha1 performance profiles are converted using a conversion webhook that injects the `globallyDisableIrqLoadBalancing` field with a value of `true`.

View File

@@ -14,6 +14,12 @@ include::modules/cnf-upgrading-performance-addon-operator.adoc[leveloffset=+1]
include::modules/cnf-provisioning-real-time-and-low-latency-workloads.adoc[leveloffset=+1]
include::modules/cnf-managing-device-interrupt-processing-for-guaranteed-pod-isolated-cpus.adoc[leveloffset=+2]
include::modules/cnf-use-device-interrupt-processing-for-isolated-cpus.adoc[leveloffset=+2]
include::modules/cnf-configure_for_irq_dynamic_load_balancing.adoc[leveloffset=+2]
include::modules/cnf-configuring-huge-pages.adoc[leveloffset=+1]
include::modules/cnf-allocating-multiple-huge-page-sizes.adoc[leveloffset=+1]