From bfb1856dde7aeb62d1d0007bcb8fd910faf73bbc Mon Sep 17 00:00:00 2001 From: Aidan Reilly Date: Fri, 11 Dec 2020 16:30:16 +0000 Subject: [PATCH] changes for CNF-802 updates fixed dupe content error updates for CNF-802 CNF-802 Tweaks latest changes for CNF-802 updated id updated id updates for @dshchedr fixes for globallyDisableIrqLoadBalancing typo and sundry formatting fixes updates for marcel jan 13 yupdates changes 'enables' to 'disables' for nabling_interrupt_processing_for_individual_pods_cnf typo typo API formatting typos updated install procedure for CLI and GUI. removed spec from pao yaml, and set install to apply to all namespaces added 4.6 > 4.7 upgrade info, removed obsolete 4.6 PAO developer preview warnings added 4.6 > 4.7 upgrade info, removed obsolete 4.6 PAO developer preview warnings updates for marcel updates for marcel - upgrade from 4.6 typo in cmd changed title changed title PAO upgrade tweak typo typo put back the object name typo typo --- ...figure_for_irq_dynamic_load_balancing.adoc | 174 ++++++++++++++++++ ...eating-the-performance-profile-object.adoc | 2 +- ...alling-the-performance-addon-operator.adoc | 5 +- ...sing-for-guaranteed-pod-isolated-cpus.adoc | 56 ++++++ ...o-end-tests-for-platform-verification.adoc | 4 +- ...g-real-time-and-low-latency-workloads.adoc | 4 +- ...or-low-latency-via-performanceprofile.adoc | 8 +- ...-upgrading-performance-addon-operator.adoc | 42 ++++- ...nterrupt-processing-for-isolated-cpus.adoc | 29 +++ ...-addon-operator-for-low-latency-nodes.adoc | 6 + 10 files changed, 304 insertions(+), 26 deletions(-) create mode 100644 modules/cnf-configure_for_irq_dynamic_load_balancing.adoc create mode 100644 modules/cnf-managing-device-interrupt-processing-for-guaranteed-pod-isolated-cpus.adoc create mode 100644 modules/cnf-use-device-interrupt-processing-for-isolated-cpus.adoc diff --git a/modules/cnf-configure_for_irq_dynamic_load_balancing.adoc b/modules/cnf-configure_for_irq_dynamic_load_balancing.adoc new file mode 100644 index 0000000000..c7bf57c911 --- /dev/null +++ b/modules/cnf-configure_for_irq_dynamic_load_balancing.adoc @@ -0,0 +1,174 @@ +// Module included in the following assemblies: +// +// scalability_and_performance/cnf-performance-addon-operator-for-low-latency-nodes.adoc + +[id="configuring_for_irq_dynamic_load_balancing_{context}"] += Configuring a node for IRQ dynamic load balancing + +To configure a cluster node to handle IRQ dynamic load balancing, do the following: + +. Log in to the {product-title} cluster as a user with cluster-admin privileges. +. Set the performance profile `apiVersion` to use `performance.openshift.io/v2`. +. Remove the `globallyDisableIrqLoadBalancing` field or set it to `false`. +. Set the appropriate isolated and reserved CPUs. The following snippet illustrates a profile that reserves 2 CPUs. IRQ load-balancing is enabled for pods running on the `isolated` CPU set: ++ +[source,yaml] +---- +apiVersion: performance.openshift.io/v2 +kind: PerformanceProfile +metadata: + name: dynamic-irq-profile +spec: + cpu: + isolated: 2-5 + reserved: 0-1 +... +---- + +. Create the pod that uses exclusive CPUs, and set `irq-load-balancing.crio.io` and `cpu-quota.crio.io` annotations to `disable`. For example: ++ +[source,yaml] +---- +apiVersion: v1 +kind: Pod +metadata: + name: dynamic-irq-pod + annotations: + irq-load-balancing.crio.io: "disable" + cpu-quota.crio.io: "disable" +spec: + containers: + - name: dynamic-irq-pod + image: "quay.io/openshift-kni/cnf-tests:4.6" + command: ["sleep", "10h"] + resources: + requests: + cpu: 2 + memory: "200M" + limits: + cpu: 2 + memory: "200M" + nodeSelector: + node-role.kubernetes.io/worker-cnf: "" + runtimeClassName: performance-dynamic-irq-profile +... +---- + +. Enter the pod `runtimeClassName` in the form performance-, where is the `name` from the `PerformanceProfile` YAML, in this example, `performance-dynamic-irq-profile`. +. Set the node selector to target a cnf-worker. +. Ensure the pod is running correctly. Status should be `running`, and the correct cnf-worker node should be set: ++ +[source,terminal] +---- +$ oc get pod -o wide +---- ++ +.Expected output ++ +[source,terminal] +---- +NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES +dynamic-irq-pod 1/1 Running 0 5h33m +---- +. Get the CPUs that the pod configured for IRQ dynamic load balancing runs on: ++ +[source,terminal] +---- +$ oc exec -it dynamic-irq-pod -- /bin/bash -c "grep Cpus_allowed_list /proc/self/status | awk '{print $2}'" +---- ++ +.Expected output ++ +[source,terminal] +---- +Cpus_allowed_list: 2-3 +---- +. Ensure the node configuration is applied correctly. SSH into the node to verify the configuration. ++ +[source,terminal] +---- +$ oc debug node/ +---- ++ +.Expected output ++ +[source,terminal] +---- +Starting pod/-debug ... +To use host binaries, run `chroot /host` + +Pod IP: +If you don't see a command prompt, try pressing enter. + +sh-4.4# +---- + +. Verify that you can use the node file system: ++ +[source,terminal] +---- +sh-4.4# chroot /host +---- ++ +.Expected output ++ +[source,terminal] +---- +sh-4.4# +---- + +. Ensure the default system CPU affinity mask does not include the `dynamic-irq-pod` CPUs, for example, CPUs 2 and 3. ++ +[source,terminal] +---- +$ cat /proc/irq/default_smp_affinity +---- ++ +.Example output ++ +[source,terminal] +---- +33 +---- +. Ensure the system IRQs are not configured to run on the `dynamic-irq-pod` CPUs: ++ +[source,terminal] +---- +find /proc/irq/ -name smp_affinity_list -exec sh -c 'i="$1"; mask=$(cat $i); file=$(echo $i); echo $file: $mask' _ {} \; +---- ++ +.Example output ++ +[source,terminal] +---- +/proc/irq/0/smp_affinity_list: 0-5 +/proc/irq/1/smp_affinity_list: 5 +/proc/irq/2/smp_affinity_list: 0-5 +/proc/irq/3/smp_affinity_list: 0-5 +/proc/irq/4/smp_affinity_list: 0 +/proc/irq/5/smp_affinity_list: 0-5 +/proc/irq/6/smp_affinity_list: 0-5 +/proc/irq/7/smp_affinity_list: 0-5 +/proc/irq/8/smp_affinity_list: 4 +/proc/irq/9/smp_affinity_list: 4 +/proc/irq/10/smp_affinity_list: 0-5 +/proc/irq/11/smp_affinity_list: 0 +/proc/irq/12/smp_affinity_list: 1 +/proc/irq/13/smp_affinity_list: 0-5 +/proc/irq/14/smp_affinity_list: 1 +/proc/irq/15/smp_affinity_list: 0 +/proc/irq/24/smp_affinity_list: 1 +/proc/irq/25/smp_affinity_list: 1 +/proc/irq/26/smp_affinity_list: 1 +/proc/irq/27/smp_affinity_list: 5 +/proc/irq/28/smp_affinity_list: 1 +/proc/irq/29/smp_affinity_list: 0 +/proc/irq/30/smp_affinity_list: 0-5 +---- + +Some IRQ controllers do not support IRQ re-balancing and will always expose all online CPUs as the IRQ mask. These IRQ controllers effectively run on CPU 0. For more information on the host configuration, SSH into the host and run the following, replacing `` with the CPU number that you want to query: + +[source,terminal] +---- +$ cat /proc/irq//effective_affinity +---- diff --git a/modules/cnf-creating-the-performance-profile-object.adoc b/modules/cnf-creating-the-performance-profile-object.adoc index 2d68540586..585032156b 100644 --- a/modules/cnf-creating-the-performance-profile-object.adoc +++ b/modules/cnf-creating-the-performance-profile-object.adoc @@ -27,7 +27,7 @@ will be reserved for housekeeping, and CPUs that will be used for running the wo This is a typical performance profile: + ---- -apiversion: performance.openshift.io/v1alpha1 +apiversion: performance.openshift.io/v2 kind: PerformanceProfile metadata: name: diff --git a/modules/cnf-installing-the-performance-addon-operator.adoc b/modules/cnf-installing-the-performance-addon-operator.adoc index 647fc5aef4..6999370704 100644 --- a/modules/cnf-installing-the-performance-addon-operator.adoc +++ b/modules/cnf-installing-the-performance-addon-operator.adoc @@ -55,9 +55,6 @@ kind: OperatorGroup metadata: name: openshift-performance-addon-operator namespace: openshift-performance-addon-operator -spec: - targetNamespaces: - - openshift-performance-addon-operator ---- .. Create the `OperatorGroup` CR by running the following command: @@ -132,7 +129,7 @@ You must create the `Namespace` CR and `OperatorGroup` CR as mentioned in the pr .. Choose *Performance Addon Operator* from the list of available Operators, and then click *Install*. -.. On the *Install Operator* page, under *A specific namespace on the cluster* select *openshift-performance-addon-operator*. Then, click *Install*. +.. On the *Install Operator* page, select *All namespaces on the cluster*. Then, click *Install*. . Optional: Verify that the performance-addon-operator installed successfully: diff --git a/modules/cnf-managing-device-interrupt-processing-for-guaranteed-pod-isolated-cpus.adoc b/modules/cnf-managing-device-interrupt-processing-for-guaranteed-pod-isolated-cpus.adoc new file mode 100644 index 0000000000..805706dc41 --- /dev/null +++ b/modules/cnf-managing-device-interrupt-processing-for-guaranteed-pod-isolated-cpus.adoc @@ -0,0 +1,56 @@ +// CNF-802 Infrastructure-provided interrupt processing for guaranteed pod CPUs +// Module included in the following assemblies: +// +// *cnf-performance-addon-operator-for-low-latency-nodes.adoc + +[id="managing-device-interrupt-processing-for-guaranteed-pod-isolated-cpus_{context}"] += Managing device interrupt processing for guaranteed pod isolated CPUs + +The Performance Addon Operator manages host CPUs by dividing them into reserved CPUs for cluster and operating system housekeeping duties, and isolated CPUs for workloads. CPUs that are used for low latency workloads are set as isolated. + +Device interrupts are load balanced between all isolated and reserved CPUs in order to avoid CPUs being overloaded, with the exception of CPUs where there is a guaranteed pod running. Guaranteed pod CPUs are prevented from processing device interrupts when the relevant annotations are set for the pod. + +In the performance profile, `globallyDisableIrqLoadBalancing` is used to manage whether device interrupts are processed or not. For certain workloads the reserved CPUs are not always sufficient for dealing with device interrupts, and for this reason, device interrupts are not globally disabled on the isolated CPUs. By default, Performance Addon Operator does not disable device interrupts on isolated CPUs. + +To achieve low latency for workloads, some (but not all) pods require the CPUs they are running on to not process device interrupts. A pod annotation, `irq-load-balancing.crio.io`, is used to define whether device interrupts are processed or not. When configured, CRI-O disables device interrupts only as long as the pod is running. + +[id="configuring-global-device-interrupts-handling-for-isolated-cpus_{context}"] +== Disabling global device interrupts handling in Performance Addon Operator + +To configure Performance Addon Operator to disable global device interrupts for the isolated CPU set, set the `globallyDisableIrqLoadBalancing` field in the performance profile to `true`. When `true`, conflicting pod annotations are ignored. When `false`, IRQ loads are balanced across all CPUs. + +A performance profile snippet illustrates this setting: + +[source,yaml] +---- +apiVersion: performance.openshift.io/v2 +kind: PerformanceProfile +metadata: + name: manual +spec: + globallyDisableIrqLoadBalancing: + enabled: "true" +... +---- + +[id="disabling_interrupt_processing_for_individual_pods_{context}"] +== Disabling interrupt processing for individual pods + +To disable interrupt processing for individual pods, ensure that `globallyDisableIrqLoadBalancing` is set to `false` in the performance profile. Then, in the pod specification, set the `irq-load-balancing.crio.io` and `cpu-quota.crio.io` pod annotations to `disable`. An example pod specification snippet that illustrates this is below: + +[source,yaml] +---- +apiVersion: performance.openshift.io/v2 +kind: Pod +metadata: + annotations: + irq-load-balancing.crio.io: "disable" + cpu-quota.crio.io: "disable" +spec: + runtimeClassName: performance- +... +---- + + + + diff --git a/modules/cnf-performing-end-to-end-tests-for-platform-verification.adoc b/modules/cnf-performing-end-to-end-tests-for-platform-verification.adoc index 650b6ee79b..aa0aba18f5 100644 --- a/modules/cnf-performing-end-to-end-tests-for-platform-verification.adoc +++ b/modules/cnf-performing-end-to-end-tests-for-platform-verification.adoc @@ -444,7 +444,7 @@ To do this, a profile like the following one can be mounted inside the container [source,yaml] ---- -apiVersion: performance.openshift.io/v1 +apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: name: performance @@ -552,7 +552,7 @@ To do this, use a profile like the following one that can be mounted inside the [source,yaml] ---- -apiVersion: performance.openshift.io/v1 +apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: name: performance diff --git a/modules/cnf-provisioning-real-time-and-low-latency-workloads.adoc b/modules/cnf-provisioning-real-time-and-low-latency-workloads.adoc index cab9d0179f..9423cd7353 100644 --- a/modules/cnf-provisioning-real-time-and-low-latency-workloads.adoc +++ b/modules/cnf-provisioning-real-time-and-low-latency-workloads.adoc @@ -48,7 +48,7 @@ You must decide which nodes will be configured with real-time workloads. It coul + [source,yaml] ---- -apiVersion: performance.openshift.io/v1 +apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: name: example-performanceprofile @@ -176,7 +176,7 @@ Functionality to disable or enable CPU load balancing is implemented on the CRI- + [source,yaml] ---- -apiVersion: performance.openshift.io/v1 +apiVersion: performance.openshift.io/v2 kind: PerformanceProfile ... status: diff --git a/modules/cnf-tuning-nodes-for-low-latency-via-performanceprofile.adoc b/modules/cnf-tuning-nodes-for-low-latency-via-performanceprofile.adoc index 088e445543..4bf3a1075b 100644 --- a/modules/cnf-tuning-nodes-for-low-latency-via-performanceprofile.adoc +++ b/modules/cnf-tuning-nodes-for-low-latency-via-performanceprofile.adoc @@ -3,12 +3,6 @@ // Epic CNF-422 (4.5) // scalability_and_performance/cnf-performance-addon-operator-for-low-latency-nodes.adoc -[IMPORTANT] -==== -The feature described in this document is for *Developer Preview* purposes and is *not supported* by Red Hat at this time. -This feature could cause nodes to reboot and not be available. -==== - [id="cnf-tuning-nodes-for-low-latency-via-performanceprofile_{context}"] = Tuning nodes for low latency with the performance profile @@ -32,7 +26,7 @@ This is a typical performance profile: + [source,yaml] ---- -apiVersion: performance.openshift.io/v1 +apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: name: performance diff --git a/modules/cnf-upgrading-performance-addon-operator.adoc b/modules/cnf-upgrading-performance-addon-operator.adoc index a766edebd2..b2b2569825 100644 --- a/modules/cnf-upgrading-performance-addon-operator.adoc +++ b/modules/cnf-upgrading-performance-addon-operator.adoc @@ -1,24 +1,16 @@ // Module included in the following assemblies: // // * scalability_and_performance/cnf-performance-addon-operator-for-low-latency-nodes.adoc -// - -[IMPORTANT] -==== -The feature described in this document is for *Developer Preview* purposes and is *not supported* by Red Hat at this time. -This feature could cause nodes to reboot and not be available. -==== [id="upgrading-performance-addon-operator_{context}"] = Upgrading Performance Addon Operator -You can manually upgrade to the next minor version of Performance Addon Operator and monitor the status of an update -by using the web console. +You can manually upgrade to the next minor version of Performance Addon Operator and monitor the status of an update by using the web console. [id="about-upgrading-performance-addon-operator_{context}"] == About upgrading Performance Addon Operator -* You can upgrade to the next minor version of Performance Addon Operator by using the OpenShift web console to change the channel of your Operator subscription. +* You can upgrade to the next minor version of Performance Addon Operator by using the {product-title} web console to change the channel of your Operator subscription. * You can enable automatic z-stream updates during Performance Addon Operator installation. @@ -63,6 +55,36 @@ You can manually upgrade Performance Addon Operator to the next minor version by $ oc get csv -n openshift-performance-addon-operator ---- +[id="upgrading-performance-addon-operator-configured-for-a-specific-namespace_{context}"] +=== Upgrading Performance Addon Operator when previously installed to a specific namespace + +If you previously installed the Performance Addon Operator to a specific namespace on the cluster, for example `openshift-performance-addon-operator`, modify the `OperatorGroup` object to remove the `targetNamespaces` entry before upgrading. + +.Prerequisites + +* Install the {product-title} CLI (oc). +* Log in to the OpenShift cluster as a user with cluster-admin privileges. + +.Procedure + +. Edit the Performance Addon Operator `OperatorGroup` CR and remove the `spec` element that contains the `targetNamespaces` entry by running the following command: ++ +[source,terminal] +---- +$ oc patch operatorgroup -n openshift-performance-addon-operator openshift-performance-addon-operator --type json -p '[{ "op": "remove", "path": "/spec" }]' +---- + +. Wait until the Operator Lifecycle Manager (OLM) processes the change. +. Verify that the OperatorGroup CR change has been successfully applied. Check that the `OperatorGroup` CR `spec` element has been removed: ++ +[source,terminal] +---- +$ oc describe -n openshift-performance-addon-operator og openshift-performance-addon-operator +---- + +. Proceed with the Performance Addon Operator upgrade. +//. Proceed with the xref:../scalability_and_performance/cnf-performance-addon-operator-for-low-latency-nodes#upgrading-performance-addon-operator_{context}[Performance Addon Operator upgrade]. + [id="performance-addon-operator-monitoring-upgrade-status_{context}"] == Monitoring upgrade status The best way to monitor Performance Addon Operator upgrade status is to watch the `ClusterServiceVersion` (CSV) `PHASE`. diff --git a/modules/cnf-use-device-interrupt-processing-for-isolated-cpus.adoc b/modules/cnf-use-device-interrupt-processing-for-isolated-cpus.adoc new file mode 100644 index 0000000000..902d5ca028 --- /dev/null +++ b/modules/cnf-use-device-interrupt-processing-for-isolated-cpus.adoc @@ -0,0 +1,29 @@ +// CNF-802 Infrastructure-provided interrupt processing for guaranteed pod CPUs +// Module included in the following assemblies: +// +// *cnf-performance-addon-operator-for-low-latency-nodes.adoc + +[id="use-device-interrupt-processing-for-isolated-cpus_{context}"] += Upgrading the performance profile to use device interrupt processing + +When you upgrade the Performance Addon Operator performance profile custom resource definition (CRD) from v1 or v1alpha1 to v2, `globallyDisableIrqLoadBalancing` is set to `true` on existing profiles. + +[NOTE] +==== +When `globallyDisableIrqLoadBalancing` is set to `true`, device interrupts are processed across all CPUs as long as they don't belong to a guaranteed pod. +==== + +[id="pao_supported_api_versions_{context}"] +== Supported API Versions + +The Performance Addon Operator supports `v2`, `v1`, and `v1alpha1` for the performance profile `apiVersion` field. The v1 and v1alpha1 APIs are identical. The v2 API includes an optional boolean field `globallyDisableIrqLoadBalancing` with a default value of `false`. + +[id="upgrading_pao_api_from_v1alpha1_to_v1_{context}"] +=== Upgrading Performance Addon Operator API from v1alpha1 to v1 + +When upgrading Performance Addon Operator API version from v1alpha1 to v1, the v1alpha1 performance profiles are converted on-the-fly using a "None" Conversion strategy and served to the Performance Addon Operator with API version v1. + +[id="upgrading_pao_api_from_v1alpha1_to_v1_or_v2_{context}"] +=== Upgrading Performance Addon Operator API from v1alpha1 or v1 to v2 + +When upgrading from an older Performance Addon Operator API version, the existing v1 and v1alpha1 performance profiles are converted using a conversion webhook that injects the `globallyDisableIrqLoadBalancing` field with a value of `true`. diff --git a/scalability_and_performance/cnf-performance-addon-operator-for-low-latency-nodes.adoc b/scalability_and_performance/cnf-performance-addon-operator-for-low-latency-nodes.adoc index eae8b8d05e..7dc9a3102c 100644 --- a/scalability_and_performance/cnf-performance-addon-operator-for-low-latency-nodes.adoc +++ b/scalability_and_performance/cnf-performance-addon-operator-for-low-latency-nodes.adoc @@ -14,6 +14,12 @@ include::modules/cnf-upgrading-performance-addon-operator.adoc[leveloffset=+1] include::modules/cnf-provisioning-real-time-and-low-latency-workloads.adoc[leveloffset=+1] +include::modules/cnf-managing-device-interrupt-processing-for-guaranteed-pod-isolated-cpus.adoc[leveloffset=+2] + +include::modules/cnf-use-device-interrupt-processing-for-isolated-cpus.adoc[leveloffset=+2] + +include::modules/cnf-configure_for_irq_dynamic_load_balancing.adoc[leveloffset=+2] + include::modules/cnf-configuring-huge-pages.adoc[leveloffset=+1] include::modules/cnf-allocating-multiple-huge-page-sizes.adoc[leveloffset=+1]