mirror of
https://github.com/openshift/openshift-docs.git
synced 2026-02-05 03:47:04 +01:00
65 lines
5.4 KiB
Plaintext
65 lines
5.4 KiB
Plaintext
// Module included in the following assemblies:
|
|
//
|
|
// * scalability_and_performance/telco_core_ref_design_specs/telco-core-rds.adoc
|
|
|
|
:_mod-docs-content-type: REFERENCE
|
|
[id="telco-core-cpu-partitioning-and-performance-tuning_{context}"]
|
|
= CPU partitioning and performance tuning
|
|
|
|
New in this release::
|
|
* Optional support for the `acpi_idle` CPUIdle driver.
|
|
* The `systemReserved` field replaces the `autoSizingReserved` field to specify 11Gi memory for worker nodes and 30Gi for control plane nodes.
|
|
* Enable triggering a kernel panic through a non-maskable interrupt for system recovery and diagnostic purposes when `x86_64` architecture nodes become unresponsive.
|
|
|
|
Description::
|
|
CPU partitioning improves performance and reduces latency by separating sensitive workloads from general-purpose tasks, interrupts, and driver work queues.
|
|
The CPUs allocated to those auxiliary processes are referred to as *reserved* in the following sections.
|
|
In a system with Hyper-Threading enabled, a CPU is one hyper-thread.
|
|
|
|
Limits and requirements::
|
|
* The operating system needs a certain amount of CPU to perform all the support tasks, including kernel networking.
|
|
** A system with just user plane networking applications (DPDK) needs at least one core (2 hyper-threads when enabled) reserved for the operating system and the infrastructure components.
|
|
* In a system with Hyper-Threading enabled, core sibling threads must always be in the same pool of CPUs.
|
|
* The set of reserved and isolated cores must include all CPU cores.
|
|
* Core 0 of each NUMA node must be included in the reserved CPU set.
|
|
* Low latency workloads require special configuration to avoid being affected by interrupts, kernel scheduler, or other parts of the platform.
|
|
|
|
For more information, see "Creating a performance profile".
|
|
|
|
Engineering considerations::
|
|
* As of OpenShift 4.19, `cgroup v1` is no longer supported and has been removed.
|
|
All workloads must now be compatible with `cgroup v2`.
|
|
For more information, see link:https://www.redhat.com/en/blog/rhel-9-changes-context-red-hat-openshift-workloads[Red Hat Enterprise Linux 9 changes in the context of Red Hat OpenShift workloads].
|
|
* The minimum reserved capacity (`systemReserved`) required can be found by following the guidance in link:https://access.redhat.com/solutions/5843241[Which amount of CPU and memory are recommended to reserve for the system in OCP 4 nodes?].
|
|
** The specific values must be custom-tuned for each cluster based on its size and application workload.
|
|
** The minimum recommended `systemReserved` memory is 11Gi for worker nodes and 30Gi for control plane nodes.
|
|
* For schedulable control planes, the minimum recommended reserved capacity is at least 16 CPUs.
|
|
* The actual required reserved CPU capacity depends on the cluster configuration and workload attributes.
|
|
* The reserved CPU value must be rounded up to a full core (2 hyper-threads) alignment.
|
|
* Changes to CPU partitioning cause the nodes contained in the relevant machine config pool to be drained and rebooted.
|
|
* The reserved CPUs reduce the pod density, because the reserved CPUs are removed from the allocatable capacity of the {product-title} node.
|
|
* The real-time workload hint should be enabled for real-time capable workloads.
|
|
** Applying the real-time `workloadHint` setting results in the `nohz_full` kernel command line parameter being applied to improve performance of high performance applications.
|
|
When you apply the `workloadHint` setting, any isolated or burstable pods that do not have the `cpu-quota.crio.io: "disable"` annotation and a proper `runtimeClassName` value, are subject to CRI-O rate limiting.
|
|
When you set the `workloadHint` parameter, be aware of the tradeoff between increased performance and the potential impact of CRI-O rate limiting.
|
|
Ensure that required pods are correctly annotated.
|
|
* Hardware without IRQ affinity support affects isolated CPUs.
|
|
All server hardware must support IRQ affinity to ensure that pods with guaranteed CPU QoS can fully use allocated CPUs.
|
|
* OVS dynamically manages its `cpuset` entry to adapt to network traffic needs.
|
|
You do not need to reserve an additional CPU for handling high network throughput on the primary CNI.
|
|
* If workloads running on the cluster use kernel level networking, the RX/TX queue count for the participating NICs should be set to 16 or 32 queues if the hardware permits it.
|
|
Be aware of the default queue count.
|
|
With no configuration, the default queue count is one RX/TX queue per online CPU; which can result in too many interrupts being allocated.
|
|
* The irdma kernel module might result in the allocation of too many interrupt vectors on systems with high core counts.
|
|
To prevent this condition the reference configuration excludes this kernel module from loading through a kernel commandline argument in the `PerformanceProfile` resource.
|
|
Typically Core workloads do not require this kernel module.
|
|
* To enable the `acpi_idle` CPUIdle driver, for example for Intel FlexRAN workloads, add `intel_idle.max_cstate=0` to the `additionalKernelArgs` list in the `PerformanceProfile` resource.
|
|
* The `TunedPerformancePatch.yaml` file in the reference configures the `kernel.panic_on_unrecovered_nmi` sysctl parameter to enable triggering a kernel panic through BMC Non-Maskable Interrupt (NMI) on x86_64 architures.
|
|
This provides a mechanism to force a kernel panic for system recovery and diagnostic purposes when nodes become unresponsive.
|
|
+
|
|
[NOTE]
|
|
====
|
|
Some drivers do not deallocate the interrupts even after reducing the queue count.
|
|
====
|
|
|