openshift-docs/modules/setting-up-cpu-manager.adoc

// Module included in the following assemblies:
//
// * scalability_and_performance/using-cpu-manager.adoc
// * post_installation_configuration/node-tasks.adoc

:_mod-docs-content-type: PROCEDURE
[id="setting_up_cpu_manager_{context}"]
= Setting up CPU Manager

To configure CPU manager, create a KubeletConfig custom resource (CR) and apply it to the desired set of nodes.

.Procedure

. Label a node by running the following command:
+
[source,terminal]
----
# oc label node perf-node.example.com cpumanager=true
----

. To enable CPU Manager for all compute nodes, edit the CR by running the following command:
+
[source,terminal]
----
# oc edit machineconfigpool worker
----

. Add the `custom-kubelet: cpumanager-enabled` label to `metadata.labels` section.
+
[source,yaml]
----
metadata:
  creationTimestamp: 2020-xx-xxx
  generation: 3
  labels:
    custom-kubelet: cpumanager-enabled
----

. Create a `KubeletConfig`, `cpumanager-kubeletconfig.yaml`, custom resource (CR). Refer to the label created in the previous step to have the correct nodes updated with the new kubelet config. See the `machineConfigPoolSelector` section:
+
[source,yaml]
----
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: cpumanager-enabled
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: cpumanager-enabled
  kubeletConfig:
     cpuManagerPolicy: static <1>
     cpuManagerReconcilePeriod: 5s <2>
----
<1> Specify a policy:
* `none`. This policy explicitly enables the existing default CPU affinity scheme, providing no affinity beyond what the scheduler does automatically. This is the default policy.
* `static`. This policy allows containers in guaranteed pods with integer CPU requests. It also limits access to exclusive CPUs on the node. If `static`, you must use a lowercase `s`.
<2> Optional. Specify the CPU Manager reconcile frequency. The default is `5s`.

. Create the dynamic kubelet config by running the following command:
+
[source,terminal]
----
# oc create -f cpumanager-kubeletconfig.yaml
----
+
This adds the CPU Manager feature to the kubelet config and, if needed, the Machine Config Operator (MCO) reboots the node. To enable CPU Manager, a reboot is not needed.

. Check for the merged kubelet config by running the following command:
+
[source,terminal]
----
# oc get machineconfig 99-worker-XXXXXX-XXXXX-XXXX-XXXXX-kubelet -o json | grep ownerReference -A7
----
+
.Example output
[source,json]
----
       "ownerReferences": [
            {
                "apiVersion": "machineconfiguration.openshift.io/v1",
                "kind": "KubeletConfig",
                "name": "cpumanager-enabled",
                "uid": "7ed5616d-6b72-11e9-aae1-021e1ce18878"
            }
        ]
----

. Check the compute node for the updated `kubelet.conf` file by running the following command:
+
[source,terminal]
----
# oc debug node/perf-node.example.com
sh-4.2# cat /host/etc/kubernetes/kubelet.conf | grep cpuManager
----
+
.Example output
[source,terminal]
----
cpuManagerPolicy: static        <1>
cpuManagerReconcilePeriod: 5s   <2>
----
<1> `cpuManagerPolicy` is defined when you create the `KubeletConfig` CR.
<2> `cpuManagerReconcilePeriod` is defined when you create the `KubeletConfig` CR.

. Create a project by running the following command:
+
[source,terminal]
----
$ oc new-project <project_name>
----

. Create a pod that requests a core or multiple cores. Both limits and requests must have their CPU value set to a whole integer. That is the number of cores that will be dedicated to this pod:
+
[source,terminal]
----
# cat cpumanager-pod.yaml
----
+
.Example output
[source,yaml]
----
apiVersion: v1
kind: Pod
metadata:
  generateName: cpumanager-
spec:
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: cpumanager
    image: gcr.io/google_containers/pause:3.2
    resources:
      requests:
        cpu: 1
        memory: "1G"
      limits:
        cpu: 1
        memory: "1G"
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop: [ALL]
  nodeSelector:
    cpumanager: "true"
----

. Create the pod:
+
[source,terminal]
----
# oc create -f cpumanager-pod.yaml
----

.Verification

. Verify that the pod is scheduled to the node that you labeled by running the following command:
+
[source,terminal]
----
# oc describe pod cpumanager
----
+
.Example output
[source,terminal]
----
Name:               cpumanager-6cqz7
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:  perf-node.example.com/xxx.xx.xx.xxx
...
 Limits:
      cpu:     1
      memory:  1G
    Requests:
      cpu:        1
      memory:     1G
...
QoS Class:       Guaranteed
Node-Selectors:  cpumanager=true
----

. Verify that a CPU has been exclusively assigned to the pod by running the following command:
+
[source,terminal]
----
# oc describe node --selector='cpumanager=true' | grep -i cpumanager- -B2
----
+
.Example output
[source,terminal]
----
NAMESPACE    NAME                CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
cpuman       cpumanager-mlrrz    1 (28%)       1 (28%)     1G (13%)         1G (13%)       27m
----

. Verify that the `cgroups` are set up correctly. Get the process ID (PID) of the `pause` process by running the following commands:
+
[source,terminal]
----
# oc debug node/perf-node.example.com
----
+
[source,terminal]
----
sh-4.2# systemctl status | grep -B5 pause
----
+
[NOTE]
====
If the output returns multiple pause process entries, you must identify the correct pause process.
====
+
.Example output
[source,terminal]
----
# ├─init.scope
│ └─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 17
└─kubepods.slice
  ├─kubepods-pod69c01f8e_6b74_11e9_ac0f_0a2b62178a22.slice
  │ ├─crio-b5437308f1a574c542bdf08563b865c0345c8f8c0b0a655612c.scope
  │ └─32706 /pause
----

. Verify that pods of quality of service (QoS) tier `Guaranteed` are placed within the `kubepods.slice` subdirectory by running the following commands:
+
[source,terminal]
----
# cd /sys/fs/cgroup/kubepods.slice/kubepods-pod69c01f8e_6b74_11e9_ac0f_0a2b62178a22.slice/crio-b5437308f1ad1a7db0574c542bdf08563b865c0345c86e9585f8c0b0a655612c.scope
----
+
[source,terminal]
----
# for i in `ls cpuset.cpus cgroup.procs` ; do echo -n "$i "; cat $i ; done
----
+
[NOTE]
====
Pods of other QoS tiers end up in child `cgroups` of the parent `kubepods`.
====
+
.Example output
[source,terminal]
----
cpuset.cpus 1
tasks 32706
----

. Check the allowed CPU list for the task by running the following command:
+
[source,terminal]
----
# grep ^Cpus_allowed_list /proc/32706/status
----
+
.Example output
[source,terminal]
----
 Cpus_allowed_list:    1
----

. Verify that another pod on the system cannot run on the core allocated for the `Guaranteed` pod. For example, to verify the pod in the `besteffort` QoS tier, run the following commands:
+
[source,terminal]
----
# cat /sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podc494a073_6b77_11e9_98c0_06bba5c387ea.slice/crio-c56982f57b75a2420947f0afc6cafe7534c5734efc34157525fa9abbf99e3849.scope/cpuset.cpus
----
+
[source,terminal]
----
# oc describe node perf-node.example.com
----
+
.Example output
[source,terminal]
----
...
Capacity:
 attachable-volumes-aws-ebs:  39
 cpu:                         2
 ephemeral-storage:           124768236Ki
 hugepages-1Gi:               0
 hugepages-2Mi:               0
 memory:                      8162900Ki
 pods:                        250
Allocatable:
 attachable-volumes-aws-ebs:  39
 cpu:                         1500m
 ephemeral-storage:           124768236Ki
 hugepages-1Gi:               0
 hugepages-2Mi:               0
 memory:                      7548500Ki
 pods:                        250
-------                               ----                           ------------  ----------  ---------------  -------------  ---
  default                                 cpumanager-6cqz7               1 (66%)       1 (66%)     1G (12%)         1G (12%)       29m

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests          Limits
  --------                    --------          ------
  cpu                         1440m (96%)       1 (66%)
----
+
This VM has two CPU cores. The `system-reserved` setting reserves 500 millicores, meaning that half of one core is subtracted from the total capacity of the node to arrive at the `Node Allocatable` amount. You can see that `Allocatable CPU` is 1500 millicores. This means you can run one of the CPU Manager pods since each will take one whole core. A whole core is equivalent to 1000 millicores. If you try to schedule a second pod, the system will accept the pod, but it will never be scheduled:
+
[source,terminal]
----
NAME                    READY   STATUS    RESTARTS   AGE
cpumanager-6cqz7        1/1     Running   0          33m
cpumanager-7qc2t        0/1     Pending   0          11s
----