mirror of
https://github.com/openshift/openshift-docs.git
synced 2026-02-05 12:46:18 +01:00
TELCODOCS-419 first commit for setting interface sysctls
This commit is contained in:
committed by
openshift-cherrypick-robot
parent
1c754b6660
commit
3121b759aa
@@ -982,6 +982,8 @@ Topics:
|
||||
File: configuring-node-port-service-range
|
||||
- Name: Configuring IP failover
|
||||
File: configuring-ipfailover
|
||||
- Name: Configuring interface-level network sysctls
|
||||
File: setting-interface-level-network-sysctls
|
||||
- Name: Using SCTP
|
||||
File: using-sctp
|
||||
Distros: openshift-enterprise,openshift-origin
|
||||
|
||||
BIN
images/264_OpenShift_CNI_plugin_chain_0622.png
Normal file
BIN
images/264_OpenShift_CNI_plugin_chain_0622.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 43 KiB |
BIN
images/264_OpenShift_CNI_plugin_chain_0722.png
Normal file
BIN
images/264_OpenShift_CNI_plugin_chain_0722.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 44 KiB |
114
modules/nodes-containers-start-pod-safe-sysctls.adoc
Normal file
114
modules/nodes-containers-start-pod-safe-sysctls.adoc
Normal file
@@ -0,0 +1,114 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * nodes/containers/nodes-containers-sysctls.adoc
|
||||
|
||||
:_content-type: PROCEDURE
|
||||
[id="nodes-starting-pod-safe-sysctls_{context}"]
|
||||
= Starting a pod with safe sysctls
|
||||
|
||||
You can set sysctls on pods using the pod's `securityContext`. The `securityContext` applies to all containers in the same pod.
|
||||
|
||||
Safe sysctls are allowed by default.
|
||||
|
||||
This example uses the pod `securityContext` to set the following safe sysctls:
|
||||
|
||||
* `kernel.shm_rmid_forced`
|
||||
* `net.ipv4.ip_local_port_range`
|
||||
* `net.ipv4.tcp_syncookies`
|
||||
* `net.ipv4.ping_group_range`
|
||||
|
||||
[WARNING]
|
||||
====
|
||||
To avoid destabilizing your operating system, modify sysctl parameters only after you understand their effects.
|
||||
====
|
||||
|
||||
Use this procedure to start a pod with the configured sysctl settings.
|
||||
[NOTE]
|
||||
====
|
||||
In most cases you modify an existing pod definition and add the `securityContext` spec.
|
||||
====
|
||||
|
||||
|
||||
.Procedure
|
||||
|
||||
. Create a YAML file `sysctl_pod.yaml` that defines an example pod and add the `securityContext` spec, as shown in the following example:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: sysctl-example
|
||||
namespace: default
|
||||
spec:
|
||||
containers:
|
||||
- name: podexample
|
||||
image: centos
|
||||
command: ["bin/bash", "-c", "sleep INF"]
|
||||
securityContext:
|
||||
runAsUser: 2000 <1>
|
||||
runAsGroup: 3000 <2>
|
||||
allowPrivilegeEscalation: false <3>
|
||||
capabilities: <4>
|
||||
drop: ["ALL"]
|
||||
securityContext:
|
||||
runAsNonRoot: true <5>
|
||||
seccompProfile: <6>
|
||||
type: RuntimeDefault
|
||||
sysctls:
|
||||
- name: kernel.shm_rmid_forced
|
||||
value: "1"
|
||||
- name: net.ipv4.ip_local_port_range
|
||||
value: "32770 60666"
|
||||
- name: net.ipv4.tcp_syncookies
|
||||
value: "0"
|
||||
- name: net.ipv4.ping_group_range
|
||||
value: "0 200000000"
|
||||
----
|
||||
<1> `runAsUser` controls which user ID the container is run with.
|
||||
<2> `runAsGroup` controls which primary group ID the containers is run with.
|
||||
<3> `allowPrivilegeEscalation` determines if a pod can request to allow privilege escalation. If unspecified, it defaults to true. This boolean directly controls whether the `no_new_privs` flag gets set on the container process.
|
||||
<4> `capabilities` permit privileged actions without giving full root access. This policy ensures all capabilities are dropped from the pod.
|
||||
<5> `runAsNonRoot: true` requires that the container will run with a user with any UID other than 0.
|
||||
<6> `RuntimeDefault` enables the default seccomp profile for a pod or container workload.
|
||||
|
||||
. Create the pod by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc apply -f sysctl_pod.yaml
|
||||
----
|
||||
+
|
||||
. Verify that the pod is created by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc get pod
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
sysctl-example 1/1 Running 0 14s
|
||||
----
|
||||
|
||||
. Log in to the pod by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc rsh sysctl-example
|
||||
----
|
||||
|
||||
. Verify the values of the configured sysctl flags. For example, find the value `kernel.shm_rmid_forced` by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
sh-4.4# sysctl kernel.shm_rmid_forced
|
||||
----
|
||||
+
|
||||
.Expected output
|
||||
[source,terminal]
|
||||
----
|
||||
kernel.shm_rmid_forced = 1
|
||||
----
|
||||
@@ -6,94 +6,17 @@
|
||||
[id="nodes-containers-sysctls-about_{context}"]
|
||||
= About sysctls
|
||||
|
||||
In Linux, the sysctl interface allows an administrator to modify kernel
|
||||
parameters at runtime. Parameters are available via the *_/proc/sys/_* virtual
|
||||
process file system. The parameters cover various subsystems, such as:
|
||||
In Linux, the sysctl interface allows an administrator to modify kernel parameters at runtime. Parameters are available from the `_/proc/sys/_` virtual process file system. The parameters cover various subsystems, such as:
|
||||
|
||||
- kernel (common prefix: *_kernel._*)
|
||||
- networking (common prefix: *_net._*)
|
||||
- virtual memory (common prefix: *_vm._*)
|
||||
- MDADM (common prefix: *_dev._*)
|
||||
- kernel (common prefix: `_kernel._`)
|
||||
- networking (common prefix: `_net._`)
|
||||
- virtual memory (common prefix: `_vm._`)
|
||||
- MDADM (common prefix: `_dev._`)
|
||||
|
||||
More subsystems are described in
|
||||
link:https://www.kernel.org/doc/Documentation/sysctl/README[Kernel documentation].
|
||||
More subsystems are described in link:https://www.kernel.org/doc/Documentation/sysctl/README[Kernel documentation].
|
||||
To get a list of all parameters, run:
|
||||
|
||||
[source,terminal]
|
||||
----
|
||||
$ sudo sysctl -a
|
||||
----
|
||||
|
||||
[[namespaced-vs-node-level-sysctls]]
|
||||
== Namespaced versus node-level sysctls
|
||||
|
||||
A number of sysctls are _namespaced_ in the Linux kernels. This means that
|
||||
you can set them independently for each pod on a node. Being namespaced is a
|
||||
requirement for sysctls to be accessible in a pod context within Kubernetes.
|
||||
|
||||
The following sysctls are known to be namespaced:
|
||||
|
||||
- *_kernel.shm*_*
|
||||
- *_kernel.msg*_*
|
||||
- *_kernel.sem_*
|
||||
- *_fs.mqueue.*_*
|
||||
|
||||
Additionally, most of the sysctls in the *net.** group are known
|
||||
to be namespaced. Their namespace adoption differs based on the kernel
|
||||
version and distributor.
|
||||
|
||||
Sysctls that are not namespaced are called _node-level_ and must be set
|
||||
manually by the cluster administrator, either by means of the underlying Linux
|
||||
distribution of the nodes, such as by modifying the *_/etc/sysctls.conf_* file,
|
||||
or by using a daemon set with privileged containers. You can use
|
||||
the Node Tuning Operator to set _node-level_ sysctls.
|
||||
|
||||
|
||||
[NOTE]
|
||||
====
|
||||
Consider marking nodes with special sysctls as tainted. Only schedule pods onto
|
||||
them that need those sysctl settings. Use the taints and toleration feature to mark the nodes.
|
||||
====
|
||||
|
||||
[[safe-vs-unsafe-sysclts]]
|
||||
== Safe versus unsafe sysctls
|
||||
|
||||
Sysctls are grouped into _safe_ and _unsafe_ sysctls.
|
||||
|
||||
For a sysctl to be considered safe, it must use proper
|
||||
namespacing and must be properly isolated between pods on the same
|
||||
node. This means that if you set a sysctl for one pod it must not:
|
||||
|
||||
- Influence any other pod on the node
|
||||
- Harm the node's health
|
||||
- Gain CPU or memory resources outside of the resource limits of a pod
|
||||
|
||||
{product-title} supports, or whitelists, the following sysctls
|
||||
in the safe set:
|
||||
|
||||
- *_kernel.shm_rmid_forced_*
|
||||
- *_net.ipv4.ip_local_port_range_*
|
||||
- *_net.ipv4.tcp_syncookies_*
|
||||
- *_net.ipv4.ping_group_range_*
|
||||
|
||||
All safe sysctls are enabled by default. You can use a sysctl in a pod by modifying
|
||||
the `Pod` spec.
|
||||
|
||||
Any sysctl not whitelisted by {product-title} is considered unsafe for {product-title}.
|
||||
Note that being namespaced alone is not sufficient for the sysctl to be considered safe.
|
||||
|
||||
All unsafe sysctls are disabled by default, and the cluster administrator must
|
||||
manually enable them on a per-node basis. Pods with disabled unsafe sysctls
|
||||
are scheduled but do not launch.
|
||||
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc get pod
|
||||
----
|
||||
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
hello-pod 0/1 SysctlForbidden 0 14s
|
||||
----
|
||||
----
|
||||
@@ -3,43 +3,45 @@
|
||||
// * nodes/containers/nodes-containers-sysctls.adoc
|
||||
|
||||
:_content-type: PROCEDURE
|
||||
[id="nodes-containers-sysctls-setting_{context}"]
|
||||
= Setting sysctls for a pod
|
||||
[id="nodes-containers-starting-pod-with-unsafe-sysctls_{context}"]
|
||||
= Starting a pod with unsafe sysctls
|
||||
|
||||
You can set sysctls on pods using the pod's `securityContext`. The `securityContext`
|
||||
applies to all containers in the same pod.
|
||||
A pod with unsafe sysctls fails to launch on any node unless the cluster administrator explicitly enables unsafe sysctls for that node. As with node-level sysctls, use the taints and toleration feature or labels on nodes to schedule those pods onto the right nodes.
|
||||
|
||||
Safe sysctls are allowed by default. A pod with unsafe sysctls fails
|
||||
to launch on any node unless the cluster administrator explicitly enables unsafe sysctls for
|
||||
that node. As with node-level sysctls, use the taints and toleration feature
|
||||
or labels on nodes to schedule those pods onto the right nodes.
|
||||
|
||||
The following example uses the pod `securityContext` to set a safe sysctl
|
||||
`kernel.shm_rmid_forced` and two unsafe sysctls, `net.core.somaxconn` and
|
||||
`kernel.msgmax`. There is no distinction between _safe_ and _unsafe_ sysctls in
|
||||
the specification.
|
||||
The following example uses the pod `securityContext` to set a safe sysctl `kernel.shm_rmid_forced` and two unsafe sysctls, `net.core.somaxconn` and `kernel.msgmax`. There is no distinction between _safe_ and _unsafe_ sysctls in the specification.
|
||||
|
||||
[WARNING]
|
||||
====
|
||||
To avoid destabilizing your operating system, modify sysctl parameters only
|
||||
after you understand their effects.
|
||||
To avoid destabilizing your operating system, modify sysctl parameters only after you understand their effects.
|
||||
====
|
||||
|
||||
The following example illustrates what happens when you add safe and unsafe sysctls to a pod specification:
|
||||
|
||||
.Procedure
|
||||
|
||||
To use safe and unsafe sysctls:
|
||||
|
||||
. Modify the YAML file that defines the pod and add the `securityContext` spec, as
|
||||
shown in the following example:
|
||||
. Create a YAML file `sysctl-example-unsafe.yaml` that defines an example pod and add the `securityContext` specification, as shown in the following example:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: sysctl-example
|
||||
name: sysctl-example-unsafe
|
||||
spec:
|
||||
containers:
|
||||
- name: podexample
|
||||
image: centos
|
||||
command: ["bin/bash", "-c", "sleep INF"]
|
||||
securityContext:
|
||||
runAsUser: 2000
|
||||
runAsGroup: 3000
|
||||
allowPrivilegeEscalation: false
|
||||
capabilities:
|
||||
drop: ["ALL"]
|
||||
securityContext:
|
||||
runAsNonRoot: true
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
sysctls:
|
||||
- name: kernel.shm_rmid_forced
|
||||
value: "0"
|
||||
@@ -47,19 +49,17 @@ spec:
|
||||
value: "1024"
|
||||
- name: kernel.msgmax
|
||||
value: "65536"
|
||||
...
|
||||
----
|
||||
|
||||
. Create the pod:
|
||||
. Create the pod using the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc apply -f <file-name>.yaml
|
||||
$ oc apply -f sysctl-example-unsafe.yaml
|
||||
----
|
||||
+
|
||||
If the unsafe sysctls are not allowed for the node, the pod is scheduled,
|
||||
but does not deploy:
|
||||
+
|
||||
|
||||
. Verify that the pod is scheduled but does not deploy because unsafe sysctls are not allowed for the node using the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc get pod
|
||||
@@ -68,6 +68,6 @@ $ oc get pod
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
hello-pod 0/1 SysctlForbidden 0 14s
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
sysctl-example-unsafe 0/1 SysctlForbidden 0 14s
|
||||
----
|
||||
|
||||
@@ -26,27 +26,28 @@ containers, resource shortage, or breaking a node.
|
||||
|
||||
.Procedure
|
||||
|
||||
. Add a label to the machine config pool where the containers where containers
|
||||
with the unsafe sysctls will run:
|
||||
. List existing MachineConfig objects for your {product-title} cluster to decide how to label your machine config by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc edit machineconfigpool worker
|
||||
$ oc get machineconfigpool
|
||||
----
|
||||
+
|
||||
[source,yaml]
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
apiVersion: machineconfiguration.openshift.io/v1
|
||||
kind: MachineConfigPool
|
||||
metadata:
|
||||
creationTimestamp: 2019-02-08T14:52:39Z
|
||||
generation: 1
|
||||
labels:
|
||||
custom-kubelet: sysctl <1>
|
||||
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
|
||||
master rendered-master-bfb92f0cd1684e54d8e234ab7423cc96 True False False 3 3 3 0 42m
|
||||
worker rendered-worker-21b6cb9a0f8919c88caf39db80ac1fce True False False 3 3 3 0 42m
|
||||
----
|
||||
<1> Add a `key: pair` label.
|
||||
|
||||
. Create a `KubeletConfig` custom resource (CR):
|
||||
. Add a label to the machine config pool where the containers with the unsafe sysctls will run by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc label machineconfigpool worker custom-kubelet=sysctl
|
||||
----
|
||||
. Create a YAML file `set-sysctl-worker.yaml` that defines a `KubeletConfig` custom resource (CR):
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
@@ -66,64 +67,108 @@ spec:
|
||||
<1> Specify the label from the machine config pool.
|
||||
<2> List the unsafe sysctls you want to allow.
|
||||
|
||||
. Create the object:
|
||||
. Create the object by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc apply -f set-sysctl-worker.yaml
|
||||
----
|
||||
+
|
||||
A new `MachineConfig` object named in the `99-worker-XXXXXX-XXXXX-XXXX-XXXXX-kubelet` format is created.
|
||||
|
||||
. Wait for the cluster to reboot usng the `machineconfigpool` object `status` fields:
|
||||
+
|
||||
For example:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
status:
|
||||
conditions:
|
||||
- lastTransitionTime: '2019-08-11T15:32:00Z'
|
||||
message: >-
|
||||
All nodes are updating to
|
||||
rendered-worker-ccbfb5d2838d65013ab36300b7b3dc13
|
||||
reason: ''
|
||||
status: 'True'
|
||||
type: Updating
|
||||
----
|
||||
+
|
||||
A message similar to the following appears when the cluster is ready:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
- lastTransitionTime: '2019-08-11T16:00:00Z'
|
||||
message: >-
|
||||
All nodes are updated with
|
||||
rendered-worker-ccbfb5d2838d65013ab36300b7b3dc13
|
||||
reason: ''
|
||||
status: 'True'
|
||||
type: Updated
|
||||
----
|
||||
|
||||
. When the cluster is ready, check for the merged `KubeletConfig` object in the new `MachineConfig` object:
|
||||
. Wait for the Machine Config Operator to generate the new rendered configuration and apply it to the machines by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc get machineconfig 99-worker-XXXXXX-XXXXX-XXXX-XXXXX-kubelet -o json | grep ownerReference -A7
|
||||
$ oc get machineconfigpool worker -w
|
||||
----
|
||||
+
|
||||
[source,json]
|
||||
After some minutes the `UPDATING` status changes from True to False:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
"ownerReferences": [
|
||||
{
|
||||
"apiVersion": "machineconfiguration.openshift.io/v1",
|
||||
"blockOwnerDeletion": true,
|
||||
"controller": true,
|
||||
"kind": "KubeletConfig",
|
||||
"name": "custom-kubelet",
|
||||
"uid": "3f64a766-bae8-11e9-abe8-0a1a2a4813f2"
|
||||
}
|
||||
]
|
||||
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
|
||||
worker rendered-worker-f1704a00fc6f30d3a7de9a15fd68a800 False True False 3 2 2 0 71m
|
||||
worker rendered-worker-f1704a00fc6f30d3a7de9a15fd68a800 False True False 3 2 3 0 72m
|
||||
worker rendered-worker-0188658afe1f3a183ec8c4f14186f4d5 True False False 3 3 3 0 72m
|
||||
----
|
||||
. Create a YAML file `sysctl-example-safe-unsafe.yaml` that defines an example pod and add the `securityContext` spec, as shown in the following example:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: sysctl-example-safe-unsafe
|
||||
spec:
|
||||
containers:
|
||||
- name: podexample
|
||||
image: centos
|
||||
command: ["bin/bash", "-c", "sleep INF"]
|
||||
securityContext:
|
||||
runAsUser: 2000
|
||||
runAsGroup: 3000
|
||||
allowPrivilegeEscalation: false
|
||||
capabilities:
|
||||
drop: ["ALL"]
|
||||
securityContext:
|
||||
runAsNonRoot: true
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
sysctls:
|
||||
- name: kernel.shm_rmid_forced
|
||||
value: "0"
|
||||
- name: net.core.somaxconn
|
||||
value: "1024"
|
||||
- name: kernel.msgmax
|
||||
value: "65536"
|
||||
----
|
||||
|
||||
. Create the pod by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc apply -f sysctl-example-safe-unsafe.yaml
|
||||
----
|
||||
+
|
||||
You can now add unsafe sysctls to pods as needed.
|
||||
.Expected output
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
Warning: would violate PodSecurity "restricted:latest": forbidden sysctls (net.core.somaxconn, kernel.msgmax)
|
||||
pod/sysctl-example-safe-unsafe created
|
||||
----
|
||||
|
||||
. Verify that the pod is created by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc get pod
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
sysctl-example-safe-unsafe 1/1 Running 0 19s
|
||||
----
|
||||
|
||||
. Log in to the pod by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc rsh sysctl-example-safe-unsafe
|
||||
----
|
||||
|
||||
. Verify the values of the configured sysctl flags. For example, find the value `net.core.somaxconn` by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
sh-4.4# sysctl net.core.somaxconn
|
||||
----
|
||||
+
|
||||
.Expected output
|
||||
[source,terminal]
|
||||
----
|
||||
net.core.somaxconn = 1024
|
||||
----
|
||||
|
||||
The unsafe sysctl is now allowed and the value is set as defined in the `securityContext` spec of the updated pod specification.
|
||||
|
||||
31
modules/nodes-namespaced-nodelevel-sysctls.adoc
Normal file
31
modules/nodes-namespaced-nodelevel-sysctls.adoc
Normal file
@@ -0,0 +1,31 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * nodes/containers/nodes-containers-sysctls.adoc
|
||||
|
||||
:_content-type: CONCEPT
|
||||
|
||||
[id="namespaced-and-node-level-sysctls"]
|
||||
= Namespaced and node-level sysctls
|
||||
|
||||
A number of sysctls are _namespaced_ in the Linux kernels. This means that you can set them independently for each pod on a node. Being namespaced is a requirement for sysctls to be accessible in a pod context within Kubernetes.
|
||||
|
||||
The following sysctls are known to be namespaced:
|
||||
|
||||
- `_kernel.shm*_`
|
||||
- `_kernel.msg*_`
|
||||
- `_kernel.sem_`
|
||||
- `_fs.mqueue.*_`
|
||||
|
||||
Additionally, most of the sysctls in the `net.*` group are known to be namespaced. Their namespace adoption differs based on the kernel version and distributor.
|
||||
|
||||
Sysctls that are not namespaced are called _node-level_ and must be set
|
||||
manually by the cluster administrator, either by means of the underlying Linux
|
||||
distribution of the nodes, such as by modifying the `_/etc/sysctls.conf_` file,
|
||||
or by using a daemon set with privileged containers. You can use the Node Tuning Operator to set _node-level_ sysctls.
|
||||
|
||||
|
||||
[NOTE]
|
||||
====
|
||||
Consider marking nodes with special sysctls as tainted. Only schedule pods onto
|
||||
them that need those sysctl settings. Use the taints and toleration feature to mark the nodes.
|
||||
====
|
||||
118
modules/nodes-safe-sysctls-list.adoc
Normal file
118
modules/nodes-safe-sysctls-list.adoc
Normal file
@@ -0,0 +1,118 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * nodes/containers/nodes-containers-sysctls.adoc
|
||||
|
||||
:_content-type: REFERENCE
|
||||
[id="safe_and_unsafe_sysctls_{context}"]
|
||||
= Safe and unsafe sysctls
|
||||
|
||||
Sysctls are grouped into _safe_ and _unsafe_ sysctls.
|
||||
|
||||
For system-wide sysctls to be considered safe, they must be namespaced. A namespaced sysctl ensures there is isolation between namespaces and therefore pods. If you set a sysctl for one pod it must not add any of the following:
|
||||
|
||||
- Influence any other pod on the node
|
||||
- Harm the node health
|
||||
- Gain CPU or memory resources outside of the resource limits of a pod
|
||||
|
||||
[NOTE]
|
||||
====
|
||||
Being namespaced alone is not sufficient for the sysctl to be considered safe.
|
||||
====
|
||||
Any sysctl that is not added to the allowed list on {product-title} is considered unsafe for {product-title}.
|
||||
|
||||
Unsafe sysctls are not allowed by default. For system-wide sysctls the cluster administrator must manually enable them on a per-node basis. Pods with disabled unsafe sysctls are scheduled but do not launch.
|
||||
|
||||
[NOTE]
|
||||
====
|
||||
You cannot manually enable interface-specific unsafe sysctls.
|
||||
====
|
||||
|
||||
{product-title} adds the following system-wide and interface-specific safe sysctls to an allowed safe list:
|
||||
|
||||
.System-wide safe sysctls
|
||||
[cols="30%,70%",options="header"]
|
||||
|===
|
||||
| sysctl | Description
|
||||
|
||||
| `kernel.shm_rmid_forced`
|
||||
a|When set to `1`, all shared memory objects in current IPC namespace are automatically forced to use IPC_RMID. For more information, see link:https://docs.kernel.org/admin-guide/sysctl/kernel.html?highlight=shm_rmid_forced#shm-rmid-forced[shm_rmid_forced].
|
||||
|
||||
| `net.ipv4.ip_local_port_range`
|
||||
a| Defines the local port range that is used by TCP and UDP to choose the local port. The first number is the first port number, and the second number is the last local port number. If possible, it is better if these numbers have different parity (one even and one odd value). They must be greater than or equal to `ip_unprivileged_port_start`. The default values are `32768` and `60999` respectively. For more information, see link:https://docs.kernel.org/networking/ip-sysctl.html?highlight=ip_local_port_range#ip-variables[ip_local_port_range].
|
||||
|
||||
| `net.ipv4.tcp_syncookies`
|
||||
|When `net.ipv4.tcp_syncookies` is set, the kernel handles TCP SYN packets normally until the
|
||||
half-open connection queue is full, at which time, the SYN cookie functionality kicks in. This functionality allows the system to keep accepting valid connections, even if under a denial-of-service attack. For more information, see link:https://docs.kernel.org/networking/ip-sysctl.html?highlight=tcp_syncookies#tcp-variables[tcp_syncookies].
|
||||
|
||||
| `net.ipv4.ping_group_range`
|
||||
a| This restricts `ICMP_PROTO` datagram sockets to users in the group range. The default is `1 0`, meaning that nobody, not even root, can create ping sockets. For more information, see link:https://docs.kernel.org/networking/ip-sysctl.html?highlight=ping_group_range#ip-variables[ping_group_range].
|
||||
|
||||
| `net.ipv4.ip_unprivileged_port_start`
|
||||
| This defines the first unprivileged port in the network namespace. To disable all privileged ports, set this to `0`. Privileged ports must not overlap with the `ip_local_port_range`. For more information, see link:https://docs.kernel.org/networking/ip-sysctl.html?highlight=ip_unprivileged_port_start#ip-variables#ip-variables[ip_unprivileged_port_start].
|
||||
|===
|
||||
|
||||
|
||||
.Interface-specific safe sysctls
|
||||
[cols="30%,70%",options="header"]
|
||||
|===
|
||||
| sysctl | Description
|
||||
|
||||
| `net.ipv4.conf.IFNAME.accept_ra`
|
||||
a|Accept IPv4 Router Advertisements; autoconfigure using them. It also determines whether or not to transmit router solicitations. Router solicitations are transmitted only if the functional setting is to accept router advertisements.
|
||||
|
||||
| `net.ipv4.conf.IFNAME.accept_redirects`
|
||||
a| Accept IPv4 ICMP redirect messages.
|
||||
|
||||
| `net.ipv4.conf.IFNAME.accept_source_route`
|
||||
|Accept IPv4 packets with strict source route (SRR) option.
|
||||
|
||||
| `net.ipv4.conf.IFNAME.arp_accept`
|
||||
a| Define behavior for gratuitous ARP frames with an IPv4 address that is not already present in the ARP table:
|
||||
|
||||
* `0` - Do not create new entries in the ARP table.
|
||||
|
||||
* `1` - Create new entries in the ARP table.
|
||||
|
||||
| `net.ipv4.conf.IFNAME.arp_notify`
|
||||
| Define mode for notification of IPv4 address and device changes.
|
||||
|
||||
| `net.ipv4.conf.IFNAME.disable_policy`
|
||||
a| Disable IPSEC policy (SPD) for this IPv4 interface.
|
||||
|
||||
| `net.ipv4.conf.IFNAME.secure_redirects`
|
||||
a| Accept ICMP redirect messages only to gateways listed in the interface’s current gateway list.
|
||||
|
||||
| `net.ipv4.conf.IFNAME.send_redirects`
|
||||
| Send redirects is enabled only if the node acts as a router. That is, a host should not send an ICMP redirect message. It is used by routers to notify the host about a better routing path that is available for a particular destination.
|
||||
|
||||
| `net.ipv6.conf.IFNAME.accept_ra`
|
||||
a| Accept IPv6 Router advertisements; autoconfigure using them. It also determines whether or not to transmit router solicitations. Router solicitations are transmitted only if the functional setting is to accept router advertisements.
|
||||
|
||||
| `net.ipv6.conf.IFNAME.accept_redirects`
|
||||
a| Accept IPv6 ICMP redirect messages.
|
||||
|
||||
| `net.ipv6.conf.IFNAME.accept_source_route`
|
||||
a| Accept IPv6 packets with SRR option.
|
||||
|
||||
| `net.ipv6.conf.IFNAME.arp_accept`
|
||||
a| Define behavior for gratuitous ARP frames with an IPv6 address that is not already present in the ARP table:
|
||||
|
||||
* `0` - Do not create new entries in the ARP table.
|
||||
|
||||
* `1` - Create new entries in the ARP table.
|
||||
|
||||
| `net.ipv6.conf.IFNAME.arp_notify`
|
||||
| Define mode for notification of IPv6 address and device changes.
|
||||
|
||||
| `net.ipv6.neigh.IFNAME.base_reachable_time_ms`
|
||||
| This parameter controls the hardware address to IP mapping lifetime in the neighbour table for IPv6.
|
||||
|
||||
| `net.ipv6.neigh.IFNAME.retrans_time_ms`
|
||||
| Set the retransmit timer for neighbor discovery messages.
|
||||
|
||||
|===
|
||||
|
||||
[NOTE]
|
||||
====
|
||||
The interface name is represented by the `IFNAME` token, and is replaced with the actual name of the interface at runtime.
|
||||
====
|
||||
158
modules/nw-cfg-tuning-interface-cni.adoc
Normal file
158
modules/nw-cfg-tuning-interface-cni.adoc
Normal file
@@ -0,0 +1,158 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * networking/setting-interface-level-network-sysctls.adoc
|
||||
:_content-type: PROCEDURE
|
||||
[id="nw-configuring-tuning-cni_{context}"]
|
||||
= Configuring the tuning CNI
|
||||
|
||||
The following procedure configures the tuning CNI to change the interface-level network `net.ipv4.conf.IFNAME.accept_redirects` sysctl. This example enables accepting and sending ICMP-redirected packets.
|
||||
|
||||
.Procedure
|
||||
|
||||
. Create a network attachment definition, such as `tuning-example.yaml`, with the following content:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: "k8s.cni.cncf.io/v1"
|
||||
kind: NetworkAttachmentDefinition
|
||||
metadata:
|
||||
name: <name> <1>
|
||||
namespace: default <2>
|
||||
spec:
|
||||
config: '{
|
||||
"cniVersion": "0.4.0", <3>
|
||||
"name": "<name>", <4>
|
||||
"plugins": [{
|
||||
"type": "<main_CNI_plug-in>" <5>
|
||||
},
|
||||
{
|
||||
"type": "tuning", <6>
|
||||
"sysctl": {
|
||||
"net.ipv4.conf.IFNAME.accept_redirects": "1" <7>
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
----
|
||||
<1> Specifies the name for the additional network attachment to create. The name must be unique within the specified namespace.
|
||||
<2> Specifies the namespace that the object is associated with.
|
||||
<3> Specifies the CNI specification version.
|
||||
<4> Specifies the name for the configuration. It is recommended to match the configuration name to the name value of the network attachment definition.
|
||||
<5> Specifies the name of the main CNI plug-in to configure.
|
||||
<6> Specifies the name of the CNI meta plug-in.
|
||||
<7> Specifies the sysctl to set.
|
||||
+
|
||||
An example yaml file is shown here:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: "k8s.cni.cncf.io/v1"
|
||||
kind: NetworkAttachmentDefinition
|
||||
metadata:
|
||||
name: tuningnad
|
||||
namespace: default
|
||||
spec:
|
||||
config: '{
|
||||
"cniVersion": "0.4.0",
|
||||
"name": "tuningnad",
|
||||
"plugins": [{
|
||||
"type": "bridge"
|
||||
},
|
||||
{
|
||||
"type": "tuning",
|
||||
"sysctl": {
|
||||
"net.ipv4.conf.IFNAME.accept_redirects": "1"
|
||||
}
|
||||
}
|
||||
]
|
||||
}'
|
||||
----
|
||||
|
||||
. Apply the yaml by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc apply -f tuning-example.yaml
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
networkattachmentdefinition.k8.cni.cncf.io/tuningnad created
|
||||
----
|
||||
|
||||
. Create a pod such as `examplepod.yaml` with the network attachment definition similar to the following:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: tunepod
|
||||
namespace: default
|
||||
annotations:
|
||||
k8s.v1.cni.cncf.io/networks: tuningnad <1>
|
||||
spec:
|
||||
containers:
|
||||
- name: podexample
|
||||
image: centos
|
||||
command: ["/bin/bash", "-c", "sleep INF"]
|
||||
securityContext:
|
||||
runAsUser: 2000 <2>
|
||||
runAsGroup: 3000 <3>
|
||||
allowPrivilegeEscalation: false <4>
|
||||
capabilities: <5>
|
||||
drop: ["ALL"]
|
||||
securityContext:
|
||||
runAsNonRoot: true <6>
|
||||
seccompProfile: <7>
|
||||
type: RuntimeDefault
|
||||
----
|
||||
<1> Specify the name of the configured `NetworkAttachmentDefinition`.
|
||||
<2> `runAsUser` controls which user ID the container is run with.
|
||||
<3> `runAsGroup` controls which primary group ID the containers is run with.
|
||||
<4> `allowPrivilegeEscalation` determines if a pod can request to allow privilege escalation. If unspecified, it defaults to true. This boolean directly controls whether the `no_new_privs` flag gets set on the container process.
|
||||
<5> `capabilities` permit privileged actions without giving full root access. This policy ensures all capabilities are dropped from the pod.
|
||||
<6> `runAsNonRoot: true` requires that the container will run with a user with any UID other than 0.
|
||||
<7> `RuntimeDefault` enables the default seccomp profile for a pod or container workload.
|
||||
|
||||
. Apply the yaml by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc apply -f examplepod.yaml
|
||||
----
|
||||
|
||||
. Verify that the pod is created by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc get pod
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
tunepod 1/1 Running 0 47s
|
||||
----
|
||||
|
||||
. Log in to the pod by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc rsh tunepod
|
||||
----
|
||||
|
||||
. Verify the values of the configured sysctl flags. For example, find the value `net.ipv4.conf.net1.accept_redirects` by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
sh-4.4# sysctl net.ipv4.conf.net1.accept_redirects
|
||||
----
|
||||
+
|
||||
.Expected output
|
||||
[source,terminal]
|
||||
----
|
||||
net.ipv4.conf.net1.accept_redirects = 1
|
||||
----
|
||||
26
networking/setting-interface-level-network-sysctls.adoc
Normal file
26
networking/setting-interface-level-network-sysctls.adoc
Normal file
@@ -0,0 +1,26 @@
|
||||
:_content-type: ASSEMBLY
|
||||
:context: set-networkinterface-sysctls
|
||||
[id="nodes-setting-interface-level-network-sysctls"]
|
||||
= Configuring interface-level network sysctls
|
||||
include::_attributes/common-attributes.adoc[]
|
||||
|
||||
toc::[]
|
||||
|
||||
In Linux, sysctl allows an administrator to modify kernel parameters at runtime. You can modify interface-level network sysctls using the tuning Container Network Interface (CNI) meta plug-in. The tuning CNI meta plug-in operates in a chain with a main CNI plug-in as illustrated.
|
||||
|
||||
image::264_OpenShift_CNI_plugin_chain_0722.png[CNI plug-in]
|
||||
|
||||
The main CNI plug-in assigns the interface and passes this to the tuning CNI meta plug-in at runtime. You can change some sysctls and several interface attributes (promiscuous mode, all-multicast mode, MTU, and MAC address) in the network namespace by using the tuning CNI meta plug-in. In the tuning CNI meta plug-in configuration, the interface name is represented by the `IFNAME` token, and is replaced with the actual name of the interface at runtime.
|
||||
|
||||
[NOTE]
|
||||
====
|
||||
In {product-title}, the tuning CNI meta plug-in only supports changing interface-level network sysctls.
|
||||
====
|
||||
|
||||
include::modules/nw-cfg-tuning-interface-cni.adoc[leveloffset=+1]
|
||||
|
||||
[role="_additional-resources"]
|
||||
[id="additional-resources_nodes-setting-interface-level-network-sysctls"]
|
||||
== Additional resources
|
||||
|
||||
* xref:../nodes/containers/nodes-containers-sysctls.adoc#nodes-containers-sysctls[Using sysctls in containers]
|
||||
@@ -6,18 +6,21 @@ include::_attributes/common-attributes.adoc[]
|
||||
|
||||
toc::[]
|
||||
|
||||
Sysctl settings are exposed through Kubernetes, allowing users to modify certain kernel parameters at runtime. Only sysctls that are namespaced can be set independently on pods. If a sysctl is not namespaced, called _node-level_, you must use another method of setting the sysctl, such as by using the Node Tuning Operator.
|
||||
|
||||
Network sysctls are a special category of sysctl. Network sysctls include:
|
||||
|
||||
* System-wide sysctls, for example `net.ipv4.ip_local_port_range`, that are valid for all networking. You can set these independently for each pod on a node.
|
||||
* Interface-specific sysctls, for example `net.ipv4.conf.eth0.accept_local`, that only apply to a specific interface. You cannot set these independently for each pod on a node. You set these by using a configuration in the `tuning-cni` after the network interfaces are created.
|
||||
|
||||
Sysctl settings are exposed via Kubernetes, allowing users to modify certain
|
||||
kernel parameters at runtime for namespaces within a container. Only sysctls
|
||||
that are namespaced can be set independently on pods. If a sysctl is not
|
||||
namespaced, called _node-level_, you must use another method of setting the sysctl, such as the
|
||||
xref:../../scalability_and_performance/using-node-tuning-operator.adoc#using-node-tuning-operator[Node Tuning Operator].
|
||||
Moreover, only those sysctls considered _safe_ are whitelisted by default; you
|
||||
can manually enable other _unsafe_ sysctls on the node to be available to the
|
||||
user.
|
||||
|
||||
[role="_additional-resources"]
|
||||
.Additional resources
|
||||
|
||||
* xref:../../scalability_and_performance/using-node-tuning-operator.adoc#using-node-tuning-operator[Node Tuning Operator]
|
||||
|
||||
// The following include statements pull in the module files that comprise
|
||||
// the assembly. Include any combination of concept, procedure, or reference
|
||||
@@ -26,6 +29,22 @@ user.
|
||||
|
||||
include::modules/nodes-containers-sysctls-about.adoc[leveloffset=+1]
|
||||
|
||||
include::modules/nodes-namespaced-nodelevel-sysctls.adoc[leveloffset=+1]
|
||||
|
||||
include::modules/nodes-safe-sysctls-list.adoc[leveloffset=+1]
|
||||
|
||||
[role="_additional-resources"]
|
||||
.Additional resources
|
||||
* link:https://docs.kernel.org/networking/ip-sysctl.html[Linux networking documentation]
|
||||
|
||||
include::modules/nodes-containers-start-pod-safe-sysctls.adoc[leveloffset=+1]
|
||||
|
||||
include::modules/nodes-containers-sysctls-setting.adoc[leveloffset=+1]
|
||||
|
||||
include::modules/nodes-containers-sysctls-unsafe.adoc[leveloffset=+1]
|
||||
|
||||
[role="_additional-resources"]
|
||||
[id="additional-resources_nodes-containers-sysctls"]
|
||||
== Additional resources
|
||||
|
||||
* xref:../../networking/setting-interface-level-network-sysctls.adoc#nodes-setting-interface-level-network-sysctls[Setting interface-level network sysctls]
|
||||
Reference in New Issue
Block a user