GA User Name Space

2026-02-05 12:46:18 +01:00 · 2024-11-14 09:46:16 -05:00
parent 0b4900ed4b
commit 76c82091a7
3 changed files with 103 additions and 87 deletions
--- a/modules/nodes-pods-user-namespaces-configuring.adoc
+++ b/modules/nodes-pods-user-namespaces-configuring.adoc
@@ -6,52 +6,15 @@
 [id="nodes-pods-user-namespaces-configuring_{context}"]
 = Configuring Linux user namespace support

+You can configure Linux user namespace by setting the `hostUsers` parameter to `false` in the pod spec, and a few other configurations, as shown in the following procedure.

-.Prerequisites
+Running workloads in user namespaces makes it safe to configure `RunAsAny` for Security Context Constraint (SCC) fields, such as `fsGroup`, `runAsGroup`, `runAsUser`, and `supplementalGroups`, as the UID or GID outside of the container is different from the one inside, which these fields express.

-* You enabled the required Technology Preview features for your cluster by editing the `FeatureGate` CR named `cluster`:
-+
-[source,terminal]
----
-$ oc edit featuregate cluster
----
-+
-.Example `FeatureGate` CR
-[source,yaml]
----
-apiVersion: config.openshift.io/v1
-kind: FeatureGate
-metadata:
-  name: cluster
-spec:
-  featureSet: TechPreviewNoUpgrade <1>
----
-<1> Enables the required `UserNamespacesSupport` and `ProcMountType` features.
-+
-[WARNING]
-====
-Enabling the `TechPreviewNoUpgrade` feature set on your cluster cannot be undone and prevents minor version updates. This feature set allows you to enable these Technology Preview features on test clusters, where you can fully test them. Do not enable this feature set on production clusters.
-====
-+
-After you save the changes, new machine configs are created, the machine config pools are updated, and scheduling on each node is disabled while the change is being applied.
+For extra security, you can use the `restricted-v3` or `nested-container` SCC, which are specifically designed for workloads in Linux user namespaces. The `userNamespaceLevel: RequirePodLevel` field in the SCC requires that the workloads run in user namespaces. For more information about SCCs, see "Managing security context constraints".

-* The crun container runtime is present on the worker nodes. crun is currently the only OCI runtime packaged with {product-title} that supports user namespaces. crun is active by default.
-+
-[source,yaml]
----
-apiVersion: machineconfiguration.openshift.io/v1
-kind: ContainerRuntimeConfig
-metadata:
- name: enable-crun-worker
-spec:
- machineConfigPoolSelector:
-   matchLabels:
-     pools.operator.machineconfiguration.openshift.io/worker: "" <1>
- containerRuntimeConfig:
-   defaultRuntime: crun <2>
----
-<1> Specifies the machine config pool label.
-<2> Specifies the container runtime to deploy.
+To require a specific SCC for a workload, you can add an SCC to a specific user or group by using the `oc adm policy add-scc-to-user` or `oc adm policy add-scc-to-group` command. For more information, see the "OpenShift CLI administrator command reference".
+
+Also, you can optionally use the `procMount` parameter in a pod specification to configure the `/proc` file system in pods as `unmasked`. Setting `/proc` to `unmasked`, which is generally considered as safe, bypasses the default masking behavior of the container runtime, and should be used only with an SCC that sets `hostUsers` to `false`.

 .Procedure

@@ -76,18 +39,18 @@ metadata:
    openshift.io/sa.scc.supplemental-groups: 1000/10000 <1>
    openshift.io/sa.scc.uid-range: 1000/10000 <2>
 # ...
-name: userns
+  name: userns
 # ...
 ----
-<1> Edit the default GID to match the value you specified in the pod spec. The range for a Linux user namespace must be lower than 65,535. The default is `1000000000/10000`.
-<2> Edit the default UID to match the value you specified in the pod spec. The range for a Linux user namespace must be lower than 65,535. The default is `1000000000/10000`.
+<1> Specifies the default GID to require in the pod spec. The range for a Linux user namespace must be `65535` or lower. The default is `1000000000/10000`.
+<2> Specifies the default UID to require in the pod spec. The range for a Linux user namespace must be `65535` or lower. The default is `1000000000/10000`.
 +
 [NOTE]
 ====
 The range 1000/10000 means 10,000 values starting with ID 1000, so it specifies the range of IDs from 1000 to 10,999.
 ====

-. Enable the use of Linux user namespaces by creating a pod configured to run with a `restricted` profile and with the `hostUsers` parameter set to `false`.
+. Enable the use of Linux user namespaces by creating a workload configured to run with an appropriate SCC and the `hostUsers` parameter set to `false`.

 .. Create a YAML file similar to the following:
 +
@@ -97,35 +60,42 @@ The range 1000/10000 means 10,000 values starting with ID 1000, so it specifies
 apiVersion: v1
 kind: Pod
 metadata:
+  namespace: userns
  name: userns-pod
-
 # ...
-
 spec:
+#...
+  template:
+    metadata:
+      labels:
+        app: name
+      annotations:
+        openshift.io/required-scc: "restricted-v3" <1>     
+    spec:
+      hostUsers: false <2>
      containers:
      - name: userns-container
        image: registry.access.redhat.com/ubi9
        command: ["sleep", "1000"]
        securityContext:
-      capabilities:
+          capabilities: <3>
            drop: ["ALL"]
-      allowPrivilegeEscalation: false <1>
-      runAsNonRoot: true <2>
-      seccompProfile:
-        type: RuntimeDefault
-      runAsUser: 1000 <3>
-      runAsGroup: 1000 <4>
-  hostUsers: false <5>
-
+          allowPrivilegeEscalation: false
+          runAsNonRoot: true <4>
+          procMount: Unmasked <5>
+          runAsUser: 1000 <6>
+          runAsGroup: 1000 <7>
 # ...
 ----
-<1> Specifies that a pod cannot request privilege escalation. This is required for the `restricted-v2` security context constraints (SCC).
-<2> Specifies that the container will run with a user with any UID other than 0.
-<3> Specifies the UID the container is run with.
-<4> Specifies which primary GID the containers is run with.
-<5> Requests that the pod is to be run in a user namespace. If `true`, the pod runs in the host user namespace. If `false`, the pod runs in a new user namespace that is created for the pod. The default is `true`.
+<1> Specifies the SCC to use with this workload.
+<2> Specifies whether the pod is to be run in a user namespace. If `false`, the pod runs in a new user namespace that is created for the pod. If `true`, the pod runs in the host user namespace. The default is `true`.
+<3>  `capabilities` permit privileged actions without giving full root access. Technically, setting capabilities inside of a user namespace is safer than setting them outside, as the scope of the capabilities are limited by being inside user namespace, and can generally be considered to be safe. However, giving pods capabilities like `CAP_SYS_ADMIN` to any untrusted workload could increase the potential kernel surface area that a containerized process has access to and could find exploits in. Thus, capabilities inside of a user namespace are allowed at `baseline` level in pod security admission.
+<4> Specifies that processes inside the container run with a user that has any UID other than 0.
+<5> Optional: Specifies the type of proc mount to use for the containers. The `unmasked` value ensures that a container's `/proc` file system is mounted as read/write by the container process. The default is `Default`.
+<6> Specifies the user ID for processes that run inside of the container. This must fall in the range that you set in the `namespace` object. 
+<7> Specifies the group ID for processes that run inside of the containers. This must fall in the range that you set in the `namespace` object.

-.. Create the pod by running the following command:
+.. Create the object by running the following command:
 +
 ----
 $ oc create -f <file_name>.yaml
@@ -133,7 +103,7 @@ $ oc create -f <file_name>.yaml

 .Verification

-. Check the pod user and group IDs being used in the pod container you created. The pod is inside the Linux user namespace.
+. Check the user and group IDs being used by the container in the pod you created. The pod is inside the Linux user namespace.

 .. Start a shell session with the container in your pod:
 +
@@ -158,8 +128,9 @@ sh-5.1$ id
 .Example output
 [source,terminal]
 ----
-uid=1000(1000) gid=1000(1000) groups=1000(1000)
+uid=1000(1000) gid=1000(1000) groups=1000(1000) <1>
 ----
+<1> The UID and group for the container should be the same as you set in the pod specification.

 .. Display the user ID being used in the container user namespace:
 +
@@ -174,9 +145,9 @@ sh-5.1$ lsns -t user
        NS TYPE  NPROCS PID USER COMMAND
 4026532447 user       3   1 1000 /usr/bin/coreutils --coreutils-prog-shebang=sleep /usr/bin/sleep 1000 <1>
 ----
-<1> The UID for the process is `1000`, the same as you set in the pod spec.
+<1> The UID for the process should be the same as you set in the pod spec.

-. Check the pod user ID being used on the node where the pod was created. The node is outside of the Linux user namespace. This user ID should be different from the UID being used in the container.
+. Check the UID being used by the node. The node is outside of the Linux user namespace. This user ID should be different from the UID being used in the container.

 .. Start a debug session for that node:
 +
@@ -198,7 +169,7 @@ $ oc debug node/ci-ln-z5vppzb-72292-8zp2b-worker-c-q8sh9
 sh-5.1# chroot /host
 ----

-.. Display the user ID being used in the node user namespace:
+.. Display the UID being used by the node:
 +
 [source,terminal]
 ----
@@ -212,4 +183,29 @@ sh-5.1#  lsns -t user
 4026531837 user     233     1 root       /usr/lib/systemd/systemd --switched-root --system --deserialize 28
 4026532447 user       1  4767 2908816384 /usr/bin/coreutils --coreutils-prog-shebang=sleep /usr/bin/sleep 1000 <1>
 ----
-<1> The UID for the process is `2908816384`, which is different from what you set in the pod spec.
+<1> The UID should be different from what you set in the pod specification.
+
+.. Exit the debug session by using the following commands:
+
+[source,terminal]
+----
+sh-5.1#  exit
+----
+
+[source,terminal]
+----
+sh-5.1#  exit
+----
+
+. Check that the `/proc` file system is mounted into container as `unmasked`, as indicated by read/write permission (`rw`) in the output of the following command:
+
+[source,terminal]
+----
+$ oc exec <pod_name> -- mount | grep /proc
+----
+
+.Example output
+[source,terminal]
+----
+proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
+----
--- a/modules/security-context-constraints-about.adoc
+++ b/modules/security-context-constraints-about.adoc
@@ -36,14 +36,14 @@ The cluster contains several default security context constraints (SCCs) as desc
 [IMPORTANT]
 ====
 Do not modify the default SCCs. Customizing the default SCCs can lead to issues when some of the platform pods deploy or
-ifndef::openshift-rosa,openshift-rosa-hcp[]
+ifndef::openshift-rosa[]
 {product-title}
 endif::[]
-ifdef::openshift-rosa,openshift-rosa-hcp[]
+ifdef::openshift-rosa[]
 ROSA
-endif::openshift-rosa,openshift-rosa-hcp[]
+endif::openshift-rosa[]
 is upgraded. Additionally, the default SCC values are reset to the defaults during some cluster upgrades, which discards all customizations to those SCCs.
-ifdef::openshift-origin,openshift-enterprise,openshift-webscale,openshift-dedicated,openshift-rosa,openshift-rosa-hcp[]
+ifdef::openshift-origin,openshift-enterprise,openshift-webscale,openshift-dedicated,openshift-rosa[]

 Instead of modifying the default SCCs, create and modify your own SCCs as needed. For detailed steps, see _Creating security context constraints_.
 endif::[]
@@ -89,7 +89,19 @@ If additional workloads are run on control plane hosts, use caution when providi
 * The `NET_BIND_SERVICE` capability can be added explicitly.
 * `seccompProfile` is set to `runtime/default` by default.
 * `allowPrivilegeEscalation` must be unset or set to `false` in security contexts.
+endif::[]

+|`nested-container`
+| Like the `restricted-v2` SCC, but with the following differences:
+
+* `seLinuxContext` is set to `MustRunAs` and `seLinuxOptions.type` is `container_engine_t`.
+* `runAsUser` is set to `MustRunAsRange`.
+* `requiredDropCapabilities` is set to `null`.
+* `userNamespaceLevel` is set to `RequirePodLevel`, which forces pods to be in a Linux user namespace (`hostUsers: false`).
+
+This SCC allows a user to run a container engine inside of an {product-title} pod.
+
+ifndef::openshift-dedicated[]
 |`node-exporter`
 |Used for the Prometheus node exporter.

@@ -160,11 +172,18 @@ In clusters that were upgraded from {product-title} 4.10 or earlier, this SCC is
 * `seccompProfile` is set to `runtime/default` by default.
 * `allowPrivilegeEscalation` must be unset or set to `false` in security contexts.

+This SCC is used by default for authenticated users.
+
+|`restricted-v3`
+| Like the `restricted-v2` SCC, but with the following differences:
+
+* `UserNamespaceLevel` is set to `RequirePodLevel`, which forces pods to be in a Linux user namespace (`hostUsers: false`).
+
 This is the most restrictive SCC provided by a new installation and will be used by default for authenticated users.

 [NOTE]
 ====
-The `restricted-v2` SCC is the most restrictive of the SCCs that is included by default with the system. However, you can create a custom SCC that is even more restrictive. For example, you can create an SCC that restricts `readOnlyRootFilesystem` to `true`.
+The `restricted-v3` SCC is the most restrictive of the SCCs that is included by default with the system. However, you can create a custom SCC that is even more restrictive. For example, you can create an SCC that restricts `readOnlyRootFilesystem` to `true`.
 ====

 |===
--- a/nodes/pods/nodes-pods-user-namespaces.adoc
+++ b/nodes/pods/nodes-pods-user-namespaces.adoc
@@ -8,15 +8,10 @@ toc::[]

 Linux user namespaces allow administrators to isolate the container user and group identifiers (UIDs and GIDs) so that a container can have a different set of permissions in the user namespace than on the host system where it is running. This allows containers to run processes with full privileges inside the user namespace, but the processes can be unprivileged for operations on the host machine.

-By default, a container runs in the host system's root user namespace. Running a container in the host user namespace can be useful when the container needs a feature that is available only in that user namespace. However, it introduces security concerns, such as the possibility of container breakouts, in which a process inside a container breaks out onto the host where the process can access or modify files on the host or in other containers. 
+By default, a container runs in the host user namespace. Running a container in the host user namespace can be useful when the container needs a feature that is available only in the host namespace. However, running pods in the host namespace introduces security concerns, such as the possibility of container breakouts, in which a process inside another container breaks out onto the host where the process can access or modify files on the host or in your containers. 
 
 Running containers in individual user namespaces can mitigate container breakouts and several other vulnerabilities that a compromised container can pose to other pods and the node itself. 

-You can configure Linux user namespace use by setting the `hostUsers` parameter to `false` in the pod spec, as shown in the following procedure.
-
-:FeatureName: Support for Linux user namespaces
-include::snippets/technology-preview.adoc[]
-
 // The following include statements pull in the module files that comprise
 // the assembly. Include any combination of concept, procedure, or reference
 // modules required to cover the user story. You can also include other
@@ -24,3 +19,9 @@ include::snippets/technology-preview.adoc[]

 include::modules/nodes-pods-user-namespaces-configuring.adoc[leveloffset=+1]

+[role="_additional-resources"]
+.Additional resources
+
+* xref:../../authentication/managing-security-context-constraints.adoc#configuring-internal-oauth[Managing security context constraints]
+* xref:../../cli_reference/openshift_cli/administrator-cli-commands.adoc#cli-administrator-commands[OpenShift CLI administrator command reference]
+