1
0
mirror of https://github.com/opencontainers/runc.git synced 2026-02-05 09:46:08 +01:00
Files
runc/libcontainer
Aleksa Sarai b370bafce9 merge private security patches into ghsa-release-1.3.3
Aleksa Sarai (22):
  rootfs: re-allow dangling symlinks in mount targets
  openat2: improve resilience on busy systems
  selinux: use safe procfs API for labels
  rootfs: switch to fd-based handling of mountpoint targets
  libct/system: use securejoin for /proc/$pid/stat
  init: use securejoin for /proc/self/setgroups
  init: write sysctls using safe procfs API
  utils: remove unneeded EnsureProcHandle
  utils: use safe procfs for /proc/self/fd loop code
  apparmor: use safe procfs API for labels
  ci: add lint to forbid the usage of os.Create
  rootfs: avoid using os.Create for new device inodes
  internal: add wrappers for securejoin.Proc*
  go.mod: update to github.com/cyphar/filepath-securejoin@v0.5.0
  console: verify /dev/pts/ptmx before use
  console: avoid trivial symlink attacks for /dev/console
  console: add fallback for pre-TIOCGPTPEER kernels
  console: use TIOCGPTPEER when allocating peer PTY
  *: switch to safer securejoin.Reopen
  internal: move utils.MkdirAllInRoot to internal/pathrs
  internal/sys: add VerifyInode helper
  internal: linux: add package doc-comment

Li Fubang (1):
  libct: align param type for mountCgroupV1/V2 functions

Kir Kolyshkin (3):
  libct: maskPaths: don't rely on ENOTDIR for mount
  libct: maskPaths: only ignore ENOENT on mount dest
  libct: add/use isDevNull, verifyDevNull

Fixes: CVE-2025-31133 GHSA-9493-h29p-rfm2
Fixes: CVE-2025-52565 GHSA-qw9x-cqr3-wc7r
Fixes: CVE-2025-52881 GHSA-cgrx-mc8f-2prm
Reported-by: Lei Wang <ssst0n3@gmail.com>
Reported-by: Li Fubang <lifubang@acmcoder.com>
Reported-by: Tõnis Tiigi <tonistiigi@gmail.com>
Reported-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2025-11-05 20:05:20 +11:00
..
2025-02-28 15:20:33 -08:00
2024-01-24 00:20:59 +11:00
2023-09-19 10:22:29 +02:00
2024-09-23 23:27:35 +00:00
2021-10-14 13:46:02 -07:00
2021-10-14 13:46:02 -07:00
2021-10-14 13:46:02 -07:00
2025-03-02 19:17:41 -08:00
2025-02-28 15:20:33 -08:00

libcontainer

Go Reference

Libcontainer provides a native Go implementation for creating containers with namespaces, cgroups, capabilities, and filesystem access controls. It allows you to manage the lifecycle of the container performing additional operations after the container is created.

Container

A container is a self contained execution environment that shares the kernel of the host system and which is (optionally) isolated from other containers in the system.

Using libcontainer

Container init

Because containers are spawned in a two step process you will need a binary that will be executed as the init process for the container. In libcontainer, we use the current binary (/proc/self/exe) to be executed as the init process, and use arg "init", we call the first step process "bootstrap", so you always need a "init" function as the entry of "bootstrap".

In addition to the go init function the early stage bootstrap is handled by importing nsenter.

For details on how runc implements such "init", see init.go and libcontainer/init_linux.go.

Device management

If you want containers that have access to some devices, you need to import this package into your code:

    import (
        _ "github.com/opencontainers/cgroups/devices"
    )

Without doing this, libcontainer cgroup manager won't be able to set up device access rules, and will fail if devices are specified in the container configuration.

Container creation

To create a container you first have to create a configuration struct describing how the container is to be created. A sample would look similar to this:

defaultMountFlags := unix.MS_NOEXEC | unix.MS_NOSUID | unix.MS_NODEV
var devices []*devices.Rule
for _, device := range specconv.AllowedDevices {
	devices = append(devices, &device.Rule)
}
config := &configs.Config{
	Rootfs: "/your/path/to/rootfs",
	Capabilities: &configs.Capabilities{
		Bounding: []string{
			"CAP_KILL",
			"CAP_AUDIT_WRITE",
		},
		Effective: []string{
			"CAP_KILL",
			"CAP_AUDIT_WRITE",
		},
		Permitted: []string{
			"CAP_KILL",
			"CAP_AUDIT_WRITE",
		},
	},
	Namespaces: configs.Namespaces([]configs.Namespace{
		{Type: configs.NEWNS},
		{Type: configs.NEWUTS},
		{Type: configs.NEWIPC},
		{Type: configs.NEWPID},
		{Type: configs.NEWUSER},
		{Type: configs.NEWNET},
		{Type: configs.NEWCGROUP},
	}),
	Cgroups: &configs.Cgroup{
		Name:   "test-container",
		Parent: "system",
		Resources: &configs.Resources{
			MemorySwappiness: nil,
			Devices:          devices,
		},
	},
	MaskPaths: []string{
		"/proc/kcore",
		"/sys/firmware",
	},
	ReadonlyPaths: []string{
		"/proc/sys", "/proc/sysrq-trigger", "/proc/irq", "/proc/bus",
	},
	Devices:  specconv.AllowedDevices,
	Hostname: "testing",
	Mounts: []*configs.Mount{
		{
			Source:      "proc",
			Destination: "/proc",
			Device:      "proc",
			Flags:       defaultMountFlags,
		},
		{
			Source:      "tmpfs",
			Destination: "/dev",
			Device:      "tmpfs",
			Flags:       unix.MS_NOSUID | unix.MS_STRICTATIME,
			Data:        "mode=755",
		},
		{
			Source:      "devpts",
			Destination: "/dev/pts",
			Device:      "devpts",
			Flags:       unix.MS_NOSUID | unix.MS_NOEXEC,
			Data:        "newinstance,ptmxmode=0666,mode=0620,gid=5",
		},
		{
			Device:      "tmpfs",
			Source:      "shm",
			Destination: "/dev/shm",
			Data:        "mode=1777,size=65536k",
			Flags:       defaultMountFlags,
		},
		{
			Source:      "mqueue",
			Destination: "/dev/mqueue",
			Device:      "mqueue",
			Flags:       defaultMountFlags,
		},
		{
			Source:      "sysfs",
			Destination: "/sys",
			Device:      "sysfs",
			Flags:       defaultMountFlags | unix.MS_RDONLY,
		},
	},
	UIDMappings: []configs.IDMap{
		{
			ContainerID: 0,
			HostID: 1000,
			Size: 65536,
		},
	},
	GIDMappings: []configs.IDMap{
		{
			ContainerID: 0,
			HostID: 1000,
			Size: 65536,
		},
	},
	Networks: []*configs.Network{
		{
			Type:    "loopback",
			Address: "127.0.0.1/0",
			Gateway: "localhost",
		},
	},
	Rlimits: []configs.Rlimit{
		{
			Type: unix.RLIMIT_NOFILE,
			Hard: uint64(1025),
			Soft: uint64(1025),
		},
	},
}

Once you have the configuration populated you can create a container with a specified ID under a specified state directory:

container, err := libcontainer.Create("/run/containers", "container-id", config)
if err != nil {
	logrus.Fatal(err)
	return
}

To spawn bash as the initial process inside the container and have the processes pid returned in order to wait, signal, or kill the process:

process := &libcontainer.Process{
	Args:   []string{"/bin/bash"},
	Env:    []string{"PATH=/bin"},
	User:   "daemon",
	Stdin:  os.Stdin,
	Stdout: os.Stdout,
	Stderr: os.Stderr,
	Init:   true,
}

err := container.Run(process)
if err != nil {
	container.Destroy()
	logrus.Fatal(err)
	return
}

// wait for the process to finish.
_, err := process.Wait()
if err != nil {
	logrus.Fatal(err)
}

// destroy the container.
container.Destroy()

Additional ways to interact with a running container are:

// return all the pids for all processes running inside the container.
processes, err := container.Processes()

// get detailed cpu, memory, io, and network statistics for the container and
// it's processes.
stats, err := container.Stats()

// pause all processes inside the container.
container.Pause()

// resume all paused processes.
container.Resume()

// send signal to container's init process.
container.Signal(signal)

// update container resource constraints.
container.Set(config)

// get current status of the container.
status, err := container.Status()

// get current container's state information.
state, err := container.State()

Checkpoint & Restore

libcontainer now integrates CRIU for checkpointing and restoring containers. This lets you save the state of a process running inside a container to disk, and then restore that state into a new process, on the same machine or on another machine.

criu version 1.5.2 or higher is required to use checkpoint and restore. If you don't already have criu installed, you can build it from source, following the online instructions. criu is also installed in the docker image generated when building libcontainer with docker.

Code and documentation copyright 2014 Docker, inc. The code and documentation are released under the Apache 2.0 license. The documentation is also released under Creative Commons Attribution 4.0 International License. You may obtain a copy of the license, titled CC-BY-4.0, at http://creativecommons.org/licenses/by/4.0/.