mirror of https://github.com/opencontainers/runc.git synced 2026-02-06 12:45:09 +01:00

Files

lfbzhm e669926691 fix an error caused by fd reuse race when starting runc init

There is a race situation when we are opening a file, if there is a
small fd was closed at that time, maybe it will be reused by safeExe.
Because of Go stdlib fds shuffling bug, if the fd of safeExe is too
small, go stdlib will dup3 it to another fd, or dup3 a other fd to this
fd, then it will cause the fd type cmd.Path refers to a random path,
and it can lead to an error "permission denied" when starting the process.
Please see #4294 and <https://github.com/golang/go/issues/61751>.
So we should not use the original fd of safeExe, but use the fd after
shuffled by Go stdlib. Because Go stdlib will guarantee this fd refers to
the correct file.

Signed-off-by: lfbzhm <lifubang@acmcoder.com>

2024-10-21 06:53:44 +00:00

apparmor

remove pre-go1.17 build-tags

2024-06-29 15:45:25 +02:00

capabilities

libct/cap: no need to load capabilities

2024-10-09 12:08:44 -07:00

cgroups

libcontainer/userns: migrate to github.com/moby/sys/userns

2024-10-09 22:20:25 +08:00

configs

libct: use Namespaces.IsPrivate more

2024-09-17 22:49:29 -07:00

devices

remove pre-go1.17 build-tags

2024-06-29 15:45:25 +02:00

dmz

dmz: use overlayfs to write-protect /proc/self/exe if possible

2024-10-20 21:35:09 +11:00

integration

runc spec, libct/int: do not add ambient capabilities

2024-09-25 21:48:29 -07:00

intelrdt

libct/intelrdt: check if available iff configured

2022-07-28 12:06:03 -07:00

internal/userns

libct/userns: split userns detection from internal userns code

2024-06-30 20:06:30 +02:00

keys

libct/*: remove linux build tag from some pkgs

2021-08-30 20:52:07 -07:00

logs

init: don't special-case logrus fds

2024-01-24 00:20:59 +11:00

nsenter

libct: rm initWaiter

2024-10-17 08:05:42 -07:00

seccomp

libct/seccomp/patchbpf: use binary.NativeEndian

2024-09-11 22:06:58 -07:00

specconv

runc spec, libct/int: do not add ambient capabilities

2024-09-25 21:48:29 -07:00

system

utils: switch to securejoin.MkdirAllHandle

2024-09-03 23:06:47 +10:00

user

deprecate libcontainer/user

2023-09-19 10:22:29 +02:00

userns

libcontainer/userns: migrate to github.com/moby/sys/userns

2024-10-09 22:20:25 +08:00

utils

dmz: use overlayfs to write-protect /proc/self/exe if possible

2024-10-20 21:35:09 +11:00

console_linux.go

libct: use chmod instead of umask

2023-09-27 16:46:53 -07:00

container_linux_test.go

Revert "Set temporary single CPU affinity..."

2024-06-10 06:31:03 +08:00

container_linux.go

fix an error caused by fd reuse race when starting runc init

2024-10-21 06:53:44 +00:00

container.go

libct: rm BaseContainer and Container interfaces

2022-03-23 11:04:12 -07:00

criu_linux.go

rootfs: consolidate mountpoint creation logic

2024-07-25 14:16:05 +10:00

criu_opts_linux.go

deps: bump github.com/checkpoint-restore/go-criu to 6.3.0

2022-11-01 10:08:14 +00:00

error.go

add ErrCgroupNotExist

2024-09-23 23:27:35 +00:00

factory_linux_test.go

libct: rm BaseContainer and Container interfaces

2022-03-23 11:04:12 -07:00

factory_linux.go

libct: decouple libct/cg/devices

2024-04-17 15:05:38 -07:00

init_linux.go

Merge pull request #4405 from amghazanfari/main

2024-10-04 14:01:23 -07:00

message_linux.go

libcontainer: remove all mount logic from nsexec

2023-12-14 11:36:40 +11:00

mount_linux.go

libct/userns: split userns detection from internal userns code

2024-06-30 20:06:30 +02:00

network_linux.go

Remove io/ioutil use

2021-10-14 13:46:02 -07:00

notify_linux_test.go

Remove io/ioutil use

2021-10-14 13:46:02 -07:00

notify_linux.go

Remove io/ioutil use

2021-10-14 13:46:02 -07:00

notify_v2_linux.go

*: rm redundant linux build tag

2021-08-30 20:15:00 -07:00

process_linux.go

libct: rm initWaiter

2024-10-17 08:05:42 -07:00

process.go

Add I/O priority

2024-03-30 22:31:54 +09:00

README.md

libct/README: simplify example, rm inheritable caps

2024-09-25 21:48:29 -07:00

restored_process.go

libcontainer: remove LinuxFactory

2022-03-22 23:44:31 -07:00

rootfs_linux_test.go

libcontainer: force apps to think fips is enabled/disabled for testing

2024-04-10 18:58:34 -04:00

rootfs_linux.go

libcontainer/userns: migrate to github.com/moby/sys/userns

2024-10-09 22:20:25 +08:00

setns_init_linux.go

libct: rm eaccess

2024-06-07 10:18:59 -07:00

SPEC.md

ci/gha: add space-at-eol check, fix existing issues

2023-06-07 11:27:27 -07:00

standard_init_linux.go

libct: rm eaccess

2024-06-07 10:18:59 -07:00

state_linux_test.go

libct: rm BaseContainer and Container interfaces

2022-03-23 11:04:12 -07:00

state_linux.go

libct: Signal: honor RootlessCgroups

2024-09-11 03:54:52 +09:00

stats_linux.go

…

sync_unix.go

never send procError after the socket closed

2024-01-25 04:52:11 +00:00

sync.go

libcontainer: remove all mount logic from nsexec

2023-12-14 11:36:40 +11:00

README.md

libcontainer

Libcontainer provides a native Go implementation for creating containers with namespaces, cgroups, capabilities, and filesystem access controls. It allows you to manage the lifecycle of the container performing additional operations after the container is created.

Container

A container is a self contained execution environment that shares the kernel of the host system and which is (optionally) isolated from other containers in the system.

Using libcontainer

Container init

Because containers are spawned in a two step process you will need a binary that will be executed as the init process for the container. In libcontainer, we use the current binary (/proc/self/exe) to be executed as the init process, and use arg "init", we call the first step process "bootstrap", so you always need a "init" function as the entry of "bootstrap".

In addition to the go init function the early stage bootstrap is handled by importing nsenter.

For details on how runc implements such "init", see init.go and libcontainer/init_linux.go.

Device management

If you want containers that have access to some devices, you need to import this package into your code:

    import (
        _ "github.com/opencontainers/runc/libcontainer/cgroups/devices"
    )

Without doing this, libcontainer cgroup manager won't be able to set up device access rules, and will fail if devices are specified in the container configuration.

Container creation

To create a container you first have to create a configuration struct describing how the container is to be created. A sample would look similar to this:

defaultMountFlags := unix.MS_NOEXEC | unix.MS_NOSUID | unix.MS_NODEV
var devices []*devices.Rule
for _, device := range specconv.AllowedDevices {
	devices = append(devices, &device.Rule)
}
config := &configs.Config{
	Rootfs: "/your/path/to/rootfs",
	Capabilities: &configs.Capabilities{
		Bounding: []string{
			"CAP_KILL",
			"CAP_AUDIT_WRITE",
		},
		Effective: []string{
			"CAP_KILL",
			"CAP_AUDIT_WRITE",
		},
		Permitted: []string{
			"CAP_KILL",
			"CAP_AUDIT_WRITE",
		},
	},
	Namespaces: configs.Namespaces([]configs.Namespace{
		{Type: configs.NEWNS},
		{Type: configs.NEWUTS},
		{Type: configs.NEWIPC},
		{Type: configs.NEWPID},
		{Type: configs.NEWUSER},
		{Type: configs.NEWNET},
		{Type: configs.NEWCGROUP},
	}),
	Cgroups: &configs.Cgroup{
		Name:   "test-container",
		Parent: "system",
		Resources: &configs.Resources{
			MemorySwappiness: nil,
			Devices:          devices,
		},
	},
	MaskPaths: []string{
		"/proc/kcore",
		"/sys/firmware",
	},
	ReadonlyPaths: []string{
		"/proc/sys", "/proc/sysrq-trigger", "/proc/irq", "/proc/bus",
	},
	Devices:  specconv.AllowedDevices,
	Hostname: "testing",
	Mounts: []*configs.Mount{
		{
			Source:      "proc",
			Destination: "/proc",
			Device:      "proc",
			Flags:       defaultMountFlags,
		},
		{
			Source:      "tmpfs",
			Destination: "/dev",
			Device:      "tmpfs",
			Flags:       unix.MS_NOSUID | unix.MS_STRICTATIME,
			Data:        "mode=755",
		},
		{
			Source:      "devpts",
			Destination: "/dev/pts",
			Device:      "devpts",
			Flags:       unix.MS_NOSUID | unix.MS_NOEXEC,
			Data:        "newinstance,ptmxmode=0666,mode=0620,gid=5",
		},
		{
			Device:      "tmpfs",
			Source:      "shm",
			Destination: "/dev/shm",
			Data:        "mode=1777,size=65536k",
			Flags:       defaultMountFlags,
		},
		{
			Source:      "mqueue",
			Destination: "/dev/mqueue",
			Device:      "mqueue",
			Flags:       defaultMountFlags,
		},
		{
			Source:      "sysfs",
			Destination: "/sys",
			Device:      "sysfs",
			Flags:       defaultMountFlags | unix.MS_RDONLY,
		},
	},
	UIDMappings: []configs.IDMap{
		{
			ContainerID: 0,
			HostID: 1000,
			Size: 65536,
		},
	},
	GIDMappings: []configs.IDMap{
		{
			ContainerID: 0,
			HostID: 1000,
			Size: 65536,
		},
	},
	Networks: []*configs.Network{
		{
			Type:    "loopback",
			Address: "127.0.0.1/0",
			Gateway: "localhost",
		},
	},
	Rlimits: []configs.Rlimit{
		{
			Type: unix.RLIMIT_NOFILE,
			Hard: uint64(1025),
			Soft: uint64(1025),
		},
	},
}

Once you have the configuration populated you can create a container with a specified ID under a specified state directory:

container, err := libcontainer.Create("/run/containers", "container-id", config)
if err != nil {
	logrus.Fatal(err)
	return
}

To spawn bash as the initial process inside the container and have the processes pid returned in order to wait, signal, or kill the process:

process := &libcontainer.Process{
	Args:   []string{"/bin/bash"},
	Env:    []string{"PATH=/bin"},
	User:   "daemon",
	Stdin:  os.Stdin,
	Stdout: os.Stdout,
	Stderr: os.Stderr,
	Init:   true,
}

err := container.Run(process)
if err != nil {
	container.Destroy()
	logrus.Fatal(err)
	return
}

// wait for the process to finish.
_, err := process.Wait()
if err != nil {
	logrus.Fatal(err)
}

// destroy the container.
container.Destroy()

Additional ways to interact with a running container are:

// return all the pids for all processes running inside the container.
processes, err := container.Processes()

// get detailed cpu, memory, io, and network statistics for the container and
// it's processes.
stats, err := container.Stats()

// pause all processes inside the container.
container.Pause()

// resume all paused processes.
container.Resume()

// send signal to container's init process.
container.Signal(signal)

// update container resource constraints.
container.Set(config)

// get current status of the container.
status, err := container.Status()

// get current container's state information.
state, err := container.State()

Checkpoint & Restore

libcontainer now integrates CRIU for checkpointing and restoring containers. This lets you save the state of a process running inside a container to disk, and then restore that state into a new process, on the same machine or on another machine.

criu version 1.5.2 or higher is required to use checkpoint and restore. If you don't already have criu installed, you can build it from source, following the online instructions. criu is also installed in the docker image generated when building libcontainer with docker.

Copyright and license

Code and documentation copyright 2014 Docker, inc. The code and documentation are released under the Apache 2.0 license. The documentation is also released under Creative Commons Attribution 4.0 International License. You may obtain a copy of the license, titled CC-BY-4.0, at http://creativecommons.org/licenses/by/4.0/.