opencontainers/runc - runc - Linuxmonk: Open Source Repository Mirror

mirror of https://github.com/opencontainers/runc.git synced 2026-02-05 18:45:28 +01:00

Author	SHA1	Message	Date
Kir Kolyshkin	652269729d	libc/int: use strings.Builder Generated by modernize@latest (v0.21.0). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-12-16 15:04:04 -08:00
Curd Becker	536e183451	Replace os.Is* error checking functions with their errors.Is counterpart Signed-off-by: Curd Becker <me@curd-becker.de>	2025-12-11 03:16:02 +01:00
Akihiro Suda	64c3c8eea6	Merge pull request #4994 from kolyshkin/gofumpt-extra Enable gofumpt extra rules	2025-11-28 09:30:57 +09:00
Kir Kolyshkin	5fbc3bb019	libct/int: TestFdLeaks: deflake Since the recent CVE fixes, TestFdLeaksSystemd sometimes fails: === RUN TestFdLeaksSystemd exec_test.go:1750: extra fd 9 -> /12224/task/13831/fd exec_test.go:1753: found 1 extra fds after container.Run --- FAIL: TestFdLeaksSystemd (0.10s) It might have been caused by the change to the test code in commit `ff6fe13` ("utils: use safe procfs for /proc/self/fd loop code") -- we are now opening a file descriptor during the logic to get a list of file descriptors. If the file descriptor happens to be allocated to a different number, you'll get an error. Let's try to filter out the fd used to read a directory. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-11-20 15:36:14 +08:00
Aleksa Sarai	3b75374cc7	runtime-spec: update pids.limit handling to match new guidance The main update is actually in github.com/opencontainers/cgroups, but we need to also update runtime-spec to a newer pre-release version to get the updates from there as well. In short, the behaviour change is now that "0" is treated as a valid value to set in "pids.max", "-1" means "max" and unset/nil means "do nothing". As described in the opencontainers/cgroups PR, this change is actually backwards compatible because our internal state.json stores PidsLimit, and that entry is marked as "omitempty". So, an old runc would omit PidsLimit=0 in state.json, and this will be parsed by a new runc as being "nil" -- and both would treat this case as "do not set anything". Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-11 15:15:27 +11:00
Kir Kolyshkin	67840cce4b	Enable gofumpt extra rules Commit `b2f8a74d` "clothed" the naked return as inflicted by gofumpt v0.9.0. Since gofumpt v0.9.2 this rule was moved to "extra" category, not enabled by default. The only other "extra" rule is to group adjacent parameters with the same type, which also makes sense. Enable gofumpt "extra" rules, and reformat the code accordingly. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-11-10 13:18:45 -08:00
Aleksa Sarai	ff6fe13246	utils: use safe procfs for /proc/self/fd loop code From a safety perspective this might not be strictly required, but it paves the way for us to remove utils.ProcThreadSelf. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-01 21:24:04 +11:00
Kir Kolyshkin	89e59902c4	Modernize code for Go 1.24 Brought to you by modernize -fix -test ./... Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-08-27 19:11:02 -07:00
Kir Kolyshkin	a638f1330b	.golangci.yml: add nolintlint, fix found issues The errrolint linter can finally ignore errors from Close, and it also ignores direct comparisons of errors from x/sys/unix. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-03-24 11:59:54 -07:00
Kir Kolyshkin	65e0f2b719	libct/int: use destroyContainer Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-03-24 10:02:47 -07:00
Kir Kolyshkin	1aebfa3eab	libct/int: don't use _ = runContainerOk There is no need to explicitly ignore returned value. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-03-24 10:02:47 -07:00
Kir Kolyshkin	5ac77ed6d9	libct/int: add/use needUserNS helper Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-03-13 08:40:27 -07:00
Kir Kolyshkin	a75076b4a4	Switch to opencontainers/cgroups This removes libcontainer/cgroups packages and starts using those from github.com/opencontainers/cgroups repo. Mostly generated by: git rm -f libcontainer/cgroups find . -type f -name "*.go" -exec sed -i \ 's\|github.com/opencontainers/runc/libcontainer/cgroups\|github.com/opencontainers/cgroups\|g' \ {} + go get github.com/opencontainers/cgroups@v0.0.1 make vendor gofumpt -w . Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-02-28 15:20:33 -08:00
Kir Kolyshkin	7dc2486889	libct: switch to numeric UID/GID/groups This addresses the following TODO in the code (added back in 2015 by commit `845fc65e5`): > // TODO: fix libcontainer's API to better support uid/gid in a typesafe way. Historically, libcontainer internally uses strings for user, group, and additional (aka supplementary) groups. Yet, runc receives those credentials as part of runtime-spec's process, which uses integers for all of them (see [1], [2]). What happens next is: 1. runc start/run/exec converts those credentials to strings (a User string containing "UID:GID", and a []string for additional GIDs) and passes those onto runc init. 2. runc init converts them back to int, in the most complicated way possible (parsing container's /etc/passwd and /etc/group). All this conversion and, especially, parsing is totally unnecessary, but is performed on every container exec (and start). The only benefit of all this is, a libcontainer user could use user and group names instead of numeric IDs (but runc itself is not using this feature, and we don't know if there are any other users of this). Let's remove this back and forth translation, hopefully increasing runc exec performance. The only remaining need to parse /etc/passwd is to set HOME environment variable for a specified UID, in case $HOME is not explicitly set in process.Env. This can now be done right in prepareEnv, which simplifies the code flow a lot. Alas, we can not use standard os/user.LookupId, as it could cache host's /etc/passwd or the current user (even with the osusergo tag). PS Note that the structures being changed (initConfig and Process) are never saved to disk as JSON by runc, so there is no compatibility issue for runc users. Still, this is a breaking change in libcontainer, but we never promised that libcontainer API will be stable (and there's a special package that can handle it -- github.com/moby/sys/user). Reflect this in CHANGELOG. For 3998. [1]: https://github.com/opencontainers/runtime-spec/blob/v1.0.2/config.md#posix-platform-user [2]: https://github.com/opencontainers/runtime-spec/blob/v1.0.2/specs-go/config.go#L86 Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-02-06 17:49:17 -08:00
Kir Kolyshkin	6171da6005	libct/configs: add HookList.SetDefaultEnv 1. Make CommandHook.Command a pointer, which reduces the amount of data being copied when using hooks, and allows to modify command hooks. 2. Add SetDefaultEnv, which is to be used by the next commit. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-01-09 18:22:53 +08:00
Kir Kolyshkin	a56f85f87b	libct/*: switch from configs to cgroups Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2024-12-11 19:08:40 -08:00
Sebastiaan van Stijn	30b530ca94	libct/userns: split userns detection from internal userns code Commit `4316df8b53` isolated RunningInUserNS to a separate package to make it easier to consume without bringing in additional dependencies, and with the potential to move it separate in a similar fashion as libcontainer/user was moved to a separate module in commit `ca32014adb`. While RunningInUserNS is fairly trivial to implement, it (or variants of this utility) is used in many codebases, and moving to a separate module could consolidate those implementations, as well as making it easier to consume without large dependency trees (when being a package as part of a larger code base). Commit `1912d5988b` and follow-ups introduced cgo code into the userns package, and code introduced in those commits are not intended for external use, therefore complicating the potential of moving the userns package separate. This commit moves the new code to a separate package; some of this code was included in v1.1.11 and up, but I could not find external consumers of `GetUserNamespaceMappings` and `IsSameMapping`. The `Mapping` and `Handles` types (added in `ba0b5e2698`) only exist in main and in non-stable releases (v1.2.0-rc.x), so don't need an alias / deprecation. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2024-06-30 20:06:30 +02:00
lifubang	4ea0bf88fd	update/add some tests for rlimit issues: https://github.com/opencontainers/runc/issues/4195 https://github.com/opencontainers/runc/pull/4265#discussion_r1588599809 Signed-off-by: lifubang <lifubang@acmcoder.com>	2024-05-08 10:57:10 +00:00
Aleksa Sarai	8e8b136c49	tree-wide: use /proc/thread-self for thread-local state With the idmap work, we will have a tainted Go thread in our thread-group that has a different mount namespace to the other threads. It seems that (due to some bad luck) the Go scheduler tends to make this thread the thread-group leader in our tests, which results in very baffling failures where /proc/self/mountinfo produces gibberish results. In order to avoid this, switch to using /proc/thread-self for everything that is thread-local. This primarily includes switching all file descriptor paths (CLONE_FS), all of the places that check the current cgroup (technically we never will run a single runc thread in a separate cgroup, but better to be safe than sorry), and the aforementioned mountinfo code. We don't need to do anything for the following because the results we need aren't thread-local: * Checks that certain namespaces are supported by stat(2)ing /proc/self/ns/... * /proc/self/exe and /proc/self/cmdline are not thread-local. * While threads can be in different cgroups, we do not do this for the runc binary (or libcontainer) and thus we do not need to switch to the thread-local version of /proc/self/cgroups. * All of the CLONE_NEWUSER files are not thread-local because you cannot set the usernamespace of a single thread (setns(CLONE_NEWUSER) is blocked for multi-threaded programs). Note that we have to use runtime.LockOSThread when we have an open handle to a tid-specific procfs file that we are operating on multiple times. Go can reschedule us such that we are running on a different thread and then kill the original thread (causing -ENOENT or similarly confusing errors). This is not strictly necessary for most usages of /proc/thread-self (such as using /proc/thread-self/fd/$n directly) since only operating on the actual inodes associated with the tid requires this locking, but because of the pre-3.17 fallback for CentOS, we have to do this in most cases. In addition, CentOS's kernel is too old for /proc/thread-self, which requires us to emulate it -- however in rootfs_linux.go, we are in the container pid namespace but /proc is the host's procfs. This leads to the incredibly frustrating situation where there is no way (on pre-4.1 Linux) to figure out which /proc/self/task/... entry refers to the current tid. We can just use /proc/self in this case. Yes this is all pretty ugly. I also wish it wasn't necessary. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2023-12-14 11:36:41 +11:00
Aleksa Sarai	09822c3da8	configs: disallow ambiguous userns and timens configurations For userns and timens, the mappings (and offsets, respectively) cannot be changed after the namespace is first configured. Thus, configuring a container with a namespace path to join means that you cannot also provide configuration for said namespace. Previously we would silently ignore the configuration (and just join the provided path), but we really should be returning an error (especially when you consider that the configuration userns mappings are used quite a bit in runc with the assumption that they are the correct mapping for the userns -- but in this case they are not). In the case of userns, the mappings are also required if you _do not_ specify a path, while in the case of the time namespace you can have a container with a timens but no mappings specified. It should be noted that the case checking that the user has not specified a userns path and a userns mapping needs to be handled in specconv (as opposed to the configuration validator) because with this patchset we now cache the mappings of path-based userns configurations and thus the validator can't be sure whether the mapping is a cached mapping or a user-specified one. So we do the validation in specconv, and thus the test for this needs to be an integration test. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2023-12-05 17:46:09 +11:00
Francis Laniel	c47f58c4e9	Capitalize [UG]idMappings as [UG]IDMappings Signed-off-by: Francis Laniel <flaniel@linux.microsoft.com>	2023-07-21 13:55:34 +02:00
Kir Kolyshkin	f8ad20f500	runc kill: drop -a option As of previous commit, this is implied in a particular scenario. In fact, this is the one and only scenario that justifies the use of -a. Drop the option from the documentation. For backward compatibility, do recognize it, and retain the feature of ignoring the "container is stopped" error when set. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2023-06-08 09:30:40 -07:00
Kir Kolyshkin	9583b3d1c2	libct: move killing logic to container.Signal By default, the container has its own PID namespace, and killing (with SIGKILL) its init process from the parent PID namespace also kills all the other processes. Obviously, it does not work that way when the container is sharing its PID namespace with the host or another container, since init is no longer special (it's not PID 1). In this case, killing container's init will result in a bunch of other processes left running (and thus the inability to remove the cgroup). The solution to the above problem is killing all the container processes, not just init. The problem with the current implementation is, the killing logic is implemented in libcontainer's initProcess.wait, and thus only available to libcontainer users, but not the runc kill command (which uses nonChildProcess.kill and does not use wait at all). So, some workarounds exist: - func destroy(c *Container) calls signalAllProcesses; - runc kill implements -a flag. This code became very tangled over time. Let's simplify things by moving the killing all processes from initProcess.wait to container.Signal, and documents the new behavior. In essence, this also makes `runc kill` to automatically kill all container processes when the container does not have its own PID namespace. Document that as well. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2023-06-08 09:29:25 -07:00
Kir Kolyshkin	2a7dcbbb40	libct: fix shared pidns detection When someone is using libcontainer to start and kill containers from a long lived process (i.e. the same process creates and removes the container), initProcess.wait method is used, which has a kludge to work around killing containers that do not have their own PID namespace. The code that checks for own PID namespace is not entirely correct. To be exact, it does not set sharePidns flag when the host/caller PID namespace is implicitly used. As a result, the above mentioned kludge does not work. Fix the issue, add a test case (which fails without the fix). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2023-06-08 09:23:29 -07:00
Kir Kolyshkin	5b8f8712a4	libct: signalAllProcesses: remove child reaping There are two very distinct usage scenarios for signalAllProcesses: * when used from the runc binary ("runc kill" command), the processes that it kills are not the children of "runc kill", and so calling wait(2) on each process is totally useless, as it will return ECHLD; * when used from a program that have created the container (such as libcontainer/integration test suite), that program can and should call wait(2), not the signalling code. So, the child reaping code is totally useless in the first case, and should be implemented by the program using libcontainer in the second case. I was not able to track down how this code was added, my best guess is it happened when this code was part of dockerd, which did not have a proper child reaper implemented at that time. Remove it, and add a proper documentation piece. Change the integration test accordingly. PS the first attempt to disable the child reaping code in signalAllProcesses was made in commit `bb912eb00c`, which used a questionable heuristic to figure out whether wait(2) should be called. This heuristic worked for a particular use case, but is not correct in general. While at it: - simplify signalAllProcesses to use unix.Kill; - document (container).Signal. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2023-06-08 09:23:29 -07:00
Kir Kolyshkin	f2e71b085d	libct/int: make TestFdLeaks more robust The purpose of this test is to check that there are no extra file descriptors left open after repeated calls to runContainer. In fact, the first call to runContainer leaves a few file descriptors opened, and this is by design. Previously, this test relied on two things: 1. some other tests were run before it (and thus all such opened-once file descriptors are already opened); 2. explicitly excluding fd opened to /sys/fs/cgroup. Now, if we run this test separately, it will fail (because of 1 above). The same may happen if the tests are run in a random order. To fix this, add a container run before collection the initial fd list, so those fds that are opened once are included and won't be reported. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2023-02-22 02:58:47 -08:00
Kir Kolyshkin	be7e03940f	libct/int: wording nits Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2023-02-22 02:58:47 -08:00
Kir Kolyshkin	7c75e84e22	libc/int: add/use runContainerOk wrapper This is to de-duplicate the code that checks that err is nil and that the exit code is zero. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2023-02-22 02:58:47 -08:00
Kir Kolyshkin	98fe566c52	runc: do not set inheritable capabilities Do not set inheritable capabilities in runc spec, runc exec --cap, and in libcontainer integration tests. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2022-05-12 08:14:50 +10:00
Kir Kolyshkin	0fec1c2d8c	libct: Mount: rm {Pre,Post}mountCmds Those were added by commit `59c5c3ac0` back in Apr 2015, but AFAICS were never used and are obsoleted by more generic container hooks (initially added by commit `05567f2c94` in Sep 2015). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2022-01-26 15:51:55 -08:00
Kir Kolyshkin	953e56c56f	libct/int: runContainer: drop console arg It is not and was never ever used. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-11-29 20:10:22 -08:00
Kir Kolyshkin	972aea3af0	libct/configs/validate: allow / in sysctl names Runtime spec says: > sysctl (object, OPTIONAL) allows kernel parameters to be modified at > runtime for the container. For more information, see the sysctl(8) > man page. and sysctl(8) says: > variable > The name of a key to read from. An example is > kernel.ostype. The '/' separator is also accepted in place of a '.'. Apparently, runc config validator do not support sysctls with / as a separator. Fortunately this is a one-line fix. Add some more test data where / is used as a separator. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-10-29 09:45:55 -07:00
Akihiro Suda	95f8ecdd53	fix `libcontainer/integration/exec_test.go:1859:8: undefined: ioutil` Fix `4d17654479` Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2021-10-28 14:56:03 +09:00
Akihiro Suda	4d17654479	Merge pull request #2576 from kinvolk/alban/userns-2484-take2 Open bind mount sources from the host userns	2021-10-28 14:50:33 +09:00
Mauricio Vásquez	8542322dfe	libcontainer: Add unit tests with userns and mounts Add a unit test to check that bind mounts that have a part of its path non accessible by others still work when using user namespaces. To do this, we also modify newRoot() to return rootfs directories that can be traverse by others, so the rootfs created works for all test (either running in a userns or not). Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io> Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io> Co-authored-by: Rodrigo Campos <rodrigo@kinvolk.io>	2021-10-16 17:29:33 +02:00
Kir Kolyshkin	5516294172	Remove io/ioutil use See https://golang.org/doc/go1.16#ioutil Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-10-14 13:46:02 -07:00
Kir Kolyshkin	3bc606e9d3	libct/int: adapt to Go 1.15 1. Use t.TempDir instead of ioutil.TempDir. This means no need for an explicit cleanup, which removes some code, including newTestBundle and newTestRoot. 2. Move newRootfs invocation down to newTemplateConfig, removing a need for explicit rootfs creation. Also, remove rootfs from tParam as it is no longer needed (there was a since test case in which two containers shared the same rootfs, but it does not look like it's required for the test). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-07-27 01:41:47 -07:00
Kir Kolyshkin	5dc3260431	libct/int/TestFreeze: test freeze/thaw via Set In addition to freezing and thawing a container via Pause/Resume, there is a way to also do so via Set. This way was broken though and is being fixed by a few preceding commits. The test is added to make sure this is fixed and won't regress. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-07-14 23:42:35 -07:00
Kir Kolyshkin	e969d42156	libct/int/testPids: logging nits Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-06-02 17:31:03 -07:00
Kir Kolyshkin	e6048715e4	Use gofumpt to format code gofumpt (mvdan.cc/gofumpt) is a fork of gofmt with stricter rules. Brought to you by git ls-files \*.go \| grep -v ^vendor/ \| xargs gofumpt -s -w Looking at the diff, all these changes make sense. Also, replace gofmt with gofumpt in golangci.yml. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-06-01 12:17:27 -07:00
Aleksa Sarai	ed4781029f	merge branch 'pr-2781' Sebastiaan van Stijn (7): errcheck: utils errcheck: signals errcheck: tty errcheck: libcontainer errcheck: libcontainer/nsenter errcheck: libcontainer/configs errcheck: libcontainer/integration LGTM: AkihiroSuda cyphar Closes #2781	2021-05-25 12:31:52 +10:00
Aleksa Sarai	54904516e6	libcontainer: fix integration failure in "make test" When running inside a Docker container, systemd is not available. The new TestFdLeaksSystemd forgot to include the relevant t.Skip section. Fixes: `a7feb42395` ("libct/int: add TestFdLeaksSystemd") Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2021-05-23 17:55:09 +10:00
Aleksa Sarai	c7c70ce810	*: clean t.Skip messages Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2021-05-23 17:53:01 +10:00
Sebastiaan van Stijn	a899505377	errcheck: libcontainer/integration Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2021-05-20 14:17:40 +02:00
Kir Kolyshkin	a7feb42395	libct/int: add TestFdLeaksSystemd Add a test to check that container.Run do not leak file descriptors. Before the previous commit, it fails like this: exec_test.go:2030: extra fd 8 -> socket:[659703] exec_test.go:2030: extra fd 11 -> socket:[658715] exec_test.go:2033: found 2 extra fds after container.Run Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-05-06 12:37:55 -07:00
Kir Kolyshkin	6faed0e486	libct/int: use ok(t, err) ... in all the places it makes sense to use it. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-04-15 13:03:17 -07:00
Kir Kolyshkin	af3c5699a5	libct/int: remove unused code Since commit `88e8350de2` the error message is different, so the check is not working. In addition, for the cgroup v2 case, and it seems that PID controller is always available these days. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-04-15 12:46:17 -07:00
Kir Kolyshkin	7b802a7da4	libct/int: better test container names 1. Do not create the same container named "test" over and over. 2. Fix randomization issues when generating container and cgroup names. The issues were: * math/rand used without seeding * complex rand/md5/hexencode sequence In both cases, replace with nanosecond time encoded with digits and lowercase letters. 3. Add test name to container and cgroup names. For example, this is how systemd log has changed: Before: Started libcontainer container test16ddfwutxgjte. After: Started libcontainer container TestPidsSystemd-4oaqvr. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-04-15 12:37:59 -07:00
Qiang Huang	2d38476c96	Merge pull request #2840 from kolyshkin/ignore-kmem Ignore kernel memory settings	2021-04-13 09:44:14 +08:00
Kir Kolyshkin	52390d6804	Ignore kernel memory settings This is somewhat radical approach to deal with kernel memory. Per-cgroup kernel memory limiting was always problematic. A few examples: - older kernels had bugs and were even oopsing sometimes (best example is RHEL7 kernel); - kernel is unable to reclaim the kernel memory so once the limit is hit a cgroup is toasted; - some kernel memory allocations don't allow failing. In addition to that, - users don't have a clue about how to set kernel memory limits (as the concept is much more complicated than e.g. [user] memory); - different kernels might have different kernel memory usage, which is sort of unexpected; - cgroup v2 do not have a [dedicated] kmem limit knob, and thus runc silently ignores kernel memory limits for v2; - kernel v5.4 made cgroup v1 kmem.limit obsoleted (see https://github.com/torvalds/linux/commit/0158115f702b). In view of all this, and as the runtime-spec lists memory.kernel and memory.kernelTCP as OPTIONAL, let's ignore kernel memory limits (for cgroup v1, same as we're already doing for v2). This should result in less bugs and better user experience. The only bad side effect from it might be that stat can show kernel memory usage as 0 (since the accounting is not enabled). [v2: add a warning in specconv that limits are ignored] Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-04-12 12:18:11 -07:00

1 2 3

125 Commits