On MIPS arches, Rdev is uint32 so we have to convert it.
Fixes issue 4962.
Fixes: 8476df83 ("libct: add/use isDevNull, verifyDevNull")
Fixes: de87203e ("console: verify /dev/pts/ptmx before use")
Fixes: 398955bc ("console: add fallback for pre-TIOCGPTPEER kernels")
Reported-by: Tianon Gravi <admwiggin@gmail.com>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This is primarily done out of an abudance of caution against runc exec
being attacked by a container where /dev/pts/ptmx has been replaced with
some other bad inode (a disconnected NFS handle, a symlink that goes
through a leaked runc file descriptor to reference a host ptmx, etc).
Unfortunately, we cannot trivially verify that /dev/pts/ptmx is actually
the /dev/pts from the container without storing stuff like the fsid in
the runc state.json, which is probably not worth the extra effort. This
should at least avoid the most concerning cases.
Reported-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
An attacker could make /dev/console a symlink. This presents two
possible issues:
1. os.Create will happily truncate targets, which could have resulted
in a worse version of CVE-2024-4531. Luckily, this all happens after
pivot_root(2) so the scope of that particular attack is fairly
limited (you are unlikely to be able to easily access host rootfs
files -- though it might be possible to take advantage of leaks such
as in CVE-2024-21626). However, O_CREAT|O_NOFOLLOW is what we should
be doing for all file creations.
2. Because we passed /dev/console as the only mount path (as opposed to
using a /proc/self/fd/$n path), an attacker could swap the symlink
to point to any other path and thus cause us to mount over some
other path. This is not as big of a problem because all the mounts
are in the container namespace after pivot_root(2), and users
usually can create arbitrary mount targets inside the container.
These issues don't seem particularly exploitable, but they deserve to be
hardened regardless.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
The pty driver has very consistent allocation rules for the major:minor
numbers of /dev/pts/$n inodes, so it is possible to somewhat safely open
/dev/pts/* paths if we validate that the inode is the one we expect.
It is possible for an attacker to have over-mounted a pts peer from a
different devpts instance, but to fix this would require more tracking
of devpts instances than runc currently can do.
This means runc should continue to work on very old kernels.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
When opening the peer end of a pty, the old kernel API required us to
open /dev/pts/$num inside the container (at least since we fixed console
handling many years ago in commit 244c9fc426 ("*: console rewrite")).
The problem is that in a hostile container it is possible for
/dev/pts/$num to be an attacker-controlled symlink that runc can be
tricked into resolving when doing bind-mounts. This allows the attacker
to (among other things) persist /proc/... entries that are later masked
by runc, allowing an attacker to escape through the kernel.core_pattern
sysctl (/proc/sys/kernel/core_pattern). This is the original issue
reported by Lei Wang and Li Fu Bang in CVE-2025-52565.
However, it should be noted that this is not entirely a newly-discovered
problem. Way back in Linux 4.13 (2017), I added the TIOCGPTPEER ioctl,
which allows us to get a pty peer without touching the /dev/pts inside
the container. The original threat model was around an attacker
replacing /dev/pts/$n or /dev/pts/ptmx with some malicious inode (a DoS
inode, or possibly a PTY they wanted a confused deputy to operate on).
Unfortunately, there was no practical way for runc to cache a safe
O_PATH handle to /dev/pts/ptmx (unlike other runtimes like LXC, which
switched to TIOCGPTPEER way back in 2017). Since it wasn't clear how we
could protect against the main attack TIOCGPTPEER was meant to protect
against, we never switched to it (even though I implemented it
specifically to harden container runtimes).
Unfortunately, It turns out that mount *sources* are a threat we didn't
fully consider. Since TIOCGPTPEER already solves this problem entirely
for us in a race free way, we should just use that. In a later patch, we
will add some hardening for /dev/pts/$num opening to maintain support
for very old kernels (Linux 4.13 is very old at this point, but RHEL 7
is still kicking and is stuck on Linux 3.10).
Fixes: GHSA-qw9x-cqr3-wc7r CVE-2025-52565
Reported-by: Lei Wang <ssst0n3@gmail.com> (CVE-2025-52565)
Reported-by: lfbzhm <lifubang@acmcoder.com> (CVE-2025-52565)
Reported-by: Aleksa Sarai <cyphar@cyphar.com> (TIOCGPTPEER)
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Umask is problematic for Go programs as it affects other goroutines
(see [1] for more details).
Instead of using it, let's just prop up with Chmod.
Note this patch misses the MkdirAll call in createDeviceNode. Since the
runtime spec does not say anything about creating intermediary
directories for device nodes, let's assume that doing it via mkdir with
the current umask set is sufficient (if not, we have to reimplement
MkdirAll from scratch, with added call to os.Chmod).
[1] https://github.com/opencontainers/runc/pull/3563#discussion_r990293788
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1. Simplify mount call by removing the procfd argument, and use the new
mount() where procfd is not used. Now, the mount() arguments are the
same as for unix.Mount.
2. Introduce a new mountViaFDs function, which is similar to the old
mount(), except it can take procfd for both source and target.
The new arguments are called srcFD and dstFD.
3. Modify the mount error to show both srcFD and dstFD so it's clear
which one is used for which purpose. This fixes the issue of having
a somewhat cryptic errors like this:
> mount /proc/self/fd/11:/sys/fs/cgroup/systemd (via /proc/self/fd/12), flags: 0x20502f: operation not permitted
(in which fd 11 is actually the source, and fd 12 is the target).
After this change, it looks like
> mount src=/proc/self/fd/11, dst=/sys/fs/cgroup/systemd, dstFD=/proc/self/fd/12, flags=0x20502f: operation not permitted
so it's clear that 12 is a destination fd.
4. Fix the mountViaFDs callers to use dstFD (rather than procfd) for the
variable name.
5. Use srcFD where mountFd is set.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Errors returned by unix are bare. In some cases it's impossible to find
out what went wrong because there's is not enough context.
Add a mountError type (mostly copy-pasted from github.com/moby/sys/mount),
and mount/unmount helpers. Use these where appropriate, and convert error
checks to use errors.Is.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
gofumpt (mvdan.cc/gofumpt) is a fork of gofmt with stricter rules.
Brought to you by
git ls-files \*.go | grep -v ^vendor/ | xargs gofumpt -s -w
Looking at the diff, all these changes make sense.
Also, replace gofmt with gofumpt in golangci.yml.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This moves all console code to use github.com/containerd/console library to
handle console I/O. Also move to use EpollConsole by default when user requests
a terminal so we can still cope when the other side temporarily goes away.
Signed-off-by: Daniel Dao <dqminh89@gmail.com>
Use IoctlGetInt and IoctlGetTermios/IoctlSetTermios instead of manually
reimplementing them.
Because of unlockpt, the ioctl wrapper is still needed as it needs to
pass a pointer to a value, which is not supported by any ioctl function
in x/sys/unix yet.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
And use it only in local tooling that is forwarding the pseudoterminal
master. That way runC no longer has an opinion on the onlcr setting
for folks who are creating a terminal and detaching. They'll use
--console-socket and can setup the pseudoterminal however they like
without runC having an opinion. With this commit, the only cases
where runC still has applies SaneTerminal is when *it* is the process
consuming the master descriptor.
Signed-off-by: W. Trevor King <wking@tremily.us>
This allows for higher-level orchestrators to be able to have access to
the master pty file descriptor without keeping the runC process running.
This is key to having (detach && createTTY) with a _real_ pty created
inside the container, which is then sent to a higher level orchestrator
over an AF_UNIX socket.
This patch is part of the console rewrite patchset.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
Since the gid=X and mode=Y flags can be set inside config.json as mount
options, don't override them with our own defaults. This avoids
/dev/pts/* not being owned by tty in a regular container, as well as all
of the issues with us implementing grantpt(3) manually. This is the
least opinionated approach to take.
This patch is part of the console rewrite patchset.
Reported-by: Mrunal Patel <mrunalp@gmail.com>
Signed-off-by: Aleksa Sarai <asarai@suse.de>
This implements {createTTY, detach} and all of the combinations and
negations of the two that were previously implemented. There are some
valid questions about out-of-OCI-scope topics like !createTTY and how
things should be handled (why do we dup the current stdio to the
process, and how is that not a security issue). However, these will be
dealt with in a separate patchset.
In order to allow for late console setup, split setupRootfs into the
"preparation" section where all of the mounts are created and the
"finalize" section where we pivot_root and set things as ro. In between
the two we can set up all of the console mountpoints and symlinks we
need.
We use two-stage synchronisation to ensures that when the syscalls are
reordered in a suboptimal way, an out-of-place read() on the parentPipe
will not gobble the ancilliary information.
This patch is part of the console rewrite patchset.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
The default terminal setting for a new pty on Linux (unix98) has +ONLCR,
resulting in '\n' writes by a container process to be converted to
'\r\n' reads by the managing process. This is quite unexpected, and
causes multiple issues with things like bats testing. To fix it, make
the terminal sane after opening it by setting -ONLCR.
This patch might need to be rewritten after the console rewrite patchset
is merged.
Signed-off-by: Aleksa Sarai <asarai@suse.de>