2015-02-09 14:07:18 -08:00
|
|
|
package libcontainer
|
|
|
|
|
|
|
|
|
|
import (
|
2025-08-01 19:33:12 +10:00
|
|
|
"errors"
|
console: use TIOCGPTPEER when allocating peer PTY
When opening the peer end of a pty, the old kernel API required us to
open /dev/pts/$num inside the container (at least since we fixed console
handling many years ago in commit 244c9fc426ae ("*: console rewrite")).
The problem is that in a hostile container it is possible for
/dev/pts/$num to be an attacker-controlled symlink that runc can be
tricked into resolving when doing bind-mounts. This allows the attacker
to (among other things) persist /proc/... entries that are later masked
by runc, allowing an attacker to escape through the kernel.core_pattern
sysctl (/proc/sys/kernel/core_pattern). This is the original issue
reported by Lei Wang and Li Fu Bang in CVE-2025-52565.
However, it should be noted that this is not entirely a newly-discovered
problem. Way back in Linux 4.13 (2017), I added the TIOCGPTPEER ioctl,
which allows us to get a pty peer without touching the /dev/pts inside
the container. The original threat model was around an attacker
replacing /dev/pts/$n or /dev/pts/ptmx with some malicious inode (a DoS
inode, or possibly a PTY they wanted a confused deputy to operate on).
Unfortunately, there was no practical way for runc to cache a safe
O_PATH handle to /dev/pts/ptmx (unlike other runtimes like LXC, which
switched to TIOCGPTPEER way back in 2017). Since it wasn't clear how we
could protect against the main attack TIOCGPTPEER was meant to protect
against, we never switched to it (even though I implemented it
specifically to harden container runtimes).
Unfortunately, It turns out that mount *sources* are a threat we didn't
fully consider. Since TIOCGPTPEER already solves this problem entirely
for us in a race free way, we should just use that. In a later patch, we
will add some hardening for /dev/pts/$num opening to maintain support
for very old kernels (Linux 4.13 is very old at this point, but RHEL 7
is still kicking and is stuck on Linux 3.10).
Fixes: GHSA-qw9x-cqr3-wc7r CVE-2025-52565
Reported-by: Lei Wang <ssst0n3@gmail.com> (CVE-2025-52565)
Reported-by: lfbzhm <lifubang@acmcoder.com> (CVE-2025-52565)
Reported-by: Aleksa Sarai <cyphar@cyphar.com> (TIOCGPTPEER)
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2025-05-15 16:12:21 +10:00
|
|
|
"fmt"
|
2015-02-09 14:07:18 -08:00
|
|
|
"os"
|
console: use TIOCGPTPEER when allocating peer PTY
When opening the peer end of a pty, the old kernel API required us to
open /dev/pts/$num inside the container (at least since we fixed console
handling many years ago in commit 244c9fc426ae ("*: console rewrite")).
The problem is that in a hostile container it is possible for
/dev/pts/$num to be an attacker-controlled symlink that runc can be
tricked into resolving when doing bind-mounts. This allows the attacker
to (among other things) persist /proc/... entries that are later masked
by runc, allowing an attacker to escape through the kernel.core_pattern
sysctl (/proc/sys/kernel/core_pattern). This is the original issue
reported by Lei Wang and Li Fu Bang in CVE-2025-52565.
However, it should be noted that this is not entirely a newly-discovered
problem. Way back in Linux 4.13 (2017), I added the TIOCGPTPEER ioctl,
which allows us to get a pty peer without touching the /dev/pts inside
the container. The original threat model was around an attacker
replacing /dev/pts/$n or /dev/pts/ptmx with some malicious inode (a DoS
inode, or possibly a PTY they wanted a confused deputy to operate on).
Unfortunately, there was no practical way for runc to cache a safe
O_PATH handle to /dev/pts/ptmx (unlike other runtimes like LXC, which
switched to TIOCGPTPEER way back in 2017). Since it wasn't clear how we
could protect against the main attack TIOCGPTPEER was meant to protect
against, we never switched to it (even though I implemented it
specifically to harden container runtimes).
Unfortunately, It turns out that mount *sources* are a threat we didn't
fully consider. Since TIOCGPTPEER already solves this problem entirely
for us in a race free way, we should just use that. In a later patch, we
will add some hardening for /dev/pts/$num opening to maintain support
for very old kernels (Linux 4.13 is very old at this point, but RHEL 7
is still kicking and is stuck on Linux 3.10).
Fixes: GHSA-qw9x-cqr3-wc7r CVE-2025-52565
Reported-by: Lei Wang <ssst0n3@gmail.com> (CVE-2025-52565)
Reported-by: lfbzhm <lifubang@acmcoder.com> (CVE-2025-52565)
Reported-by: Aleksa Sarai <cyphar@cyphar.com> (TIOCGPTPEER)
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2025-05-15 16:12:21 +10:00
|
|
|
"runtime"
|
2017-03-27 16:46:57 -05:00
|
|
|
|
console: use TIOCGPTPEER when allocating peer PTY
When opening the peer end of a pty, the old kernel API required us to
open /dev/pts/$num inside the container (at least since we fixed console
handling many years ago in commit 244c9fc426ae ("*: console rewrite")).
The problem is that in a hostile container it is possible for
/dev/pts/$num to be an attacker-controlled symlink that runc can be
tricked into resolving when doing bind-mounts. This allows the attacker
to (among other things) persist /proc/... entries that are later masked
by runc, allowing an attacker to escape through the kernel.core_pattern
sysctl (/proc/sys/kernel/core_pattern). This is the original issue
reported by Lei Wang and Li Fu Bang in CVE-2025-52565.
However, it should be noted that this is not entirely a newly-discovered
problem. Way back in Linux 4.13 (2017), I added the TIOCGPTPEER ioctl,
which allows us to get a pty peer without touching the /dev/pts inside
the container. The original threat model was around an attacker
replacing /dev/pts/$n or /dev/pts/ptmx with some malicious inode (a DoS
inode, or possibly a PTY they wanted a confused deputy to operate on).
Unfortunately, there was no practical way for runc to cache a safe
O_PATH handle to /dev/pts/ptmx (unlike other runtimes like LXC, which
switched to TIOCGPTPEER way back in 2017). Since it wasn't clear how we
could protect against the main attack TIOCGPTPEER was meant to protect
against, we never switched to it (even though I implemented it
specifically to harden container runtimes).
Unfortunately, It turns out that mount *sources* are a threat we didn't
fully consider. Since TIOCGPTPEER already solves this problem entirely
for us in a race free way, we should just use that. In a later patch, we
will add some hardening for /dev/pts/$num opening to maintain support
for very old kernels (Linux 4.13 is very old at this point, but RHEL 7
is still kicking and is stuck on Linux 3.10).
Fixes: GHSA-qw9x-cqr3-wc7r CVE-2025-52565
Reported-by: Lei Wang <ssst0n3@gmail.com> (CVE-2025-52565)
Reported-by: lfbzhm <lifubang@acmcoder.com> (CVE-2025-52565)
Reported-by: Aleksa Sarai <cyphar@cyphar.com> (TIOCGPTPEER)
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2025-05-15 16:12:21 +10:00
|
|
|
"github.com/containerd/console"
|
2017-03-27 16:46:57 -05:00
|
|
|
"golang.org/x/sys/unix"
|
console: use TIOCGPTPEER when allocating peer PTY
When opening the peer end of a pty, the old kernel API required us to
open /dev/pts/$num inside the container (at least since we fixed console
handling many years ago in commit 244c9fc426ae ("*: console rewrite")).
The problem is that in a hostile container it is possible for
/dev/pts/$num to be an attacker-controlled symlink that runc can be
tricked into resolving when doing bind-mounts. This allows the attacker
to (among other things) persist /proc/... entries that are later masked
by runc, allowing an attacker to escape through the kernel.core_pattern
sysctl (/proc/sys/kernel/core_pattern). This is the original issue
reported by Lei Wang and Li Fu Bang in CVE-2025-52565.
However, it should be noted that this is not entirely a newly-discovered
problem. Way back in Linux 4.13 (2017), I added the TIOCGPTPEER ioctl,
which allows us to get a pty peer without touching the /dev/pts inside
the container. The original threat model was around an attacker
replacing /dev/pts/$n or /dev/pts/ptmx with some malicious inode (a DoS
inode, or possibly a PTY they wanted a confused deputy to operate on).
Unfortunately, there was no practical way for runc to cache a safe
O_PATH handle to /dev/pts/ptmx (unlike other runtimes like LXC, which
switched to TIOCGPTPEER way back in 2017). Since it wasn't clear how we
could protect against the main attack TIOCGPTPEER was meant to protect
against, we never switched to it (even though I implemented it
specifically to harden container runtimes).
Unfortunately, It turns out that mount *sources* are a threat we didn't
fully consider. Since TIOCGPTPEER already solves this problem entirely
for us in a race free way, we should just use that. In a later patch, we
will add some hardening for /dev/pts/$num opening to maintain support
for very old kernels (Linux 4.13 is very old at this point, but RHEL 7
is still kicking and is stuck on Linux 3.10).
Fixes: GHSA-qw9x-cqr3-wc7r CVE-2025-52565
Reported-by: Lei Wang <ssst0n3@gmail.com> (CVE-2025-52565)
Reported-by: lfbzhm <lifubang@acmcoder.com> (CVE-2025-52565)
Reported-by: Aleksa Sarai <cyphar@cyphar.com> (TIOCGPTPEER)
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2025-05-15 16:12:21 +10:00
|
|
|
|
|
|
|
|
"github.com/opencontainers/runc/internal/linux"
|
2025-08-01 19:33:12 +10:00
|
|
|
"github.com/opencontainers/runc/internal/pathrs"
|
|
|
|
|
"github.com/opencontainers/runc/internal/sys"
|
2025-05-15 17:32:01 +10:00
|
|
|
"github.com/opencontainers/runc/libcontainer/utils"
|
2015-02-09 14:07:18 -08:00
|
|
|
)
|
|
|
|
|
|
2025-05-15 23:56:45 +10:00
|
|
|
// checkPtmxHandle checks that the given file handle points to a real
|
|
|
|
|
// /dev/pts/ptmx device inode on a real devpts mount. We cannot (trivially)
|
|
|
|
|
// check that it is *the* /dev/pts for the container itself, but this is good
|
|
|
|
|
// enough.
|
|
|
|
|
func checkPtmxHandle(ptmx *os.File) error {
|
|
|
|
|
//nolint:revive,staticcheck,nolintlint // ignore "don't use ALL_CAPS" warning // nolintlint is needed to work around the different lint configs
|
|
|
|
|
const (
|
|
|
|
|
PTMX_MAJOR = 5 // from TTYAUX_MAJOR in <linux/major.h>
|
|
|
|
|
PTMX_MINOR = 2 // from mknod_ptmx in fs/devpts/inode.c
|
|
|
|
|
PTMX_INO = 2 // from mknod_ptmx in fs/devpts/inode.c
|
|
|
|
|
)
|
|
|
|
|
return sys.VerifyInode(ptmx, func(stat *unix.Stat_t, statfs *unix.Statfs_t) error {
|
|
|
|
|
if statfs.Type != unix.DEVPTS_SUPER_MAGIC {
|
|
|
|
|
return fmt.Errorf("ptmx handle is not on a real devpts mount: super magic is %#x", statfs.Type)
|
|
|
|
|
}
|
|
|
|
|
if stat.Ino != PTMX_INO {
|
|
|
|
|
return fmt.Errorf("ptmx handle has wrong inode number: %v", stat.Ino)
|
|
|
|
|
}
|
2025-11-05 17:52:47 -08:00
|
|
|
rdev := uint64(stat.Rdev) //nolint:unconvert // Rdev is uint32 on MIPS.
|
|
|
|
|
if stat.Mode&unix.S_IFMT != unix.S_IFCHR || rdev != unix.Mkdev(PTMX_MAJOR, PTMX_MINOR) {
|
2025-05-15 23:56:45 +10:00
|
|
|
return fmt.Errorf("ptmx handle is not a real char ptmx device: ftype %#x %d:%d",
|
2025-11-05 17:52:47 -08:00
|
|
|
stat.Mode&unix.S_IFMT, unix.Major(rdev), unix.Minor(rdev))
|
2025-05-15 23:56:45 +10:00
|
|
|
}
|
|
|
|
|
return nil
|
|
|
|
|
})
|
|
|
|
|
}
|
|
|
|
|
|
2025-08-01 19:33:12 +10:00
|
|
|
func isPtyNoIoctlError(err error) bool {
|
|
|
|
|
// The kernel converts -ENOIOCTLCMD to -ENOTTY automatically, but handle
|
|
|
|
|
// -EINVAL just in case (which some drivers do, include pty).
|
|
|
|
|
return errors.Is(err, unix.EINVAL) || errors.Is(err, unix.ENOTTY)
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
func getPtyPeer(pty console.Console, unsafePeerPath string, flags int) (*os.File, error) {
|
|
|
|
|
peer, err := linux.GetPtyPeer(pty.Fd(), unsafePeerPath, flags)
|
|
|
|
|
if err == nil || !isPtyNoIoctlError(err) {
|
|
|
|
|
return peer, err
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// On pre-TIOCGPTPEER kernels (Linux < 4.13), we need to fallback to using
|
|
|
|
|
// the /dev/pts/$n path generated using TIOCGPTN. We can do some validation
|
|
|
|
|
// that the inode is correct because the Unix-98 pty has a consistent
|
|
|
|
|
// numbering scheme for the device number of the peer.
|
|
|
|
|
|
|
|
|
|
peerNum, err := unix.IoctlGetUint32(int(pty.Fd()), unix.TIOCGPTN)
|
|
|
|
|
if err != nil {
|
|
|
|
|
return nil, fmt.Errorf("get peer number of pty: %w", err)
|
|
|
|
|
}
|
|
|
|
|
//nolint:revive,staticcheck,nolintlint // ignore "don't use ALL_CAPS" warning // nolintlint is needed to work around the different lint configs
|
|
|
|
|
const (
|
|
|
|
|
UNIX98_PTY_SLAVE_MAJOR = 136 // from <linux/major.h>
|
|
|
|
|
)
|
|
|
|
|
wantPeerDev := unix.Mkdev(UNIX98_PTY_SLAVE_MAJOR, peerNum)
|
|
|
|
|
|
|
|
|
|
// Use O_PATH to avoid opening a bad inode before we validate it.
|
|
|
|
|
peerHandle, err := os.OpenFile(unsafePeerPath, unix.O_PATH|unix.O_CLOEXEC, 0)
|
|
|
|
|
if err != nil {
|
|
|
|
|
return nil, err
|
|
|
|
|
}
|
|
|
|
|
defer peerHandle.Close()
|
|
|
|
|
|
|
|
|
|
if err := sys.VerifyInode(peerHandle, func(stat *unix.Stat_t, statfs *unix.Statfs_t) error {
|
|
|
|
|
if statfs.Type != unix.DEVPTS_SUPER_MAGIC {
|
|
|
|
|
return fmt.Errorf("pty peer handle is not on a real devpts mount: super magic is %#x", statfs.Type)
|
|
|
|
|
}
|
2025-11-05 17:52:47 -08:00
|
|
|
rdev := uint64(stat.Rdev) //nolint:unconvert // Rdev is uint32 on MIPS.
|
|
|
|
|
if stat.Mode&unix.S_IFMT != unix.S_IFCHR || rdev != wantPeerDev {
|
2025-08-01 19:33:12 +10:00
|
|
|
return fmt.Errorf("pty peer handle is not the real char device for pty %d: ftype %#x %d:%d",
|
2025-11-05 17:52:47 -08:00
|
|
|
peerNum, stat.Mode&unix.S_IFMT, unix.Major(rdev), unix.Minor(rdev))
|
2025-08-01 19:33:12 +10:00
|
|
|
}
|
|
|
|
|
return nil
|
|
|
|
|
}); err != nil {
|
|
|
|
|
return nil, err
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return pathrs.Reopen(peerHandle, flags)
|
|
|
|
|
}
|
|
|
|
|
|
console: use TIOCGPTPEER when allocating peer PTY
When opening the peer end of a pty, the old kernel API required us to
open /dev/pts/$num inside the container (at least since we fixed console
handling many years ago in commit 244c9fc426ae ("*: console rewrite")).
The problem is that in a hostile container it is possible for
/dev/pts/$num to be an attacker-controlled symlink that runc can be
tricked into resolving when doing bind-mounts. This allows the attacker
to (among other things) persist /proc/... entries that are later masked
by runc, allowing an attacker to escape through the kernel.core_pattern
sysctl (/proc/sys/kernel/core_pattern). This is the original issue
reported by Lei Wang and Li Fu Bang in CVE-2025-52565.
However, it should be noted that this is not entirely a newly-discovered
problem. Way back in Linux 4.13 (2017), I added the TIOCGPTPEER ioctl,
which allows us to get a pty peer without touching the /dev/pts inside
the container. The original threat model was around an attacker
replacing /dev/pts/$n or /dev/pts/ptmx with some malicious inode (a DoS
inode, or possibly a PTY they wanted a confused deputy to operate on).
Unfortunately, there was no practical way for runc to cache a safe
O_PATH handle to /dev/pts/ptmx (unlike other runtimes like LXC, which
switched to TIOCGPTPEER way back in 2017). Since it wasn't clear how we
could protect against the main attack TIOCGPTPEER was meant to protect
against, we never switched to it (even though I implemented it
specifically to harden container runtimes).
Unfortunately, It turns out that mount *sources* are a threat we didn't
fully consider. Since TIOCGPTPEER already solves this problem entirely
for us in a race free way, we should just use that. In a later patch, we
will add some hardening for /dev/pts/$num opening to maintain support
for very old kernels (Linux 4.13 is very old at this point, but RHEL 7
is still kicking and is stuck on Linux 3.10).
Fixes: GHSA-qw9x-cqr3-wc7r CVE-2025-52565
Reported-by: Lei Wang <ssst0n3@gmail.com> (CVE-2025-52565)
Reported-by: lfbzhm <lifubang@acmcoder.com> (CVE-2025-52565)
Reported-by: Aleksa Sarai <cyphar@cyphar.com> (TIOCGPTPEER)
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2025-05-15 16:12:21 +10:00
|
|
|
// safeAllocPty returns a new (ptmx, peer pty) allocation for use inside a
|
|
|
|
|
// container.
|
|
|
|
|
func safeAllocPty() (pty console.Console, peer *os.File, Err error) {
|
2025-05-15 23:56:45 +10:00
|
|
|
// TODO: Use openat2(RESOLVE_NO_SYMLINKS|RESOLVE_NO_XDEV).
|
|
|
|
|
ptmxHandle, err := os.OpenFile("/dev/pts/ptmx", unix.O_PATH|unix.O_NOFOLLOW|unix.O_CLOEXEC, 0)
|
|
|
|
|
if err != nil {
|
|
|
|
|
return nil, nil, err
|
|
|
|
|
}
|
|
|
|
|
defer ptmxHandle.Close()
|
|
|
|
|
|
|
|
|
|
if err := checkPtmxHandle(ptmxHandle); err != nil {
|
|
|
|
|
return nil, nil, fmt.Errorf("verify ptmx handle: %w", err)
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
ptyFile, err := pathrs.Reopen(ptmxHandle, unix.O_RDWR|unix.O_NOCTTY)
|
|
|
|
|
if err != nil {
|
|
|
|
|
return nil, nil, fmt.Errorf("reopen ptmx to get new pty pair: %w", err)
|
|
|
|
|
}
|
|
|
|
|
// On success, the ownership is transferred to pty.
|
|
|
|
|
defer func() {
|
|
|
|
|
if Err != nil {
|
|
|
|
|
_ = ptyFile.Close()
|
|
|
|
|
}
|
|
|
|
|
}()
|
|
|
|
|
|
|
|
|
|
pty, unsafePeerPath, err := console.NewPtyFromFile(ptyFile)
|
console: use TIOCGPTPEER when allocating peer PTY
When opening the peer end of a pty, the old kernel API required us to
open /dev/pts/$num inside the container (at least since we fixed console
handling many years ago in commit 244c9fc426ae ("*: console rewrite")).
The problem is that in a hostile container it is possible for
/dev/pts/$num to be an attacker-controlled symlink that runc can be
tricked into resolving when doing bind-mounts. This allows the attacker
to (among other things) persist /proc/... entries that are later masked
by runc, allowing an attacker to escape through the kernel.core_pattern
sysctl (/proc/sys/kernel/core_pattern). This is the original issue
reported by Lei Wang and Li Fu Bang in CVE-2025-52565.
However, it should be noted that this is not entirely a newly-discovered
problem. Way back in Linux 4.13 (2017), I added the TIOCGPTPEER ioctl,
which allows us to get a pty peer without touching the /dev/pts inside
the container. The original threat model was around an attacker
replacing /dev/pts/$n or /dev/pts/ptmx with some malicious inode (a DoS
inode, or possibly a PTY they wanted a confused deputy to operate on).
Unfortunately, there was no practical way for runc to cache a safe
O_PATH handle to /dev/pts/ptmx (unlike other runtimes like LXC, which
switched to TIOCGPTPEER way back in 2017). Since it wasn't clear how we
could protect against the main attack TIOCGPTPEER was meant to protect
against, we never switched to it (even though I implemented it
specifically to harden container runtimes).
Unfortunately, It turns out that mount *sources* are a threat we didn't
fully consider. Since TIOCGPTPEER already solves this problem entirely
for us in a race free way, we should just use that. In a later patch, we
will add some hardening for /dev/pts/$num opening to maintain support
for very old kernels (Linux 4.13 is very old at this point, but RHEL 7
is still kicking and is stuck on Linux 3.10).
Fixes: GHSA-qw9x-cqr3-wc7r CVE-2025-52565
Reported-by: Lei Wang <ssst0n3@gmail.com> (CVE-2025-52565)
Reported-by: lfbzhm <lifubang@acmcoder.com> (CVE-2025-52565)
Reported-by: Aleksa Sarai <cyphar@cyphar.com> (TIOCGPTPEER)
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2025-05-15 16:12:21 +10:00
|
|
|
if err != nil {
|
|
|
|
|
return nil, nil, err
|
|
|
|
|
}
|
|
|
|
|
defer func() {
|
|
|
|
|
if Err != nil {
|
|
|
|
|
_ = pty.Close()
|
|
|
|
|
}
|
|
|
|
|
}()
|
|
|
|
|
|
2025-08-01 19:33:12 +10:00
|
|
|
peer, err = getPtyPeer(pty, unsafePeerPath, unix.O_RDWR|unix.O_NOCTTY)
|
console: use TIOCGPTPEER when allocating peer PTY
When opening the peer end of a pty, the old kernel API required us to
open /dev/pts/$num inside the container (at least since we fixed console
handling many years ago in commit 244c9fc426ae ("*: console rewrite")).
The problem is that in a hostile container it is possible for
/dev/pts/$num to be an attacker-controlled symlink that runc can be
tricked into resolving when doing bind-mounts. This allows the attacker
to (among other things) persist /proc/... entries that are later masked
by runc, allowing an attacker to escape through the kernel.core_pattern
sysctl (/proc/sys/kernel/core_pattern). This is the original issue
reported by Lei Wang and Li Fu Bang in CVE-2025-52565.
However, it should be noted that this is not entirely a newly-discovered
problem. Way back in Linux 4.13 (2017), I added the TIOCGPTPEER ioctl,
which allows us to get a pty peer without touching the /dev/pts inside
the container. The original threat model was around an attacker
replacing /dev/pts/$n or /dev/pts/ptmx with some malicious inode (a DoS
inode, or possibly a PTY they wanted a confused deputy to operate on).
Unfortunately, there was no practical way for runc to cache a safe
O_PATH handle to /dev/pts/ptmx (unlike other runtimes like LXC, which
switched to TIOCGPTPEER way back in 2017). Since it wasn't clear how we
could protect against the main attack TIOCGPTPEER was meant to protect
against, we never switched to it (even though I implemented it
specifically to harden container runtimes).
Unfortunately, It turns out that mount *sources* are a threat we didn't
fully consider. Since TIOCGPTPEER already solves this problem entirely
for us in a race free way, we should just use that. In a later patch, we
will add some hardening for /dev/pts/$num opening to maintain support
for very old kernels (Linux 4.13 is very old at this point, but RHEL 7
is still kicking and is stuck on Linux 3.10).
Fixes: GHSA-qw9x-cqr3-wc7r CVE-2025-52565
Reported-by: Lei Wang <ssst0n3@gmail.com> (CVE-2025-52565)
Reported-by: lfbzhm <lifubang@acmcoder.com> (CVE-2025-52565)
Reported-by: Aleksa Sarai <cyphar@cyphar.com> (TIOCGPTPEER)
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2025-05-15 16:12:21 +10:00
|
|
|
if err != nil {
|
|
|
|
|
return nil, nil, fmt.Errorf("failed to get peer end of newly-allocated console: %w", err)
|
|
|
|
|
}
|
|
|
|
|
return pty, peer, nil
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// mountConsole bind-mounts the provided pty on top of /dev/console so programs
|
|
|
|
|
// that operate on /dev/console operate on the correct container pty.
|
|
|
|
|
func mountConsole(peerPty *os.File) error {
|
2025-05-15 17:32:01 +10:00
|
|
|
console, err := os.OpenFile("/dev/console", unix.O_NOFOLLOW|unix.O_CREAT|unix.O_CLOEXEC, 0o666)
|
|
|
|
|
if err != nil {
|
|
|
|
|
return fmt.Errorf("create /dev/console mount target: %w", err)
|
2015-02-09 14:07:18 -08:00
|
|
|
}
|
2025-05-15 17:32:01 +10:00
|
|
|
defer console.Close()
|
|
|
|
|
|
|
|
|
|
dstFd, closer := utils.ProcThreadSelfFd(console.Fd())
|
|
|
|
|
defer closer()
|
|
|
|
|
|
console: use TIOCGPTPEER when allocating peer PTY
When opening the peer end of a pty, the old kernel API required us to
open /dev/pts/$num inside the container (at least since we fixed console
handling many years ago in commit 244c9fc426ae ("*: console rewrite")).
The problem is that in a hostile container it is possible for
/dev/pts/$num to be an attacker-controlled symlink that runc can be
tricked into resolving when doing bind-mounts. This allows the attacker
to (among other things) persist /proc/... entries that are later masked
by runc, allowing an attacker to escape through the kernel.core_pattern
sysctl (/proc/sys/kernel/core_pattern). This is the original issue
reported by Lei Wang and Li Fu Bang in CVE-2025-52565.
However, it should be noted that this is not entirely a newly-discovered
problem. Way back in Linux 4.13 (2017), I added the TIOCGPTPEER ioctl,
which allows us to get a pty peer without touching the /dev/pts inside
the container. The original threat model was around an attacker
replacing /dev/pts/$n or /dev/pts/ptmx with some malicious inode (a DoS
inode, or possibly a PTY they wanted a confused deputy to operate on).
Unfortunately, there was no practical way for runc to cache a safe
O_PATH handle to /dev/pts/ptmx (unlike other runtimes like LXC, which
switched to TIOCGPTPEER way back in 2017). Since it wasn't clear how we
could protect against the main attack TIOCGPTPEER was meant to protect
against, we never switched to it (even though I implemented it
specifically to harden container runtimes).
Unfortunately, It turns out that mount *sources* are a threat we didn't
fully consider. Since TIOCGPTPEER already solves this problem entirely
for us in a race free way, we should just use that. In a later patch, we
will add some hardening for /dev/pts/$num opening to maintain support
for very old kernels (Linux 4.13 is very old at this point, but RHEL 7
is still kicking and is stuck on Linux 3.10).
Fixes: GHSA-qw9x-cqr3-wc7r CVE-2025-52565
Reported-by: Lei Wang <ssst0n3@gmail.com> (CVE-2025-52565)
Reported-by: lfbzhm <lifubang@acmcoder.com> (CVE-2025-52565)
Reported-by: Aleksa Sarai <cyphar@cyphar.com> (TIOCGPTPEER)
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2025-05-15 16:12:21 +10:00
|
|
|
mntSrc := &mountSource{
|
|
|
|
|
Type: mountSourcePlain,
|
|
|
|
|
file: peerPty,
|
|
|
|
|
}
|
2025-05-15 17:32:01 +10:00
|
|
|
return mountViaFds(peerPty.Name(), mntSrc, "/dev/console", dstFd, "bind", unix.MS_BIND, "")
|
2015-02-09 14:07:18 -08:00
|
|
|
}
|
|
|
|
|
|
console: use TIOCGPTPEER when allocating peer PTY
When opening the peer end of a pty, the old kernel API required us to
open /dev/pts/$num inside the container (at least since we fixed console
handling many years ago in commit 244c9fc426ae ("*: console rewrite")).
The problem is that in a hostile container it is possible for
/dev/pts/$num to be an attacker-controlled symlink that runc can be
tricked into resolving when doing bind-mounts. This allows the attacker
to (among other things) persist /proc/... entries that are later masked
by runc, allowing an attacker to escape through the kernel.core_pattern
sysctl (/proc/sys/kernel/core_pattern). This is the original issue
reported by Lei Wang and Li Fu Bang in CVE-2025-52565.
However, it should be noted that this is not entirely a newly-discovered
problem. Way back in Linux 4.13 (2017), I added the TIOCGPTPEER ioctl,
which allows us to get a pty peer without touching the /dev/pts inside
the container. The original threat model was around an attacker
replacing /dev/pts/$n or /dev/pts/ptmx with some malicious inode (a DoS
inode, or possibly a PTY they wanted a confused deputy to operate on).
Unfortunately, there was no practical way for runc to cache a safe
O_PATH handle to /dev/pts/ptmx (unlike other runtimes like LXC, which
switched to TIOCGPTPEER way back in 2017). Since it wasn't clear how we
could protect against the main attack TIOCGPTPEER was meant to protect
against, we never switched to it (even though I implemented it
specifically to harden container runtimes).
Unfortunately, It turns out that mount *sources* are a threat we didn't
fully consider. Since TIOCGPTPEER already solves this problem entirely
for us in a race free way, we should just use that. In a later patch, we
will add some hardening for /dev/pts/$num opening to maintain support
for very old kernels (Linux 4.13 is very old at this point, but RHEL 7
is still kicking and is stuck on Linux 3.10).
Fixes: GHSA-qw9x-cqr3-wc7r CVE-2025-52565
Reported-by: Lei Wang <ssst0n3@gmail.com> (CVE-2025-52565)
Reported-by: lfbzhm <lifubang@acmcoder.com> (CVE-2025-52565)
Reported-by: Aleksa Sarai <cyphar@cyphar.com> (TIOCGPTPEER)
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2025-05-15 16:12:21 +10:00
|
|
|
// dupStdio replaces stdio with the given peerPty.
|
|
|
|
|
func dupStdio(peerPty *os.File) error {
|
2015-02-09 14:07:18 -08:00
|
|
|
for _, i := range []int{0, 1, 2} {
|
console: use TIOCGPTPEER when allocating peer PTY
When opening the peer end of a pty, the old kernel API required us to
open /dev/pts/$num inside the container (at least since we fixed console
handling many years ago in commit 244c9fc426ae ("*: console rewrite")).
The problem is that in a hostile container it is possible for
/dev/pts/$num to be an attacker-controlled symlink that runc can be
tricked into resolving when doing bind-mounts. This allows the attacker
to (among other things) persist /proc/... entries that are later masked
by runc, allowing an attacker to escape through the kernel.core_pattern
sysctl (/proc/sys/kernel/core_pattern). This is the original issue
reported by Lei Wang and Li Fu Bang in CVE-2025-52565.
However, it should be noted that this is not entirely a newly-discovered
problem. Way back in Linux 4.13 (2017), I added the TIOCGPTPEER ioctl,
which allows us to get a pty peer without touching the /dev/pts inside
the container. The original threat model was around an attacker
replacing /dev/pts/$n or /dev/pts/ptmx with some malicious inode (a DoS
inode, or possibly a PTY they wanted a confused deputy to operate on).
Unfortunately, there was no practical way for runc to cache a safe
O_PATH handle to /dev/pts/ptmx (unlike other runtimes like LXC, which
switched to TIOCGPTPEER way back in 2017). Since it wasn't clear how we
could protect against the main attack TIOCGPTPEER was meant to protect
against, we never switched to it (even though I implemented it
specifically to harden container runtimes).
Unfortunately, It turns out that mount *sources* are a threat we didn't
fully consider. Since TIOCGPTPEER already solves this problem entirely
for us in a race free way, we should just use that. In a later patch, we
will add some hardening for /dev/pts/$num opening to maintain support
for very old kernels (Linux 4.13 is very old at this point, but RHEL 7
is still kicking and is stuck on Linux 3.10).
Fixes: GHSA-qw9x-cqr3-wc7r CVE-2025-52565
Reported-by: Lei Wang <ssst0n3@gmail.com> (CVE-2025-52565)
Reported-by: lfbzhm <lifubang@acmcoder.com> (CVE-2025-52565)
Reported-by: Aleksa Sarai <cyphar@cyphar.com> (TIOCGPTPEER)
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2025-05-15 16:12:21 +10:00
|
|
|
if err := linux.Dup3(int(peerPty.Fd()), i, 0); err != nil {
|
2015-02-09 14:07:18 -08:00
|
|
|
return err
|
|
|
|
|
}
|
|
|
|
|
}
|
console: use TIOCGPTPEER when allocating peer PTY
When opening the peer end of a pty, the old kernel API required us to
open /dev/pts/$num inside the container (at least since we fixed console
handling many years ago in commit 244c9fc426ae ("*: console rewrite")).
The problem is that in a hostile container it is possible for
/dev/pts/$num to be an attacker-controlled symlink that runc can be
tricked into resolving when doing bind-mounts. This allows the attacker
to (among other things) persist /proc/... entries that are later masked
by runc, allowing an attacker to escape through the kernel.core_pattern
sysctl (/proc/sys/kernel/core_pattern). This is the original issue
reported by Lei Wang and Li Fu Bang in CVE-2025-52565.
However, it should be noted that this is not entirely a newly-discovered
problem. Way back in Linux 4.13 (2017), I added the TIOCGPTPEER ioctl,
which allows us to get a pty peer without touching the /dev/pts inside
the container. The original threat model was around an attacker
replacing /dev/pts/$n or /dev/pts/ptmx with some malicious inode (a DoS
inode, or possibly a PTY they wanted a confused deputy to operate on).
Unfortunately, there was no practical way for runc to cache a safe
O_PATH handle to /dev/pts/ptmx (unlike other runtimes like LXC, which
switched to TIOCGPTPEER way back in 2017). Since it wasn't clear how we
could protect against the main attack TIOCGPTPEER was meant to protect
against, we never switched to it (even though I implemented it
specifically to harden container runtimes).
Unfortunately, It turns out that mount *sources* are a threat we didn't
fully consider. Since TIOCGPTPEER already solves this problem entirely
for us in a race free way, we should just use that. In a later patch, we
will add some hardening for /dev/pts/$num opening to maintain support
for very old kernels (Linux 4.13 is very old at this point, but RHEL 7
is still kicking and is stuck on Linux 3.10).
Fixes: GHSA-qw9x-cqr3-wc7r CVE-2025-52565
Reported-by: Lei Wang <ssst0n3@gmail.com> (CVE-2025-52565)
Reported-by: lfbzhm <lifubang@acmcoder.com> (CVE-2025-52565)
Reported-by: Aleksa Sarai <cyphar@cyphar.com> (TIOCGPTPEER)
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2025-05-15 16:12:21 +10:00
|
|
|
runtime.KeepAlive(peerPty)
|
2015-02-09 14:07:18 -08:00
|
|
|
return nil
|
|
|
|
|
}
|