This prepends a severity level such as <3> to each line of diagnostic
output, with numeric severity levels taken from matching syslog(3)
(such as LOG_ERR = 3), so that the diagnostic output can be parsed by
tools like `logger --prio-prefix` and `systemd-cat --level-prefix=1`
that support that encoding.
The facility (LOG_USER, etc.) is not included, since it makes little
sense to vary on a per-message basis. logger(1) supports prefixes
with or without a facility, and systemd-cat(1) only supports prefixes
without a facility, so this is compatible with both.
A future version of Steam's pressure-vessel is likely to use this to
make warnings and fatal errors from bubblewrap more visible.
Signed-off-by: Simon McVittie <smcv@collabora.com>
Fixes containers/bubblewrap#91
Add the ability to overwrite argv[0] when starting a process in a
container. Using --argv0 to be consistent with ld.so --argv0.
Overwriting argv[0] is useful as some tools change their behavior based
on the value of argv[0]. For example, when bash is symlinked to sh it
behaves as sh. Similarly, unxz is a symlink to xz and changes the
default from compressing to decompressing. An extreme example is on many
systems, date, df, cat and so on are all symlinks to the coreutils
binary.
Example usage: bwrap --bind / / --argv0 sh bash
Signed-off-by: Jonathan Wright <quaggy@gmail.com>
Mention how to format capabilities for --add-cap, e.g.
CAP_DAC_READ_SEARCH instead of DAC_READ_SEARCH.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
We can't combine --disable-userns with entering an existing user
namespace via --userns if the existing user namespace was created with
--disable-userns, because its ability to create nested user namespaces
has already been disabled. However, the next best thing is to verify
that we are already in the desired state.
Signed-off-by: Simon McVittie <smcv@collabora.com>
Some use-cases of bubblewrap want to ensure that the subprocess can't
further re-arrange the filesystem namespace, or do other more complex
namespace modification. For example, Flatpak wants to prevent sandboxed
processes from altering their /proc/$pid/root/.flatpak-info, so that
/.flatpak-info can safely be used as an indicator that a process is part
of a Flatpak app.
This approach was suggested by lukts30 on containers/bubblewrap#452.
The sysctl-controlled maximum numbers of namespaces are themselves
namespaced, so we can disable nested user namespaces by setting the
limit to 1 and then entering a new, nested user namespace. The resulting
process loses its privileges in the namespace where the limit was set
to 1, so it is unable to move the limit back up.
Co-authored-by: Alexander Larsson <alexl@redhat.com>
Signed-off-by: Simon McVittie <smcv@collabora.com>
This allows a sandbox to share a pid namespace with another sandbox.
For this to work the namespace passed in must be owned by the user
namespace that bwrap is using, which implies either that you pass in
--userns pointing, or run under that user namespace already. In the
former case you'd typically take the userns from a running bwrap
--unshare-user instance, whereas the second case happens when using
bwrap in the setuid mode without user namespaces.
If both --unshare-pid and --pidns are specified then we first
switch to the pid namespace, and then unshare from there. This is
useful if you want a pid-isolated sandob that is visible to another
sandbox.
The implementation is a bit tricky, as it needs to fork() in order
to activate the setns():ed pid namespaces, which means we have to
pass through the final pid via a socket to make the kernel translate
the pid to the initial pid namespace for us to waitpid() on it.
This allows you to reuse an existing user namespace to set up all the
other namespaces, entering that instead of creating a new one. The
reason you want to do this is that you can then also reuse other
namespaces that are owned by the user namespace. Typically you use
this to partially re-enter a previoulsy created bubblewrap sandbox.
This also adds --userns2 which is similar to --userns, but this is
switched into at the end instead of the start. Bubblewrap sometimes
creates nested such user namespaces[1], and to be able to reuse such a
setup we need to similarly reuse both namespaces via --userns2.
Technically using setns() is probably safe even in the privileged
case, because we got passed in a file descriptor to the namespace, and
that can only be gotten if you have ptrace permissions against the
target, and then you could do whatever to the namespace
anyway. However, for practical reasons this isn't useable for bwrap,
because (as described in a comment in acquire_privs()) setuid mode
causes root to own the namespaces that it creates. So as you will not
be able to access these namespaces for reuse anyway, its best to
disable it (in case of unexpected security issues).
[1] This is to work around an issue with mounting devpts without uid 0
mapped in the user namespace, where the outer namespace owns all the
other namespaces but the inner one has the right mappings.
These ignore source files not existing which allows bwrap using
applications to avoid repeatedly checking if files exist.
Closes: #283
Approved by: alexlarsson
When using namespaces, permit to leave some capabilities in the
sandbox. This can be helpful to run a system instance of systemd.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Closes: #101
Approved by: alexlarsson
In scenarios such as running bwrap in test frameworks (`bwrap make check`),
one wants all of the processes to go away if the parent process
dies, or if the bwrap process is directly killed.
This ensures that in all cases (both with `--unshare-pid` and without), we use
`prctl(PR_SET_PDEATHSIG)` on both our outer and inner init procesesses if
`--die-with-parent` is specified.
Tests-by: Colin Walters <walters@verbum.org>
Closes: #165
Approved by: emdej
This means we stay compatible with apps using the old bwrap, yet
still makes it easy to avoid CVE-2017-5226 in apps using bwrap.
Also, recommend that applications not using --new-session should
use a seccomp filter for the TIOCSTI ioctl to avoid the input
injection issue.
Closes: #154
Approved by: cgwalters
This makes `--unshare-uts` actually useful by allowing the user to
specify a custom hostname for the newly created UTS namespace.
Implements #93.
Closes: #94
Approved by: alexlarsson
Add an interface for retrieving information about the child process.
For now the only information exported is the child pid, it is needed to
manage prestart OCI hooks, as the container pid must be provided to the
hook process.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
it is useful to manage OCI prestart hooks, as the container process is
blocked on block_fd until the hooks are processed.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
This flag will unshare cgroups only if supported else will skip it.
Closes: #62
Approved by: alexlarsson
This requires linux kernel version 4.6 or higher.
We check for the presence of /proc/self/ns/cgroup
to determine if it is supported or not.
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
Closes: #62
Approved by: alexlarsson
This is very useful if you want to cover some area of the filesystem,
or if you want to make some part of a read-only tree writable.
Closes: #42
Approved by: cgwalters