Poststart hooks exist in runc since 2015 [1], and since that time until
today, if a hook returned an error, runc kills the container.
In 2020, commit c1662686cf (PR #1008) added the following text
(which became part of runtime-spec release v1.0.2):
> 9. The `poststart` MUST be invoked by the runtime. If any
> `poststart` hook fails, the runtime MUST log a warning, but the
> remaining hooks and lifecycle continue as if the hook had succeeded.
Now, this text conflicted with the pre-existing runtime (runc) behavior,
and it still conflicts with the current runc behavior.
At this point, we can either fix runtimes or the spec.
To my mind, fixing the spec is a better approach, because:
- initial implementation predates the spec wording by a few years;
- the wording in the spec was never implemented (in runc);
- returning an error (and stopping the container) seems like a more
versatile approach, since a hook can usually choose whether to
return an error or not.
[1]: https://github.com/opencontainers/runc/pull/392
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Add `features.md` and `features-linux.md`, to formalize the `runc features` JSON that was introduced in runc v1.1.0.
A runtime caller MAY use this JSON to detect the features implemented by the runtime.
The spec corresponds to https://github.com/opencontainers/runc/blob/v1.1.0/types/features/features.go
(opencontainers/runc PR 3296, opencontainers/runc PR 3310)
Differences since runc v1.1.0:
- Add `.linux.intelRdt.enabled` field
- Add `.linux.cgroup.rdma` field
- Add `.linux.seccomp.knownFlags` and `.linux.seccomp.supportedFlags` fields (Implemented in runc PR 3588)
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
In PR #1008 3 new steps in the lifecycle were added and the referencing of the states to the corresponding lifecycle step was not adjusted. Corrected based on Tag v1.0.0-rc6.
Signed-off-by: Pascal Theml <pascal@familie-theml.de>
The latter is much more common:
$ git --no-pager grep -ic "process in the container" origin/master -- *.md
origin/master:runtime.md:1
$ git --no-pager grep -ic "container process" origin/master -- *.md
origin/master:config-linux.md:4
origin/master:config.md:2
origin/master:glossary.md:1
origin/master:runtime.md:5
Signed-off-by: W. Trevor King <wking@tremily.us>
The old wording was ambiguous. For example, if the configuration had
ociVersion set to 1.0.0 and the container was created with a 1.1.0
runtime, which version should show up in the state?
With this commit, we use the version which matches the state schema,
because the config/runtime versions used for creation don't seem
particularly important once the container has been created, while the
state schema version is important for state consumers. For example,
if new properties were added to the state spec between 1.0.0 and
1.1.0, a consumer would want to see 1.1.0 in the state's ociVersion so
it could decide whether it could rely on those new properties.
Signed-off-by: W. Trevor King <wking@tremily.us>
Step 3 of the lifecycle from before this commit had two sentences
which both landed in be594153 (Split create and start, 2016-04-01,
#384). I pushed back a bit on the entry then [1,2], but we seem to be
pretty comfortable with the current "keep all lifecyle entries in a
one-layer enumerated list" approach, so I'm leaving that alone in this
commit. Step 3 isn't really a lifecycle step though, it's more about
clarifying that you can jump around in the lifecycle instead of
hitting all the steps in consecutive order. I'd floated a new
paragraph addressing that jumping, but was unable to form a consensus
around wording, and the jumping is already somewhat covered by the
current list entries (e.g. "The container process exits."). This
commit just drops the old step 3, and Michael will follow up with
wording about jumping [3].
The other sentence from the old step 3 doesn't need replacing, because
the limits are already covered in more detail in the operation
sections themselves. For example, the 'delete' operation has:
Attempting to delete a container that does not exist MUST generate
an error. Attempting to delete a container whose process is still
running MUST generate an error.
I don't see the need to call generic attention to that idea, and
especially do not think that an entry in the lifecycle list is the
right place for such a generic call-out.
[1]: https://github.com/opencontainers/runtime-spec/pull/384#r60939710
[2]: https://github.com/opencontainers/runtime-spec/pull/384#issuecomment-214418730
[3]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2017/opencontainers.2017-05-10-21.03.log.html#l-79
Signed-off-by: W. Trevor King <wking@tremily.us>
Based on IRC discussion today (times in PST) [1]:
11:36 < crosbymichael> just take a step back and think about it.
you have a process object in the spec. its a single object
defining what to run. How do you run a process? you exec its
args. From the spec pov its an atomic operation. in between
create and start its not running the users code and is left up to
the runtime. you either have a process defined by the spec and
its created as an operation in the container on start or your
dont.
With the previous wording, it was unclear how large a hole we were
poking with "the user-specified program MUST NOT be run at this time".
This commit removes that ambiguous wording and replaces it with an
explicit reference to 'process.args'. It makes it clear that
everything outside of 'process' MUST happen at create-time. And it
leaves all of 'process' except for 'process.args' up to the
implementation.
This means that the caller has no reliable way to set the
user/cwd/capabilities/… of the runtime's container process between
'create' and 'start'. You could avoid that limitation by requiring
all process properties *except* process.args be applied at
create-time, but my attempt to make process.args optional (which would
have allowed that interpretation without burdening callers who never
intended to call 'start') was rejected in favor of this all-or-nothing
approach to 'process' handling [2].
[1]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2017-02-27.log.html#t2017-02-27T19:35:35
[2]: https://github.com/opencontainers/runtime-spec/pull/620#issuecomment-282820279
Signed-off-by: W. Trevor King <wking@tremily.us>
These states are already defined in the "State" section. There's no
need to redefine them in the operation sections.
Operation-level redefinitions are dicy anyway, because they imply
something testable about the immediately-after-this-operation time,
and it's not possible to run race-free tests of that time (e.g. the
process could die for other reasons between the successful 'create'
call and the 'state' call you made to look for a 'created' status).
Signed-off-by: W. Trevor King <wking@tremily.us>
All of these operations should be supported on all of our
compliance-tested platforms. If there are cases where a given OS
cannot support one of these operations, it should be discussed in that
operation's section, and the "Unless otherwise stated" qualifier makes
space for that. But leaving the reader to decide on whether a host OS
supports a given operation seems too unstable for a specification.
I've also dropped the "OCI compliant" qualifier, because the (not)
compliant language in spec.md already ties OCI compliance to the RFC
2119 language. There's no point in mentioning "OCI compliant" outside
of that section, because the "MUST" already brings in all of the
compliance associations we need.
Signed-off-by: W. Trevor King <wking@tremily.us>
This wording landed without comment as part of 7117ede7 (Expand on the
definition of our ops, 2015-10-13, #225). However, I'm not entirely
clear on the exception it's making. It may be trying to say something
like:
Just because you were authorized to manage that container when you
created it doesn't mean you're still authorized to perform operation
X on it now. Maybe you've lost privileges in the meantime.
But as far as compliance testing is concerned, the same test harness
will be calling 'create' and the subsequent operations. That harness
will be reporting MUST violations if the runtime refuses a subsequent
operation, and removing the access-control loophole makes it more
obvious that the runtime's refusal is non-compliant.
Signed-off-by: W. Trevor King <wking@tremily.us>
I expect the lifecycle information was removed accidentally in
be594153 (Split create and start, 2016-04-01, #384), because for a
time it seemed like that PR would also be removing hooks. Putting the
lifecycle information back in, I made some tweaks to adjust to the new
environment, for example:
* Put the pre-start hooks after the 'start' call, but before the meat
of the start call (the container-process exec trigger). Folks who
want a post-create hook can add one with that name. I'd like to
have renamed poststop to post-delete to avoid confusion like [1].
But the motivation for keeping hooks was backwards compatibility [2]
so I've left the name alone.
* Put each "...command is invoked..." lifecycle entry in its own list
entry, to match the 'create' list entry.
* Move the rules about what happens on hook failure into the
lifecycle. This matches pre-split entries like:
If any prestart hook fails, then the container MUST be stopped and
the lifecycle continues at step 7.
and avoids respecifying that information in a second location
(config.md).
* I added the warning section to try and follow post-split's generic
"generates an error" approach while respecting the pre-split desire
to see what failed (we had "then an error including the exit code
and the stderr is returned to the caller" and "then an error is
logged").
* I left the state 'id' context out, since Michael didn't want it [3].
* Make runtime.md references to "generate an error" and "log a
warning" links, so readers have an easier time finding more detail
on that wording.
Where I reference a section, I'm still using the auto-generated anchor
for that header and not the anchors which were added in 41839d7 (Merge
pull request #707 from mrunalp/anchor_tags, 2017-03-03) and similar.
Mrunal suggested that the manually-added anchors were mainly intended
for the validation tooling [4].
[1]: https://github.com/opencontainers/runtime-spec/pull/395
Subject: Run post-stop hooks before the container sandbox is deleted.
[2]: https://github.com/opencontainers/runtime-spec/pull/483#issuecomment-240568422
Subject: *: Remove hooks
[3]: https://github.com/opencontainers/runtime-spec/pull/532#discussion_r99232480
Subject: Restore hook language removed by create/start split
[4]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2017-03-03.log.html#t2017-03-03T18:02:12
Signed-off-by: W. Trevor King <wking@tremily.us>
To distinguish between "we're still setting this container up" and
"we're finished setting up; you can call 'start' if you like".
Also reference the lifecycle steps, because you can't be too explicit
Signed-off-by: W. Trevor King <wking@tremily.us>
Because during creation (before 'created') we may not have a container
process yet (e.g. if we're still reading the configuration or setting
up cgroups), and in the 'stopped' phase the PID is no longer
meaningful.
Also add OPTIONAL/REQUIRED and remove colons for consistency with the
config.md.
Signed-off-by: W. Trevor King <wking@tremily.us>
proc(5) describes the following state entries in proc/[pid]/stat [1]
(for modern kernels):
* R Running
* S Sleeping in an interruptible wait
* D Waiting in uninterruptible disk sleep
* Z Zombie
* T Stopped (on a signal)
* t Tracing stop
* X Dead
and ps(1) has a bit more context [2] (for modern kernels):
* D uninterruptible sleep (usually IO)
* R running or runnable (on run queue)
* S interruptible sleep (waiting for an event to complete)
* T stopped by job control signal
* t stopped by debugger during the tracing
* X dead (should never be seen)
* Z defunct ("zombie") process, terminated but not reaped by its
parent
So I expect "stopped" to mean "process still exists but is paused,
e.g. by SIGSTOP". And I expect "exited" to mean "process has finished
and is either a zombie or dead".
After this commit, 'git grep -i stop' only turns up the "stopped"
state (which I've left alone for backwards compat), some poststop-hook
stuff, a reference in principles.md, a "stoppage" in LICENSE, and some
ChangeLog entries.
Also replace "container's process" with "container process" to match
usage in the rest of the repository. After this commit:
$ git grep -i "container process" | wc -l
20
$ git grep -i "container's process" | wc -l
1
Also reword status entries to avoid "running", which is less precise
in our spec (e.g. it also includes "sleeping", "waiting", ...).
Also removes a "them" leftover from a partial plural -> singular
reroll of be594153 (Split create and start, 2016-04-01, #384).
[1]: http://man7.org/linux/man-pages/man5/proc.5.html
[2]: http://man7.org/linux/man-pages/man1/ps.1.html
Signed-off-by: W. Trevor King <wking@tremily.us>
From 766abd6f (runtime.md: Require 'create' to fail if config.json
asks for the impossible, 2016-09-08, #559).
Signed-off-by: W. Trevor King <wking@tremily.us>
All of these sections are about configuration, and we don't usually
use "{Whatever} configuration" in the headers.
Signed-off-by: W. Trevor King <wking@tremily.us>
We don't want to silently ignore settings that we understand but
cannot implement [1] (we *do* want to ignore settings that we don't
understand [2], but that's a separate issue).
This raises a slightly sticky certification issue. If a runtime
*always* exits 'create' with an error:
func create() err {
return fmt.Errorf("nope, I cannot create that container either.")
}
it would be neither complaint nor non-compliant. It would not fail
any MUSTs, but availing itself of the "cannot create the maintainer"
option specified in this commit would mean the test suite could not
test the deeper requirements around the config properties themselves.
So with this change, making Microsoft certifiable will still need an
explicit weakening around root.path. The easiest way to do that might
be to have separate annotations for whether a setting is optional for
config authors and whether it's optional for runtime authors
(supported):
* **`readonly`** (bool, config:optional, support:optional) ...
But I'll leave hashing that out to a later commit. Regardless of the
certification impact, we want to be clear that silently ignoring known
parameters is wrong.
[1]: 9b8e21826c (r65400731)
Subject: [ Config | Root Config ] Clarify readonly
[2]: https://github.com/opencontainers/runtime-spec/pull/510
Subject: Add text about extensions
Signed-off-by: W. Trevor King <wking@tremily.us>
There's an outside change that these are intentional, since I pointed
one of these out earlier [1] and it wasn't fixed. But I haven't seen
" : " used intentionally outside of this project, and don't think we
want to break ground in that direction ;).
[1]: https://github.com/opencontainers/runtime-spec/pull/510#discussion_r77291554
Signed-off-by: W. Trevor King <wking@tremily.us>
We use both forms, but the latter was more popular. Before this
branch:
$ git grep -i 'container process' origin/master | wc -l
13
$ git grep -i 'main process' origin/master | wc -l
4
Also fix "process id" -> "process ID" in one of the lines I touched,
to match fork(2) [1].
[1]: http://man7.org/linux/man-pages/man2/fork.2.html
Signed-off-by: W. Trevor King <wking@tremily.us>
The indentation of the sub bullets for the status entries under State were not
indented by the 4 spaces required by markdown. They did not render well in
the pdf, at least with my native build with pandoc v1.13.2. Change them to 4
spaces, which now renders well.
Signed-off-by: Graham Whaley <graham.whaley@linux.intel.com>
Make it clear that if a runtime cannot set up an environment that
*precisely* matches the config.json provided, it must generate an error.
This is important because not doing this can cause security issues.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
The shorter-than-normal (for the rest of this list) indent landed with
the line in be594153 (Split create and start, 2016-04-01, #384).
Signed-off-by: W. Trevor King <wking@tremily.us>