* Add minimum supported Go version to CI
On top of automatically testing against the two most recent releases (what Go upstream supports), also test explicitly against our lower bound.
As noted in the previous change, don't have a `go.mod` to source this information from, so it's simply hard-coded in this file instead.
(I chose 1.21 as that was the lowest version we were testing against previously, but it's possible that could go lower or actually reasonably needs to go higher.)
Signed-off-by: Tianon Gravi <admwiggin@gmail.com>
* Add explicit `GOTOOLCHAIN=local` in CI
Signed-off-by: Tianon Gravi <admwiggin@gmail.com>
---------
Signed-off-by: Tianon Gravi <admwiggin@gmail.com>
The upside is, we don't have to bump the versions here every 6 months.
The downside is, CI might break once the new Go is out.
Overall I think it is net positive.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Current spec allows decimal 512 as a maximum value for FileMode,
which is octal 1000, meaning sticky bit is set and no rwx permissions
for anyone (aka s---------).
This does not make sense,the maximum value should be 511 (which is
octal 777, aka -rwxrwxrwx).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Originally, the file mode was indeed written in octal (see e.g.
commit 5273b3d), but it was found out later that JSON does not
allow octal values so the examples were changed to decimal in
commit ccf3a24, but the "typically an octal value" bit (added
by commit cdcabde) remains.
Change it to emphasize the fact that this is in decimal.
Also, add a note to config-linux.md saying the same thing.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The history of this is a little complicated, but in short there is an
argument to be made that several misunderstandings resulted in the spec
sometimes implying (and runtimes interpreting) a pids.limit value of 0
to be equivalent to "max" or otherwise having unfortunate handling of
the value.
The slightly longer background is the following:
1. When commit 834fb5db52 ("spec: linux: add support for the PIDs
cgroup") added support, we did not yet have textual documentation of
cgroup configuration values. In addition, we had not yet started
using pointers to indicate optional fields and detect unset fields.
However, the initial commit did imply that pids.limit=0 should be
treated as a real value.
2. Commit 2ce2c866ff ("runtime: config: linux: add cgroups
information") labeled "pids.limit" as being a REQUIRED field. This
may seem trivial, but consider this foreshadowing for point 5.
3. Later, commit 9b19cd2fab ("config: linux: update description of
PidsLimit") was added to explicitly make pids.limit=0 equivalent to
max (at the time there was a kernel patch proposed to make setting
pids.max to 0 illegal, though it was never merged).
This is often pointed to as being the reason for runtimes
interpreting this behaviour this way, however...
4. Soon after, 488f174af9 ("Make optional Cgroup related config params
pointers along with `omitempty` json tag.") converted it to a pointer
and changed the code comment to state that the "default value" means
"no limit" -- and the default value was now a pointer so the default
value is nil not 0. At this stage, using 0 to mean "no limit" would
arguably no longer be correct.
5. However, because the field was marked as REQUIRED in point 2, a while
later commit ef9ce84cf9 ("specs-go/config: fix required items
type") changed the value back to a non-pointer but didn't modify the
code comment -- and so ended up codifying the "0 means no limit"
behaviour.
I would argue this commit is the reason why runtimes have interpreted
the behaviour this way (though runc likely did it because of point 3
since I authored both patches, and other runtimes probably looked at
runc to see how they should interpret this confusing history -- my
bad!).
So, let's finally have some clarity and add wording to conclusively
state that the correct representation of max is -1 (like every other
cgroup configuration value) and that users should not treat 0 as a
special value of any kind. A nil value means "do not touch it" (just
like every other cgroup configuration value too).
Note that a pids.max value of 0 is actually different to 1 now that
CLONE_INTO_CGROUP exists (at the time pids was added to the kernel and
the spec, this feature didn't exist and so it may have seemed redundant
to have two equivalent values -- hence my attempt to make 0 an illegal
value for the kernel implementation).
For the Go API, this is effectively a partial revert of commit
ef9ce84cf9 ("specs-go/config: fix required items type") which turned
the limit value into a bare int64.
Fixes: 2ce2c866ff ("runtime: config: linux: add cgroups information")
Fixes: 9b19cd2fab ("config: linux: update description of PidsLimit")
Fixes: ef9ce84cf9 ("specs-go/config: fix required items type")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
The thinking is that the runtimes should not do the filtering of values,
but instead just apply the values in order. This way the possible
MB-lines in l3CacheSchema will become overwritten by memBwSchema values
(if the domains overlap).
Note that we can't just concatenate the values because kernel will error
out if the same domain is attempted to be set multiple times within one
write() call.
Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
Nodes is required only in some memory policy modes, while some other
modes require that there must be no nodes.
Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
Commit 34a39b9070 introduced the
"linux.intelRdt.enableMonitoring" field. This patch supplements it by
adding "linux.intelRdt.monitoring" field in the features.json to check
if the runtime implementation supports the new field of the spec.
Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
Specify "/" as an explicit value for linux.intelRdt.closID to assign a
container to the default CLOS, corresponding to the root of the resctrl
filesystem.
This addition is important after the recently introduced
intelRdt.enableMonitoring field. There is no way to express "enable
monitoring but keep the container in the default CLOS". Users would
otherwise have to rely on pre-created CLOSes or may quickly exhaust
available CLOS entries - in some configurations the number of available
CLOSes (on top of the default) may be as low as three.
Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
Accidentally left out from d2f4f9097a
which added the "linux.intelRdt.schemata" field to the config.
Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
Add a parameter for enabling per-container resctrl monitoring.
This supersedes and replaces the previous "enableCMT" and "enableMBM"
settings whose functionality was very vaguely specified. Separate
parameter for every monitoring metric does not seem to make much sense, in
particular because in the resctrl filesystem it is not possible to
selectively enable a subset of the monitoring features. You always get
all the metrics that the system provides. Also, with separate settings
(and corresponding check if the specific metric is available) the user
cannot specify "enable whatever is available" - setting everything to
"true" might fail because one of the metrics is not available on the
platform. In addition, having separate parameters is very
future-unproof, making support for new monitoring metrics unnecessarily
cumbersome to add. New metrics are certain to be added in new hardware
generations, e.g. perf/energy monitoring in the near future
(https://lkml.org/lkml/2025/5/21/1631), and requiring an update to the
runtime-spec for each one of them feels like an overkill without much
benefits. It is easier to have one switch for "enable container-specific
metrics" and let the user read whatever metrics the platform provides.
Moreover, it is not even possible to turn off monitoring (from the
resctrl filesystem). For example, you always get the metrics for all
CTRL_MON groups (closIDs). However, that is not always very useful as
there likely are a lot of applications packed in the same group. The new
intelRdt.enableMontoring parameter will enable creation of a MON group
specific to a single container allowing monitoring of resctrl metrics on
per-container granularity.
Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
* config-linux: add schemata field to IntelRdt
Add a new "schemata" field to the Linux IntelRdt configuration. This
addresses the complexity of separate schema fields and resolves the
issue of supporting currently uncovered RDT features like L2 cache
allocation and CDP (Code and Data Prioritization).
The new field is for specifying the complete schemata (all schemas) to
be written to the schemata file in Linux resctrl fs. The aim is for
simple usage and runtime implementation (by not requiring any
parsing/filtering of data or otherwise re-implement parsing or
validation of the Linux resctrl interface) and also to support all RDT
features now and in the future (i.e. schemas like L2, L2CODE, L2DATA,
L3CODE and L3DATA and who knows L4 or something else in the future).
Behavior of existing fields is not changed but it is required that the
new schemata field is applied last.
Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
* Add linux.intelRdt.schemata to features.md
Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
---------
Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
Enable setting a NUMA memory policy for the container. New
linux.memoryPolicy object contains inputs to the set_mempolicy(2)
syscall.
Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
The proposed "netdevices" field provides a declarative way to
specify which host network devices should be moved into a container's
network namespace.
This approach is similar than the existing "devices" field used for block
devices but uses a dictionary keyed by the interface name instead.
The proposed scheme is based on the existing representation of network
device by the `struct net_device`
https://docs.kernel.org/networking/netdevices.html.
This proposal focuses solely on moving existing network devices into
the container namespace. It does not cover the complexities of
network configuration or network interface creation, emphasizing the
separation of device management and network configuration.
Signed-off-by: Antonio Ojea <aojea@google.com>