This implements a way to run a debug container with a provided image on
the node.
The container runs with privileged profile, allowing to issue debugging
commands (e.g. using some advanced network tools) to troubleshoot a
machine.
Signed-off-by: Laura Brehm <laurabrehm@hey.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Add DisableAccessTime and Secure mount options for existing volumes.
DisableAccessTime adds noatime parameter to disable access time updates.
Secure adds nosuid and nodev parameters for security (defaults to true).
Add integration tests for both options.
Signed-off-by: Pranav Patil <pranavppatil767@gmail.com>
Unify a list of all APIs in Talos to a single place, and use them in
associated tests:
* the test for one2many specifics
* the test for deprecated methods
* the test for missing RBAC rules
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
These new APIs only support one2one proxying, so they don't have any
hacks, and look as regular gRPC APIs.
Old APIs are deprecated, but still supported.
Implement client-side multiplexing in `talosctl`, provide fallback to
old APIs for legacy Talos versions.
New APIs include removing an image, importing an image.
Extracted from #12392
Co-authored-by: Laura Brehm <laurabrehm@hey.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Instead of defaulting to one2many, list explicitly one2many supported
APIs.
The idea is that any new API will only be "normal" gRPC API, so we can
flip the switch, and consider one2many APIs as "legacy".
Extracted from #12392
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Fixes #12649
The cryptic error was coming from our code, as it never worked if the
decoded node is not mapping node.
Also annotate errors with line numbers (or document kinds) to make
understanding the problem better, specifically for multi-doc and long
configs.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
It seems that etcd might derive them incorrectly on IPv6-only system.
This change is confusing, as it sets the `--initial-` prefixed flag even
after join, but it seems that on etcd side, the configuration value is
used always despite the flag name.
Fixes #12646 (see the issue for more details)
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Updated `ReadFromVolume` to open the filesystem it's attempting to
read from as read-only. This allows `vfat` cloud init volumes to be
successfully read by Talos Linux. This change was made here and not in
`pkg/xfs/fsopen/fsopen_linux.go` so that it only applies to volumes that
are being read for cloud init configuration, not all volumes.
Fixes https://github.com/siderolabs/talos/issues/12647.
Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
We always unconditionally create `BIOS` partition, even on arm64, so the
prefix should be same on all arches.
We don't use `BIOS` on arm64, but still this would be easier to support
in the future.
Co-authored-by: Dmitrii Sharshakov <dmitry.sharshakov@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This commit changes the way kubespan gets the podCIDR to advertise when
`advertiseKubernetesNetworks` is enabled. Before, it used the interface
address, but some CNIs (such as Cilium in NativeRouting) only set a
single /32 IP to a single interface (`cilium_host` in cilium's case).
This adds the `v1.Node`'s `.spec.podCIDRs` array to the `k8s.NodeStatus`
object and uses this to advertise the kubernetes network.
Signed-off-by: Florian Ströger <stroeger@youniqx.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Same as for any other resource - layering per source, and proper merge
across layers, so we can see where it comes from.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This commit introduces ProbeConfig, a new network configuration document type
that allows users to configure TCP connectivity probes to monitor network
endpoints.
Features:
- ProbeConfig document type with TCP probe support
- ProbeSpec and ProbeStatus resources for probe management
- ProbeConfigController to translate ProbeConfig into ProbeSpec
- ProbeController to execute probes and update ProbeStatus
- Configurable probe interval, timeout, and failure threshold
- Integration tests for API functionality
Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
If SMBIOS does not report memory information, fall back to
/proc/meminfo and expose a dummy memory module as a best-effort
approximation.
Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
Simplify the flow a bit by using live partition info,
avoid doing some calculations which are already done in the
partition code.
Remove some steps I believe we don't need to do.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Add support for negative max size values in volume configuration.
Negative max size represents the amount of space to be left free on the device, rather than the size the volume should consume.
For example, a max size of "-10GiB" means the volume can grow to the device size minus 10GiB.
Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
This value for some historical reason (I guess treating empty string as
'none') doesn't use standard enumer's methods.
So we shipped it in Talos 1.12 without proper encoding/decoding
in YAML config documents (it was actually converted to int).
Fix encoding, but keep backwards compatibility for integer values
just in case someone already started relying on it.
Fixes #12625
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
I got a failure when dual-boot image refuses to format EPHEMERAL
partition where `EFI` partition used to be (VFAT).
So until we have a resolution, do this workaround.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This should reduce false triggers due to high IO activity and similar
events increasing global memory PSI despite free memory being available.
Also add more details for trigger condition and debugging.
Fixes: #12526
Co-authored-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Signed-off-by: Dmitrii Sharshakov <dmitry.sharshakov@siderolabs.com>
Open the blockdevice in `O_EXCL` mode when wiping to ensure that we
don't wipe a mounted device.
This issue was discovered via #12620, when we wipe a blockdevice which
is still mounted ending up in a wrong state.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This reports image pull progress in the console for images pulled by
Talos:
* etcd
* kubelet
* installer
This work was mostly done by @laurazard, I just wrapped it for the
console with Laura's help. (see #12932)
Co-authored-by: Laura Brehm <laurabrehm@hey.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Add Talos details to the Hetzner Cloud client user-agent.
Helps us identify and troubleshoot issues with users running Talos on Hetzner Cloud.
Signed-off-by: Jonas Lammler <jonas.lammler@hetzner-cloud.de>
Signed-off-by: Noel Georgi <git@frezbo.dev>
Changing `.cluster.controlPlane.endpoint=$NEW` will cause old tokens to be no longer valid.
We want to ensure that new tokens are issued using the `.cluster.controlPlane.endpoint=$NEW` value,
but all the existing tokens (issued using `.cluster.controlPlane.endpoint=$OLD`) are still accepted.
Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
In API Server, passing extra args with `service-account-issuer` will add them to default value.
Fixes #11694
Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
BREAKING: internal resources for the components use different
representation of AxtraArgs, resulting in modified types in protocol
buffers.
Signed-off-by: Mateusz Urbanek <mateusz.urbanek.98@gmail.com>
This adds support for VLAN interfaces in OpenStack network_data.json.
VLANs are configured with type "vlan" and reference a parent link via
"vlan_link" field. The VLAN ID is specified in "vlan_id" field.
Example network_data.json entry:
{
"type": "vlan",
"vlan_link": "tap7819ff08-20",
"vlan_id": 100
}
This enables Talos to automatically configure VLAN interfaces when
booting on OpenStack/Ironic bare metal with VLAN-based network topology.
Signed-off-by: Max Makarov <maxpain@linux.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Fixes #12491
In (almost) all places we previously used `FastWipe`, use instead a
helper which will try to discover filesystem/partition signatures, and
wipe them.
This fixes the issue when a partition re-created in the same place might
already hit a scenario when the "old" filesystem is discovered in the
same place.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Migrate KubeSpan configuration to support multi-document format.
Add version-aware support for talosctl cluster create and gen config.
Uses multi-doc format for Talos 1.13+, legacy format for 1.12 and earlier.
Signed-off-by: Pranav Patil <pranavppatil767@gmail.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
It's not recommended to use DNS name for the Wireguard endpoint, as
in-kernel Wireguard endpoint relies only on IP addresses, so either way
DNS resolve will happen outside of any Wireguard networking operations.
Previously, the resolving would happen at the moment Wireguard config is
applied to the Linux kernel, but SideroLink reconnect would not trigger
Wireguard reconfiguration as there is no change to the spec if the
hostname is used (even if it resolves to a different IP now).
With this change, on each SideroLink reconnect attempt
the name will be resolved to an IP address, so the Wireguard config
would actually trigger a change/reconfiguration if the DNS names
resolves to a new IP now.
Co-authored-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
add the following flags to the upgrade-k8s command:
* `--force-conflicts` overwrite the fields when applying even if the field manager differs
* `--inventory-policy` string kubernetes SSA inventory policy (one of 'MustMatch', 'AdoptIfNoInventory' or 'AdoptAll') (default "AdoptIfNoInventory")
* `--no-prune` whether pruning of previously applied objects should happen after apply
* `--prune-timeout` int how long to wait for resources to be pruned in secunds (set to zero to disable waiting for resources to be fully deleted) (default 180)
* `--reconcile-timeout` int how long to wait for resources to be prfully reconciled in secunds (set to zero to disable waiting for resources to be fully reoondiled) (default 180)
Signed-off-by: Orzelius <33936483+Orzelius@users.noreply.github.com>
Call `FillDefaults` on platform-acquired bond config.
As platform config controller might have cached (saved) representation
of bond config, we need to ensure we adapt it to the latest bond
configuration.
In particular, new fields introduced in v1.12 require some values to be
set which `.FillDefaults()` does for us.
Otherwise, Talos enters a loop trying to reconfigure the bond in a loop.
Prove with a unit-test (it fails if `.FillDefaults()` is removed).
Fixes #12561
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Use literal IP address instead of `localhost` to make `kube-apiserver`
connect to etcd member instead of relying on IPv4/IPv6 resolving of
`localhost`.
Simplify configuration for listening on 127.0.0.1 only, generate cert
SANs uncoditionally for etcd loopback IPs.
Fixes #12542
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
If system services including kubelet/CRI start using swap, it might lead
to extreme performance degradation.
Disable swap for all system services except for dashboard (which is not
critical).
```
NAME SwapCurrent SwapPeak SwapHigh SwapMax ZswapCurrent ZswapMax ZswapWriteback
. unset unset unset unset unset unset 1
├──init 0 B 0 B max 0 B 0 B max 1
├──podruntime 0 B 0 B max max 0 B max 1
│ ├──etcd 0 B 0 B max 0 B 0 B max 1
│ ├──kubelet 0 B 0 B max 0 B 0 B max 1
│ └──runtime 0 B 0 B max 0 B 0 B max 1
└──system 0 B 0 B max max 0 B max 1
├──apid 0 B 0 B max 0 B 0 B max 1
├──dashboard 0 B 0 B max max 0 B max 1
├──runtime 0 B 0 B max 0 B 0 B max 1
├──trustd 0 B 0 B max 0 B 0 B max 1
```
Refactor etcd cgroup to use same common pattern while keeping same
settings (but limit swap).
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Overlays installers assume the `/boot/EFI` path, so we generate assets into `/boot/EFI` then move that directory to the mountPrefix+/EFI.
Signed-off-by: Noel Georgi <git@frezbo.dev>
Since bootloader interface got refactored to support rootless `ExtraInstallStep` needs to be handled in `GenerateAssets`.
Signed-off-by: Noel Georgi <git@frezbo.dev>
Boards were deprecated in favor of overlays from Talos 1.7.
Now completely remove all board specific code.
Part of: #12492
Signed-off-by: Noel Georgi <git@frezbo.dev>
* add SSA via the new go-kubernetes library implementation to talosctl `upgrade-k8s` command
* add SSA via direct ResourceInterface call into talos (machined) with a manual inventory update
* add an integration test for ssa functionality
Co-authored-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Signed-off-by: Orzelius <33936483+Orzelius@users.noreply.github.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
When building for the release, as the release hasn't been finalized yet,
the test might fail.
```
run.go:146: Running "/home/runner/_work/talos/talos/_out/talosctl-linux-amd64 --talosconfig /tmp/e2e/docker/talosconfig image talos-bundle v1.13.0-alpha.0"
run.go:210:
Error Trace: /src/internal/integration/base/run.go:210
/src/internal/integration/base/cli.go:107
/src/internal/integration/cli/image.go:142
/go/src/runtime/asm_amd64.s:1693
Error: Received unexpected error:
exit status 1
Test: TestIntegration/cli.ImageSuite/TestSourceBundle
Messages: command failed, stdout: "", stderr: "error fetching official extensions for v1.13.0-alpha.0: HEAD https://ghcr.io/v2/ghcr.io/siderolabs/extensions/manifests/v1.13.0-alpha.0: unexpected status code 404 Not Found (HEAD responses have no body, use GET for details)\n"
```
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>