1
0
mirror of https://github.com/siderolabs/talos.git synced 2026-02-05 15:45:37 +01:00
Commit Graph

2695 Commits

Author SHA1 Message Date
renovate[bot]
d85a260cfd chore: update dependencies
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2026-02-05 10:12:52 +00:00
Laura Brehm
d43a01ccbd feat: implement talosctl debug
This implements a way to run a debug container with a provided image on
the node.

The container runs with privileged profile, allowing to issue debugging
commands (e.g. using some advanced network tools) to troubleshoot a
machine.

Signed-off-by: Laura Brehm <laurabrehm@hey.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-02-04 21:26:09 +04:00
Pranav Patil
34a31c9797 feat: add mount options support for existing volumes
Add DisableAccessTime and Secure mount options for existing volumes.
DisableAccessTime adds noatime parameter to disable access time updates.
Secure adds nosuid and nodev parameters for security (defaults to true).
Add integration tests for both options.

Signed-off-by: Pranav Patil <pranavppatil767@gmail.com>
2026-02-04 09:13:05 +01:00
Fritz Schaal
1bf95eed18 feat: improve dashboard uptime display
* display dashboard uptime in days when >= 24h

Signed-off-by: Fritz Schaal <fritz.schaal@siderolabs.com>
2026-02-03 21:52:11 +04:00
Andrey Smirnov
9f2dd6312f refactor: api tests
Unify a list of all APIs in Talos to a single place, and use them in
associated tests:

* the test for one2many specifics
* the test for deprecated methods
* the test for missing RBAC rules

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-02-02 19:17:27 +04:00
Andrey Smirnov
8b245b8f26 feat: implement new image service APIs
These new APIs only support one2one proxying, so they don't have any
hacks, and look as regular gRPC APIs.

Old APIs are deprecated, but still supported.

Implement client-side multiplexing in `talosctl`, provide fallback to
old APIs for legacy Talos versions.

New APIs include removing an image, importing an image.

Extracted from #12392

Co-authored-by: Laura Brehm <laurabrehm@hey.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-02-02 15:55:56 +04:00
Andrey Smirnov
2165280d0e refactor: change the way one2many proxying is picked
Instead of defaulting to one2many, list explicitly one2many supported
APIs.

The idea is that any new API will only be "normal" gRPC API, so we can
flip the switch, and consider one2many APIs as "legacy".

Extracted from #12392

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-01-29 16:45:02 +04:00
Andrey Smirnov
6aa9b0677e fix: skip empty documents on config decoding
Fixes #12649

The cryptic error was coming from our code, as it never worked if the
decoded node is not mapping node.

Also annotate errors with line numbers (or document kinds) to make
understanding the problem better, specifically for multi-doc and long
configs.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-01-23 21:11:50 +04:00
Andrey Smirnov
494492489b fix: always set advertised peer URLs
It seems that etcd might derive them incorrectly on IPv6-only system.

This change is confusing, as it sets the `--initial-` prefixed flag even
after join, but it seems that on etcd side, the configuration value is
used always despite the flag name.

Fixes #12646 (see the issue for more details)

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-01-23 18:21:28 +04:00
Matthew Sanabria
782cc507dc fix: open the filesystem as read-only
Updated `ReadFromVolume` to open the filesystem it's attempting to
read from as read-only. This allows `vfat` cloud init volumes to be
successfully read by Talos Linux. This change was made here and not in
`pkg/xfs/fsopen/fsopen_linux.go` so that it only applies to volumes that
are being read for cloud init configuration, not all volumes.

Fixes https://github.com/siderolabs/talos/issues/12647.

Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
2026-01-23 13:42:45 +01:00
Andrey Smirnov
28e61a740a fix: set GRUB prefix correctly on arm64
We always unconditionally create `BIOS` partition, even on arm64, so the
prefix should be same on all arches.

We don't use `BIOS` on arm64, but still this would be easier to support
in the future.

Co-authored-by: Dmitrii Sharshakov <dmitry.sharshakov@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-01-23 13:40:07 +04:00
Florian Ströger
562920701e fix: use node podCIDRs for kubespan advertiseKubernetesNetworks
This commit changes the way kubespan gets the podCIDR to advertise when
`advertiseKubernetesNetworks` is enabled. Before, it used the interface
address, but some CNIs (such as Cilium in NativeRouting) only set a
single /32 IP to a single interface (`cilium_host` in cilium's case).
This adds the `v1.Node`'s `.spec.podCIDRs` array to the `k8s.NodeStatus`
object and uses this to advertise the kubernetes network.

Signed-off-by: Florian Ströger <stroeger@youniqx.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-01-22 19:43:21 +04:00
Andrey Smirnov
39460365c1 feat: implement layering for ProbeSpec
Same as for any other resource - layering per source, and proper merge
across layers, so we can see where it comes from.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-01-22 17:49:45 +04:00
Mickaël Canévet
b5c760f707 feat: add ProbeConfig for network connectivity probes
This commit introduces ProbeConfig, a new network configuration document type
that allows users to configure TCP connectivity probes to monitor network
endpoints.

Features:
- ProbeConfig document type with TCP probe support
- ProbeSpec and ProbeStatus resources for probe management
- ProbeConfigController to translate ProbeConfig into ProbeSpec
- ProbeController to execute probes and update ProbeStatus
- Configurable probe interval, timeout, and failure threshold
- Integration tests for API functionality

Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-01-21 22:17:38 +04:00
Mateusz Urbanek
4172095125 fix: fallback to /proc/meminfo for memory modules
If SMBIOS does not report memory information, fall back to
/proc/meminfo and expose a dummy memory module as a best-effort
approximation.

Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
2026-01-21 15:50:16 +01:00
Andrey Smirnov
ddd6b186eb refactor: generate GRUB images
Simplify the flow a bit by using live partition info,
avoid doing some calculations which are already done in the
partition code.

Remove some steps I believe we don't need to do.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-01-21 16:37:25 +04:00
Andrey Smirnov
c7aa266ea5 fix: overwrite resolver config with machine config
Fixes #12614

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-01-21 16:14:36 +04:00
Mateusz Urbanek
8c7b8f5b7d feat: add support for negative max size
Add support for negative max size values in volume configuration.
Negative max size represents the amount of space to be left free on the device, rather than the size the volume should consume.
For example, a max size of "-10GiB" means the volume can grow to the device size minus 10GiB.

Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
2026-01-21 12:11:31 +01:00
Andrey Smirnov
77bc3d21fa fix: marshal of FailOverMac property
This value for some historical reason (I guess treating empty string as
'none') doesn't use standard enumer's methods.

So we shipped it in Talos 1.12 without proper encoding/decoding
in YAML config documents (it was actually converted to int).

Fix encoding, but keep backwards compatibility for integer values
just in case someone already started relying on it.

Fixes #12625

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-01-21 14:55:54 +04:00
Andrey Smirnov
3d1301640d fix: wipe the first/last 1MiB in addition to wiping by signatures
I got a failure when dual-boot image refuses to format EPHEMERAL
partition where `EFI` partition used to be (VFAT).

So until we have a resolution, do this workaround.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-01-20 21:25:54 +04:00
Dmitrii Sharshakov
1aa6528adc fix: make OOM controller more precise by considering separate cgroup PSI
This should reduce false triggers due to high IO activity and similar
events increasing global memory PSI despite free memory being available.

Also add more details for trigger condition and debugging.

Fixes: #12526

Co-authored-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Signed-off-by: Dmitrii Sharshakov <dmitry.sharshakov@siderolabs.com>
2026-01-20 16:19:34 +01:00
Andrey Smirnov
f7072c050e fix: check if the device is not mounted when wiping
Open the blockdevice in `O_EXCL` mode when wiping to ensure that we
don't wipe a mounted device.

This issue was discovered via #12620, when we wipe a blockdevice which
is still mounted ending up in a wrong state.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-01-20 17:07:19 +04:00
Andrey Smirnov
743c3b94b9 fix: use correct containerd import path
Use `/v2` import path, otherwise we pull in `containerd` v1.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-01-20 15:48:54 +04:00
Andrey Smirnov
f2dd08594e feat: report image pull progress in the console
This reports image pull progress in the console for images pulled by
Talos:

* etcd
* kubelet
* installer

This work was mostly done by @laurazard, I just wrapped it for the
console with Laura's help. (see #12932)

Co-authored-by: Laura Brehm <laurabrehm@hey.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-01-19 18:00:13 +04:00
Jonas Lammler
d4ed13d939 fix: add talos version to Hetzner Cloud client user agent
Add Talos details to the Hetzner Cloud client user-agent.

Helps us identify and troubleshoot issues with users running Talos on Hetzner Cloud.

Signed-off-by: Jonas Lammler <jonas.lammler@hetzner-cloud.de>
Signed-off-by: Noel Georgi <git@frezbo.dev>
2026-01-19 13:04:17 +05:30
Mateusz Urbanek
01a3678913 fix: use append instead of prepend in service-account-issuer
Changing `.cluster.controlPlane.endpoint=$NEW` will cause old tokens to be no longer valid.
We want to ensure that new tokens are issued using the `.cluster.controlPlane.endpoint=$NEW` value,
but all the existing tokens (issued using `.cluster.controlPlane.endpoint=$OLD`) are still accepted.

Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
2026-01-16 12:17:23 +01:00
Mateusz Urbanek
d1954278a1 feat: add extraArgs from service-account-issuer
In API Server, passing extra args with `service-account-issuer` will add them to default value.

Fixes #11694

Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
2026-01-16 11:21:00 +01:00
Mateusz Urbanek
91b88f7f99 feat: support multiple values for extraArgs
BREAKING: internal resources for the components use different
representation of AxtraArgs, resulting in modified types in protocol
buffers.

Signed-off-by: Mateusz Urbanek <mateusz.urbanek.98@gmail.com>
2026-01-16 11:20:59 +01:00
Andrey Smirnov
96e604874b fix: add hostname to endpoints
Populate endpoint coming from the Kubernetes controlplane endpoint with
the hostname (if the endpoint is a hostname).

This should improve cases when hostname is used for the endpoint in
terms of SNI, proper resolving of DNS if it's dynamic.

See https://github.com/siderolabs/talos/pull/12556#issuecomment-3755862314

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-01-15 22:56:46 +04:00
Max Makarov
34f09a3004 feat: add VLAN support to OpenStack platform
This adds support for VLAN interfaces in OpenStack network_data.json.
VLANs are configured with type "vlan" and reference a parent link via
"vlan_link" field. The VLAN ID is specified in "vlan_id" field.

Example network_data.json entry:
{
    "type": "vlan",
    "vlan_link": "tap7819ff08-20",
    "vlan_id": 100
}

This enables Talos to automatically configure VLAN interfaces when
booting on OpenStack/Ironic bare metal with VLAN-based network topology.

Signed-off-by: Max Makarov <maxpain@linux.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-01-15 14:48:52 +04:00
Andrey Smirnov
5127ef7c28 fix: wipe disk by signatures
Fixes #12491

In (almost) all places we previously used `FastWipe`, use instead a
helper which will try to discover filesystem/partition signatures, and
wipe them.

This fixes the issue when a partition re-created in the same place might
already hit a scenario when the "old" filesystem is discovered in the
same place.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-01-14 19:15:37 +04:00
Pranav Patil
8184927316 feat: implement KubeSpan multi-document configuration
Migrate KubeSpan configuration to support multi-document format.
Add version-aware support for talosctl cluster create and gen config.
Uses multi-doc format for Talos 1.13+, legacy format for 1.12 and earlier.

Signed-off-by: Pranav Patil <pranavppatil767@gmail.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-01-13 16:08:11 +04:00
Andrey Smirnov
308c750907 fix: resolve SideroLink Wireguard endpoint on reconnect
It's not recommended to use DNS name for the Wireguard endpoint, as
in-kernel Wireguard endpoint relies only on IP addresses, so either way
DNS resolve will happen outside of any Wireguard networking operations.

Previously, the resolving would happen at the moment Wireguard config is
applied to the Linux kernel, but SideroLink reconnect would not trigger
Wireguard reconfiguration as there is no change to the spec if the
hostname is used (even if it resolves to a different IP now).

With this change, on each SideroLink reconnect attempt
the name will be resolved to an IP address, so the Wireguard config
would actually trigger a change/reconfiguration if the DNS names
resolves to a new IP now.

Co-authored-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-01-12 21:09:52 +04:00
Mateusz Urbanek
c3176adcf9 feat: add EnvironmentConfig document
Add new EnvironmentConfig document for configuring the Env vars.
Deprecate .Machine.Env

Closes #12439

Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
2026-01-12 15:10:20 +01:00
Orzelius
c839b38809 feat: expose more SSA options in the upgrade-k8s command
add the following flags to the upgrade-k8s command:
* `--force-conflicts`            overwrite the fields when applying even if the field manager differs
* `--inventory-policy` string    kubernetes SSA inventory policy (one of 'MustMatch', 'AdoptIfNoInventory' or 'AdoptAll') (default "AdoptIfNoInventory")
* `--no-prune`                   whether pruning of previously applied objects should happen after apply
* `--prune-timeout` int          how long to wait for resources to be pruned in secunds (set to zero to disable waiting for resources to be fully deleted) (default 180)
* `--reconcile-timeout` int      how long to wait for resources to be prfully reconciled in secunds (set to zero to disable waiting for resources to be fully reoondiled) (default 180)

Signed-off-by: Orzelius <33936483+Orzelius@users.noreply.github.com>
2026-01-12 21:17:43 +09:00
Andrey Smirnov
b8ff9677e4 fix: handle correctly incomplete RegistryTLSConfig
Add some missing unit-test coverage.

Fixes #12571

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-01-09 21:43:28 +04:00
Andrey Smirnov
99f2ddada8 fix: bond config via platform
Call `FillDefaults` on platform-acquired bond config.

As platform config controller might have cached (saved) representation
of bond config, we need to ensure we adapt it to the latest bond
configuration.

In particular, new fields introduced in v1.12 require some values to be
set which `.FillDefaults()` does for us.

Otherwise, Talos enters a loop trying to reconfigure the bond in a loop.

Prove with a unit-test (it fails if `.FillDefaults()` is removed).

Fixes #12561

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-01-09 21:12:29 +04:00
Andrey Smirnov
2449ffea45 fix: allow HostnameConfig to be used with incomplete machine config
The controller incorrectly was waiting for `cfg.Machine()` which
basically blocks `HostnameConfig` with partial machine config.

Fix that, add missing unit-tests.

Fixes #12564

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-01-09 19:57:28 +04:00
Andrey Smirnov
35fc520872 fix: lock down etcd listen address to IPv4 localhost
Use literal IP address instead of `localhost` to make `kube-apiserver`
connect to etcd member instead of relying on IPv4/IPv6 resolving of
`localhost`.

Simplify configuration for listening on 127.0.0.1 only, generate cert
SANs uncoditionally for etcd loopback IPs.

Fixes #12542

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-01-09 18:49:43 +04:00
Mateusz Urbanek
080efcbda2 feat: add k8s-version parameter to k8s-bundle
Allow overriding K8s version in the  command.

Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
2026-01-07 10:26:39 +01:00
Noel Georgi
b764f5f724 fix: skip sync test when kube-proxy is disabled
Skip manifests sync test when kube-proxy is disabled.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2026-01-05 21:13:45 +05:30
Noel Georgi
70e67787d6 feat: imager: populate filesystems with root owned files
Populate filesystems from source directories with root owned files.
This completes running imager fully rootless.

Fixes: #12498

Signed-off-by: Noel Georgi <git@frezbo.dev>
2026-01-05 21:13:42 +05:30
Noel Georgi
dc2009e477 chore: use context when creating filesystems
Pass in context when creating filesystems with `mkfs.*` commands.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2026-01-05 15:29:35 +05:30
Andrey Smirnov
154952175a fix: disable swap for system services
If system services including kubelet/CRI start using swap, it might lead
to extreme performance degradation.

Disable swap for all system services except for dashboard (which is not
critical).

```
NAME                                                                          SwapCurrent   SwapPeak   SwapHigh   SwapMax    ZswapCurrent   ZswapMax   ZswapWriteback
.                                                                                unset         unset      unset      unset      unset          unset   1
├──init                                                                            0 B           0 B        max        0 B        0 B            max   1
├──podruntime                                                                      0 B           0 B        max        max        0 B            max   1
│   ├──etcd                                                                        0 B           0 B        max        0 B        0 B            max   1
│   ├──kubelet                                                                     0 B           0 B        max        0 B        0 B            max   1
│   └──runtime                                                                     0 B           0 B        max        0 B        0 B            max   1
└──system                                                                          0 B           0 B        max        max        0 B            max   1
    ├──apid                                                                        0 B           0 B        max        0 B        0 B            max   1
    ├──dashboard                                                                   0 B           0 B        max        max        0 B            max   1
    ├──runtime                                                                     0 B           0 B        max        0 B        0 B            max   1
    ├──trustd                                                                      0 B           0 B        max        0 B        0 B            max   1
```

Refactor etcd cgroup to use same common pattern while keeping same
settings (but limit swap).

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-12-26 18:25:25 +04:00
Noel Georgi
d98b415afe fix: drop more non-overlay SBC stuff
Drop more options used in SBC board code.

Part-of: #12492

Signed-off-by: Noel Georgi <git@frezbo.dev>
2025-12-26 16:39:49 +05:30
Noel Georgi
53f5bf8d2c fix: overlay installers
Overlays installers assume the `/boot/EFI` path, so we generate assets into `/boot/EFI` then move that directory to the mountPrefix+/EFI.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2025-12-26 15:03:11 +05:30
Noel Georgi
10d0cfd93a fix: overlay install in image mode
Since bootloader interface got refactored to support rootless `ExtraInstallStep` needs to be handled in `GenerateAssets`.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2025-12-26 15:03:07 +05:30
Noel Georgi
4d5657b1a3 fix: drop SBC board code
Boards were deprecated in favor of overlays from Talos 1.7.

Now completely remove all board specific code.

Part of: #12492

Signed-off-by: Noel Georgi <git@frezbo.dev>
2025-12-26 14:45:17 +05:30
Orzelius
c4f3f6d3e5 feat: implement kubernetes server-side apply
* add SSA via the new go-kubernetes library implementation to talosctl `upgrade-k8s` command
* add SSA via direct ResourceInterface call into talos (machined) with a manual inventory update
* add an integration test for ssa functionality

Co-authored-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Signed-off-by: Orzelius <33936483+Orzelius@users.noreply.github.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-12-26 12:08:16 +04:00
Andrey Smirnov
f0d8a68517 test: skip the source bundle on exact tag
When building for the release, as the release hasn't been finalized yet,
the test might fail.

```
    run.go:146: Running "/home/runner/_work/talos/talos/_out/talosctl-linux-amd64 --talosconfig /tmp/e2e/docker/talosconfig image talos-bundle v1.13.0-alpha.0"
    run.go:210:
        	Error Trace:	/src/internal/integration/base/run.go:210
        	            				/src/internal/integration/base/cli.go:107
        	            				/src/internal/integration/cli/image.go:142
        	            				/go/src/runtime/asm_amd64.s:1693
        	Error:      	Received unexpected error:
        	            	exit status 1
        	Test:       	TestIntegration/cli.ImageSuite/TestSourceBundle
        	Messages:   	command failed, stdout: "", stderr: "error fetching official extensions for v1.13.0-alpha.0: HEAD https://ghcr.io/v2/ghcr.io/siderolabs/extensions/manifests/v1.13.0-alpha.0: unexpected status code 404 Not Found (HEAD responses have no body, use GET for details)\n"
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-12-25 15:16:15 +04:00