Why I run Talos Linux

My Kubernetes journey started in 2018 at my company and shortly thereafter at home on a small 5-node Raspberry Pi cluster. By day, my team and I develop an air-gapped, on-prem, mission-critical Kubernetes platform delivered to and used on hundreds, if not thousands of servers. Updates are conservative, even boring, because boring software is the best software. It’s software that works. At home it’s the opposite. My homelab is where I get to be opinionated, spontaneous, break things, experiment, and run the setup I actually want to run (even if it’s a bad idea). That is why and where I tried Talos Linux, and it’s the reason Talos is how I now Kubernetes on bare metal.

Talos is a Linux distribution that does exactly one thing: run Kubernetes. It’s an immutable OS so there is no SSH, no shell, no package manager, and no hand-editing config files on the box. The whole machine is described by a YAML file and managed through an API with the talosctl cli. If that sounds restrictive, it is, and that restriction turns out to be the entire point, and a glorious point at that.

The cluster

There are eight nodes: three control-plane and five workers, almost all of it on recycled small-form-factor desktops plus a couple of deliberate misfits for targeted purposes.

Node	Role	Model	CPU	RAM	Storage
hv1	control-plane	Dell OptiPlex 5060	i5-8600T (6c/6t)	32 GiB	2 TB NVMe + 1 TB SATA SSD
hv2	control-plane	Dell OptiPlex 5060	i5-8600T (6c/6t)	32 GiB	2 TB NVMe + 1 TB SATA SSD
hv3	control-plane	Dell OptiPlex 5060	i5-8500T (6c/6t)	32 GiB	2 TB NVMe + 1 TB SATA SSD
hv4	worker	Dell OptiPlex 7050	i7-7700 (4c/8t)	64 GiB	2 TB NVMe + 1 TB SATA SSD + 12 TB HDD
hv5	worker	Dell OptiPlex 7050	i7-7700 (4c/8t)	64 GiB	2 TB NVMe + 1 TB SATA SSD + 12 TB HDD
hv6	worker	Dell OptiPlex 7050	i7-7700 (4c/8t)	64 GiB	1 TB + 2 TB NVMe + 2× 12 TB HDD
hv7	worker	Raspberry Pi 4 (8 GB)	Cortex-A72 (4c/4t)	8 GiB	1 TB SSD (SATA enclosure)
hv8	worker	Apple MacBook Air (2018)	i5-8210Y (2c/4t)	8 GiB	128 GB NVMe

As you can see, the control plane is three matched Dell OptiPlex 5060 micros with low-power T-series chips. This is perfect for running in a closet: quiet, cool, and three of them so I get a real quorum instead of a single point of failure. The workers are bigger 7050s with full i7-7700s with 64 GiB each with Nvidia GPUs, plus two genuine oddballs, a Raspberry Pi 4 (for an ARM build runner and other odd-and-ends) and an aging MacBook Air (who doesn’t like a built-in UPS). Kubernetes doesn’t care what the box is as long as it joins the cluster and is schedulable.

My cluster definitley earns its keep. This is not a demo cluster running a couple of toys I stood up last week, it’s a hard working cluster that cannot have any downtime. At any given moment it is carrying around 100 workloads across some 40 namespaces, closing in on 200 pods, and it behaves like a real cluster. The plumbing is all there: Flux running everything through GitOps, Kyverno, cert-manager, MetalLB, a Gateway, Longhorn for distributed storage, a CloudNativePG Postgres operator with three replicas, private git hosting with CI runners, and an in-cluster container registry to act as a rudamentary Artifactory. Layered on top is a full observability stack, an IdP server, local LLM inference on worker GPUs, frequent health checks posting to a custom status API I wrote, and a long list of services I actually depend on, Home Assistant, Plex, Paperless, and my own applications, including the beekeeping platform this post is quietly building toward. CronJobs handle the unglamorous glue: dynamic DNS, volume snapshots, off-site backups, photo sync, etc… This isn’t theory for me. With this much work on the cluster, immutability and clean, atomic upgrades aren’t a nice-to-have, they’re the reason it stays up.

Storage in three tiers

A quick note on storage, that disrupts the flow but is important nonetheless. I split storage across the cluster into three tiers by speed:

Fast is NVMe storage, every node has one. Anything latency-sensitive lives here (see below, the control-plane nodes are workhorses too, not just etcd boxes).
Mid is a trick I really like and am a little nervous about. Talos carves an EPHEMERAL partition for its own working state, and on a large disk that leaves a lot of room sitting idle. I reclaim that otherwise-wasted space as a mid tier. Becasue of the possibility of loss, workloads that are easily recovered live here. The honest catch is that it lives on the ephemeral volume, so if I wipe or reset a node without thinking, it goes with it. It is “free” capacity with an explosive attached, and I treat it that way.
Slow is bulk spinning dead weight: four 12 TB drives spread across the workers, for the things that are big, cold, and replaceable.

To make the control plane nodes schedulable in the storage cluster I did the following:

Labeled the control-plane nodes as storage nodes with longhorn-storage: "true".
In the longhorn-default-setting ConfigMap, instructed Longhorn to tolerate the control-plane taint so it could actually schedule there (taint-toleration: "node-role.kubernetes.io/control-plane:NoSchedule")

Which pins its components to those storage-labeled nodes (system-managed-components-node-selector: "longhorn-storage:true")
And to use a fast data path (default-data-path: "/var/lib/longhorn-fast") for the NVMe tier.

These settings gave me the storage placement and performance I wanted. The Longhorn pods are currently running on the tainted control plane nodes:

❯ for n in hv1 hv2 hv3; do
    printf "%s longhorn pods: %s\n" "$n" \
      "$(kubectl get pods -n longhorn-system --field-selector spec.nodeName=$n --no-headers | wc -l | tr -d ' ')"
  done
hv1 longhorn pods: 8
hv2 longhorn pods: 9
hv3 longhorn pods: 5

The problem: stuck at 800 MHz from BD-PROCHOT

Back to the cluster. I already knew this hardware had a major issues. I had run these Dell 5060s for years under other Linux distros before (Debian, Ubuntu & Rocky) and hit it every time: the three control-plane nodes refused to clock above 800 MHz, not under load, not ever. Even with offical Dell power supplies as reported to work by some. So that’s to core of the issue, under a mutable OS that’s easy to remedy. But, I wanted to run the immutability of Talos, and I wasn’t about to buy new hardware to do it. The issue was of course bd-prochot, bidirectional PROCHOT. PROCHOT (“processor hot”) is normally the CPU throttling itself to avoid cooking; the bidirectional variant lets another component on the board assert it too. On these machines it was being held on, telling perfectly cool CPUs to throttle when nothing was actually wrong.

The fix is to reach into the CPU’s MSR_POWER_CTL register (MSR 0x1FC) and turn the bd-prochot bit off. Previously that was a one-line script, used several times across different iterations of my homelab. On Talos it is not simple, for two reasons: there is no shell to run it from, and the stock kernel does not even expose the interface, CONFIG_X86_MSR, the driver that hands userspace /dev/cpu/*/msr, was not built in. (It’s not included for obvious security reasons).

So the answer on this immutable host was not to “log in and poke a register” becasue I couldn’t. It was “produce an image where the capability exists, then do the poking in a declared, repeatable way in Kubernetes.” So, I built a custom Talos image with CONFIG_X86_MSR enabled, then ran a small DaemonSet, pinned to the control-plane nodes with a nodeSelector, that reads MSR 0x1FC on each core and clears the throttle. Here’s an example of the logs from the DaemonSet:

Target: MSR 0x1FC (MSR_POWER_CTL)
Node: bd-prochot-disabler-mclf2

Found 6 CPU(s) with MSR access

CPU 0: MSR 0x1FC = 0x00000000002C005C
         BD PROCHOT bit: DISABLED
         -> Already disabled, no action needed
...

Then, when deployed, it allows the scaling_max power to properly scale up to the processor limit rather than having a govenor keep it at 800MHz;

❯ talosctl -n xx.xx.xx.xx read /proc/cpuinfo | grep MHz
cpu MHz         : 2300.002
cpu MHz         : 2299.996
cpu MHz         : 2299.999
cpu MHz         : 2299.997
cpu MHz         : 2299.998
cpu MHz         : 2300.001

Building it and getting it running

Once that image was built and the DaemonSet was applied, the control plane finally ran at the speed I actually expected. The part I care about is what the fix looks like afterward. The kernel capability is baked into an image identified by its schematic, and the unthrottling is a Kubernetes object living in Git alongside everything else. There is no node where someone “fixed the clocks by hand,” and nothing to remember. If I add another Dell 5060 to the cluster it gets the fix for free. To make these nodes easy to spot, I even stamped the reported version as <talos-version>-msr, as you can see here:

❯ kubectl get nodes -o wide
NAME   STATUS     ROLES           AGE    VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE              KERNEL-VERSION   CONTAINER-RUNTIME
hv1    Ready      control-plane   157d   v1.34.1   xx.xx.xx.xx   <none>        Talos (v1.13.3-msr)   6.18.33-talos    containerd://2.2.4
hv2    Ready      control-plane   157d   v1.34.1   xx.xx.xx.xx   <none>        Talos (v1.13.3-msr)   6.18.33-talos    containerd://2.2.4
hv3    Ready      control-plane   157d   v1.34.1   xx.xx.xx.xx   <none>        Talos (v1.13.3-msr)   6.18.33-talos    containerd://2.2.4
hv4    Ready      <none>          141d   v1.34.1   xx.xx.xx.xx   <none>        Talos (v1.13.3)       6.18.33-talos    containerd://2.2.4
hv5    Ready      <none>          157d   v1.34.1   xx.xx.xx.xx   <none>        Talos (v1.13.3)       6.18.33-talos    containerd://2.2.4
hv6    Ready      <none>          157d   v1.34.1   xx.xx.xx.xx   <none>        Talos (v1.13.3)       6.18.33-talos    containerd://2.2.4
hv7    Ready      <none>          157d   v1.34.1   xx.xx.xx.xx   <none>        Talos (v1.13.3)       6.18.33-talos    containerd://2.2.4
hv8    Ready      <none>          156d   v1.34.1   xx.xx.xx.xx   <none>        Talos (v1.13.3)       6.18.33-talos    containerd://2.2.4

So we have a great pattern for the whole setup: the OS is YAML I keep in Git, applied through an API, the same thing on every node.

The icing on the cake: upgrades and security

As if running Kubernetes in a deterministic manner wasn’t good enough, there is icing on this cake: easy upgrades and built-in security.

Upgrades: For most of my career one of the hardest issues has been to orchestrate upgrades in a deterministic manner accross disparet environments and customers. It’s a large part of my job. Every long-lived box slowly drifts into a snowflake no one person fully understands, especially when multiple people may support it. Talos upgrades are atomic: you point a node at a new image, it reboots into it, and if there’s an issue, it rolls back. I recently took my custom control-plane nodes from 1.12.0 to 1.13.3, and it was uneventful, which is the highest compliment I can pay an upgrade… boring is the best (remember boring is good).

Here’s my honest take: One worker quietly did not like its update and rolled itself back to the previous version, and I didn’t catch it. This wasn’t the custom Talos I compiled… no, it was the official upstream version from Sidero Labs. The node just kept working, nothing screamed for attention. I only noticed later when I went to upgrade Kubernetes itself from 1.34.1 to 1.35.5 (then with plans to upgrade to 1.36.1) and that one node stood out from the rest. The fix was small: get it back onto the correct schematic ID so it pulled the right custom image, upgrade it, and then carry on with the Kubernetes upgrade. The lesson I took is the opposite of “Talos failed.” The system did the safe thing on its own, rolling back rather than booting something it did not trust, and because everything is declared, the correction was a one-line schematic change instead of a forensic dig and postmortem.

Security: I mostly get it for free. With no shell and no package manager, the attack surface I would normally spend an afternoon hardening is just the default. On top of that I run SELinux, encrypted filesystems, and the rest of the modern hardening checklist, the kind of thing that is a project on a traditional distro and a config field here. I am really fond of using ko and a secure Chainguard base for Go apps I’ve written. But that’s for another day.

None of this is frictionless, and I won’t pretend otherwise. Kubernetes can be difficult to secure and has many knobs to twist. Running a full Kubernetes environment with no shell is a real mindset change. The reflex of “SSH in and fix it” is gone, and you have to get creative in ways that feel strange at first. When I need to move a file from one container into another, the answer is not “scp it over,” it is a sidecar or a small sync container that performs the move as a declared in a job, a repeatable thing. Once it clicks it is genuinely better, because every workaround becomes a real artifact instead of something I did once at 2am and immediately forgot.

Where this is headed

My homelab proved the value of the immutable model to me, and the things that make Talos good on a closet full of recycled desktops are exactly the ones I want when real users depend on it: no drift, quick and atomic upgrades, a cluster I can rebuild from Git, and a security posture I get, mostly, for free.

So the next step is to evaluate running Talos in production for my large beekeeping management app. It is the workload I would most want this kind of foundation under, and a real test of whether what seemed clean in the lab holds up when it is not a “toy” cluster.

I’m trying to be realistic about what changes at that point. The caveats I listed above turn into operational requirements. Owning the control plane means owning the storage, etcd, backups, and availability myself, and “the cluster can be down for an evening” is no longer an acceptable failure mode. Uptime nines become real. The Kubernetes-only constraint forces a real decision before deployment on anything that is not already a container. And the no-shell debugging model has to understood, used, and second nature before an incident. That’s not something I can really take the time to learn during an outage.