From K3s Alpha to RKE2 Delta

Published 18 May 2026

In my earlier post, Homelab Infrastructure Overview (2026), I described the broader shape of the platform and noted that K3s was still current, with RKE2 as the likely next step.

That migration has now happened.

This is a short engineering update on what changed moving from the old K3s alpha cluster to the new RKE2 delta cluster, and why the main improvement was not just the distribution itself, but the sharper separation between platform infrastructure and application workloads.

Why I Moved from K3s Alpha to RKE2 Delta

K3s did exactly what I needed for a long time. It was lightweight, practical, and got the platform a long way. I do not see that as a failed choice. It was the right tool at an earlier stage.

The problem was not that K3s stopped working. The environment simply outgrew a lighter-weight setup.

As the platform started hosting more real services, I wanted something closer to the sort of Kubernetes environment I would expect to operate in a more enterprise-aligned context. Part of that was career and learning value, but part of it was also operational fit. RKE2 has a stronger security stance out of the box, and it fits better with the way I increasingly want the homelab to behave: deliberate, repeatable, and closer to a real platform.

delta was built as a fresh cluster rather than an in-place conversion. I used a more deliberate build model based on infrastructure as code: bootstrap project, Proxmox template VMs, Terraform, and Ansible. The point was not just to get one new cluster running, but to make it easier to tear down, rebuild, and eventually reproduce as an equivalent test environment.

The migration itself was live and phased. Applications moved gradually from alpha to delta rather than through one large cutover. Because I only have one public IP and can only forward 80/443 to one cluster or the other, that initially meant routing traffic through the old cluster while workloads were being moved across. Once everything had been migrated, I could switch ingress and change the port forwards over to delta. That part was awkward and did cause some outages at the time, but it was still the practical trade-off for a live migration with limited edge flexibility.

This was less a Kubernetes migration in isolation and more a platform cleanup.

How the GitOps Structure Changed

The biggest architectural improvement was cleaning up the boundaries in the GitOps repository.

The repo is now split along a fairly strict operational line: shared platform components on one side, cluster-local application workloads on the other. In practice, infrastructure/ contains the capabilities that make the cluster function as a platform. That includes Cilium, ingress, Cloudflare tunnel integration, external-dns, cert-manager, trust-manager, Vault, Vault Secrets Operator, CrowdSec, monitoring, NFS-backed storage provisioning, Velero, and the NVIDIA GPU operator.

By contrast, clusters/delta/apps/ contains the services the cluster is there to run: namespaces, deployments, services, ingresses, PVCs, and app-local secrets for workloads such as Authentik, Immich, Paperless, Gitea, Harbor, and my Promethix-hosted sites.

Flux reconciles infrastructure first and applications second, which makes the dependency chain explicit. If something exists to make the cluster work, it belongs in infrastructure. If it exists because the cluster is hosting it, it belongs under apps.

That separation also makes changes safer by reducing the temptation to let shared platform concerns and application-specific concerns bleed into each other.

Security Improvements with CrowdSec and Cilium Policies

Security also became more explicit as part of the move.

In the earlier shape of the environment, a lot of security posture was implied by the wider network design, VLAN separation, and edge controls. That still matters, but as the environment started hosting more real services, security controls and traffic boundaries needed to become more intentional inside the cluster as well.

CrowdSec now plays a meaningful role in that model. At the moment, it is primarily used as an ingress protection layer rather than a general in-cluster service control plane. It ingests logs from the external Traefik ingress tier, analyses them using the Traefik collection, and exposes decisions through the CrowdSec API. Those decisions are then enforced by a Traefik bouncer middleware attached to public-facing routes.

On network policy, the cluster is currently in a selective hardening phase rather than a universal default-deny model. I am using Cilium-backed policy where it adds immediate value, particularly around edge-facing services and a few more sensitive internal paths, but I have not pushed the whole cluster into a blanket namespace-by-namespace default-deny posture.

Even so, workload communication is becoming more intentional, and east-west traffic is starting to be constrained where it matters rather than assumed safe by default.

Why I Replaced MetalLB with the Cilium Load Balancer

Another part of the cleanup was load balancing. alpha used MetalLB. delta does not. Before application migration started, I set up Cilium load-balancer capability so that the new cluster could take on that role directly.

I do not see that as technology churn for its own sake. Moving load balancing into Cilium reduced one more layer of separate operational concern. Cilium was already doing important networking and security work in the cluster, so extending that responsibility into service load balancing made the overall stack simpler and more coherent.

The value was not “replace X because Y is newer”. The value was consolidation. Fewer overlapping components means fewer moving parts to manage, fewer boundaries to debug, and a clearer answer to the question of which layer owns what.

In a platform that now hosts live services, simplification matters.

Direction of the Platform

The move from K3s alpha to RKE2 delta says something broader about the direction of the platform.

It is becoming less of a collection of useful tools and more of an integrated platform with clearer responsibility boundaries. Shared services are treated more like platform capabilities. Application workloads are kept local unless there is a real reason to promote them into shared infrastructure. Security controls are more explicit. Networking responsibilities are more consolidated. The whole stack is easier to rebuild, reason about, and evolve.

K3s was valuable and got me a long way. RKE2 is simply a better fit for where the environment is now. The main improvement was not just the distribution itself, but the cleaner operational model around it. That is what made delta feel like a real step forward rather than just a distro swap.