Lucia Smith: Personal Website

Talos\K8S In a Homelab

I'm going to be approaching from a bit of a different perspective. I know how to do what k8s does in k8s with operators, kubelets, etc. And I understand Linux systems very well so configuring them is not an issue. So that makes this an interesting choice on things to deploy because it wasn't for lack of skill. Rather this was done to see how immutable operating systems geared towards K8S work, their challenges, their strengths...and to save me a small amount of time on the initial setup.

I effectively chose to install Talos on my own home lab on a few VMs because I saw it to be an opportunity to move some stateless workloads from VMs\containers to something that was distributed\would have less downtime.

What prompted this

For the curious you can see what inspired the entire home lab rebuild. But the short here is the resource nodes we had were very poorly provisioned, and I wanted to deploy more services on the network. The problem is...well one was TrueNAS Scale so not proper provisioning support and just the general process of finding a machine with free capacity was getting very tedious.

A few general notes on K8s

I personally believe that most deployments of k8s are insanely overengineered and very brittle due to the expectation that workloads be able to be stopped and restarted at will. Sometimes a bare metal install, VM, or even just the docker container itself is actually the best solution. Effectively individuals often deploy a solution designed by Google (K8s predecessor is a system called Borg developed by Google) without the respective scale to benefit from it.

That said...I have seen beautiful and brilliantly engineered solution using K8s as part of a distributed system. The tooling for deploying new workloads is very mature and very powerful when it comes to determining the nodes you want to deploy on\how many resources you want to allocate. The problem you run into when you try to convert things to run in k8s is you have to be extremely mindful of application state because things can and will restart during upgrades. A node will fail causing all workloads to restart\migrate.

It is a tool like everything with a specific place and use case. From my perspective if you are running a few stateless workloads with no I\O or limited I\O then you might as well make use of it. Or if you are willing to deploy the correct infra to make use of more I\O heavy workloads.

Why I went with a K8s deployment

For my situation...my partner and I have a collection of consumer hardware from old PCs, eBay, and old machines of my partner. So we can't get away with just sticking everything on one machine and deploying virtual machines for everything across the network does not make sense.

I've been a programmer since I was 14...about 17 years ago today. Most of my time has been spent writing applications to run for the small business and enterprise space so much of my experience is stateless software or software that is at most backed by a DB. Also, candidly speaking when I install a software program tend to have a very specific want and I find programming fun, so I will often write exactly what I want. So as a result I have an insane degree of control over the software I run on my home lab and a nontrivial portion of it is my own built over a decade so I can just make my software work in k8s.

Additionally...I have had a self-hosted git repo for around 5 years. Sure this means that sometimes my projects never show in my GitHub commit history, but that seems like a small price to pay to have everything local for speed\reliability. When I deployed this originally I also deployed a single VM to handle builds which was...not ideal because I often jump across projects, languages, etc. Additionally, any time my build box would break my ability to do work would slow to a crawl, so I eventually stopped using it. With a K8s cluster I can simply spin off however many commits I want to make and the build jobs themselves will offload to the cluster. Sure... I would have to write some code to do this, but did a full write-up on it here.

Lastly...deploying a new service (or microservice architecture as I often do) onto k8s thanks to the automated resource management is a lot easier than it would be to do so with VMs and a lot less wasteful. As a result I am able to deploy more projects than I otherwise would because it only takes me minutes to add a new service to the cluster nowadays as opposed to when I would have to figure out space on VMs.

General challenges with porting home lab services to K8s

I feel a lot of the reason people don't run k8s on their home environments in the knowledge required, but the general home lab ecosystem doesn't exactly help this. Going to write up some interesting notes for a few people here.

Most home lab projects can run in docker (cool), but a few have ports hardcoded at 80 which is not ideal for k8s as that will require a root user. So your option here is to make it work (please don't) in K8s or to modify the image (not great as you're now maintaining that our of tree or packaging it yourself). And what's deeply frustrating as it's often not that much work to change it to something else. A configurable port is the preferred solution here or at a minimum pick something above 1024.

Other projects make use of SQLite which is great for a local system, but does not work well with a networked file system (SQLite has a section on how to corrupt the database that calls NFS out in particular here) and yes longhorn isn't that much better this is an issue with networked\distributed file systems in general when multiple writes need to occur to a single asset. And while you can specify a local volume this removes a lot of the strengths of K8s on moving workloads. A number of open source projects do support an external database which goes a long way towards solving this, but some do not.

The last issue is reliability and resiliency. Sometimes services just do not handle being restarted very well so they will lose work or lose state. On a physical node or we expected this...on a VM ok still not great, but you planned that. On a K8s node though by design the workload can be restarted at any point for upgrades or migrations to another node as part of server upgrades so this little design oversights very quickly become large oversights.

A few notes on Talos

After running Talos for a while. I can say a few things which are both positive and negative.

Documentation while it exists has gaps so it very much expects you to understand k8s very well, it is insanely opinionated, and there are gaps in modifying the installation once you have installed the system. It is effectively a k8s application so the minute you need to modify it low level enough to do things that are not supported you really can't.

Performance is excellent due to the stripped down installation and rewritten system components. Installing and reinstalling new nodes is a breeze even without things like PXE booting on the network. If all you need is a basic k8s system it is perfect.

The high level deployment today

We are running Talos with a 3 node control plane (VMs) and 3 worker nodes (2 VMs, 1 bare metal). This is enough for us to run upgrades without taking out network services if replicas are set...or at a minium having low downtime if they are not.

The storage issue

Talos treats nodes with large disks as an antipattern and because it does not really give you easy access into the file system a backup of data on something like Longhorn would be needlessly complicated. While I understand Longhorn is supported and documented I would like to not go against the design pattern of the system in use.

I really wanted to deploy iSCSI onto the network. I wanted to have block storage for my IO intensive workloads. Configuring this in TrueNAS Scale was very simple and Talos does provide an addition for iSCSI, but it does not really have a clean way to add this to existing nodes (as of this writing). So if you want iSCSI on the network you effectively need to deploy it to start with or redeploy it. Candidly speaking Jellyfin and Kiwix would be infinitely better in a deployment with iSCSI because they are fairly I\O heavy.

Other storage mediums like an SMB drive are just a bad idea for a list of reasons. Mainly the lack of file attributes causing issues.

Inevitably the best solution to the problem was NFS for my home network, but this meant I had to be extremely mindful of what I deployed. And for the record I forced NFSv4 which is a much more mature implementation with respect to file locking. From my perspective so long as the application did not have an SQLite database it was probably going to be fine.

Security concerns

When deploying NFS what I did is whitelist based upon the IPs of the K8S worker nodes. In theory this is great, but in practice it introduces some concerns as if somehow you managed to compromise a container and escape it you could make a jump to the NFS share. There are several ways to view this...either a known security limitation or with more complex configuration. I went with the prior.

The K8S cluster is also the home for workloads that are expected to be exposed\forwarded to the internet through a reverse proxy so that cluster is in a lower trust network. Even if you did compromise the cluster you would not be able to get into most of the network as this is fairly isolated.

As a result of this extremely sensitive data really can't be deployed onto the cluster in NFS. Which is completely fine as we have Proxmox system, orchestration for them, and solid tooling\caching for these systems...but it is something that has to be considered.

Migrating workloads never designed for K8S

To likely no one's surprise most home lab services are not designed to be written in K8S. And a lot of my older software was never designed for it.

Some of these were easy to migrate such as for example an ancient Wallpaper service I wrote years ago for returning a random wallpaper to a client machine. This was because I had a folder of around 1000 images that I just did not want to have locally on every machine...so I wrote a program for it. Also yes it's in JS...this was written years ago for a single user (me) while I would like to rewrite it one day it gets around a lot of performance issues by storing information about the images to the file system. Which also...really complicated things... I could have added NFS mounts for this, but ultimately determined that just storing the computed image information (generated as part of a docker build) and a small quantity of images (I have a self-hosted internal docker repo) was easy enough.

A silly service I wrote years ago was a service to keep track of items in a storage system in minecraft. Yeah, I'm not joking, wrote a small amount of Lua for the mod ComputerCraft just uses the http module from what can be read in game. This service is completely stateless because it just blindly receives whatever inventory is presented by the system at the time. This was one was as easy as just giving it a docker container and deploying that.

Migrating Gitea

I have used a self-hosted Gitea repo for years. This was originally deployed back when I was writing algorithms to analyze stock market trends. Let's just say the data was useful. The repo had later grown to include my partner, and I's code for the network, so things like provisioning and upgrades. My partner as part of an AI project had also made an AI agent that was using memfs so all of a sudden a failure in Gitea meant a failure on the memory file system for an AI agent that we use and is sitting in our Discord...yeah ok was not ideal.

Something you have to know about Gitea...it can use next to no resources, or it can use an insane amount of resources depending largely upon the repos deployed. I wanted to start the process of keeping local mirrors for things like QEMU, the Linux kernel, K8S, Mesa...things I have had to check the source code of over the years. This was partly in the wake of GitHub having reliability issues, some claims online at this time reported below 90%. These repos are also massive so cloning them or viewing them online is suboptimal at best...to fully work with them, you really need local copies.

This is not just theoretically btw. I had had to read through the Linux kernel code to narrow down changes to virtual NIC drivers, USB issues (including one issue where a capture card crashed the USB driver), VR headset support (unofficial Bigscreen Beyond VR patch was out of tree at this time), and sound system issues.

On QEMU, I've had to dig into why a system deadlocked entirely or why the USB pass-through is so suboptimal (advice don't do it...just pass in a PCIe USB card and call it a day). Or my favorite when the flaky nature of hardware pass-through just fails to work like you expect.

K8s at one point had a bug known as ghost pods where resources where remain as pending after they had been deleted. The issue was more common in clusters with a large scale and this was something we did see at NCR during my time there. We ended up writing code to work around it, but it did require my colleague and I to dig into the source code.

And I run a custom Mesa build on my desktop because it's more performant. That's just it...no other reasons. I build the package as a modification of the upstream Arch packaging.

So due to NFS we couldn't really use the SQLite database and because I wanted local mirrors of the above projects I really needed a dedicated MySQL database which fortunately was dead simple to deploy on NixOS. Of course this application was never designed for K8S so if you kill it while running the error recovery is not graceful (things like corrupt repos when creating new mirrors) as a result you really can't use a readiness or liveliness probe here. And because the workloads can spike to require multiple full CPU cores and multiple gigs of memory (ex the Linux kernel size) you really need to have a very high max system resource usage that is multitudes higher than the minimum that is closer to the typical use in normal applications.

Deploying Nextcloud

My partner and I keep all of our shared documents and things like collaborative editing in Nextcloud. It works insanely well for us for that purpose because it allows local collaborative document editing and allows us to keep financial data understandably private.

Nextcloud while it was not designed for this...as soon as you do not add readiness and liveliness works completely fine. Ended up using the linuxserver.io project. In the case of Nextcloud this works by populating a storage volume with the contents of a sidecar container. Assuming the storage volume has been written to then this is perfect, but if the process is interrupted...well then the container has no way to know and the entire installation fails.

This is also why we're using NFS, the attributes of files are properly supported and there is minimal file locking at play which would impact performance.

For the database we're making use of the same MySQL instance across the network for all services (Gitea, Piwigo, Nextcloud).

Fortunately Collabora for collaborative document editing was relatively easy to deploy and very well documented on how to run in K8S. So it was relatively easy to apply their example for Minikube to a full k8s setup.

Upgrade edge case

One small issue with upgrading a K8s cluster...is when the CI\CD to upgrade the cluster lives on that same cluster. Typically, a CI job does not like being stopped, so how do we handle the guarantee that our upgrader for the cluster will be rebooted on us? I write up more about this on the CI runner, but the short is we need to capture the state of what upgrade segments have run, write them to a persistent storage (or a DB), and restore the run at that state.

Additionally, Talos will always reboot the node if you apply a Talos image even if it is the same image. And since this upgrade binary is running on the same K8s cluster that we would be upgrading we need to not always apply a new Talos image as otherwise we will reboot every cluster node anytime the CI runs for the home lab provisioning. To prevent Talos updating unnecessarily what we have to do is check the current image version, determine if that version is currently deployed, and deploy the new version if it differs.

Future plans

I will likely retire Talos one of these days. I find the issues with getting into the OS to be a very significant problem and the value I get with it from being an appliance fairly low. I would like to have Longhorn in place on some nodes as this would move some application workloads away from being a single point of failure as the file system will be distributed.

Granted... I also say I am going to retire my TrueNAS Scale box one of these days, but it is still running. It's largely a...do I have the time\reason to move to something different. Right now what is in place works and the benefits of an appliance seem to be outweighing the downsides of rolling my own.

After proper block storage is on the network I will likely migrate Kiwix to block storage\k8s as well as Jellyfin.

I may also retire some older VMs from Proxmox and move them to kubevirt, but that will have to wait until I have better networking as much of the network caps out at 2.5gb. Will likely consider this at 10gb networking.

My Professional Story

Projects (Day Job)

Projects (Spare time)

Patents \ Accomplishments

Personal