Athena
Athena started as a project created by necessity. In short I needed a way to keep track of my games on a local Linux system.
This kinda just happens as a Linux user. You end up with applications running through various compatibility layers, maybe a launcher or two, official repos, 3rd-party packaging, and some random shell scripts holding the whole thing together. Here is the original commit. Dead simple, just list out the programs in a few directories and launch their respective script.
After moving to California in 2024 I learned that if I wanted to continue to use my VR system I would need to move it to the living room (of which there was no room for a desk\keyboard). Seeing the flaw in this plan I deployed Sunshine\Moonlight (remote desktop protocols that are fundamentally reverse engineered implementations of Geforce now so suitable for low-latency user applications) and eventually grew tired of the manual process of launching programs. If you are wondering why I did not just use a KVM solution...while these exist and I do have one as an escape hatch all the ones I am aware of are single monitor...my workstation currently has 4.
One thing led to another and eventually I was writing a simple ncurses application while on a call with some friends (a quite evening for me). Dabbling with text rendering in a CLI application on a low level. Written in Python because I needed something quick and performance was not a concern at the time. Of course the fact it was in Python is likely to one day require me to rewrite it, but this wasn't the immediate concern when this was built.
A shift in focus
I was informed I would be down for several months (expected 1.5 months...turned out closer to 3) for a medical procedure (all good nowadays). And I had also just gotten laid off. I would be going under in March and it was November at the time which meant not enough time to pick up a contract\new position, but too long to do nothing. So...I opted to instead build out Athena to be a full VDI application. 5 months away...including holidays and medical prep which made it closer to 4...seemed doable. I figured it would make my life much easier during recovery.
And the Steam Deck had released...which highlighted to me the importance of application streaming.
You know...completely normal things you do when bored. But...4 months to work on a project you are passionate about. Seemed like fun to me, at worst I get a good story. At best I build something cool. I would like to believe I got both here.
The problems with a project that outgrew its design
Athena during it's early implementation was very much a hacky mess. A single python file that would load a single json file that in turn executed shell scripts to open programs. There was no recovery...and the scripts were autogenerated based off a random set of scripts written over nearly a year. Granted...this is what happens when your personal project grows exponentially. For context this was the project at this point.
Additionally...some applications still required being added manually to various config files and hand written shell scripts. This meant a lot of manual effort to load that new game in and a lot of headaches if say a component crashed. Ex: an earlier implementation because it included started a remote VM would also shut it down when the stream closed...and one day Moonlight crashed which triggered a shutdown of the remote machine.
And there was no way to control or interact with remote machines outside of ssh (so Windows for example was a locked down box)...in addition to being Linux only due to hacky decisions made early on.
Most of this...was because all I originally designed was a launch system for a single shell script and I did not have the time to implement something more robust. The nature of building something for yourself.
So let me introduce you where we started; on the local we would have something like this:
start-remote-system;
copy-remote-start-script-to-machine;
open-moonight;
shutdown-remote-system;
Meanwhile on the remote we would get this after the copy:
launch-game
This fine in theory, but is incredibly tedious in practice...what are you going to do. Assume every system has ssh? Not going to work with Windows which as a Linux user who enjoys playing a few games that Wine\Proton cannot run well...meant that process was always a manual one on shutting down\selecting the program.
You can also clearly see why why if moonlight should crash (which it does sometimes...software breaks sometimes) then you immediately get a shutdown. You can bandaid this with a sleep command, but this is a hack.
This also means anytime you want to switch programs on the remote you are restarting the whole system. If you want to delay shutdowns or remember machine states you need to be directly running the startup of the machine and the shutdown because there is not a clean API across operating systems to get the system status from cold boot. Ping the IP guess what pings were disabled. Expect a specific service to be online brilliant now add that to every application and you're supposed to be keeping resources light as you're running games\demanding applications.
And how do you use these scripts with a remote API to spin up the machine\ready the application for something like streaming to a cell phone, tablet, or the steam deck? Do you just attempt to dynamically modify\remove the start command. That's insanely brittle. Only real solution is to move the launch of moonlight to be controlled directly by the application so that you can skip it and just run the corresponding setup and teardown.
The new plan
Effectively take what I had learned from my time using Athena...at one point in time I would use this daily so I knew what worked...and what didn't. Rewrite the autogenerated scripts to be proper library functions. Make things cross platform...remove the need for any manual setup. Add resilency so a dropped connection doesn't nuke the whole session. Add support for things I wanted, but couldn't justify the manual effort for on every application. Add things like a CLI and REST API to make integration\usage easier.
Removal of autogenerated scripts...overhaul of autogenerated configs
So rather than us running a shell script that modifies the config file containing all programs or even writing shell scripts in general it makes far more sense to instead define the type of system an application is and parameters to follow on the configuration. So for example let's say the key of "remote_client_type" is set to "moonlight" and the start_script\stop_script are set. Now if this entry is loaded I can determine what parameters to use as well as provide the option for tooling to add more externally.
For example here is a configuration entry for a game on a remote system:
"Stronghold 2": {
"script": "",
"asset": "./scripts/dist/remote/40960",
"ip": "lair.friedmicro-lab.org",
"live_check": "192.168.1.14",
"remote_client_type": "moonlight",
"os": "linux",
"skip_assets": false,
"athena_installed": true,
"skip_daemon": false,
"force_sync": false,
"moonlight_app": "Launch_Flatpak_Game",
"moonlight_machine": "shadow",
"start_script": "/usr/local/bin/wake-thick-client",
"stop_script": "/usr/local/bin/shutdown-thick-client",
"skip_stop_command": false,
"time_limit": true
},
By automatically generating configurations like this based upon the user settings we automate populating all known programs and do not have to fallback to shell scripts. In fact if I did not insist my machine always be off and VMs be used these scripts would not even be needed as they are optional keys.
And the asset can be platform specific. For example the above is a remote Linux machine, the below is a Windows machine:
"SimCity Societies Destinations": {
"script": "",
"asset": "./scripts/assets/SimCity Societies Destinations.bat",
"ip": "192.168.1.27",
"live_check": "192.168.1.14",
"remote_client_type": "moonlight",
"os": "windows",
"skip_assets": false,
"athena_installed": true,
"skip_daemon": false,
"force_sync": false,
"moonlight_app": "Desktop",
"moonlight_machine": "Shadow-Virtual",
"start_script": "/usr/local/bin/windows-vm",
"stop_script": "/usr/local/bin/windows-vm-shutdown",
"skip_stop_command": true,
"time_limit": true
},
You may notice the IP and start\stop scripts as well as assets are different. This is because the following is a Windows VM.
And we can extend this for additional usecases. For example I enjoy playing an old Android game I have had since my teens. As a result I have waydroid installed so here is the entry for that:
"Star Traders": {
"layer": "waydroid",
"script": "",
"asset": "com.corytrese.games.startraderselite",
"time_limit": true
},
But nothing is perfect and the old script system is sometimes the best solution. For example my partner and I enjoy playing Minecraft. We have a VM that migrates between two servers; it is on a lower performing server when we are not playing and it migrates to my machine in the living (I connect into a different VM running on the same box). This is an edge case ad doesn't make much sense to build an entire block of code for:
"FTB": {
"script": "./scripts/dist/local/ftb",
"live_check": "192.168.1.75",
"ip": "192.168.1.75",
"time_limit": false
},
Effectively I added code that generates entries for the config by scanning the local system and remote systems. The code block for the curious is here. This largely uses the user configuration located here.
Scanning for programs does get very complicated though as Athena supports: Windows, Linux, and MacOS (my partner uses a Mac as her main machine) remote\local systems. So sometimes the system calls (or paths in the case of Windows) between these vary wildly.
Creating the remote daemon
It's amazing the security concerns you run into when writing a daemon who's entire purpose is to execute commands on a machine. For the curious this is a lot like guest tools like what you would find in a virtual machine. Just with additional security.
To be specific. Data to and from the machine is encrypted with a key that is provided at the first client start then copied to the remote by the user. Could this be automated yes, but this being a manual process really reduces the attack surface as well as the time I have to spend on it.
Overall implementation wasn't too difficult. Windows systems can make this a scheduled task or convert the process to a service using 3rd party utilities. Linux systems can just add a systemd service (or openrc I do not judge...but not sure that works well with sunshine).
Effectively what we needed to do was: handle start commands for the machine, handle stop commands for the machine, copy over the start script for the application, and scan the system for applications to be sent back to the client.
Start and stop are easy. Just wait for the command on the TCP socket and execute a shell or bat script depending upon the OS. For assets you have to scan specific directories based upon the OS\user configuration (this is configurable for obvious reasons)...you then just send them back over the wire.
As an aside...many launchers of games on Windows don't actually allow you to know when the program has closed. This is a issue Sunshine has had for a long time where Moonlight cannot know that the remote application has finished execution because game launcher is still running. As much of a hack as it is I just execute a full reboot of the VM when disconnecting from the game as that simplifies things, but this is personal configuration and some might be fine just closing it manually. The stop commands in my configuration are mainly used to take system backups.
You might be asking yourself...why not integrate this directly into Moonlight\Sunshine Lucia? You know C\C++ you are fully capable of that right? Well yes...but that's not a great decision. This kinda system overlaps heavilly with what one would see in a commercial environment and I was skeptical the changes would be accepted upstream. Even if they were accepted upstream then it would need to be adopted by not only Sunshine, but also Apollo and other forks people use. A seperate daemon is simpler and less administrative overhead.
Creating a usable API
I wanted anyone to be able to integrate their own tooling with the configuration provided by Athena. As a result I ended up making a library effectively. There are 3 applications which use this library: the REST API (for remotely starting\stopping systems and preparing applications to stream), the CLI (for quick uses, ex: athena start "Noita" will start Noita on a remote machine for me), and the original ncurses interface which also works on Windows (there is a 3rd party library that addresses some quirks on Windows with ncurses).
This is also where I stopped. By this point I was less than 1 month from my procedure and wanted to instead focus on polish prior to going under.
Managing the machine state
This was a strange one. Because there isn't an API to cleanly determine a remote bare metal system on consumer hardware I was left scratching my head. Ultimately all I could think of was that I should record the machine state so that I could do delayed shutdowns (in case I decided to run a different program). Seemed simple enough.
For good measure I also added the ability to remove a system from state tracking just in the event the system malfunctioned or for development purposes. Ex: a flaky cable dropping large amounts of UDP (Sunshine\Moonlight) traffic, but TCP (Athena) is still able to get through. This let's you fix the cable rather than just trigger a shutdown.
Lessons learned
In hindsight and had I had the time I would have completely rewritten Athena into something like Rust or at a minimum went with the approach of multiple languages in the project. The daemon being in python is suboptimal from a performance perspective especially when transferring know applications\scanning a remote system. The local process that keeps the machine state while fairly light is not a great choice for a system component that runs 24/7. I also wanted to add a UI, much to my surprise the state of UIs in Python is still not great.
That said...I won't be too hard on myself for this one. This was a tool written quickly that ended up evolving to be a fairly complex system and the time I had to rework it was very limited. While it's not perfect at least it exists today. Long term I will likely do at least partial rewrites of athena into Rust using native python modules and full rust binaries where it makes sense.
The state today
Athena is a fully working application that I use on a regular basis to run programs on machines across my server cluster. It's almost entirely automated and relatively bug free in my uses of it. If a remote connection closes I can remove the machine from tracking and debug it. I have a few friends who have reviewed and adopted portions of either the design or the code entirely. It also did it's job. It provided a very easy to use remote interface for me when I was recovering.