Welcome to My Homelab

Hey there! I'm Cody Blevins, and this is my little corner of the internet where I share projects, experiments, and learnings from my homelab. What started as "I wonder if I can run LLMs locally" has evolved into a full-blown datacenter in my partially finished basement/game room.

I've been tinkering with technology for as long as I can remember, and this site is where I document the interesting stuff I'm building:

Radeon GPU Inference - Running LLMs locally on AMD hardware using ROCm
AI Agent Systems - Building and orchestrating autonomous AI agents
Kubernetes Everything - Because why run one container when you can orchestrate hundreds?
Fun Side Projects - Weekend hacks that sometimes turn into real tools

The Philosophy

My day job involves a lot of integration architecture and cloud infrastructure. The homelab lets me experiment with cutting-edge tech without the constraints of production environments (or cloud bills that make your CFO cry). There's something deeply satisfying about running your own infrastructure—knowing exactly where your data lives and having full control over the stack.

Plus, let's be honest: repurposing old gaming rigs into GPU compute nodes is just fun.

The Hardware Zoo

Over time I've accumulated what I affectionately call "the zoo"—a collection of machines that spans enterprise iron to retired gaming PCs, all working together in surprising harmony.

The Backbone: Dell R730xd

The heart of the operation is a Dell PowerEdge R730xd running Harvester HCI. This beast handles all my virtualization and provides distributed storage via Longhorn. It's not the quietest machine (good thing it's in the basement), but it's rock solid and gives me the flexibility to spin up VMs on demand.

From this single server, I run:

3 K3s control plane nodes (HA Kubernetes, because I've been burned before)
Multiple worker VMs for general compute
Storage pools that back the entire cluster

The Power Duo: AMD Radeon RX 7900 XTX (x2)

The crown jewels of the lab. I run two of these cards, but splitting them into specialized roles has proven more effective than a single massive cluster:

Node 1 (cblevins-7900xtx): Dedicated to Text Inference via LlamaCPP. This card runs Qwen2.5-7B with speculative decoding and Nemotron-8B for agent orchestration.
Node 2 (cblevins-5930k): Dedicated to Image/Video Generation via ComfyUI. This card handles heavy flux/wan workloads (like WanVideo 2.1) without impacting chat responsiveness.

The Legacy Branch

NVIDIA GTX 980 Ti - The old guard (cblevins-gtx980ti). Currently in a "break glass in case of emergency" state (node status: down), but remains in the cluster for legacy CUDA workloads that simply refuse to run on ROCm.

The Edge

Raspberry Pi 5 - Handling lightweight DNS and monitoring tasks. It's the silent observer of the chaos.

The Gaming PC Graveyard

Several machines in my cluster are former gaming rigs. There's poetry in watching an i7-5930K that used to render The Witcher 3's landscapes now serving LLM inference requests.

The Software Stack

All of this hardware is orchestrated through a GitOps workflow using FluxCD. Everything is defined in code, version controlled, and automatically reconciled.

Kubernetes (K3s)

THE cluster runs K3s, with the control plane spread across three VMs (k3s-cp-1, k3s-cp-2, k3s-cp-3) and workers on both VMs (k3s-w-4 through k3s-w-7) and bare-metal GPU nodes.

All configuration lives in the platform/gitops repo. Here's a peek at the structure:

platform/gitops/
├── clusters/
│   └── home-cluster/     # Flux definitions
├── k3s/
│   ├── ai/
│   │   ├── litellm/      # Model routing
│   │   ├── llamacpp/     # Text inference
│   │   └── comfyui/      # Image/Video gen
│   ├── apps/             # Web services
│   └── infra/            # Cert-manager, monitoring
└── Dockerfiles/          # Custom images

This "everything as code" approach means I can rebuild the entire cluster from scratch just by bootstrapping Flux.

AI/ML Stack

LlamaCPP - The daily driver for text. Speculative decoding on RDNA3 has been a game changer.
ComfyUI - Running WanVideo and Flux for high-fidelity media generation.
LiteLLM - The universal adapter that makes all my local models look like OpenAI API endpoints to my applications.
vLLM - (Currently mothballed) - Great for high throughput, but LlamaCPP's GGUF efficiency won out for personal use.

Storage

Longhorn provides distributed block storage across the cluster. It's not the fastest, but the redundancy and Kubernetes-native integration make it worth the trade-off. Important data gets replicated, experiments get ephemeral storage.

What's Next?

I'll be posting deep dives into specific projects—expect tutorials on getting local LLM inference humming on AMD GPUs, building AI agents that actually do useful things, and the occasional war story about what happens when you let Kubernetes autoscaling get too ambitious.

Topics on the horizon:

Speculative Decoding: Why 2 models are faster than 1
Building a multi-model inference router
The great migration: moving from Docker Compose to Kubernetes
Why Harvester might be the best hypervisor you've never heard of

Stay tuned, and feel free to reach out if you want to chat about any of this stuff. There's something special about the homelab community—we're all just tinkerers at heart, trying to build cool things with whatever hardware we can get our hands on.