← Back to Demos
Model Gallery
A live registry of Large Language Models currently hosted on my homelab cluster. These models are served via LiteLLM and accelerated by AMD Radeon GPUs.
Active Deployments
llamacpp-qwen3-vl8bAMD 7900 XTX
qwen3-vl-8b
Qwen3-VL-8B-Instruct abliterated v2.0 GGUF - Vision/OCR model for data pipeline
Aliases
tei-embeddingsCPU
text-embedding-3-small
Hosted model.
Aliases
llamacpp-nemotronAMD 7900 XTX
nemotron-orchestrator
Nemotron-Orchestrator-8B (Agent Model) on LlamaCPP (Shared Node)
Aliases
llamacpp-qwen2p5-7b-specAMD 7900 XTX
qwen2.5-7b-spec
Qwen2.5-7B-Instruct-abliterated with speculative decoding (0.5B draft) - ~2x faster
Aliases