- C# 99.1%
- Shell 0.3%
- Dockerfile 0.3%
- Makefile 0.2%
Merges all Phase 4 work (M19–M24, M20.1, M21.1): - Full output formatter system (--output human|json|xml|plaintext|md on all 27 commands) - Batch pipeline sweep (bench pipeline sweep) - Configurable discovery authors (bench discover authors) - Pre-flight doctor (bench doctor) - Agentic score granularity (weighted 0-100 composite score) - Results export (bench export tps/agentic, CSV format) - 317 → 436 tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |
||
|---|---|---|
| .config | ||
| .github | ||
| docs/work | ||
| src/Thunder.LlmBench | ||
| .env.example | ||
| .gitignore | ||
| bench.db | ||
| Caddyfile.snippet | ||
| check-system.sh | ||
| CLAUDE.md | ||
| docker-compose.yaml | ||
| HARDWARE-CBA.md | ||
| llamacpp.Dockerfile | ||
| Makefile | ||
| README.md | ||
| seed.db | ||
Thunder LLM - Self-Hosted LLM Setup
Self-hosted LLM using Ollama with NVIDIA GPU support and Open WebUI.
Architecture
GPU Machine (this machine):
- Ollama: LLM inference engine with GPU acceleration
Web Server (thundersizzle.tech):
- Open WebUI: Web interface for interacting with LLMs
- Caddy: Reverse proxy for HTTPS access
This separation ensures only the GPU-intensive Ollama runs on the GPU machine, while the lightweight web interface runs with your other web services.
Prerequisites
On GPU Machine (this machine):
-
Docker and Docker Compose installed
-
NVIDIA GPU drivers installed
-
NVIDIA Container Toolkit installed
# For Ubuntu/Debian: distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit sudo systemctl restart docker -
Verify GPU is accessible in Docker:
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
On Web Server:
- Docker and Docker Compose
- Caddy configured (already set up in thundersizzle.tech)
- Network connectivity to GPU machine on port 11434
Setup
1. GPU Machine Setup
Find your GPU machine's local IP:
ip addr show | grep "inet " | grep -v 127.0.0.1
# Or: hostname -I
Start Ollama:
cd /home/james/Thunder/code/thunder-llm
docker compose up -d
Verify GPU is being used:
docker exec ollama nvidia-smi
Download models:
# Download a smaller model (7B)
docker exec -it ollama ollama pull llama3.2
# Or a larger model (70B) if you have enough VRAM
docker exec -it ollama ollama pull llama3.1:70b
# Popular alternatives:
docker exec -it ollama ollama pull mistral
docker exec -it ollama ollama pull codellama
docker exec -it ollama ollama pull deepseek-coder
2. Web Server Setup
Configure environment on web server:
Edit /home/james/Thunder/code/thundersizzle.tech/.env and add:
# GPU Machine IP (replace with your actual IP)
GPU_MACHINE_IP=192.168.1.100
Add Caddy configuration:
Add the configuration from Caddyfile.snippet to your Caddy config in thundersizzle.tech/caddy/Caddyfile.
Start Open WebUI:
cd /home/james/Thunder/code/thundersizzle.tech
docker compose up -d open-webui
docker compose restart caddy # Reload Caddy config
Test connectivity:
# From web server, verify you can reach Ollama on GPU machine
curl http://<GPU_MACHINE_IP>:11434/api/tags
Usage
Access
- Web UI: https://llm.thundersizzle.tech
- Ollama API (from web server): http://<GPU_MACHINE_IP>:11434
First-time Setup
- Open https://llm.thundersizzle.tech
- Create an admin account (first user becomes admin)
- Select a model from the dropdown
- Start chatting!
Security Notes
- Authentication is enabled by default (
WEBUI_AUTH=true) - First registered user becomes the admin
- Ollama API is only exposed to your local network (0.0.0.0:11434)
- Consider using firewall rules to restrict Ollama access to only your web server IP:
# On GPU machine sudo ufw allow from <WEB_SERVER_IP> to any port 11434
Monitoring
GPU Machine:
Check Ollama status:
docker compose ps
docker compose logs ollama
Check GPU usage:
watch -n 1 nvidia-smi
Check disk usage:
docker system df
du -sh /var/lib/docker/volumes/thunder-llm_*
Web Server:
Check Open WebUI status:
cd /home/james/Thunder/code/thundersizzle.tech
docker compose logs open-webui
Maintenance
Update images:
GPU Machine:
cd /home/james/Thunder/code/thunder-llm
docker compose pull
docker compose up -d
Web Server:
cd /home/james/Thunder/code/thundersizzle.tech
docker compose pull open-webui
docker compose up -d open-webui
Backup data:
GPU Machine (models):
docker compose down
sudo tar -czf llm-backup-$(date +%Y%m%d).tar.gz /var/lib/docker/volumes/thunder-llm_*
docker compose up -d
Web Server (user data):
cd /home/james/Thunder/code/thundersizzle.tech
docker compose down open-webui
sudo tar -czf open-webui-backup-$(date +%Y%m%d).tar.gz /var/lib/docker/volumes/thundersizzle.tech_open_webui_data
docker compose up -d open-webui
Remove unused models:
docker exec -it ollama ollama list
docker exec -it ollama ollama rm model-name
Troubleshooting
GPU not detected:
- Verify NVIDIA drivers:
nvidia-smi - Verify Container Toolkit:
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi - Check Docker daemon has GPU support:
docker info | grep -i runtime
Can't access web UI:
- Verify Open WebUI is running on web server:
docker compose ps open-webui - Check Caddy configuration includes llm.thundersizzle.tech
- Verify DNS points to your web server
- Test Caddy:
docker compose exec caddy caddy validate --config /etc/caddy/Caddyfile
Open WebUI can't connect to Ollama:
- Verify Ollama is running on GPU machine:
docker ps | grep ollama - Test connection from web server:
curl http://<GPU_MACHINE_IP>:11434/api/tags - Check GPU_MACHINE_IP in thundersizzle.tech/.env is correct
- Verify firewall allows web server to access GPU machine port 11434
Out of memory:
- Use smaller models
- Limit concurrent requests
- Check available VRAM:
nvidia-smi
Slow performance:
- Ensure GPU is being used (check nvidia-smi during inference)
- Check model size vs available VRAM
- Monitor CPU/RAM usage with
htop