Self-hosted LLM using Ollama with NVIDIA GPU support and Open WebUI.
  • C# 99.1%
  • Shell 0.3%
  • Dockerfile 0.3%
  • Makefile 0.2%
Find a file
Thunder 857cf29151 release: v0.4.0 — Phase 4 Automation & Output
Merges all Phase 4 work (M19–M24, M20.1, M21.1):
- Full output formatter system (--output human|json|xml|plaintext|md on all 27 commands)
- Batch pipeline sweep (bench pipeline sweep)
- Configurable discovery authors (bench discover authors)
- Pre-flight doctor (bench doctor)
- Agentic score granularity (weighted 0-100 composite score)
- Results export (bench export tps/agentic, CSV format)
- 317 → 436 tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-11 13:00:06 -05:00
.config feat(m6-s1): GREEN — DatabaseFactory + M001_InitialSchema; tool manifest; 3/3 tests pass 2026-05-07 13:40:50 -05:00
.github feat(m1): baseline & quick wins — 36→56 tps via 32k context 2026-05-05 16:20:59 -05:00
docs/work docs(phase4): closeout ceremonies — phase review, lessons migrated, ideas updated, STATUS complete 2026-05-11 12:59:58 -05:00
src/Thunder.LlmBench feat(m24-s2): GREEN — ExportRepository, ExportCommand, bench export tps/agentic 2026-05-11 12:45:16 -05:00
.env.example Initial commit: Self-hosted LLM using Ollama with NVIDIA GPU support and Open WebUI. 2026-03-06 06:09:36 -06:00
.gitignore feat(m8-s3): deploy config — Makefile viewer-generate/viewer-deploy, Caddyfile.snippet docs route, .gitignore docs/viewer/ 2026-05-07 19:26:34 -05:00
bench.db fix: CtxProber/Command - thread model through for llamacpp compose YAML generation; add --model flag; 1 new test 2026-05-08 15:35:46 -05:00
Caddyfile.snippet fix(m8-s3): align deploy config with actual thundersizzle.tech infrastructure 2026-05-07 20:34:44 -05:00
check-system.sh Initial commit: Self-hosted LLM using Ollama with NVIDIA GPU support and Open WebUI. 2026-03-06 06:09:36 -06:00
CLAUDE.md docs(phase4): closeout ceremonies — phase review, lessons migrated, ideas updated, STATUS complete 2026-05-11 12:59:58 -05:00
docker-compose.yaml feat(m5-s3): IQ4_NL_XL selected as primary agentic model; M5 complete 2026-05-07 07:59:44 -05:00
HARDWARE-CBA.md Initial commit: Self-hosted LLM using Ollama with NVIDIA GPU support and Open WebUI. 2026-03-06 06:09:36 -06:00
llamacpp.Dockerfile chore: add sm_52 to CUDA archs, add llamacpp docker-compose service 2026-05-06 09:13:32 -05:00
Makefile fix(m8-s3): Makefile — use tabs not spaces (make requires tabs) 2026-05-07 20:40:31 -05:00
README.md Initial commit: Self-hosted LLM using Ollama with NVIDIA GPU support and Open WebUI. 2026-03-06 06:09:36 -06:00
seed.db test(m9-s1): RED — process/docker/migration M002/server new fields 2026-05-08 05:39:51 -05:00

Thunder LLM - Self-Hosted LLM Setup

Self-hosted LLM using Ollama with NVIDIA GPU support and Open WebUI.

Architecture

GPU Machine (this machine):

  • Ollama: LLM inference engine with GPU acceleration

Web Server (thundersizzle.tech):

  • Open WebUI: Web interface for interacting with LLMs
  • Caddy: Reverse proxy for HTTPS access

This separation ensures only the GPU-intensive Ollama runs on the GPU machine, while the lightweight web interface runs with your other web services.

Prerequisites

On GPU Machine (this machine):

  1. Docker and Docker Compose installed

  2. NVIDIA GPU drivers installed

  3. NVIDIA Container Toolkit installed

    # For Ubuntu/Debian:
    distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
    curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
    curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
    
    sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
    sudo systemctl restart docker
    
  4. Verify GPU is accessible in Docker:

    docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
    

On Web Server:

  • Docker and Docker Compose
  • Caddy configured (already set up in thundersizzle.tech)
  • Network connectivity to GPU machine on port 11434

Setup

1. GPU Machine Setup

Find your GPU machine's local IP:

ip addr show | grep "inet " | grep -v 127.0.0.1
# Or: hostname -I

Start Ollama:

cd /home/james/Thunder/code/thunder-llm
docker compose up -d

Verify GPU is being used:

docker exec ollama nvidia-smi

Download models:

# Download a smaller model (7B)
docker exec -it ollama ollama pull llama3.2

# Or a larger model (70B) if you have enough VRAM
docker exec -it ollama ollama pull llama3.1:70b

# Popular alternatives:
docker exec -it ollama ollama pull mistral
docker exec -it ollama ollama pull codellama
docker exec -it ollama ollama pull deepseek-coder

2. Web Server Setup

Configure environment on web server:

Edit /home/james/Thunder/code/thundersizzle.tech/.env and add:

# GPU Machine IP (replace with your actual IP)
GPU_MACHINE_IP=192.168.1.100

Add Caddy configuration:

Add the configuration from Caddyfile.snippet to your Caddy config in thundersizzle.tech/caddy/Caddyfile.

Start Open WebUI:

cd /home/james/Thunder/code/thundersizzle.tech
docker compose up -d open-webui
docker compose restart caddy  # Reload Caddy config

Test connectivity:

# From web server, verify you can reach Ollama on GPU machine
curl http://<GPU_MACHINE_IP>:11434/api/tags

Usage

Access

First-time Setup

  1. Open https://llm.thundersizzle.tech
  2. Create an admin account (first user becomes admin)
  3. Select a model from the dropdown
  4. Start chatting!

Security Notes

  • Authentication is enabled by default (WEBUI_AUTH=true)
  • First registered user becomes the admin
  • Ollama API is only exposed to your local network (0.0.0.0:11434)
  • Consider using firewall rules to restrict Ollama access to only your web server IP:
    # On GPU machine
    sudo ufw allow from <WEB_SERVER_IP> to any port 11434
    

Monitoring

GPU Machine:

Check Ollama status:

docker compose ps
docker compose logs ollama

Check GPU usage:

watch -n 1 nvidia-smi

Check disk usage:

docker system df
du -sh /var/lib/docker/volumes/thunder-llm_*

Web Server:

Check Open WebUI status:

cd /home/james/Thunder/code/thundersizzle.tech
docker compose logs open-webui

Maintenance

Update images:

GPU Machine:

cd /home/james/Thunder/code/thunder-llm
docker compose pull
docker compose up -d

Web Server:

cd /home/james/Thunder/code/thundersizzle.tech
docker compose pull open-webui
docker compose up -d open-webui

Backup data:

GPU Machine (models):

docker compose down
sudo tar -czf llm-backup-$(date +%Y%m%d).tar.gz /var/lib/docker/volumes/thunder-llm_*
docker compose up -d

Web Server (user data):

cd /home/james/Thunder/code/thundersizzle.tech
docker compose down open-webui
sudo tar -czf open-webui-backup-$(date +%Y%m%d).tar.gz /var/lib/docker/volumes/thundersizzle.tech_open_webui_data
docker compose up -d open-webui

Remove unused models:

docker exec -it ollama ollama list
docker exec -it ollama ollama rm model-name

Troubleshooting

GPU not detected:

  • Verify NVIDIA drivers: nvidia-smi
  • Verify Container Toolkit: docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
  • Check Docker daemon has GPU support: docker info | grep -i runtime

Can't access web UI:

  • Verify Open WebUI is running on web server: docker compose ps open-webui
  • Check Caddy configuration includes llm.thundersizzle.tech
  • Verify DNS points to your web server
  • Test Caddy: docker compose exec caddy caddy validate --config /etc/caddy/Caddyfile

Open WebUI can't connect to Ollama:

  • Verify Ollama is running on GPU machine: docker ps | grep ollama
  • Test connection from web server: curl http://<GPU_MACHINE_IP>:11434/api/tags
  • Check GPU_MACHINE_IP in thundersizzle.tech/.env is correct
  • Verify firewall allows web server to access GPU machine port 11434

Out of memory:

  • Use smaller models
  • Limit concurrent requests
  • Check available VRAM: nvidia-smi

Slow performance:

  • Ensure GPU is being used (check nvidia-smi during inference)
  • Check model size vs available VRAM
  • Monitor CPU/RAM usage with htop