KI & GPU

Odysseus with a GPU: run local AI models on a GPU server

For fast local models, Odysseus needs a GPU. Here is how to enable NVIDIA support in the Docker setup and serve your own models on a GPU server.

by Moritz MöllerJuly 3, 2026

Odysseus with a GPU: run local AI models on a GPU server

Ready to deploy?

Spin up a high-performance server in under 60 seconds.

View pricing

Keep reading

Odysseus tips and tricks for your self-hosted AI workspace

Get more out of Odysseus: agents and MCP, deep research and web search, email and calendar, optional features, and solid hardening.

How to install Odysseus, a self-hosted AI workspace

Odysseus is a self-hosted AI workspace for chat, agents, research, documents, and more. Here is how to install it with Docker and serve it securely with HTTPS.

Why a GPU server?#

Local models are the heavy part. How large a model you can run depends mostly on your GPU's VRAM. As a rough guide at 4-bit quantization:

Model size	VRAM	Suitable GPU
7B to 14B	approx. 6 to 10 GB	RTX 4000 Ada (20 GB)
32B	approx. 22 GB	RTX 6000 Ada (48 GB)
70B	approx. 42 GB	RTX 6000 Ada (48 GB)

A ComputeBox GPU server delivers exactly that: dedicated NVIDIA cards with 20 or 48 GB of VRAM, full root access, and German data centers, ready in minutes and without hourly cost traps.

Local models with full GPU power: RTX 4000 Ada (20 GB) from €99/month, dedicated.

See GPU servers

Step 5: Load and serve models#

Open Odysseus and go to the Cookbook. There you get hardware-aware model recommendations, download models, and serve them through Odysseus. Downloads land in ./data/huggingface, the serve engines in ./data/local, both survive a container recreation.

GPU passthrough is not the same as a CUDA build

A successful nvidia-smi inside the container only confirms GPU access. If the Cookbook reports Unable to find cudart or runs on the CPU, the serve engine's CUDA build is missing. Reinstall it via Cookbook → Dependencies to get a CUDA-enabled version.

Alternative: connect Ollama

If an Ollama is already running on the host (started with OLLAMA_HOST=0.0.0.0:11434 ollama serve), just add the endpoint http://host.docker.internal:11434/v1 in the Odysseus settings.

Problems?#

Troubleshooting

nvidia-smi fails inside the container

The Container Toolkit is missing or the overlay is not active. Run steps 2 and 3 and restart the stack.

Unable to find cudart / runs on CPU

This is not a passthrough problem. Reinstall the serve engine via Cookbook → Dependencies to get a CUDA build.

The wrong GPU is detected

Cookbook only sees GPUs that Docker passes through. Check passthrough with ./scripts/check-docker-gpu.sh without options.

FAQ#

Frequently asked questions

Which GPU do I need for 70B models?

A 70B model runs comfortably in 48 GB of VRAM at 4-bit quantization, so on the RTX 6000 Ada. For 7B to 14B, the RTX 4000 Ada with 20 GB is enough.

Does this work with AMD GPUs?

Odysseus also supports ROCm through its own overlay. The path is similar but uses docker/gpu.amd.yml and the AMD diagnostic.

Do I have to use local models?

No. You can run Odysseus with API models only. The GPU pays off once you want to serve models locally and privately.

GPU server for local AI

RTX 6000 Ada with 48 GB VRAM or RTX 4000 Ada from €99/month, dedicated and from German data centers.

Configure a GPU server