How to Run Ollama in Docker (with GPU + Open WebUI)
By LocalLLMGear Editorial · Editorial Team · Updated 2026-06-29
We test hardware hands-on and may use AI tools in research — every guide is human-reviewed. Editorial policy.
We may earn a commission from links in this article, at no extra cost to you. Disclosure.
Running Ollama in Docker keeps your local-LLM setup tidy and reproducible: one image, one volume, no system packages to manage, and a clean teardown when you’re done. The catch most people hit is the GPU — a container can’t see your NVIDIA card by default, so generation crawls on the CPU. This guide fixes that, then bolts on a real chat UI.
The 30-second answer: Run
docker run -d --gpus all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollamato get a GPU-accelerated Ollama server in a container, with models persisted in a named volume. Add Open WebUI via docker compose for a ChatGPT-style interface onhttp://localhost:3000.
Run Ollama in Docker
The official image is published as ollama/ollama on Docker Hub. The simplest possible
launch — CPU only, just to confirm it works:
docker run -d -p 11434:11434 --name ollama ollama/ollama
-d runs it in the background, -p 11434:11434 exposes Ollama’s API on the usual port, and
--name ollama gives the container a memorable handle. Confirm it’s alive:
docker exec -it ollama ollama --version
docker exec -it ollama ollama run llama3
The second command downloads llama3 (if needed) and drops you into a chat inside the
container. That proves the server works — but two things are still missing: GPU acceleration,
and somewhere durable to keep the models. Let’s add both.
Enable the NVIDIA GPU
Without a GPU, even an 8B model is painfully slow. To pass an NVIDIA card into a container on
Linux you need the NVIDIA Container Toolkit installed on the host (this is separate from
your GPU driver, which must already be working — check with nvidia-smi).
Install and wire it into Docker:
# Add the NVIDIA Container Toolkit repo, then:
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
(Follow NVIDIA’s current install docs for the exact repo lines for your distro — they change
occasionally.) Once Docker has been restarted, you can hand the GPU to any container with
--gpus all. Re-run Ollama with the flag added:
docker run -d --gpus all -p 11434:11434 --name ollama ollama/ollama
Verify the container actually sees the GPU:
docker exec -it ollama nvidia-smi
If nvidia-smi lists your card from inside the container, you’re set — Ollama will offload
the model to VRAM automatically. On Windows, the equivalent is Docker Desktop with the
WSL2 backend plus a recent NVIDIA driver; --gpus all then works the same way. On macOS,
Docker can’t pass through the Apple GPU, so containerized Ollama runs on CPU — Mac users are
better off running the native Ollama app instead.
Persist your models with a volume
By default a container’s filesystem is disposable. Ollama stores everything — model weights,
config — in /root/.ollama, which means a docker rm throws away every gigabyte you
downloaded. Mount a named volume so the data outlives the container:
docker run -d --gpus all \
-v ollama:/root/.ollama \
-p 11434:11434 \
--name ollama ollama/ollama
Now ollama is a Docker-managed volume holding your models. Remove and re-create the
container, upgrade to a newer image — the models stay put. You can pull more into it from the
host without ever opening a shell:
docker exec -it ollama ollama pull qwen2.5:14b
docker exec -it ollama ollama list
This is the command worth saving. It’s GPU-accelerated, persistent, and reachable on
localhost:11434 for any app on your machine.
Add Open WebUI with docker compose
The terminal works, but a proper chat UI makes a local model
feel finished. Rather than juggle two docker run commands, define the whole stack in one
docker-compose.yml so the two containers share a network and start together:
services:
ollama:
image: ollama/ollama
container_name: ollama
volumes:
- ollama:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
ports:
- "3000:8080"
environment:
- OLLAMA_BASE_URL=http://ollama:11434
volumes:
- open-webui:/app/backend/data
depends_on:
- ollama
volumes:
ollama:
open-webui:
A few things to notice. The GPU is requested through the deploy.resources block — the
compose equivalent of --gpus all. Open WebUI reaches Ollama at http://ollama:11434, using
the service name as the hostname over the shared internal network, so Ollama needs no
published host port at all. And each service has its own named volume, so both your models and
your chat history survive updates.
Bring it up:
docker compose up -d
Verify it all works
Give the containers a few seconds, then check the stack:
docker compose ps # both should be "running"
docker exec -it ollama ollama pull llama3
Open http://localhost:3000 in your browser. The first account you create becomes the
admin, and any model you’ve pulled shows up in the picker automatically. Pick llama3, send a
message, and you’ve got a fully private, GPU-accelerated ChatGPT-style app — every byte staying
on your own hardware.
If generation feels slow, the bottleneck is almost never Docker — it’s the model not fitting in your VRAM and spilling to CPU. Matching model size to your card is a hardware question; the Ollama guide has a VRAM sizing table to help you pick.
Common gotchas
could not select device driver "nvidia": the NVIDIA Container Toolkit isn’t installed or Docker wasn’t restarted afternvidia-ctk runtime configure. Re-run those steps.- Models vanish after an update: you forgot the
-v ollama:/root/.ollamavolume. Always mount it. - Open WebUI shows no models: confirm
OLLAMA_BASE_URLpoints athttp://ollama:11434(the service name), and that you’ve pulled at least one model into the Ollama container. - Port 11434 already in use: a native Ollama install is probably already running on the
host. Stop it, or drop the
-p 11434:11434mapping and talk to the container internally.
Docker gives you a clean, repeatable local-LLM box you can rebuild in one command. If you want to genuinely understand what’s running inside it — prompting, embeddings, RAG and fine-tuning — a structured course shortcuts a lot of trial and error:
Learn the fundamentals on DataCamp Ad