vLLM with Open-WebUI

Setting Up vLLM with Open-WebUI Using Docker Compose

Overview

Leverage vLLM and Open-WebUI with Docker Compose for a streamlined, containerized deployment. This approach simplifies setup, ensures reproducibility, and offers easy scalability.

Why Use Docker Compose?

✅ Simple Setup: Manage everything with one command.
✅ Reproducibility: Consistent environments across deployments.
✅ Isolation: Separate services in containers.
✅ Scalability: Add or remove services easily.
✅ Easy Maintenance: Restart, update, or remove containers effortlessly.

1. Prerequisites

Docker: Install Docker
Docker Compose: Comes bundled with Docker Desktop or can be installed separately.
NVIDIA Container Toolkit: (If using GPU) Install Guide

2. Docker Compose Setup

Step 1: Create Project Directory

mkdir vllm-openwebui
cd vllm-openwebui

Step 2: Create Docker Compose File

Create a file named docker-compose.yml:

version: '3.8'

services:
  vllm:
    image: vllm/vllm:latest
    command: ["python", "-m", "vllm.entrypoints.api_server", "--model", "meta-llama/Llama-2-7b-chat-hf"]
    ports:
      - "8000:8000"
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
      - NVIDIA_DRIVER_CAPABILITIES=compute,utility
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              capabilities: [gpu]
    restart: always

  open-webui:
    image: ghcr.io/open-webui/open-webui:latest
    environment:
      - LLM_API_URL=http://vllm:8000/v1
    ports:
      - "3000:8080"
    depends_on:
      - vllm
    restart: always

Step 3: Launch the Services

docker-compose up -d

Open-WebUI will be available at: http://localhost:3000
vLLM API will be available at: http://localhost:8000/v1

3. Secure and Scale

Secure with NGINX (Optional)

Add a reverse proxy with HTTPS using NGINX and Let’s Encrypt.
Enable basic authentication for the WebUI and API endpoints.

Scale with Multiple GPUs

Modify docker-compose.yml:

command: ["python", "-m", "vllm.entrypoints.api_server", "--model", "meta-llama/Llama-2-7b-chat-hf", "--tensor-parallel-size", "2"]

Add Persistent Volumes

volumes:
  vllm-models:
  openwebui-data:

services:
  vllm:
    volumes:
      - vllm-models:/root/.cache/huggingface
  open-webui:
    volumes:
      - openwebui-data:/app/data

4. Maintenance Commands

Check Logs: docker-compose logs -f
Stop Services: docker-compose down
Update Containers: docker-compose pull docker-compose up -d --build

5. Conclusion

Using Docker Compose for vLLM and Open-WebUI provides simplicity, scalability, and easy maintenance. You can quickly deploy your LLM interface with minimal overhead while retaining full control and privacy.

📩 Need Help Setting Up?

Full Docker Compose Configurations & Optimizations
Secure, Scalable Deployments (on-premises or cloud)
Integrations with Slack, Notion, or other tools
Ongoing Support and Performance Tuning

Contact Me – Let’s make your LLM solution a reality!

docker LLM open-webui vllm