vLLM with Open-WebUI

An Oregon based software and hardware company

Setting Up vLLM with Open-WebUI Using Docker Compose

Overview

Leverage vLLM and Open-WebUI with Docker Compose for a streamlined, containerized deployment. This approach simplifies setup, ensures reproducibility, and offers easy scalability.


Why Use Docker Compose?

Simple Setup: Manage everything with one command.
Reproducibility: Consistent environments across deployments.
Isolation: Separate services in containers.
Scalability: Add or remove services easily.
Easy Maintenance: Restart, update, or remove containers effortlessly.


1. Prerequisites

  • Docker: Install Docker
  • Docker Compose: Comes bundled with Docker Desktop or can be installed separately.
  • NVIDIA Container Toolkit: (If using GPU) Install Guide

2. Docker Compose Setup

Step 1: Create Project Directory

mkdir vllm-openwebui
cd vllm-openwebui

Step 2: Create Docker Compose File

Create a file named docker-compose.yml:

version: '3.8'

services:
  vllm:
    image: vllm/vllm:latest
    command: ["python", "-m", "vllm.entrypoints.api_server", "--model", "meta-llama/Llama-2-7b-chat-hf"]
    ports:
      - "8000:8000"
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
      - NVIDIA_DRIVER_CAPABILITIES=compute,utility
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              capabilities: [gpu]
    restart: always

  open-webui:
    image: ghcr.io/open-webui/open-webui:latest
    environment:
      - LLM_API_URL=http://vllm:8000/v1
    ports:
      - "3000:8080"
    depends_on:
      - vllm
    restart: always

Step 3: Launch the Services

docker-compose up -d

3. Secure and Scale

Secure with NGINX (Optional)

  • Add a reverse proxy with HTTPS using NGINX and Let’s Encrypt.
  • Enable basic authentication for the WebUI and API endpoints.

Scale with Multiple GPUs

Modify docker-compose.yml:

command: ["python", "-m", "vllm.entrypoints.api_server", "--model", "meta-llama/Llama-2-7b-chat-hf", "--tensor-parallel-size", "2"]

Add Persistent Volumes

volumes:
  vllm-models:
  openwebui-data:

services:
  vllm:
    volumes:
      - vllm-models:/root/.cache/huggingface
  open-webui:
    volumes:
      - openwebui-data:/app/data

4. Maintenance Commands

  • Check Logs: docker-compose logs -f
  • Stop Services: docker-compose down
  • Update Containers: docker-compose pull docker-compose up -d --build

5. Conclusion

Using Docker Compose for vLLM and Open-WebUI provides simplicity, scalability, and easy maintenance. You can quickly deploy your LLM interface with minimal overhead while retaining full control and privacy.


📩 Need Help Setting Up?

  • Full Docker Compose Configurations & Optimizations
  • Secure, Scalable Deployments (on-premises or cloud)
  • Integrations with Slack, Notion, or other tools
  • Ongoing Support and Performance Tuning

Contact Me – Let’s make your LLM solution a reality!

 

Leave a Reply

Your email address will not be published. Required fields are marked *