Self Hosting – JoLoMo – AI + Software + Hardware

A Distributed RAG System with Kafka, ChromaDB, and gRPC

February 27, 2025

At JoLoMo LLC, we are building a scalable, distributed RAG system leveraging Kafka (3-node Kraft cluster), ChromaDB, and gRPC for high-performance document ingestion, storage, and retrieval. Our event-driven architecture allows us to process large workloads efficiently across multiple servers while ensuring fast, intelligent querying using embeddings from Ollama.

By choosing gRPC over REST, we ensure low-latency, high-throughput communication, with native streaming support for handling large documents. Our system is already distributed, but we’re working on further scalability improvements, including advanced query optimization, Kubernetes orchestration, and multi-tenant support.

Read the full article to learn how we’re building the future of AI-powered document retrieval and knowledge management.

vLLM with Open-WebUI

February 13, 2025

Setting Up vLLM with Open-WebUI Using Docker Compose Overview Leverage vLLM and Open-WebUI with Docker Compose for a streamlined, containerized deployment. This approach simplifies setup, ensures reproducibility, and offers easy scalability. Why Use Docker Compose? ✅ Simple Setup: Manage everything with one command. ✅ Reproducibility: Consistent environments across deployments. ✅ Isolation: Separate services in containers.…
Read more

RTX 4090 vs. RTX 6000 Ada for Self-Hosted vLLM Interfaces

January 25, 2025

Comparing RTX 4090 vs. RTX 6000 Ada for Self-Hosted vLLM Interfaces As the demand for self-hosted large language models (LLMs) grows, choosing the right GPU for a vLLM interface becomes a crucial decision. Two popular choices for self-hosted LLMs are the NVIDIA RTX 4090 and the NVIDIA RTX 6000 Ada. In this article, we will…
Read more

Category: Self Hosting

A Distributed RAG System with Kafka, ChromaDB, and gRPC

vLLM with Open-WebUI

RTX 4090 vs. RTX 6000 Ada for Self-Hosted vLLM Interfaces

JoLoMo - AI + Software + Hardware