Articles

An Oregon based software and hardware company

Spiking Neural Networks: Event-Driven Processing in AI Models

Artificial intelligence has rapidly evolved over the past few decades, primarily driven by deep learning architectures such as convolutional and recurrent neural networks. However, these models often rely on massive computational resources and operate with a continuous flow of information. An alternative paradigm, inspired by biological neural processes, is gaining attention: Spiking Neural Networks (SNNs).…
Read more

A Distributed RAG System with Kafka, ChromaDB, and gRPC

At JoLoMo LLC, we are building a scalable, distributed RAG system leveraging Kafka (3-node Kraft cluster), ChromaDB, and gRPC for high-performance document ingestion, storage, and retrieval. Our event-driven architecture allows us to process large workloads efficiently across multiple servers while ensuring fast, intelligent querying using embeddings from Ollama.

By choosing gRPC over REST, we ensure low-latency, high-throughput communication, with native streaming support for handling large documents. Our system is already distributed, but we’re working on further scalability improvements, including advanced query optimization, Kubernetes orchestration, and multi-tenant support.

Read the full article to learn how we’re building the future of AI-powered document retrieval and knowledge management.

Scalable RAG System

Introduction At JoLoMo LLC, we specialize in AI-driven solutions that enhance business efficiency and scalability. Our Retrieval-Augmented Generation (RAG) system leverages cutting-edge technologies to deliver real-time, context-aware data retrieval. This article explores our approach to developing and deploying a high-performance RAG system, powered by Ollama LLM, ChromaDB, Elasticsearch, Redis, and LangChain. System Architecture Overview Our…
Read more

vLLM with Open-WebUI

Setting Up vLLM with Open-WebUI Using Docker Compose Overview Leverage vLLM and Open-WebUI with Docker Compose for a streamlined, containerized deployment. This approach simplifies setup, ensures reproducibility, and offers easy scalability. Why Use Docker Compose? ✅ Simple Setup: Manage everything with one command. ✅ Reproducibility: Consistent environments across deployments. ✅ Isolation: Separate services in containers.…
Read more

AI-Powered Family Scribe

Preserving Memories & Strengthening Connections In an era where technology often distances us from genuine human interactions, we’re harnessing AI to bring families closer together. Our AI-powered Q&A texting service is designed to capture memories, spark meaningful conversations, and create a digital family historian—all through simple text messages. The Problem: Life Moves Fast, Memories Get…
Read more

RTX 4090 vs. RTX 6000 Ada for Self-Hosted vLLM Interfaces

Comparing RTX 4090 vs. RTX 6000 Ada for Self-Hosted vLLM Interfaces As the demand for self-hosted large language models (LLMs) grows, choosing the right GPU for a vLLM interface becomes a crucial decision. Two popular choices for self-hosted LLMs are the NVIDIA RTX 4090 and the NVIDIA RTX 6000 Ada. In this article, we will…
Read more

Mixture-of-Experts (MoE) Models and Neuromorphic Hardware Integration

Exploring the Future: Mixture-of-Experts (MoE) Models and Neuromorphic Hardware Integration Artificial Intelligence (AI) continues to evolve, and one of the most promising areas is the convergence of Mixture-of-Experts (MoE) models with neuromorphic hardware. This innovative approach offers exciting possibilities for creating efficient, adaptive, and scalable AI systems. Let’s explore what MoE models and neuromorphic hardware…
Read more

SpiNNaker2 & Neuromorphic Chips

SpiNNaker2 and the Roadmap for Neuromorphic Chips at Scale Introduction SpiNNaker2 is a cutting-edge neuromorphic computing platform designed to emulate the parallel processing capabilities of the human brain. Developed by SpiNNcloud Systems in collaboration with the Technical University of Dresden, this system represents a significant advancement in artificial intelligence (AI) and high-performance computing. SpiNNaker2 Architecture…
Read more

Setting Up Ollama

Setting Up Ollama on Linux and Windows: A Comprehensive Guide Introduction Ollama is a powerful local large language model (LLM) runner designed for efficient on-device inference. Whether you’re a developer, researcher, or hobbyist, Ollama offers a seamless experience for running LLMs locally. In this guide, we will walk you through the installation and setup of…
Read more

What are Large Language Models (LLMs)?

Understanding Large Language Models (LLMs) Imagine having a super-smart assistant who can answer your questions, write essays, or create stories in seconds. This is what a Large Language Model (LLM) does—it’s an advanced form of artificial intelligence (AI) trained to understand and generate human language. What Is an LLM? A Large Language Model is a…
Read more