RAG Knowledge Base

A production-ready Retrieval-Augmented Generation (RAG) backend service for document ingestion and question answering. Supports PDF, Markdown, and TXT files, creates embeddings, stores them in a vector database, and provides accurate answers with citations.

Architecture

System Architecture

Built on Node.js with TypeScript and Express.js, this production-ready RAG system implements a robust document processing pipeline. LangChain serves as the orchestration framework, managing document loaders for PDF, Markdown, and TXT files, intelligent text chunking with configurable size (1000 chars) and overlap (200 chars), and retrieval chains for question answering. Ollama provides local LLM capabilities: the nomic-embed-text model generates embeddings for semantic search, while llama3.2 powers answer generation, eliminating dependency on external API services. ChromaDB functions as the vector database, storing embeddings and enabling efficient similarity search for context retrieval. SQLite maintains document metadata, processing status, and query history. The system implements an in-memory query cache (LRU with TTL) to optimize repeated queries, reducing computational overhead. Server-Sent Events (SSE) deliver real-time status updates during document ingestion and query processing, providing users with live progress feedback. The architecture is containerized with Docker and supports horizontal scaling through stateless API design.

Technologies

Enterprise-level production experienceLearning / Experimental

Node.jsTypeScriptExpress.jsLangChainChromaDBOllamaSQLiteServer-Sent Events

Project Links

View on GitHub