
Multi-Modal RAG
Problem
Enterprise teams handling sensitive documents needed a RAG system that could process multi-format files (PDF, DOCX, PPTX, URLs), extract text, tables, and images, and answer questions with citation-backed accuracy. It also had to support fully local LLM deployment for zero data leakage.
System Design




Key Features
- Multi-modal document processing: PDF, DOCX, PPTX, and URLs with text, table, and image extraction
- Hybrid retrieval with vector + keyword search, multi-query expansion, reranking, and RRF
- LangGraph multi-agent system with supervisor orchestration and SSE streaming
- Three-layer input guardrails: toxicity, prompt-injection, and PII detection
- Citation tracking for grounded, verifiable responses
- Configurable local LLM support (Ollama/LLaMA) for zero data leakage
- RAGAS-validated with ~80% higher accuracy than traditional RAG baseline
Details
Built to learn how production RAG pipelines work — multi-modal ingestion, hybrid retrieval, agentic generation, and evaluation — by building one from scratch.
Built a scalable RAG application using FastAPI, integrating S3 presigned URLs for direct uploads and Celery/Redis for asynchronous document processing, reducing backend load by 80% and enabling real-time background ingestion with full transparency.
Implemented multi-modal processing for 4 formats (PDF, DOCX, PPTX, URLs) using Unstructured, extracting three content types (text, tables, images) into PostgreSQL with PGvector for 1536-dimensional embeddings.
Developed a hybrid retrieval system combining vector + keyword search, multi-query expansion, reranking, and Reciprocal Rank Fusion (RRF), achieving ~30% higher retrieval accuracy with configurable search strategies per project.
Created a LangGraph-based multi-agent system with supervisor orchestration, three-layer input guardrails (toxicity, prompt-injection, PII detection), citation tracking for grounded responses, and SSE streaming emitting token and citation events consumed in real time.
Ensured enterprise-grade data privacy with configurable local LLM support (Ollama/LLaMA) enabling zero data leakage for sensitive documents, allowing organizations to process proprietary information on-premises without external API calls.
Validated quality using RAGAS evaluation framework, demonstrating ~80% higher accuracy than traditional RAG baseline (N = 30).