Rajat Sharma — AI Engineer

Multi-Modal RAG

FastAPICeleryRedisPostgreSQLPGvectorAWS S3SupabaseClerkLangGraphOllamaDockerUnstructuredLangSmithRAGAS

Problem

Enterprise teams handling sensitive documents needed a RAG system that could process multi-format files (PDF, DOCX, PPTX, URLs), extract text, tables, and images, and answer questions with citation-backed accuracy. It also had to support fully local LLM deployment for zero data leakage.

System Design

Key Features

Multi-modal document processing: PDF, DOCX, PPTX, and URLs with text, table, and image extraction
Hybrid retrieval with vector + keyword search, multi-query expansion, reranking, and RRF
LangGraph multi-agent system with supervisor orchestration and SSE streaming
Three-layer input guardrails: toxicity, prompt-injection, and PII detection
Citation tracking for grounded, verifiable responses
Configurable local LLM support (Ollama/LLaMA) for zero data leakage
RAGAS-validated with ~80% higher accuracy than traditional RAG baseline

Details

Built to learn how production RAG pipelines work — multi-modal ingestion, hybrid retrieval, agentic generation, and evaluation — by building one from scratch.

Built a scalable RAG application using FastAPI, integrating S3 presigned URLs for direct uploads and Celery/Redis for asynchronous document processing, reducing backend load by 80% and enabling real-time background ingestion with full transparency.

Implemented multi-modal processing for 4 formats (PDF, DOCX, PPTX, URLs) using Unstructured, extracting three content types (text, tables, images) into PostgreSQL with PGvector for 1536-dimensional embeddings.

Developed a hybrid retrieval system combining vector + keyword search, multi-query expansion, reranking, and Reciprocal Rank Fusion (RRF), achieving ~30% higher retrieval accuracy with configurable search strategies per project.

Created a LangGraph-based multi-agent system with supervisor orchestration, three-layer input guardrails (toxicity, prompt-injection, PII detection), citation tracking for grounded responses, and SSE streaming emitting token and citation events consumed in real time.

Ensured enterprise-grade data privacy with configurable local LLM support (Ollama/LLaMA) enabling zero data leakage for sensitive documents, allowing organizations to process proprietary information on-premises without external API calls.

Validated quality using RAGAS evaluation framework, demonstrating ~80% higher accuracy than traditional RAG baseline (N = 30).