Optimized RAG Pipeline
Configurable retrieval pipeline with evaluation tooling and reranking optimization.
Key ResultReduced p95 latency by 28% while increasing retrieval recall@10 by 11%.
1. Overview
Built a retrieval-augmented generation pipeline to improve answer quality in domain-specific QA settings.
2. Architecture Diagram
Client -> API -> Query Rewriter -> Retriever(FAISS) -> Reranker -> LLM -> Response
3. Technical Stack
- PyTorch
- FastAPI
- FAISS
- PostgreSQL
- Redis
4. Experimental Results
- Recall@10: +11%
- p95 latency: -28%
- Cost/query: -14%
5. Tradeoffs / Lessons
Increasing retrieval depth improved recall, but reranking was necessary to control latency and context noise.
6. Links
- GitHub
- Demo (private)
- Technical report