Project Detail2026-01-12

Optimized RAG Pipeline

Configurable retrieval pipeline with evaluation tooling and reranking optimization.

Key ResultReduced p95 latency by 28% while increasing retrieval recall@10 by 11%.

1. Overview

Built a retrieval-augmented generation pipeline to improve answer quality in domain-specific QA settings.

2. Architecture Diagram

Client -> API -> Query Rewriter -> Retriever(FAISS) -> Reranker -> LLM -> Response

3. Technical Stack

  • PyTorch
  • FastAPI
  • FAISS
  • PostgreSQL
  • Redis

4. Experimental Results

  • Recall@10: +11%
  • p95 latency: -28%
  • Cost/query: -14%

5. Tradeoffs / Lessons

Increasing retrieval depth improved recall, but reranking was necessary to control latency and context noise.

6. Links