ML Systems Portfolio

Aryan Ashta

Math + CS @ UIUC | Research-driven engineering

Project Detail2026-01-12

Optimized RAG Pipeline

Configurable retrieval pipeline with evaluation tooling and reranking optimization.

Key ResultReduced p95 latency by 28% while increasing retrieval recall@10 by 11%.

1. Overview

Built a retrieval-augmented generation pipeline to improve answer quality in domain-specific QA settings.

2. Architecture Diagram

Client -> API -> Query Rewriter -> Retriever(FAISS) -> Reranker -> LLM -> Response

3. Technical Stack

PyTorch
FastAPI
FAISS
PostgreSQL
Redis

4. Experimental Results

Recall@10: +11%
p95 latency: -28%
Cost/query: -14%

5. Tradeoffs / Lessons

Increasing retrieval depth improved recall, but reranking was necessary to control latency and context noise.

6. Links

GitHub
Demo (private)
Technical report