ML Systems Portfolio

Aryan Ashta

Math + CS @ UIUC | Research-driven engineering

Project Detail2025-11-05

Transformer Implementation From Scratch

End-to-end transformer reimplementation with component-level ablation tests.

Key ResultMatched 98% of baseline perplexity while exposing bottlenecks in attention kernels.

1. Overview

Reimplemented the transformer architecture to deeply understand each component and failure mode.

2. Architecture Diagram

Token Embeddings -> Multi-Head Attention -> MLP -> LayerNorm -> Decoder Head

3. Technical Stack

PyTorch
NumPy
Weights and Biases

4. Experimental Results

Perplexity: within 2% of baseline
Training throughput: 1700 tokens/s on A100
Memory savings: 12% with fused ops

5. Tradeoffs / Lessons

Kernel-level optimizations improved throughput but increased implementation complexity and debugging time.

6. Links

GitHub