This is my personal list of resources related to machine learning systems. Feel free to drop me an email if you think there’s something worth mentioning. I will try to update this page frequently to include the most recent stuffs in mlsys.

Courses

Labs & Faculties

Tutorials

LLM Optimization

Communication

Seminars

Papers

This section could potentially be extremely long..

Training

Really broad topic…

LLM

You an also refer to Awesome-LLM

NAS

  • Puzzle: Distillation-Based NAS for Inference-Optimized LLMs: Applying block-wise local distillation to every alternative subblock replacement in parallel and scoring its quality and inference cost to build a “library” of blocks. Then, using Mixed-Integer-Programming to assemble a heterogeneous architecture that optimizes quality under constraints such as throughput, latency and memory usage.

Diffusion

KV Cache

Datasets

ML Compilers

Graph Optimization

Inference

Multitenancy

Dynamic Neural Network

Auto Placement

Reasoning LLM

Federated Learning

Switch & ML

Memory Management

System Design

Trade-off

Structured LLM Generation

Async Training

Self-play

Costs

RAG