vLLM Deep Dive
From nano-vllm to Production Inference Engine
A hands-on course that walks through the internals of vLLM by first studying nano-vllm — a minimal ~2000-line reimplementation — then mapping each concept to the production vLLM codebase. You will understand how requests flow through the engine, how continuous batching and PagedAttention work, and how models are loaded and executed on GPUs.