vLLM large scale serving: DeepSeek 2.2k tok/s/h200 with wide-ep

Inside vLLM: Anatomy of a High-Throughput LLM Inference System

VLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention