the value of a performance oracle

Optimising the Fibonacci Benchmark

Modular: TileTensor Part 1 - Safer, More Efficient GPU Kernels

Advice: working with two compilers

TIL: Even with the new if let guards, match arms still need a fallback. Can someone help me understand the compiler's logic here?

Tracing a Full MoE Training Step Through the XLA Compiler

From SIMT to Systolic Part 2: A Kernel Author's Field Report

I got it

: vLLM IR: A Functional Intermediate Representation for vLLM

Wastrel milestone: full hoot support, with generational gc as a treat