When XLA Isn't Enough: From Pallas to VLIW with Splash Attention on TPU

From JAX to VLIW: Tracing a Computation Through the TPU Compiler Stack