BERT is just a single text diffusion step

Running GPT-2 in WebGL: Rediscovering the Lost Art of GPU Shader Programming