Can gzip be a language model?

BERT is just a single text diffusion step

Running GPT-2 in WebGL: Rediscovering the Lost Art of GPU Shader Programming