Forcing Flash Attention onto a TPU and Learning the Hard Way