Pre-training a 15B parameter language model over the internet