LLM from scratch, part 28 – training a base model from scratch on an RTX 3090

Writing an LLM from scratch, part 22 – training our LLM

Writing an LLM from scratch, part 20 – starting training, and cross entropy loss

The maths you need to start understanding LLMs

Writing an LLM from scratch, part 13 – attention heads are dumb

Writing an LLM from scratch, part 8 – trainable self-attention

Writing an LLM from scratch, part 10 – dropout

It’s still worth blogging in the age of AI

The benefits of learning in public