Transformer neural net learns to run Conway's Game of Life just from examples

Analyzing Data 170,000x Faster with Python

How the BPE tokenization algorithm used by large language models works

Notes on training BERT from scratch on an 8GB consumer GPU

A toy example of Bayesian hyperparameter optimization on parallel cloud VMs

The Fourier transform is a neural network

From scratch: reverse-mode automatic differentiation (in Python)

Piano Tuning

Piano Tuning