Writing an optimizing tensor compiler from scratch