Loading...

Tag trends are in beta. Feedback? Thoughts? Email me at [email protected]

LLM Architecture Gallery

The Big LLM Architecture Comparison

A Technical Tour of the DeepSeek Models from V3 to v3.2

GPT-OSS vs. Qwen3 and a detailed look how things evolved since GPT-2

LLM architecture comparison

The State of Reinforcement Learning for LLM Reasoning

Understanding Reasoning LLMs

Building LLMs from the Ground Up: A 3-Hour Coding Workshop

Implementing Weight-Decomposed Low-Rank Adaptation (DoRA) from Scratch

Coding Self-Attention, Multi-Head Attention, Cross-Attention, Causal-Attention

Ten Noteworthy AI Research Papers of 2023

Practical Tips for Finetuning LLMs Using LoRA (Low-Rank Adaptation)

AI and Open Source in 2023

Training and aligning LLMs with RLHF and RLHF alternatives

Optimizing LLMs from a Dataset Perspective

Understanding Llama 2 and the New Code Llama LLMs

Why the original transformer figure is wrong, and some other tidbits about LLMs

Finetuning Large Language Models

Understanding large language models: A cross-section of the relevant literature

Understanding and coding the self-attention mechanism of large language models

Understanding Large Language Models – A Transformative Reading List

A Short Chronology of Deep Learning for Tabular Data

Running PyTorch on the M1 GPU

Machine Learning with PyTorch and Scikit-Learn – The *New* Python ML Book

Intro to Deep Learning Course – 170 videos from the basics to transformers

Scientific Computing in Python: Introduction to NumPy and Matplotlib -- Including Video Tutorials

Key differences between Python 2.7.x and Python 3.x (2014)

A step by step guide for getting Python 3.6 & TensorFlow up and running on AWS EC2 instances

Implementing a Principal Component Analysis In Python (2014)

Python, Machine Learning, and Language Wars

More →