OBS Studio Gets a New Renderer

I rebuilt FlashAttention in Triton to understand the performance archaeology

Faster practical modular inversion

Tachyon: High frequency statistical sampling profiler

OpenSSL Performance Still Under Scrutiny

Python 3.15’s interpreter for Windows x86-64 should hopefully be 15% faster

From profiling to kernel patch: the journey to an eBPF performance fix

Leaving Intel

Why are we worried about memory access semantics? Full barriers should be enough for anybody

Package managers keep using Git as a database, it never works out

System Observability: Metrics, Sampling, and Tracing

Microarchitecture: What Happens Beneath

Partial inlining

GNU Mes and the module system

How to speed up the Rust compiler in December 2025

Super-Flat ASTs

Zmij: Faster floating point double-to-string conversion

Helldivers 2 on-disk size 85% reduction

What's the point of lightweight code with modern computers?

Compressing callstacks: a bitpacked DAG powered by a keyless hashmap

OSS Friday Update - Fibers are the Future of Ruby

LionsOS Design, Implementation and Performance

mmap in Go considered harmful

When SIMD Fails: Floating Point Associativity

Performance Excuses Debunked - Also, many examples of successful rewrites

UringMachine Benchmarks

It's Not Always ICache (2021)

Better Code: Concurrency (2017)

Building small Docker images faster

Event Library - A lightweight, zero boilerplate, high performance event bus for JVM

More →