CPU Dispatching: Make your code both portable and fast (2020)

A story of a large loop with a long instruction dependency chain

Avoiding register spills in vectorized code with many constants

Unexpected ways memory subsystem interacts with branch prediction

Faster hash maps, binary trees etc. through data layout modification

Make your programs run faster by better using the data cache (2020)

Hiding Memory Latency With In-Order CPU Cores OR How Compilers Optimize Your Code

Horrible Code, Clean Performance

Decreasing the number of memory accesses

Decreasing the Number of Memory Accesses: The Compiler's Secret Life 2/2

Frugal Programming: Saving Memory Subsystem Bandwidth

Loop Optimizations: interpreting the compiler optimization report

How branches influence the performance of your code and what can you do about it?