The Longest Nvidia PTX Instruction

Parsing JSON in C & C++: Singleton Tax

CPU Ports and Latency Hiding on x86

Over-engineering 5x Faster Set Intersections in SVE2, AVX-512, & NEON

35% Discount on Keyword Arguments in Python

The Painful Pitfalls of C++ STL Strings

NumPy vs BLAS: Losing 90% of Throughput

Mastering C++ with Google Benchmark (2022)

Accelerating JavaScript arrays by 10x for Vector Search

Binding a C++ Library to 10 Programming Languages 🔟

Publishing 28 Billion Molecule Embeddings with AWS ⚗️

How to avoid PyBind11 and write 5x faster CPython bindings 🐍

Python, C, Assembly – Faster Cosine Similarity

JavaScript for AI: Array or TypedArray for Vector Search 🏹

Show HN: Faking SIMD to Search and Sort Strings 5x Faster

From Dating to Vector Search – “Stable Marriages” on a Global Scale

Abusing vector search for texts, maps, and chess