Matrix Core Programming on AMD GPUs

Matrix Core Programming on AMD GPUs

Advanced Matrix Multiplication Optimization on Multi-Core Processors (2024)

Beating cuBLAS in Single-Precision General Matrix Multiplication

Beating OpenBLAS and MKL in 150 lines of C Code: A Tutorial on High-Performance Matrix Multiplication

Beating NumPy matrix multiplication in 150 lines of C