Optimizing Matrix Multiplication on RDNA3

Deep Dive into Matrix Optimization on AMD GPUs