Attempting to naively optimize matrix multiplication... attempts below: n: 4 k: 4 m: 2 tx: 2 ty: 2 swap: 0 use matrix: 0 use mad: 0