This matlab rendition creates temporary index vectors and creates temporary vectors of extracted elements. That works OK, but is not as efficient at the low level as compiled code along the lines of
Result = 0
Result = Result + v1(Off1) * v2(off2)
Off1 = Off1 + inc1
Off2 = Off2 + inc2
Now let us consider
That involves a whole series of
Which is vidot(&A(1,k),1,&B(k,1),size(B,1),size(A,1))
Where here & is intended to indicate "address of"
You could implement a matrix multiply with a lot of temporary index vectors and temporary extraction of vectors, but that involves a lot of temporary operations and storage management that you can see are not needed if you know the distance between vector elements.