Complex matrix multiplication with pagemtimes

24 views (last 30 days)
I'm optimizing some function and I end up having to do voxelwise quadratic matrix multiplication a'Ba on complex mats. While testing out the speed of different codes, I came across a weird coincidance, having a or B be real and the other complex, my matrix multiplication is an order of magnitude slower! Does anyone have an explaination for this? The work around is just to add 1i*eps but you'd think this shouldn't need to be done in a software as mature as matlab...
%clc
a = rand(50,1)+1i*eps;
b = rand(50,50,10000); %+1i*eps ;
tic
for n=1:100
c= pagemtimes( a',pagemtimes(b,a));
end
toc
Elapsed time is 16.289774 seconds.
tic
for n=1:100
c = sum(pagemtimes(b,a).*conj(a),1);
end
toc
Elapsed time is 16.440757 seconds.
b=b+1i*eps;
tic
for n=1:100
c= pagemtimes( a',pagemtimes(b,a));
end
toc
Elapsed time is 2.386708 seconds.
tic
for n=1:100
c = sum(pagemtimes(b,a).*conj(a),1);
end
toc
Elapsed time is 2.532941 seconds.
  3 Comments
tiwwexx
tiwwexx on 7 Jul 2023
a is always a vector and is always complex. b is not pages symetric.
For context this is calculating E field heating from a phased RF array on a vectorized 3D volume using Q matrix formalism.
James Tursa
James Tursa on 8 Jul 2023
See my edits below for faster ways of doing this.

Sign in to comment.

Accepted Answer

James Tursa
James Tursa on 7 Jul 2023
Edited: James Tursa on 7 Jul 2023
This really has nothing to do with the maturity of MATLAB ... every language that uses BLAS routines in the background will have the same issue. The explanation is because of the way real and complex variables are stored, and how they are multiplied in the background by the BLAS libraries. Real variables are stored in a contiguous block of memory with all the elements next to each other. Since R2018a Complex variables are stored in a contiguous block of memory with the real and imaginary parts interleaved. Matrix multiplication of arrays in MATLAB is accomplished by calling the appropriate BLAS library routine in the background. But there are no complex-real routines in this library. You either have to pass in two real variables, or two complex variables. So when you have a complex times real situation, the real variable first has to be deep copied to new memory in an interleaved fashion with 0's filling the imaginary spots, then the BLAS library is called with these inputs. So these deep copies (to twice the memory size) and all those 0 multiplies are the culprit, but if you have mixed complexity types and want to use BLAS routines in the background this penalty can't be avoided (and would be the same in any other language that used BLAS routines).
BTW, if possible you should be careful doing a' or conj(a) ahead of time as these can cause deep copies also. It might be best to use the transpX and transpY options of pagemtimes( ) to essentially get the same result without making explicit deep copies of the inputs.
Side Note: In MATLAB R2017b and earlier, complex variables were stored with the real and imaginary parts in separate blocks of memory, so mixed matrix multiplies could be accomplished by calling the real*real routine on the individual parts in the background, avoiding the deep copies and avoiding a lot of 0 multiplies. Bottom line is that mixed matrix multiplies in R2017b and earlier is faster than R2018a and later.
*** EDIT ***
If "a" is a complex vector and B is real with symmetric pages, you can mimic what happens in R2017b by splitting up "a" into its real and imaginary parts and doing the computations separately. This avoids all unnecessary deep data copies and 0 multiplies and should be significantly faster. E.g., in this particular case you would have
ar = real(a); ai = imag(a);
Then do the pagewise calculations ar'*B*ar + ai'*B*ai to get the result. Because of the B symmetry, all of the imaginary result can be shown to be 0, so this method avoids explicitly computing it entirely. And you don't even need pagemtimes for this. You can turn your B array into one big 2D array and do normal matrix multiplies. E.g.,
N = numel(a);
BN = reshape(B,N,[]);
result = ar' * reshape( ar' * BN, N, [] ) + ai' * reshape( ai' * BN, N, [] );
Test using reshape and normal matrix multiply:
a = rand(50,1) + rand(50,1)*1i; % a complex vector
B = rand(50,50,100000); % make it big and real
B = B + pagetranspose(B); % make it symmetric
tic
N = numel(a);
ar = real(a); ai = imag(a);
BN = reshape(B,N,[]);
result_r = ar' * reshape( ar' * BN, N, [] ) + ai' * reshape( ai' * BN, N, [] );
toc
Elapsed time is 0.193764 seconds.
Test using pagemtimes:
tic
result_p = pagemtimes( a,'ctranspose',pagemtimes(B,a),'none');
toc
Elapsed time is 1.626860 seconds.
Compare:
result_r(1:10)
ans = 1×10
1.0e+03 * 1.1392 1.1815 1.1726 1.1622 1.1707 1.1688 1.1556 1.1768 1.1739 1.1910
reshape(result_p(1:10),1,[])
ans =
1.0e+03 * 1.1392 + 0.0000i 1.1815 - 0.0000i 1.1726 + 0.0000i 1.1622 + 0.0000i 1.1707 + 0.0000i 1.1688 - 0.0000i 1.1556 + 0.0000i 1.1768 + 0.0000i 1.1739 + 0.0000i 1.1910 - 0.0000i
So, same result but a lot faster, and without those residual imaginary values that are only non-zero because of numerical effects. E.g.,
norm(imag(result_p(:)))
ans = 5.0187e-11
If the B pages are not symmetric, then the imaginary part of the result will not be 0. However, you can still probably save time by splitting things up and avoiding the intermediate 0 multiplies. E.g., just do the pagewise equivalent of:
ar'*B*ar + ai'*B*ai + (ar'*B*ai - ai'*B*ar)*1i
which would be:
tic
N = numel(a);
ar = real(a); ai = imag(a);
BN = reshape(B,N,[]);
result = complex(ar' * reshape( ar' * BN, N, [] ) + ai' * reshape( ai' * BN, N, [] ),...
ai' * reshape( ar' * BN, N, [] ) + ar' * reshape( ai' * BN, N, [] ));
toc
Elapsed time is 0.385149 seconds.
Still quite a bit faster than the pagemtimes( ) method with all those unnecessary 0 multiplies. When the B pages are symmetric, that last imaginary part is identically 0 and drops out.
  3 Comments
Bruno Luong
Bruno Luong on 8 Jul 2023
Edited: Bruno Luong on 8 Jul 2023
Few methods and timings I play with on my Laptop, seemps twice faster than TMW online machine.
I don't know whereas BLLAS is behid tensorprod, if someone knows please inform. But it is competitive to pagemtimes.
a = complex(rand(50,1),rand(50,1));
b = rand(50,50,100000);
tic
c = pagemtimes( a',pagemtimes(b,a));
toc % Elapsed time is 0.809244 seconds.
Elapsed time is 1.704063 seconds.
tic
c = pagemtimes( a',pagemtimes(complex(b),a));
toc % Elapsed time is 0.846693 seconds.
Elapsed time is 1.629519 seconds.
tic
tmp = tensorprod(a',b,2,1);
c = tensorprod(tmp,a,2,1);
c = reshape(c,1,1,size(b,3));
toc % Elapsed time is 0.821394 seconds.
Elapsed time is 1.623105 seconds.
tic
A = conj(a)*a.';
c = tensorprod(A,b,[1 2],[1 2]);
c = reshape(c,1,1,size(b,3));
toc % Elapsed time is 0.796169 seconds.
Elapsed time is 1.523866 seconds.
[m,n,p] = size(b);
tic
ar = reshape(real(a),1,m);
ai = reshape(imag(a),1,m);
b = reshape(b,m,n*p);
d = [ar; -ai]*b;
d = complex(d(1,:),d(2,:));
d = reshape(d,n,p);
c = reshape(a.'*d,1,1,p);
b = reshape(b,m,n,p);
toc % Elapsed time is 0.089465 seconds.
Elapsed time is 0.207498 seconds.
Bruno Luong
Bruno Luong on 8 Jul 2023
Edited: Bruno Luong on 8 Jul 2023
One more method, so far the fatest
a = complex(rand(50,1),rand(50,1));
b = rand(50,50,100000);
tic
[m,n,p] = size(b);
A = conj(a)*a.';
b = reshape(b,m*n,p);
A = reshape(A,1,m*n);
Ar = real(A);
Ai = imag(A);
c = [Ar; Ai]*b;
c = reshape(complex(c(1,:),c(2,:)),1,1,p);
b = reshape(b,m,n,p);
toc % Elapsed time is 0.059821 seconds.
Elapsed time is 0.106855 seconds.

Sign in to comment.

More Answers (0)

Categories

Find more on Linear Algebra in Help Center and File Exchange

Products


Release

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!