A single precision matrix multiplication problem
Show older comments
scripts:
a=[-0.1708981,-0.1415759,-0.0727073,-0.0554292;
0.0000000,-0.1715775,-0.1147511,-0.1098115;
0.0000000,0.0000000,-0.4013846,-0.3982548;
0.0000000,0.0000000,0.0000000,-0.0430463];
a=single(a);
b=[16557086;2829575;3241423;7206131];
b=single(b);
y=a*b;
z=zeros(4,1);
z=single(z);
for ii=1:1:4
for jj=1:1:4
z(ii)=a(ii,jj)*b(jj)+z(ii);
end
end
c=y-z;
plot(c) %=> It's not all 0
The algorithm says y and z should be the same,But c is not all 0,why is this?
Even multiplying individual rows and columns makes a difference.
9 Comments
Running here on Answers the result, c, is all zeros. I'm not sure if I should be surprised by that result or not as it suggests that, for these inputs, the loop is doing compuations in the exact same order as mtimes.
a=[-0.1708981,-0.1415759,-0.0727073,-0.0554292;
0.0000000,-0.1715775,-0.1147511,-0.1098115;
0.0000000,0.0000000,-0.4013846,-0.3982548;
0.0000000,0.0000000,0.0000000,-0.0430463];
a=single(a);
b=[16557086;2829575;3241423;7206131];
b=single(b);
y=a*b;
z=zeros(4,1);
z=single(z);
for ii=1:1:4
for jj=1:1:4
z(ii)=a(ii,jj)*b(jj)+z(ii);
end
end
c=y-z;
max(abs(c))
plot(c) %=> It's not all 0
jian
on 18 May 2023
Dyuman Joshi
on 18 May 2023
Edited: Dyuman Joshi
on 18 May 2023
I ran it in my laptop with R2021a and it produced the same results as the Live editor is producing here.
Edit - I don't think there has been any updates regarding the functions used in this code from 2019 onwards.
Paul
on 18 May 2023
The screen cap indicates you used R2021a Update 5
Dyuman Joshi
on 18 May 2023
Edited: Dyuman Joshi
on 18 May 2023
My bad, I used the command "version" to show the version of my MALTAB, but typed out 2019. It seems my mind still confuses 2021 with 2019 :/
I will edit my comment above, Thanks for pointing it out.
Paul
on 18 May 2023
Can you comment on the results in this thread where Answers 2023a and 2021a U5 have mtimes yield the same result as the loop, but that is not the case with 2019a. I didn't see any changes to mtimes in the release notes since 2019a. Is it possible mtimes works differently (order of computations) on different hardware?
Steven Lord
on 19 May 2023
If I recall correctly we upgraded some of the libraries used to perform matrix multiplication during one or more of the releases between release R2019a and release R2021a to fix bugs, improve performance, and/or improve accuracy.
Paul
on 19 May 2023
I checked the Release notes from 2019a onwards and didn't see any mention of changes to mtimes, which is suprising if such a heavy-use function actually had a behavior change. I didn't check all of the bug fix pages, but I'd be surprised to find it there, at least relative to this issue.
James Tursa
on 19 May 2023
Edited: James Tursa
on 19 May 2023
"... Is it possible mtimes works differently (order of computations) on different hardware? ..."
Yes. The function mtimes( ) for full double and single matrices is performed in the background by a multi-threaded BLAS library. Anytime that library changes you should expect there might be small differences in the outputs. Different MATLAB versions might use different libraries, and different machines/hardware might use different libraries or call different routines in the libraries (e.g., symmetric vs generic), or maybe the same library is used but with a different number of threads available. All of this can affect order of computations. MATLAB used to get their BLAS & LAPACK libraries from a 3rd party. I haven't checked lately but I presume they still do. Bottom line is you shouldn't expect floating point calculations to necessarily match exactly when comparing different MATLAB environments.
Accepted Answer
More Answers (1)
James Tursa
on 19 May 2023
Edited: James Tursa
on 22 May 2023
Somewhat related side comments:
In general, the BLAS/LAPACK libraries do not contain mixed type argument routines. E.g., there are no BLAS matrix multiply routines to multiply a single precision matrix with a double precision matrix. To accomplish this with BLAS routines one must first make a temporary copy of the single matrix as a double matrix and then call the appropriate BLAS routine. This of course will chew up extra time and memory. The only exceptions to this are some accumulation routines (e.g., dot product of single precision inputs but using a double precision accumulator). That being said, I don't know if MATLAB ever makes use of these extra precision accumulator routines that are available.
Related to that, there are also no mixed real/complex argument routines in the BLAS/LAPACK libraries. Either all arguments have to be real or all arguments have to be complex. And all of the complex variables are assumed to be interleaved memory model (this was true even back in the R2017b and earlier days).
MATLAB R2017b and prior used a separate complex memory model, meaning the real part of a variable was held in contiguous memory that was separate from the imaginary part which was also held in contiguous memory. For MATLAB R2018a and later this changed to an interleaved complex memory model, where the real part is interleaved with the imaginary part in memory. This now aligns better with the BLAS/LAPACK libraries and how complex variables are typically held in memory in other languages such as Fortran and C/C++, but has implications for how the BLAS/LAPACK routines can be used in the background. A couple of examples:
(Real Matrix) * (Complex Matrix)
R2017b and earlier: This can be accomplished with a series of Real BLAS routine calls on the individual real and imaginary parts. No need for temporary copies.
R2018a and later: This requires that a temporary copy of the real matrix be made as an interleaved complex matrix with imaginary part 0 so that the Complex BLAS routine can be used. Takes extra time and memory compared to R2017b and earlier, and also lots of unnecessary 0 multiplies will be done (which might give some NaN results that don't appear in R2017b if there are inf's involved).
inv(Complex Matrix)
R2017b and earlier: This involves calling complex LAPACK routines, so the separate real and imaginary parts of the variable must first be copied into an equivalent interleaved variable. Then the appropriate LAPACK routines are called. Then the interleaved result must be copied into a separate memory model variable. Takes extra time and memory compared to R2018b and later.
R2018b and later: This can be accomplished with direct complex LAPACK routine calls. No need for temporary copies.
Symmetrix Matrix Results:
A = rand(5000);
AT = A'; % Explicit transpose constructed in memory
tic; x = A' * A; toc % Symmetrix matrix multiply routine called, A' not explicitly formed
tic; y = AT * A; toc % Generic matrix multiply routine called using explicitly formed A'
isequal(x,y)
isequal(x,x') % Strictly symmetric result
isequal(y,y') % Might not be strictly symmetric result
So, with this last example, you can influence which BLAS routines are called in the background by how you write your code. Care must be taken to maintain strict symmetry of results. In the A' * A example, MATLAB can recognize the symmetry of the inputs and call a symmetric routine in the background without explicitly forming the transpose A'. Takes less time and guaranteed results in strict symmetric result. In the AT * A example, the transpose A' is explicitly formed but MATLAB has no clue that AT is the exact transpose of A, so a generic matrix multiply routine is called which takes longer and doesn't guarantee a strict symmetric result. In that last case we happened to get strict symmetry, but this will highly depend on your MATLAB/BLAS version and can't be guaranteed.
7 Comments
Paul
on 22 May 2023
Does Matlab identify strict symmetry for temporaray variables? That is if I have
C = B' * A' * A * B
I don't think that Matlab would do the A' * A first using the symmetric routine (I recall seeing an mlint message aobut that. If, instead, I do
C = (A*B)' * (A*B)
does Matlab recognize the symmetry and use the symmetric routine?
For this:
C = B' * A' * A * B
MATLAB will do this left to right, so ...
C = ((B' * A') * A) * B % all generic routines called
For this:
C = (A*B)' * (A*B) % all generic routines called?
Unless the parser is smarter than it used to be, the first A*B will be calculated, and the second A*B will separately be calculated. Since these are physically two different matrices, MATLAB does not recognize the symmetry and generic routines will be called for the middle multiply.
The correct way to do this calculation to guarantee MATLAB will recognize and use the symmetry is:
AB = A*B; % generic routine called
C = AB' * AB; % symmetric routine called, C guaranteed symmetric
This leaves AB in your workspace, of course, which is not the case with the temporary A*B copies above. That's the price you pay to make sure the symmetric routine is called. But it's a very small price given that you can always clear AB right afterwards if you don't want it hanging around.
Now a test of online MATLAB:
A = rand(5000);
B = rand(5000);
tic; AB = A*B; C = AB' * AB; toc
tic; D = (A*B)' * (A*B); toc
isequal(C,D)
isequal(C',C)
isequal(D',D)
So based on the timings it looks like the online parser isn't yet smart enough to recognize the symmetry in the (A*B)' * (A*B) case, and generic matrix multiply routines are called in the background.
When I run this code on my desktop PC version R2021a I get these results:
Elapsed time is 2.058999 seconds.
Elapsed time is 3.581072 seconds.
ans =
logical
0
ans =
logical
1
ans =
logical
1
So C and D are both symmetric, but they are not equal and you get two different results with this MATLAB/BLAS version. Which should not be surprising given that different background routines are called. When you compare the difference:
max(abs((C(:)-D(:))./C(:)))
ans =
3.6221e-16
You can see that the relative difference is down in the least significant bits. Both answers are "correct" in that sense. But you can save execution time and temporary memory and guarantee symmetric results if you take care in how you write your code.
Paul
on 23 May 2023
Do you know if there is any effiiency to be gained by assigning to C and then overwriting, instead of creating the temporary AB (assuming a case where A*B is square so C doesn't have to change size)?
%AB = A*B; % generic routine called
%C = AB' * AB; % symmetric routine called, C guaranteed symmetric
C = A*B;
C = C' * C;
James Tursa
on 23 May 2023
Edited: James Tursa
on 23 May 2023
There is probably no way to answer this definitively. It will depend on the MATLAB parser and memory manager as to what exactly happens in the background for cases like this, and those rules may change between MATLAB versions. Knowing when the mxArray structure addresses and data pointers get reused vs reallocated is not something TMW publishes. For small sized matrices you might see subtle differences in timings depending on how you code things. But for large sized matrices you will definitely see timings dominated by unnecessary temporary memory copies and/or generic vs symmetric routines.
This does bring up a side discussion about so-called "inplace" operations. If you call a function from within another function using R = myfunction(R,etc.), and that function has a signature like R = myfunction(R,etc.), then MATLAB can sometimes do inplace operations on the variable R (writing into current data pointer) rather than reallocating new memory and freeing old memory.
Paul
on 23 May 2023
Do you know if there is a symmetric multiplication BLAS routine that does the computations in place that Matlab could potentially take advantage of? Something like
subroutine symmmlt(A) ! returns trans(A)*A in A
James Tursa
on 23 May 2023
Edited: James Tursa
on 23 May 2023
You can look here to get an idea of what the routine signatures are for the BLAS: https://netlib.org/blas/
Note that the symmetric routines work with only the upper or lower triangle. If you want the full result you have to manually write extra code to copy one triangle into the other before returning the result back to MATLAB.
The Level 3 routines do the matrix multiplies, and all of the signatures (generic and symmetric) indicate that the multiplies involving A (and B) accumulate into a separate matrix C. Of course you could pass the pointer to A in the C spot, but my guess is that this is undefined behavior since the routines are likely allowed to accumulate into C in multiple steps. I.e., A would be overwritten in the middle of the matrix multiply algorithms, invalidating the result. It wouldn't be too hard to code up a mex routine to see what the MATLAB BLAS does in this situation.
James Tursa
on 23 May 2023
Edited: James Tursa
on 23 May 2023
So I went ahead and did this test and got what I expected, an invalid result. With the correct mex code:
>> A = rand(5000);
>> B = rand(5000);
>> C = A*B;
>> Cmex = dgemm_inplace_test(A,B);
>> isequal(C,Cmex)
ans =
logical
1
With the incorrect inplace mex code:
>> Cmex = dgemm_inplace_test(A,B);
>> isequal(C,A) % comparing to A because that's where inplace result should be
ans =
logical
0
>> max(abs(A(:)-C(:)))
ans =
1.1143e+27
And here is the code, calling with square matrices so that A and C are the same size for the inplace test:
// Does generic matrix multiply using BLAS dgemm( )
// C = A * B
/* Includes ----------------------------------------------------------- */
#include "mex.h"
#include "blas.h"
/* Gateway ------------------------------------------------------------ */
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
char TRANS = 'N';
mwSignedIndex M, N, K, LDA, LDB, LDC;
double *A, *B, *C;
double ALPHA = 1.0, BETA = 0.0;
if( nlhs > 1 )
mexErrMsgTxt("Too many outputs.");
if( nrhs != 2 || !mxIsDouble(prhs[0]) || !mxIsDouble(prhs[1]) ||
mxIsSparse(prhs[0]) || mxIsSparse(prhs[1]) ||
mxIsComplex(prhs[0]) || mxIsComplex(prhs[1]) ||
mxGetNumberOfDimensions(prhs[0]) != 2 || mxGetNumberOfDimensions(prhs[1]) != 2 )
mexErrMsgTxt("Need exactly two full double real 2D matrix inputs.");
M = mxGetM(prhs[0]);
K = mxGetN(prhs[0]);
N = mxGetN(prhs[1]);
LDA = M;
LDB = mxGetM(prhs[1]);
LDC = M;
A = mxGetData(prhs[0]);
B = mxGetData(prhs[1]);
if( LDB != K )
mexErrMsgTxt("Incorrect dimensions for matrix multiplication.");
plhs[0] = mxCreateDoubleMatrix( M, N, mxREAL );
C = mxGetData(plhs[0]);
// dgemm(&TRANS, &TRANS, &M, &N, &K, &ALPHA, A, &LDA, B, &LDB, &BETA, C, &LDC); // the correct code
dgemm(&TRANS, &TRANS, &M, &N, &K, &ALPHA, A, &LDA, B, &LDB, &BETA, A, &LDC); // the incorrect inplace code
}
Compiled using this helper function:
function compile_blas(varargin)
libdir = 'microsoft';
lib_blas = fullfile(matlabroot,'extern','lib',computer('arch'),libdir,'libmwblas.lib');
mex(varargin{:},lib_blas,'-largeArrayDims');
end
Categories
Find more on Logical in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

