Multithreaded sparse matrix multiplication?
13 views (last 30 days)
I am performing several (thousands) matrix multiplications of an NxN sparse (~1-2%) matrix, let's call it B, with an NxM dense matrix, let's call it A (where M<N). N is large, as is M; on the order of several thousands.
Now, usually, matrix multiplications and most other matrix operations are implicitly parallelized in Matlab, i.e. they make use of multiple threads automatically. This appears NOT to be the case if either of the matrices are sparse (see e.g. this StackOverflow discussion and this largely unanswered MathWorks thread). This is a rather unhappy surprise to me. We can verify this by the following code:
clc; clear all;
N = 5000; % set matrix sizes
M = 3000;
A = randn(N,M); % create dense random matrices
B = sprand(N,N,0.015); % create sparse random matrix
Bf = full(B); %create a dense form of the otherwise sparse matrix B
for i=1:3 % test for 1, 2, and 4 threads
m(i) = 2^(i-1);
maxNumCompThreads(m(i)); % set the thread count available to Matlab
tic % starts timer
y = B*A;
walltime(i) = toc; % wall clock time
speedup(i) = walltime(1)/walltime(i);
% display number of threads vs. speed up relative to just a single thread
This produces the following output, which illustrates that there is no difference between using 1, 2, and 4 threads:
If, on the other hand, I replace B by its dense form, refered to as Bf above, I get significant speedup:
So, my question: is there any way at all to access a parallelized/threaded version of matrix operations for sparse matrices without converting them to dense form? Alternatively, is this something one might expect would be implemented in future versions of Matlab? I found some suggestion involving .mex files here, but it seems the links are dead and not very well documented/no feedback?
It seems to be a rather severe restriction of implicit parallelism functionality, since sparse matrices are abound in computationally heavy problems, and hyperthreaded functionality highly desirable in these cases.
I appreciate anyone's consideration of the this issue at any level - thanks a bunch in advance! -Thomas
Eric Sampson on 15 Aug 2014
I believe the sparse matrix code is implemented by a few specialized TMW engineers rather than an external library like BLAS/LAPACK/LINPACK/etc. So it will be quite a bit of work for them, but you never know... Send them flowers, candy, etc!! :)