Parallel rank calculation for sparse matrices -- suggestions?

I need to calculate the rank of large (> 1 terabyte of non-zero elements) sparse matrices with MATLAB. Exploring the Parallel Toolbox, but can't seem to find anything that convinces me what is offers will be helpful. If I'm wrong, can someone point me in the right direction? Ideally, I'd just want to take my existing code that has "r=sprank(A)" in it and have that library call run in parallel, perhaps with some additional annotation as needed. Would using a GPUARRAY help here, for example? Doesn't seem like it, but perhaps I'm wrong. Hoping someone here can help me out. Thanks!
Barry Fagin
Professor of Computer Science
barry.fagin@afacademy.af.edu

9 Comments

What is 1T? Is there any exploitable structure to the sparse matrix?
Hi James. Sorry, edited now to say > 1 terabyte of non-zero elts. Alas, there is no exploitable structure to these matrices that I can figure out, although if I knew what might help with parallelizability I might look a little harder :-). --BF
So this afternoon I didn't understand that, but now I do. I will take it from here. Thanks Bruno! --BF
I'll move my comment so you can accept it if it helps
sparse arrays are fairly inefficient on gpu array. Do not use gpuarray for this purpose.
After doing a little digging, I now believe sprank() merely counts the number of non-zero columns. This is not what I require. I need something that will give me the identical value of r produced by this MATLAB code:
***
A = sparse(tall,skinny)
[some calculations with A]
fA = dense(A);
r = rank(fA)
***
If someone believes I am mistaken about sprank(), I'm all ears.
I did find this:
but I do not yet understand how to interpret the results it produces. In particular, I have examples where the null space is empty, but the results obtained by the above routine do not make sense to me.
A = sparse(tall,skinny)
How tall? How skinny? What currently rules out the approach already discussed below of using rank(full(A.'*A))?
"After doing a little digging, I now believe sprank() merely counts the number of non-zero columns. This is not what I require."
It does something more intelligent than that it match the row to column through Dulmange Mendelsohn permutation. In your case might be all the non-zero column can be matched. But that is exactly what structural rank means.

Sign in to comment.

 Accepted Answer

Bruno Luong
Bruno Luong on 30 Aug 2022
Moved: Bruno Luong on 30 Aug 2022
Something I don't get : sprank doc tells when you run witth threadpool it runs in parallel.

5 Comments

That "extended capability" section means simply that sprank can run on a thread-based worker (not all functions can), not that a single call to sprank can be parallelised over multiple thread-based workers. See more here in the doc.
Simple question : the doc claims in the extendedd capability section "...accelerate code with Parallel Computing Toolbox™ ThreadPool."
So is ot faster or not using Parallel Toolbox?
A single call to sprank is not accelerated using a thread pool. The extended capability means it is possible to make multiple simultaneous calls to sprank on a thread pool.
IMO the doc should clarify this.
But then I still have some doubt. If user runs multiple sprank in parallel, it will take less worker to work on one sprank. So at the end do we really speed up the whole thing?
Yes, the "extended capability" description could probably do with some refinement to make it clear exactly what works. (Today, there are a good number of MATLAB functions that cannot be run on a thread pool worker, and this "extended capability" really means simply that the function can run on a thread pool worker).
The performance situation is ... complicated, unfortunately. There is no single simple answer as to whether running N copies of a given function concurrently on workers is faster than running N copies sequentially on the client. It depends on all sorts of details of the implementation of that function - but primarily whether the function is already intrinsically multithreaded by MATLAB itself.

Sign in to comment.

More Answers (2)

Is the matrix square or is it tall and thin? If the later, then it may be an easier computation to compute rank(A.'*A).

9 Comments

NOTE: sprank is NOT rank
S = sparse(ones(3));
sprank(S)
ans = 3
rank(full(S))
ans = 1
Interesting. But the OP seems to use "rank" and "sprank" interchangeably, so either could be the intended computation.
I can only use sprank since my matrix is sparse, too large to be converted to full. So I'll use sprank and make the best of it. The reference to 'svd' below comes from an internal MATLAB library, all I did to generate the message below was call rank() on a sparse matrix. --BF
***
Error using svd
SVD does not support sparse matrices. Use SVDS to compute a subset of the singular values and vectors of a sparse matrix.
Error in rank (line 14)
s = svd(A);
Yes, but I think you do not understand that IF your matrix is tall but sufficiently narrow, then you CAN do this:
rank(full(A.'*A))
The problem with this solution is it loses a lot of precision. So a matrix that is CLOSE to singular but is not truly singular will suffer a problem. For example:
format long g
A = [1:5;2:6]';
A(:,3) = A(:,2) + randn(5,1)*1e-12
A = 5×3
1 2 1.99999999999926 2 3 2.99999999999993 3 4 3.99999999999839 4 5 5.00000000000203 5 6 6.00000000000069
So it is not truly singular, nor is it even sparse, but the sparsity or the size of it are irrelevant here.
rank(A)
ans =
3
So rank knows it to be full rank. However, the trick by forming A'*A causes rank to be confused.
rank(full(A.'*A))
ans =
2
As you can see, rank fails here, even though we know that A itself was of full rank. The problem is, when you form A'*A, this operation squares the singular values of A.
svd(A)
ans = 3×1
15.3157988606437 0.652920562027581 1.62159646645945e-12
but here, squaring the singular values causes a loss of precision, and rank is unable to see the true rank of the product matrix, only computing the numerical rank of that matrix, which is now only 2.
svd(A'*A)
ans = 3×1
234.573694739694 0.426305260318399 4.07630245872306e-17
"it may be an easier computation to compute rank(A.'*A)."
This is not true in general for complex matrix
A=[1 i; i -1]
A =
1.0000 + 0.0000i 0.0000 + 1.0000i 0.0000 + 1.0000i -1.0000 + 0.0000i
rank(A.'*A)
ans = 0
rank(A)
ans = 1
the correct formula is
rank(A'*A)
ans = 1
btw, For fat and wide matrix
rank(A) == rank(A*A')
There is no reason why the trick works for one diimension but not the other.
FYI, my matrices are tall and skinny. --BF
I like the A.'*A trick. Or one could ...wait for it ...
... use MATLAB's rref() command, which works on sparse matrices, and then count the nonzero rows. Sigh.
But that in turn begs the question of roundoff errors. Time to move to a new question, which I'll post shortly.
--BF
I would take caution before relying on rref. It has been often found to lack robustness, see e.g.,

Sign in to comment.

For thin and tall sparse matrix A of size (m x n), m>>n and n in the order of 1000s, it might be possible to compute the rank using q-less qr, which is better than rank(A'*A) which has the drawback of square the condition number.
A=sprand(100000,10,0.1)*sprand(10,100,0.1);
R=qr(A,0);
rank(full(R))
ans = 10

Categories

Find more on Linear Algebra in Help Center and File Exchange

Asked:

on 30 Aug 2022

Commented:

on 6 Sep 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!