Why is memory reduced for an identical copy of a sparse array?

2 views (last 30 days)
A = sparse([1,1],[1,1],[pi pi],1,1);
AA = A(1:end,1:end);
whos A; whos AA;
The minimal example above reveals a phenomenon I currently can't explain (A 48 bytes vs. AA 32 bytes)
In what way is data stored differently in A than in AA?

Accepted Answer

Bruno Luong
Bruno Luong on 7 Aug 2020
Edited: Bruno Luong on 7 Aug 2020
Nah nothing to do with name length, it's the number of internal non zeros of ('Ir" and "Pr" old notation) array reserved by MATLAB. In the first case calling with SPARSE it's 2 doubles, in the second case it's 1, the minimal possible value, or something like that.
The copied matrix has even a "fictive" size, since their data array must be shared. It will be copied when data eventually is written later.
These arrays will have the same size since we force the number of non-zeros reserved to be 1.
AAAAAAA = sparse([1],[1],[pi+pi],1,1,1); % Check out the NZ argument of sparse
A = AAAAAAA(1:end,1:end);
>> whos AAAAAAA
Name Size Bytes Class Attributes
AAAAAAA 1x1 32 double sparse
>> whos A
Name Size Bytes Class Attributes
A 1x1 32 double sparse
  4 Comments
Bruno Luong
Bruno Luong on 7 Aug 2020
Edited: Bruno Luong on 8 Aug 2020
No, there is not separate pointers, the internal coding is three arrays: Ir+ Pr + Jc.
Ir is nzmax x 8 bytes
Pr is nzmax x 8 bytes for real sparse, nzmax x 16 bytes for complex
Jc is (size(S,2) + 1) x 8 bytes
The smallest nzmax is ... non zeros elements (nnz), but user can allocate nzmax > nz in case of changing a 0 element, and prevent MATLAB to reallocate arrays.
I think first SPARSE command with 2 elements accumulate at the same address, estimates the NZMAX elements = 2, allocates array before accumulates for the reason of speed; and it can be wasting memory. Not sure if there is way to reduce NZMAX using pure MATLAB command, but it's easy with MEX programming.
I shouldn't worry if you can afford to have some percentage of waste.
Bruno Luong
Bruno Luong on 8 Aug 2020
Edited: Bruno Luong on 8 Aug 2020
Reverse enginering tests to see when MATLAB Trims
According to my test, MATLAB leaves alone allocation internal arrays when NZMAX < 3*NNZ, but for ratio larger than 2, MATLAB will trims
% 2D grid undirected graoh adjacent matrix
[X,Y] = ndgrid(1:1000);
n = numel(X);
I1 = sub2ind(size(X),X(1:end-1,:),Y(1:end-1,:));
J1 = I1(:)+1;
I2 = sub2ind(size(X),X(:,1:end-1),Y(:,1:end-1));
J2 = I2(:)+size(X,1);
I0 = [I1(:); I2(:)];
J0 = [J1(:); J2(:)];
[I0,J0] = deal([I0(:);J0(:)],[J0(:);I0(:)]);
V0 = 1+zeros(size(I0));
% here is the matrix
% A = sparse(I0,J0,V0,n,n);
% Loop to create different proportions of duplicated I/J
for nrep = 1:0.1:4
% Duplicate the I/J at a given proportion
nc = ceil(nrep);
p = ceil(size(I0,1)*nrep);
I = repmat(I0,nc,1);
J = repmat(J0,nc,1);
V = repmat(V0,nc,1);
I = I(1:p);
J = J(1:p);
V = V(1:p);
% Create sparse matrix
A = sparse(I,J,V,n,n);
% Check surplus
fprintf('nrep=%f, allocation surplus ratio = %f\n', nrep, nzmax(A)/nnz(A));
end
Result
>> toto
nrep=1.000000, allocation surplus ratio = 1.000000
nrep=1.100000, allocation surplus ratio = 1.100000
nrep=1.200000, allocation surplus ratio = 1.200000
nrep=1.300000, allocation surplus ratio = 1.300000
nrep=1.400000, allocation surplus ratio = 1.400000
nrep=1.500000, allocation surplus ratio = 1.500000
nrep=1.600000, allocation surplus ratio = 1.600000
nrep=1.700000, allocation surplus ratio = 1.700000
nrep=1.800000, allocation surplus ratio = 1.800000
nrep=1.900000, allocation surplus ratio = 1.900000
nrep=2.000000, allocation surplus ratio = 2.000000 % <= YOUR TEST IS HERE
nrep=2.100000, allocation surplus ratio = 2.100000
nrep=2.200000, allocation surplus ratio = 2.200000
nrep=2.300000, allocation surplus ratio = 2.300000
nrep=2.400000, allocation surplus ratio = 2.400000
nrep=2.500000, allocation surplus ratio = 2.500000
nrep=2.600000, allocation surplus ratio = 2.600000
nrep=2.700000, allocation surplus ratio = 2.700000
nrep=2.800000, allocation surplus ratio = 2.800000
nrep=2.900000, allocation surplus ratio = 2.900000
nrep=3.000000, allocation surplus ratio = 1.000000 % ****** IT TRIMS FROM 3 ******
nrep=3.100000, allocation surplus ratio = 1.000000
nrep=3.200000, allocation surplus ratio = 1.000000
nrep=3.300000, allocation surplus ratio = 1.000000
nrep=3.400000, allocation surplus ratio = 1.000000
nrep=3.500000, allocation surplus ratio = 1.000000
nrep=3.600000, allocation surplus ratio = 1.000000
nrep=3.700000, allocation surplus ratio = 1.000000
nrep=3.800000, allocation surplus ratio = 1.000000
nrep=3.900000, allocation surplus ratio = 1.000000
nrep=4.000000, allocation surplus ratio = 1.000000
>>

Sign in to comment.

More Answers (3)

Bjorn Gustavsson
Bjorn Gustavsson on 7 Aug 2020
It seems to be so that your difference of 16 bytes is simply due to the longer name of your AA:
AA = sparse([1,1,2],[1,1,2],[pi pi exp(1)],2,2);
A = AA(1:end,1:end);
whos A; whos AA;
% Returned:
% Name Size Bytes Class Attributes
%
% A 2x2 56 double sparse
%
% Name Size Bytes Class Attributes
%
% AA 2x2 72 double sparse
Since the sice-difference seems independent on the size of A and AA and this example shows the opposite sizes compared to your example...

Walter Roberson
Walter Roberson on 7 Aug 2020
Edited: Walter Roberson on 7 Aug 2020
The operation AAAAAAA(1:end,1:end) is not the same as AAAAAAA by itself.
>> V=[1 -2 3 -4 5]
V =
1 -2 3 -4 5
>> format debug
>> V
V =
Structure address = 1cc30d9c0
m = 1
n = 5
pr = 60c004eb8b60
1 -2 3 -4 5
>> B=V
B =
Structure address = 1cc30d9c0
m = 1
n = 5
pr = 60c004eb8b60
1 -2 3 -4 5
so a direct copy of the entire matrix gets the same data pointer
>> C=V(1:end)
C =
Structure address = 1cc30c400
m = 1
n = 5
pr = 60c004eb8bc0
1 -2 3 -4 5
but a copy of all of the elements does not get the same data pointer.
In every case where you access a subset of values, the data pointer must not be the same (at least for non-sparse arrays). Hypothetically the code could special-case the situation where the indices were total and in the normal order... but the code does not do that. As it is, the way to signal that you want a deep copy of a normal variable is to use 1:end, so an optimization to re-use the data pointer for the case where the entire array was being accessed would probably lead to subtle problems.
The other piece of information you need to know is that when you use a sparse array in an expression that is not using the total array alone, then MATLAB always strips off the slack non-zeros. For example if you had
A = spalloc(50, 50, 75); %75 is number of non-zeros
A = A + speye(50);
then the A + speye(50) will give a temporary sparse array that has non-zeros stripped out, and it is that stripped array that would be assigned over top of A.
This is similar to how you can define
B = complex(ones(1,10), zeros(1,10))
and B will be datatype complex, but as soon as you do any calculation with it, like
C = B + 0
then the temporary value B + 0 will be examined and found to have an all-zero complex part and the complex part will be stripped.

Matt J
Matt J on 8 Aug 2020
Edited: Matt J on 8 Aug 2020
Are you sure you have to worry about it? According to what I'm seeing (in R2019a,R2020a), the extra 16 bytes occurs only anomolously when you have 2 and only 2 repeated entries. Any more than that, and you're fine:
Bytes=nan(1,10);
for N=1:numel(Bytes);
I=ones(1,N);
J=ones(1,N);
S=pi*ones(1,N);
A=sparse(I,J,S,1,1);
Bytes(N)=getfield(whos('A'),'bytes');
end
bar(Bytes)
ylabel 'Bytes'
xlabel 'N'
  1 Comment
Matt J
Matt J on 8 Aug 2020
Edited: Matt J on 8 Aug 2020
Well, I guess it might be a problem if you have lots of distinct pairs of duplicate entry coordinates. In any case, here is a way you could pre-trim the data using splitapply or accumarray,
%% Construction with pre-accumulation
[IJ,~,G]=unique([I(:),J(:)],'rows');
Sacc=accumarray(G(:),S(:));
A=sparse(IJ(:,1),IJ(:,2),Sacc);
A modified version of the test above shows that we get the proper nzmax for the resulting matrix:
M=300;
Bytes=nan(10,2);
for N=1:size(Bytes,1)
I=(1:M).' * ones(1,N);
J=ones(M,N);
S=pi*ones(M,N);
%%Conventional construction
A0=sparse(I,J,S);
Bytes(N,1)=getfield(whos('A0'),'bytes');
%% Construction with pre-accumulation
[IJ,~,G]=unique([I(:),J(:)],'rows');
Sacc=accumarray(G(:),S(:));
A=sparse(IJ(:,1),IJ(:,2),Sacc);
Bytes(N,2)=getfield(whos('A'),'bytes');
end
bar(Bytes)
ylabel 'Bytes'
xlabel 'N'
legend('Conventional','With Pre-Accumulation')

Sign in to comment.

Categories

Find more on Sparse Matrices in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!