How can I extract a certain 'cluster' of elements according to a particular condition on the elements?

10 views (last 30 days)
Ansh
Ansh on 21 Jan 2016
Edited: Ansh on 28 Jan 2016
I have a matrix (about 342 by 342) denoted by C(k,l) and I want to identify all cluster of indices of the original according to the condition C(k,l) > rho. I.e. I want all square matrices C'(a,b) of C(k,l) such that C'(a,b) > rho for all pairs of indices a and b
For example, if I have the matrix C(i,j) as:
C = 1 0.8 0.7
0.8 1 0.5
0.7 0.5 1
And rho = 0.6 then a correct square matrix I want my code to identify is:
C'= 1 0.7
0.7 1
This is not unique of course and the result as given by the example above is not necessarily a submatrix. I am not sure how/the best way to do this is in MATLAB? If possible, I would also like identify what a and b are for each possible matrix e.g. for my example above a and b can be 1 or 3. The matrices are always symmetric and the diagonal entries are always 1.
  8 Comments
Ansh
Ansh on 22 Jan 2016
Image Analyst,
The matrices being considered are correlation matrices. I am using a clustering procedure to find a cluster of indices (in this case indices are stocks) that are highly correlated to each other. This involves extracting a square submatrix from the original matrix such that of all of its entries are >= rho as described in the problem. In the actual correlation matrices to be used (which are taken from empirical data) it is possible that this may not give a unique cluster (possibly not given the size), hence why I have asked for all such submatrices. Does this help in anyway?

Sign in to comment.

Answers (2)

Kirby Fears
Kirby Fears on 21 Jan 2016
Edited: Kirby Fears on 21 Jan 2016
Assuming you only want to find submatrices along the diagonal of C, the following code extracts all square submatrices (>rho) into a table S. This should be a good starting point for whatever assumptions you end up deciding on.
% make data
sizeC = 342;
rho = 0.6;
c = rand(sizeC);
c(1:(sizeC+1):end) = 1;
% prep
S = cell((sizeC-2)*(sizeC-1),3);
varNames = {'S','sizeS','diagC'};
idxRho = c>rho;
counterS = 1;
% traverse submatrix size
for sizeS = (sizeC-1):-1:2,
% traverse diagonal of c
for d = 1:(sizeC-sizeS),
% store valid submatrix with meta info
if all(idxRho(d:(d+sizeS-1),d:(d+sizeS-1))),
S(counterS,:) = {c(d:(d+sizeS-1),d:(d+sizeS-1)),...
sizeS,d};
counterS = counterS + 1;
end
end
end
% drop extra rows of S
if counterS<=size(S,1),
S(counterS:end,:)=[];
end
% convert S to table
S = array2table(S,'VariableNames',varNames);
Hope this helps.
  9 Comments
Ansh
Ansh on 27 Jan 2016
Hi Kirby Fears,
I was attempting to use the sets of indices to clarify what it is I wanted, obviously it seems as if it has had the opposite effect. Thank you for taking the time to answer anyway. In hindsight, this question could have been posed better from the start to avoid the confusion caused later. Many thanks for the best wishes.

Sign in to comment.


Stephen
Stephen on 23 Jan 2016
Edited: Stephen on 24 Jan 2016
Assuming that the input matrix is always square and symmetric:
>> D = [1,0.8,0.9,0.5;0.8,1,0.6,0.1;0.9,0.6,1,0.7;0.5,0.1,0.7,1]
D =
1 0.8 0.9 0.5
0.8 1 0.6 0.1
0.9 0.6 1 0.7
0.5 0.1 0.7 1
>> rho = 0.6;
>> [R,C] = find(tril(D,-1)>rho);
>> out = arrayfun(@(r,c)D([r,c],[r,c]),R,C,'UniformOutput',false);
>> out{:}
ans =
1 0.8
0.8 1
ans =
1 0.9
0.9 1
ans =
1 0.7
0.7 1
  5 Comments
Ansh
Ansh on 28 Jan 2016
Thank you Stephen for your answer, it is partly that reason why I first posted on here. I shall go away and think of an alternative procedure. If I can analyse the data I have and come up with an upper bound on the size of the cluster that could help.

Sign in to comment.

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!