How can I extract a certain 'cluster' of elements according to a particular condition on the elements?

I have a matrix (about 342 by 342) denoted by C(k,l) and I want to identify all cluster of indices of the original according to the condition C(k,l) > rho. I.e. I want all square matrices C'(a,b) of C(k,l) such that C'(a,b) > rho for all pairs of indices a and b
For example, if I have the matrix C(i,j) as:
C = 1 0.8 0.7
0.8 1 0.5
0.7 0.5 1
And rho = 0.6 then a correct square matrix I want my code to identify is:
C'= 1 0.7
0.7 1
This is not unique of course and the result as given by the example above is not necessarily a submatrix. I am not sure how/the best way to do this is in MATLAB? If possible, I would also like identify what a and b are for each possible matrix e.g. for my example above a and b can be 1 or 3. The matrices are always symmetric and the diagonal entries are always 1.

8 Comments

C' = 1
or
C' = [1 0.8
0.8 1 ]
are also possible solutions ?
Best wishes
Torsten.
Thank you Torsten, I have modified the question so that I want the code to identify all submatrices which satisfy my condition. Does this clarify things?
Best wishes
Ansh
Do you have an upper bound for the number of matrices that satisfy your condition?
A 342 x 342 matrix has nchoosek(342, 171)^2 ~= 1.5e203 square submatrices of size 171 x 171. What if all of them satisfy your condition?
Well I would expect the size of the submatrix itself to be about 5-15 that gives a max ~= 3.29e51.
Also I guess if we start with
[row,col]=find(C > rho)
that would at least find the indices that could be a and b, then some further code on the result to find what we want? That might not be the best way to start, hence why I'm asking on here. Doing the above code on my matrix gives [row,col] to be about 1216 by 2 (with rho = 0.6)
Best
Ansh
Ansh,
Do you only want submatrices along the diagonal? Is each submatrix required to have 1's along the diagonal?
Also, if there is a 3x3 submatrix S with values greater than rho, there are also 4 2x2 submatrixes inside of S which are greater than rho. Do you want to count those too?
Obviously, as you can tell from the responses, this is confusing and not well described. We don't know why 0.8 is not in any of your output(s) even though it is > 0.6. Why is 0.8 not included anywhere even though it's greater than 0.6????
What if the connected blob consisting of numbers more than 0.6 is not a rectangle but some irregularly shaped blob? What then?
Help me get excited about this by describing the use case. Why do you want to do this? Knowing that may give us a clue as to a good approach to solving the main problem.
Kirby Fears,
Yes each submatrix is required to have 1's on the diagonal. What do you mean exactly by taking submatrices along the diagonal?
Many thanks
Image Analyst,
The matrices being considered are correlation matrices. I am using a clustering procedure to find a cluster of indices (in this case indices are stocks) that are highly correlated to each other. This involves extracting a square submatrix from the original matrix such that of all of its entries are >= rho as described in the problem. In the actual correlation matrices to be used (which are taken from empirical data) it is possible that this may not give a unique cluster (possibly not given the size), hence why I have asked for all such submatrices. Does this help in anyway?

Sign in to comment.

Answers (2)

Assuming you only want to find submatrices along the diagonal of C, the following code extracts all square submatrices (>rho) into a table S. This should be a good starting point for whatever assumptions you end up deciding on.
% make data
sizeC = 342;
rho = 0.6;
c = rand(sizeC);
c(1:(sizeC+1):end) = 1;
% prep
S = cell((sizeC-2)*(sizeC-1),3);
varNames = {'S','sizeS','diagC'};
idxRho = c>rho;
counterS = 1;
% traverse submatrix size
for sizeS = (sizeC-1):-1:2,
% traverse diagonal of c
for d = 1:(sizeC-sizeS),
% store valid submatrix with meta info
if all(idxRho(d:(d+sizeS-1),d:(d+sizeS-1))),
S(counterS,:) = {c(d:(d+sizeS-1),d:(d+sizeS-1)),...
sizeS,d};
counterS = counterS + 1;
end
end
end
% drop extra rows of S
if counterS<=size(S,1),
S(counterS:end,:)=[];
end
% convert S to table
S = array2table(S,'VariableNames',varNames);
Hope this helps.

9 Comments

Hi Kirby Fears,
I tried your code on the example given in my question and I only get the submatrix
C'= 1 0.8
0.8 1
But what about the submatrix the same as above but with 0.7 in place of the entries with 0.8? Is this what you mean by taking the submatrices along the diagonal?
Many thanks
Ansh
Ansh,
The matrix you are describing:
Cs = 1 0.7
0.7 1
Is not a submatrix of:
C = 1 0.8 0.7
0.8 1 0.5
0.7 0.5 1
Can you state exactly what you are trying to extract from C? In the following example, what are all of the Ds results you want to extract from D?
D = 1 0.8 0.9 0.5
0.8 1 0.6 0.1
0.9 0.6 1 0.7
0.5 0.1 0.7 1
Kirby Fears,
I suppose I was using the term 'submatrix' in an incorrect manner, so I shall use the word cluster from now on. With the condition C(i,j) > rho I want to extract the following clusters:
[ 1 0.8 , [ 1 0.9 , [ 1 0.7
0.8 1 ] 0.9 1 ] 0.7 1 ]
Phrased in another way (hopefully more helpful), I want the set of indices A according to the following condition
C(i,j) > rho for all i,j in A
i.e. A is simply a list in which each element satisfies 1 <= i <= size(C)
Does this help?
You can get the set of indices i,j where C(i,j) > rho with one line:
[i,j] = find(C>rho);
You can re-use these indices in C to retrieve values like this:
C(i(1),j(1))
C(i(2),j(2))
... etc
Yes, but I want the code to produce a square section of the matrix such that all elements in this section are > 0.6
Take the example in my question. Doing that will give e.g. [i,j]=[1,3] and [i,j]=[1,2], BUT C(2,3)=0.5 and so one of 2 or 3 cannot be included in the set of indices that includes 1. Of course, I want all possible combinations of indices that make a valid set.
Hi Ansh,
You asked for submatrices, which I provided in one manner and Stephen in another since you did not clarify what you mean by "submatrix". Then you said no, you want the index of C elements greater than rho, which I provided in the comment above. Now you want submatrices again.
For these reasons, I'm not going to respond anymore since my time is better spent on other questions. Nonetheless, good luck with your code.
Hi Stephen, but this doesn't produce the 3 by 3 cluster I have detailed in the comment to your code. (There could have been a delay in posting that further comment - sorry).
Hi Kirby Fears,
I was attempting to use the sets of indices to clarify what it is I wanted, obviously it seems as if it has had the opposite effect. Thank you for taking the time to answer anyway. In hindsight, this question could have been posed better from the start to avoid the confusion caused later. Many thanks for the best wishes.

Sign in to comment.

Assuming that the input matrix is always square and symmetric:
>> D = [1,0.8,0.9,0.5;0.8,1,0.6,0.1;0.9,0.6,1,0.7;0.5,0.1,0.7,1]
D =
1 0.8 0.9 0.5
0.8 1 0.6 0.1
0.9 0.6 1 0.7
0.5 0.1 0.7 1
>> rho = 0.6;
>> [R,C] = find(tril(D,-1)>rho);
>> out = arrayfun(@(r,c)D([r,c],[r,c]),R,C,'UniformOutput',false);
>> out{:}
ans =
1 0.8
0.8 1
ans =
1 0.9
0.9 1
ans =
1 0.7
0.7 1

5 Comments

But if we apply your code to
1 0.8 0.9 0.5
0.8 1 0.7 0.1
0.9 0.7 1 0.5
0.5 0.1 0.5 1
This does not produce the 3 by 3 cluster/section of the matrix given below?
1 0.8 0.9
0.8 1 0.7
0.9 0.7 1
It does however produce the 2 by 2 clusters that I want the code to identify
I understood your question to mean that you want all 2x2 submatrices: "I want all square matrices C'(a,b) ... for all pairs of indices a and b". This is also what you stated in a comment to Kirby Fears' answer: "I want to extract the following clusters:"
[ 1 0.8 , [ 1 0.9 , [ 1 0.7
0.8 1 ] 0.9 1 ] 0.7 1 ]
I answered your question using your examples: I cannot read your mind and discover other secret examples that you have hiding there, I can only use the data that you have shown us, which specifically listed only 2x2 matrices as being the correct output.
Do you really want all such submatrices, regardless of size? Do the submatrices have to be contiguous?
Yes, in that particular instance I was given the matrix by Kirby Fears in which there aren't any 3 by 3 instances of what I want. I should have also included this example (which is different to that example in Kirby Fear's) to clarify what I wanted, I do apologise for that.
Yes I want all such matrices, regardless of size (the possibility of introducing an upper limit on the size also exists e.g. 15 by 15 for a 342 by 342 matrix I am in particular considering) and no the matrices do not have to be contiguous e.g. the
1 0.9
0.9 1
matrix is not contiguous given the example in the Kirby Fears comments. Hopefully this clears things up a little!
Ansh
This task might not be solvable using a standard PC: there are potentially a lot of such matrices:
Thank you Stephen for your answer, it is partly that reason why I first posted on here. I shall go away and think of an alternative procedure. If I can analyse the data I have and come up with an upper bound on the size of the cluster that could help.

Sign in to comment.

Categories

Tags

Asked:

on 21 Jan 2016

Edited:

on 28 Jan 2016

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!