How to subset in matrix based on the first 3 columns?

Question

Clarisha Nijman on 1 Nov 2018

0
Link

Direct link to this question

https://ch.mathworks.com/matlabcentral/answers/427342-how-to-subset-in-matrix-based-on-the-first-3-columns

Commented: Clarisha Nijman on 3 Nov 2018

Hello, I am trying to find subsets/matrices in matrix A, based on the first 3 columns, and then computing probabilities. For such a small thing the code I made look tremendously long and the results are not good at all! Is there a better way to do this in Matlab? Working with for loops and while loops is very difficult for me.

%given matrix
A=[ 1 2 3 2 3 4;
2 3 3 2 4;
2 3 2 3 4;
3 4 1 2 3;
3 4 2 3 4;
2 3 3 4 2;
4 3 2 3 4;
3 4 3 2 4;
4 3 1 2 3;
3 4 1 2 3];
%Subsets deduced from  A(i,1:3)= A(i+1,1:3)= A(i+2,1:3) B should be:
This part of the code works!
2 3 2 3 4;
2 3 3 2 4;
2 3 2 3 4;
2 3 3 4 2;
3 4 1 2 3;
3 4 2 3 4;
3 4 1 2 3;
4 3 2 3 4;
4 3 1 2 3;
3 4 3 2 4;
%final result matrix C with the probability of 1 element in the subset should be:
This is my problem! How to find the correct probabilities.
size(B,1)=4
2 3 2 3 4  2/4; 
2 3 3 2 4  ¼;
2 3 3 4 2  ¼ ;
size(B,1)=2
3 4 1 2 3  ½ ;
3 4 2 3 4  ½ ;
size(B,1)=2
4 3 2 3 4 ½ ;
4 3 1 2 3 ½ ;
size(B,1)=1
3 4 3 2 4 1;

The code:

%add column to matrix for indicator variable
indicator=zeros(size( A,1),1);
A=[A indicator];
for i=1:size(A,1)
    if A(i,size(A,2))==0 %consider only not adjusted indicators
        k=0;
        while i+k<=size(A,1)%takes care that index is not exceeded
            if A(i,1:3)==A(i+k,1:3)
                A(i+k,size(A,2))=i;%indicator variable
            end
            k=k+1;
        end
    end
end
%add column to matrix for frequency in the subset
freq=zeros(size( A,1),1);
A=[A freq];
%start subsetting and  compute the pdf
j=1;
while j<=max(A(:,size(A,2)-1))
    B=A(A(:,size(A,2)-1)==j,:);%save the j-th subset in B
    for i=1:size(B,1)
        if B(i,size(B,2))==0 %consider only not adjusted indicators
            k=0;
            while i+k<=size(B,1)%takes care that index is not exceeded
                if B(i,1:6)==B(i+k,1:6)
                    B(i+k,size(B,2))=i;%indicator variable
                    B
                      %subsetting to find frequencies
                      for v=1:max(B(:,size(B,2)))
                          C=B(B(:,size(B,2))==v,:);%save the j-th subset in B
                          %computing probability of each element in subset
                          for w=1:size(C,1)
                             C(w,size(C,2))= 1/ C(w,size(C,1));
                             C
                          end
                          for w=1:size(C,1)
                              z=1;
                              while z+w<size(C,1)
                                 if C(w,1:6)==C(w+z,1:6)
                                     C(w,size(C,2))=C(w,size(C,2))+C(w+z,size(C,2));
                                     C(w+z,size(C,2))=0;
                                 end
                                  z=z+1;
                              end
                             %remove lines with probability zero
                             % Specify conditions, which rows should be
                             % removed
                             weg = C(:,size(C,2))==0;
                             % remove
                             C(weg,:) = [];
                             E=[E;C];
                          end
                      end                
                  end
                  k=k+1;
              end
          end
      end  
   j=j+1;   
  end

3 Comments
Show 1 older commentHide 1 older comment

JohnGalt on 1 Nov 2018

agreed with Bruno... "Hello, I am trying to find subsets/matrices in matrix A, based on the first 3 columns, and then computing probabilities" - find sub-matrices of what form? - computing probabilities of what?

Guillaume on 1 Nov 2018

My understanding is that all rows with identical columns 1 to 3 belong to a subset. The probability of a row is the number of times it appear in the matrix divided by the number of rows in the subset it belongs to.

I too have not tried to understand the code.

Sign in to comment.

Sign in to answer this question.

Answer 1

Guillaume on 1 Nov 2018

0
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/427342-how-to-subset-in-matrix-based-on-the-first-3-columns#answer_344595

Edited: Guillaume on 2 Nov 2018

Open in MATLAB Online

If I understood correctly:

A=[ 1 2 3 2 3 4;
  1 2 3 3 2 4;
  1 2 3 2 3 4;
  2 3 4 1 2 3;
  2 3 4 2 3 4;
  1 2 3 3 4 2;
  1 4 3 2 3 4;
  1 3 4 3 2 4;
  1 4 3 1 2 3;
  2 3 4 1 2 3];
[~, ~, uid] = unique(A, 'rows');  %get unique id for each row of A
count = accumarray(uid, 1);  %get count of how many times each unique row of A appear
count = count(uid);  %and assign to each row
[~, ~, subset] = unique(A(:, 1:3), 'rows');  %identify which subset each row belongs to
subsetcount = accumarray(subset, 1);  %count the number of rows in each unique subset
subsetcount = subsetcount(subset);  %and assign to each row
probability = count ./ subsetcount;  %calculate the probability of each row in its subset
%for pretty display
table(A, subset, probability)

I'm using accumarray to compute histograms, you could replace each instance of accumarray(x, 1) by histcounts(x, 'BinMethod', 'integers')' if it's clearer for you.

4 Comments
Show 2 older commentsHide 2 older comments

Guillaume on 2 Nov 2018

Open in MATLAB Online

You'll notice I used meaningful names in my answer. I have no idea what D, E, F are in your code. Code whose variables have meaningful names is instantly easier to understand.

Note that the sort in unique(sort(x)) is pointless. unique does a sort anyway, unless you use the 'stable' option.

If you don't want the repeted rows in each subset, one method:

[rows, urow, uid] = unique(A, 'rows'); %get unique rows, where they come from, and unique id for each
count = accumarray(uid, 1);  %histogram of rows, matches the rows variable
[~, ~, subset] = unique(A(:, 1:3), 'rows');  %identify which subset each row belongs to
subsetcount = accumarray(subset, 1);  %count the number of rows in each unique subset
subsetcount = subsetcount(subset);  %and assign to each row  
probability = count ./ subsetcount(urow);
%for pretty display
subset = subset(urow);
table(rows, subset, probability)

Clarisha Nijman on 3 Nov 2018

Thanks a lot, Guillaume!

Sign in to comment.

How to subset in matrix based on the first 3 columns?

3 Comments
Show 1 older commentHide 1 older comment

Accepted Answer

4 Comments
Show 2 older commentsHide 2 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

How to subset in matrix based on the first 3 columns?

3 Comments Show 1 older commentHide 1 older comment

Accepted Answer

4 Comments Show 2 older commentsHide 2 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

3 Comments
Show 1 older commentHide 1 older comment

4 Comments
Show 2 older commentsHide 2 older comments