How to subset in matrix based on the first 3 columns?

40 views (last 30 days)
Hello, I am trying to find subsets/matrices in matrix A, based on the first 3 columns, and then computing probabilities. For such a small thing the code I made look tremendously long and the results are not good at all! Is there a better way to do this in Matlab? Working with for loops and while loops is very difficult for me.
%given matrix
A=[ 1 2 3 2 3 4;
1 2 3 3 2 4;
1 2 3 2 3 4;
2 3 4 1 2 3;
2 3 4 2 3 4;
1 2 3 3 4 2;
1 4 3 2 3 4;
1 3 4 3 2 4;
1 4 3 1 2 3;
2 3 4 1 2 3];
%Subsets deduced from A(i,1:3)= A(i+1,1:3)= A(i+2,1:3) B should be:
This part of the code works!
1 2 3 2 3 4;
1 2 3 3 2 4;
1 2 3 2 3 4;
1 2 3 3 4 2;
2 3 4 1 2 3;
2 3 4 2 3 4;
2 3 4 1 2 3;
1 4 3 2 3 4;
1 4 3 1 2 3;
1 3 4 3 2 4;
%final result matrix C with the probability of 1 element in the subset should be:
This is my problem! How to find the correct probabilities.
size(B,1)=4
1 2 3 2 3 4 2/4;
1 2 3 3 2 4 ¼;
1 2 3 3 4 2 ¼ ;
size(B,1)=2
2 3 4 1 2 3 ½ ;
2 3 4 2 3 4 ½ ;
size(B,1)=2
1 4 3 2 3 4 ½ ;
1 4 3 1 2 3 ½ ;
size(B,1)=1
1 3 4 3 2 4 1;
The code:
%add column to matrix for indicator variable
indicator=zeros(size( A,1),1);
A=[A indicator];
for i=1:size(A,1)
if A(i,size(A,2))==0 %consider only not adjusted indicators
k=0;
while i+k<=size(A,1)%takes care that index is not exceeded
if A(i,1:3)==A(i+k,1:3)
A(i+k,size(A,2))=i;%indicator variable
end
k=k+1;
end
end
end
%add column to matrix for frequency in the subset
freq=zeros(size( A,1),1);
A=[A freq];
%start subsetting and compute the pdf
j=1;
while j<=max(A(:,size(A,2)-1))
B=A(A(:,size(A,2)-1)==j,:);%save the j-th subset in B
for i=1:size(B,1)
if B(i,size(B,2))==0 %consider only not adjusted indicators
k=0;
while i+k<=size(B,1)%takes care that index is not exceeded
if B(i,1:6)==B(i+k,1:6)
B(i+k,size(B,2))=i;%indicator variable
B
%subsetting to find frequencies
for v=1:max(B(:,size(B,2)))
C=B(B(:,size(B,2))==v,:);%save the j-th subset in B
%computing probability of each element in subset
for w=1:size(C,1)
C(w,size(C,2))= 1/ C(w,size(C,1));
C
end
for w=1:size(C,1)
z=1;
while z+w<size(C,1)
if C(w,1:6)==C(w+z,1:6)
C(w,size(C,2))=C(w,size(C,2))+C(w+z,size(C,2));
C(w+z,size(C,2))=0;
end
z=z+1;
end
%remove lines with probability zero
% Specify conditions, which rows should be
% removed
weg = C(:,size(C,2))==0;
% remove
C(weg,:) = [];
E=[E;C];
end
end
end
k=k+1;
end
end
end
j=j+1;
end
  3 Comments
JohnGalt
JohnGalt on 1 Nov 2018
agreed with Bruno... "Hello, I am trying to find subsets/matrices in matrix A, based on the first 3 columns, and then computing probabilities" - find sub-matrices of what form? - computing probabilities of what?
Guillaume
Guillaume on 1 Nov 2018
My understanding is that all rows with identical columns 1 to 3 belong to a subset. The probability of a row is the number of times it appear in the matrix divided by the number of rows in the subset it belongs to.
I too have not tried to understand the code.

Sign in to comment.

Accepted Answer

Guillaume
Guillaume on 1 Nov 2018
Edited: Guillaume on 2 Nov 2018
If I understood correctly:
A=[ 1 2 3 2 3 4;
1 2 3 3 2 4;
1 2 3 2 3 4;
2 3 4 1 2 3;
2 3 4 2 3 4;
1 2 3 3 4 2;
1 4 3 2 3 4;
1 3 4 3 2 4;
1 4 3 1 2 3;
2 3 4 1 2 3];
[~, ~, uid] = unique(A, 'rows'); %get unique id for each row of A
count = accumarray(uid, 1); %get count of how many times each unique row of A appear
count = count(uid); %and assign to each row
[~, ~, subset] = unique(A(:, 1:3), 'rows'); %identify which subset each row belongs to
subsetcount = accumarray(subset, 1); %count the number of rows in each unique subset
subsetcount = subsetcount(subset); %and assign to each row
probability = count ./ subsetcount; %calculate the probability of each row in its subset
%for pretty display
table(A, subset, probability)
I'm using accumarray to compute histograms, you could replace each instance of accumarray(x, 1) by histcounts(x, 'BinMethod', 'integers')' if it's clearer for you.
  4 Comments
Guillaume
Guillaume on 2 Nov 2018
You'll notice I used meaningful names in my answer. I have no idea what D, E, F are in your code. Code whose variables have meaningful names is instantly easier to understand.
Note that the sort in unique(sort(x)) is pointless. unique does a sort anyway, unless you use the 'stable' option.
If you don't want the repeted rows in each subset, one method:
[rows, urow, uid] = unique(A, 'rows'); %get unique rows, where they come from, and unique id for each
count = accumarray(uid, 1); %histogram of rows, matches the rows variable
[~, ~, subset] = unique(A(:, 1:3), 'rows'); %identify which subset each row belongs to
subsetcount = accumarray(subset, 1); %count the number of rows in each unique subset
subsetcount = subsetcount(subset); %and assign to each row
probability = count ./ subsetcount(urow);
%for pretty display
subset = subset(urow);
table(rows, subset, probability)

Sign in to comment.

More Answers (0)

Categories

Find more on Creating and Concatenating Matrices in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!