How to aggregate one column according to the other column in MATLAB?

2 views (last 30 days)
I have .mat file, where are two columns "Product" and "Customer". Customer number is repeated as many times as he purchased different products. The table looks like that:
Product Customer
114 1
112 2
112 1
113 4
115 3
113 2
111 2
113 3
And I need to make it like this:
Customer 111 112 113 114 115
1 0 1 0 1 0
2 1 1 1 0 0
3 0 0 1 0 1
4 0 0 1 0 0
In new table have to be "Customer" column and five more columns for each product and if the customer "1" bought product "112" there should be 1 ind if he didn't buy it should be 0. How can I do it with MATLAB? Any help would be very nice!

Answers (2)

Stephen23
Stephen23 on 10 Apr 2015
Edited: Stephen23 on 10 Apr 2015
Sorry I do not have tables on my version of MATLAB, but you should be able to adapt this for your purposes. First define some simple numeric column vectors with the customer and product data:
>> Product = [114;112;112;113;115;113;111;113];
>> Customer = [1;2;1;4;3;2;2;3];
These can be obtained from a table using the syntax mytable.variable.
Then use unique to get the indices that correspond to the product codes:
>> [~,~,Y] = unique(Product);
And finally use sub2ind to locate the positions of the ones in the final matrix:
>> Z = zeros(max(Customer),max(Y));
>> X = sub2ind(size(Z),Customer,Y);
>> Z(X) = 1
Z =
0 1 0 1 0
1 1 1 0 0
0 0 1 0 1
0 0 1 0 0
Note that this assumes that the customers are numbered 1:N, otherwise you can use another unique call to get their indices too.

Jonathan Campelli
Jonathan Campelli on 10 Apr 2015
%Built the reference MAT file from which your table is pulled
product=[114,112,112,113,115,113,111,113]';
customer=[1 2 1 4 3 2 2 3]';
reftable=table(product,customer);
save('referencefile', 'reftable') %Save table 'reftable' to new MAT file 'referencefile'
%Pulled tabular data from MAT file.
prod=reftable.product(1:end); %Pull data from reftable column 'product'
cust=reftable.customer(1:end);%Pull data from reftable column 'customer'
Matrix=sortrows([cust,prod],2)%Combine 'prod' and 'cust', and sort by 'prod'
%I agree with Stephen Cobeldick's use to sub2ind
Logic=zeros(4,5); %Prefill 4(customers)x5(products) matrix 'Logic' with zeros
[~,~,indexed] = unique(Matrix(:,2),'stable') %Tag product numbers in order they appear
X = sub2ind(size(Logic),Matrix(:,1),indexed); %Return indices of all
%elements [r,c] in matrix with size 4x5 (like Logic) where r = Matrix(:,1)
%and c = indexed. Matrix(:,1) = customers assigned to the sorted product
%number column array found in Matrix(:,2).
Logic(X)=1;
%Turn matrix 'Logic' columns into individual variables as column vectors
Customer=sort(unique(customer));
n111=Logic(:,1);
n112=Logic(:,2);
n113=Logic(:,3);
n114=Logic(:,4);
n115=Logic(:,5);
%Build final table by pieceing the above variables together
JonsTable=table(Customer,n111,n112,n113,n114,n115)
  2 Comments
Stephen23
Stephen23 on 10 Apr 2015
Edited: Stephen23 on 10 Apr 2015
Note that:
  • a more robust solution is to skip that awkward table function and use array2table instead, directly on the variable Logic. This would avoid creating variables dynamically (which the current method would require for a general solution).
  • the name prod should not be used for any variable as it is the name of the inbuilt function prod.
  • the output of unique is sorted by default, so calling sort(unique(customer)) is superfluous.
  • the indexing (1:end) achieves absolutely nothing, as the syntax reftable.customer already returns the whole column from the table in that order.
  • concatenating the cust and prod vectors into one matrix and then running sortrows on this matrix is not required because the input order of subscript indices to the function sub2ind is completely irrelevant: they do not need to be sorted.
  • although MATLAB is case sensitive, it is usually considered bad practice to name variables that only differ by the character case: customer and Customer.
Walter Roberson
Walter Roberson on 10 Feb 2016
Jonathan Campelli comments,
Thanks for the solid advice, but too much testosterone. Instead of being pumped to tighten up my code, your remarks inspire a feeling of defensiveness: "Awkward", "achieves absolutely nothing", "completely irrelevant".

Sign in to comment.

Categories

Find more on Tables in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!