MATLAB Answers

Splitting a table into smaller ones based on two columns

8 views (last 30 days)
Caroline
Caroline on 2 Oct 2019
Commented: Adam Danz on 3 Oct 2019
Hello,
The code I use produces a master table with 8 columns and around 800 rows. I would like to sort the data in the following steps:
  1. Take first line of data from master table and look at the values in column six and eight.
  2. Find all other rows in the table with a column six value within 0.5 of the line one column six value and a column eight value within 2 of the line one column eight value.
  3. Create new table with this data.
  4. Delete sorted data from master table.
  5. Repeat 1-> 4 until no data left.
  6. Be left with multiple tables.
Is there a way to do this?
Thank you for your help.

  0 Comments

Sign in to comment.

Accepted Answer

Adam Danz
Adam Danz on 2 Oct 2019
Edited: Adam Danz on 3 Oct 2019
That can easily be done in a loop but it doesn't sound like a good idea. Breaking apart well-organized tabular data into sub-tables is like moving into a new house by unpacking each box in the driveway and carrying in each item from the box individually rather than just carying in the box. Keep the data together whenever possible.
Instead, each row of the table can be assigned a subgroup number and then you can use those row numbers to pull out data as needed.
Here's a functional demo with comments to illustrate this method. 'rowGroup' is used to identify subtable rows.
% Create demo data
T = array2table(rand(20,8).*2);
T{:,8} = T{:,8} * 5;
% Identify the group number of each row based on
% col 6 & 8 values and their given tolerance levels.
rowGroup = zeros(size(T,1),1); % This will store the group number for each row
while any(rowGroup==0)
% find next unassigned row, starting at the top
rowNum = find(rowGroup==0,1,'first');
% find all rows in col 6 that are within tolerance
group1idx = abs(T{rowNum,6} - T{:,6}) <= 0.5;
% find all rows in col 8 that are within tolerance
group2idx = abs(T{rowNum,8} - T{:,8}) <= 2.0;
% identify the rows that fit into this group
rowGroup(group1idx & group2idx & rowGroup==0) = max(rowGroup)+1;
end
% rowGroup is a column vector of row numbers that identify the subgroups.
% Your values will differ due to using random data
% >> rowGroup(1:5)
% ans =
% 1
% 2
% 3
% 4
% 4
% now you can access sub-groups of data like this
T(rowGroup==1,:) % for group 1
To see the number of subgroups and the number of rows within each subgroup,
subgroupSummary = table((min(rowGroup):max(rowGroup))', ...
histcounts(rowGroup,min(rowGroup):max(rowGroup)+1)', ...
'VariableNames', {'Group', 'nRows'})

  2 Comments

Caroline
Caroline on 3 Oct 2019
Thank you very much for your help this solves my problem.

Sign in to comment.

More Answers (1)

David K.
David K. on 2 Oct 2019
Edited: David K. on 2 Oct 2019
I would do it as such:
n = 1;
a = your table% I used table([1;2;3;4;5;],[1;5;1;3;1]) to sort of test
while ~isempty(a) % while your table is still empty
% here is finding the index of the logic you did - the second & might be an or (|) if
% you want all the rows that satisfy the col 7 or 8 requirements instead of col 7 AND 8 reqs
% these indices can also more cleanly be done the way adam does.
toRem = find(a.col7<a.col7(1)+0.5 & a.col7>a.col7(1)-0.5 & a.col8<a.col8(1)+2 & a.col8>a.col8(1)-2);
newTables{n} = a(toRem,:); % Put the found values into a new table inside a cell array
% putting them into a cell array is the easiest way to have different sized variables created in a loop
a(toRem,:)=[]; % remove from original table
n=n+1; % increment n
end
All of your resultant tables will now be in newTables.
I agree with Adam that this would not be a great idea in theory, but here is how you could do it if you still really want to sort them this way.

  1 Comment

Sign in to comment.

Sign in to answer this question.