# Extracting data based on categories

5 views (last 30 days)
Allie on 11 Jul 2019
Edited: Guillaume on 12 Jul 2019
I have a table (z) with two columns, year and annual precipitation. I have a separate vector (E) of three categories (0,1,2) that correspond to the years in the first table. The categories are unevenly spaced. I would like to create three new tables of year and annual precipitation, one for each category. How do I extract the correct year and precipitation data for each category and populate new tables?
z=['01-Jan-1973' 0.0114583333333333
'01-Jan-1974' 0.0918518518518519
'01-Jan-1975' 0.0529032258064516
'01-Jan-1976' 0.213571428571429
'01-Jan-1977' 0.0857692307692308]
E=[0 1 0 2 1]
I would like to create three new tables in the same format as z, like below.
A=['01-Jan-1973' 0.0114583333333333
'01-Jan-1975' 0.0529032258064516]
B=['01-Jan-1974' 0.0918518518518519
'01-Jan-1977' 0.0857692307692308]
C=['01-Jan-1976' 0.213571428571429]
My actual tables and vectors are 1x79, making extracting the three new tables challenging to do by hand.

Guillaume on 11 Jul 2019
Can you give an example of inputs and desired output. Preferably using valid matlab syntax.
Allie on 11 Jul 2019

Star Strider on 11 Jul 2019
The splitapply function (R2015b and later releases) is perfect for this!
z = {'01-Jan-1973' 0.0114583333333333
'01-Jan-1974' 0.0918518518518519
'01-Jan-1975' 0.0529032258064516
'01-Jan-1976' 0.213571428571429
'01-Jan-1977' 0.0857692307692308};
E=[0 1 0 2 1];
Out = splitapply(@(x){x}, z, E(:)+1);
A = Out{1}
B = Out{2}
C = Out{3}
producing:
A =
2×2 cell array
{'01-Jan-1973'} {[0.0114583333333333]}
{'01-Jan-1975'} {[0.0529032258064516]}
B =
2×2 cell array
{'01-Jan-1974'} {[0.0918518518518519]}
{'01-Jan-1977'} {[0.0857692307692308]}
C =
1×2 cell array
{'01-Jan-1976'} {[0.213571428571429]}

Show 1 older comment
Allie on 12 Jul 2019
When I used both codes on my data, split array only produced single column arrays with the precipitation data and not the associated dates. Could this be because the dates are in datetime format?
The accumarry function produced both columns I wanted!
Star Strider on 12 Jul 2019
I did not know your dates were in datetime format, so that is likely the reason the splitapply code failed.
I added the accumarray code later when I realised that you may not have splitapply. I’m glad it worked!
Guillaume on 12 Jul 2019
Note: If you are actually using a table (or a timetable, which would be better probably), the best solution might be not to split the table at all and just add the E as an extra table variable.
Matlab has plenty of aggregation functions (rowfun, varfun, splitapply as shown here, groupsummary) which allows you to apply the same function to each group of a table all at once. More often, the code is actually simpler and faster if the data is not split beforehand.
e.g:
%demo table
T = table(datetime({'01-Jan-1973'; '01-Jan-1974'; '01-Jan-1975'; '01-Jan-1976'; '01-Jan-1977'}), ...
[0.0114583333333333; 0.0918518518518519; 0.0529032258064516; 0.213571428571429; 0.0857692307692308], ...
'VariableNames', {'Date', 'Value'});
T.group = [0; 1; 0; 2; 1]
groupsummary(T, 'group', {'mean', 'sum', 'std'}, 'Value')
outputs:
ans =
3×5 table
group GroupCount mean_Value sum_Value std_Value
_____ __________ __________________ __________________ ___________________
0 2 0.0321807795698924 0.0643615591397849 0.0293059645132893
1 2 0.0888105413105414 0.177621082621083 0.00430106261490964
2 1 0.213571428571429 0.213571428571429 0