I need to break down the flow data into groups of data with months for every year.
13 views (last 30 days)
Show older comments
For example: All the entries for the month of Jan for all the years (suppose 2010-2022) are stored in one group. Similarly, all the entries for the month of Feb for all the years.(suppose 2010-2022) are stored in second group and so on. My end goal is to calculate the percentile of monthly data for all the years.
Date Flow
12/1/2010 100
12/2/2010 200
12/3/2010 150
12/4/2010 176
.....
.....
12/1/2022 145
I tried to create the vlookup function to create the matrix but i wasn't able to. So, I decided to group all months together and then perform operations on them.
0 Comments
Accepted Answer
Cris LaPierre
on 20 Jun 2022
6 Comments
Cris LaPierre
on 25 Jun 2022
I think the best approach here is to create a second table that is a subset of the full data set, but just contains the date ranges of interest. Then you can use groupsummary to compute the desired stats grouped by year. I don't know what you want to use for percentile, so I use 42.
T = readtable('https://www.mathworks.com/matlabcentral/answers/uploaded_files/1043800/Example.xlsx',...
'ReadVariableNames',false)
% Use 'day of year' to identify dates within the desired range for all years
drng = day(datetime(2004,9,27):caldays(1):datetime(2004,10,27),'dayofyear');
d = day(T.Var1,'dayofyear');
% Create subtable
idx= ismember(d,drng);
Tsub = T(idx,:)
% compute stats for each year
Tstats = groupsummary(Tsub,'Var1','year',{@(A) prctile(A,42),'max','min'},'Var2');
Tstats.Properties.VariableNames = ["Tyr","GroupCnt","Tmin","Tmax","Tpct"]
Cris LaPierre
on 26 Jun 2022
Leap years are going to cause some issues with my previous answer. Unfortunately, I couldn't find a slick way to do this, so here's a brute force approach.
T = readtable('https://www.mathworks.com/matlabcentral/answers/uploaded_files/1043800/Example.xlsx',...
'ReadVariableNames',false);
% Define range of interest
startD = datetime(2004,9,27);
stopD = datetime(2004,10,27);
% find data for all years between the indicated month/days
y = unique(year(T.Var1));
idx = zeros(height(T),1);
for i = 1:length(y)
idx = idx + isbetween(T.Var1,datetime(y(i),month(startD),day(startD)),...
datetime(y(i),month(stopD),day(stopD)));
end
Tsub = T(logical(idx),:)
% Compute summary stats
Tstats = groupsummary(Tsub,'Var1','year',{@(A) prctile(A,42),'max','min'},'Var2');
Tstats.Properties.VariableNames = ["Tyr","GroupCnt","Tmin","Tmax","Tpct"]
More Answers (0)
See Also
Categories
Find more on Data Preprocessing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!