Find max within due date and find max within due date by class section

7 views (last 30 days)
I am able to find the max of the date_submitted. How do I find the max of the date_submitted but within the Lab_due_date column? Also, how do I find the max of the date_submitted within the Lab_due_date by class_section?

Accepted Answer

Ayush Modi
Ayush Modi on 20 May 2024
Edited: Ayush Modi on 20 May 2024
Hi,
To find the latest date_submitted that is before the Lab_due_date for each class_section, you first need to format the data in the two column to be comparable. You can acheive this using datatime function.
data.date_submitted = datetime(data.date_submitted, 'InputFormat', 'MM/dd/yyyy HH:mm');
data.Lab_due_date = datetime(data.Lab_due_date, 'InputFormat', 'MM/dd/yyyy, hh:mm a zzz', 'TimeZone', 'America/Chicago', 'Locale', 'en_US');
data.Lab_due_date.TimeZone = '';% Group by user_id
Then, group the data by the desired category using findgroups function.
grouped = findgroups(data.user_id);
To get the date_submitted that is before the Lab_due_date:
uniqueGroups = unique(data.class_section); % Assuming class_section is a cell array of strings
maxDatesByGroup = NaT(size(uniqueGroups)); % Preallocate with Not-a-Time (NaT)
for i = 1:length(uniqueGroups)
% Filter data for the current group using strcmp for cell array comparison
currentGroupFilter = strcmp(data.class_section, uniqueGroups{i});
currentGroupData = data(currentGroupFilter, :);
% Ensure Lab_due_date and date_submitted are datetime arrays before comparison
validSubmissions = currentGroupData.date_submitted <= currentGroupData.Lab_due_date(1);
% Find the maximum date_submitted that meets the criteria
if any(validSubmissions)
maxDatesByGroup(i) = max(currentGroupData.date_submitted(validSubmissions));
end
end
Please refer to the following MathWorks documentation for more information on:
  3 Comments
Sunshine
Sunshine on 23 May 2024
Ok. I updated the Lab_due_date column to a comparable date/time format to date_submitted prior to importing the data to make it easier, I thought.
From there, I followed each of the lines provided. It seemed to work until I got to the for loop.
>> for i = 1:length(uniqueGroups)
% Filter data for the current group using strcmp for cell array comparison
currentGroupFilter = strcmp(data.class_section, uniqueGroups{i});
currentGroupData = data(currentGroupFilter, :);
% Ensure Lab_due_date and date_submitted are datetime arrays before comparison
validSubmissions = currentGroupData.date_submitted <= currentGroupData.Lab_due_date(1);
% Find the maximum date_submitted that meets the criteria
if any(validSubmissions)
maxDatesByGroup(i) = max(currentGroupData.date_submitted(validSubmissions));
end
end
I received the error: Error using indexing
Cell contents reference from a non-cell array object.
So I updated this line: currentGroupFilter = strcmp(data.class_section, uniqueGroups{i});
to use parentheses instead of curly brackets. So now, I have
currentGroupFilter = strcmp(lab31437lowerparticipantscopyupdate.class_section, uniqueGroups(i));
But then I get the error: Index exceeds the number of array elements. Index must not exceed 0.
I'm not quite sure how to fix this. How do I fix this?
Sunshine
Sunshine on 23 May 2024
Ok. After working with the code, I see that I needed to make the class_section column a string. It was imported as categorical, once I did that, I am able to follow the code and get the max dates by class_section but within the Lab_due_date. Thank you, thank you, thank you!
So now, how do I automate finding the max of the date_submitted but within the Lab_due_date column, meaning not by class_section but just the overall max date but within the Lab_due_date?

Sign in to comment.

More Answers (1)

Peter Perkins
Peter Perkins on 29 May 2024
This is much simpler than the accepted answers suggests. And you want to use categorical for categorical data, not text.
%t = readtable("lab3_14_37_lower_participants_copy.csv");
%t.date_submitted.TimeZone = "America/Chicago";
%t.Lab_due_date = datetime(t.Lab_due_date,InputFormat="MM/dd/uuuu, HH:mm a z",TimeZone="America/Chicago")
I'm gonna assume you don't want to work with time zones; your updated file does not contain the CDT offsets.
t = readtable("https://www.mathworks.com/matlabcentral/answers/uploaded_files/1702791/lab3_14_37_lower_participants_copy_update.csv");
t = convertvars(t,["user_id" "email" "class_section"],"categorical");
Your file has exactly one unique due date. Let's add some more data.
n = 50;
t = [t; t; t]
t = 150x8 table
user_id email class_section date_submitted is_submission score max_score Lab_due_date _______ ________ _______________ ________________ _____________ _____ _________ ________________ 226340 Student1 section 001/011 09/18/2023 19:26 1 0 5 09/27/2023 23:59 226340 Student1 section 001/011 09/18/2023 19:27 1 0 5 09/27/2023 23:59 226340 Student1 section 001/011 09/18/2023 19:36 1 0 5 09/27/2023 23:59 226340 Student1 section 001/011 09/18/2023 19:41 1 0 5 09/27/2023 23:59 226340 Student1 section 001/011 09/18/2023 19:48 1 0 5 09/27/2023 23:59 226340 Student1 section 001/011 09/21/2023 17:16 1 0 5 09/27/2023 23:59 226340 Student1 section 001/011 09/21/2023 17:20 1 0 5 09/27/2023 23:59 226340 Student1 section 001/011 09/21/2023 17:31 1 0 5 09/27/2023 23:59 226340 Student1 section 001/011 09/21/2023 17:32 1 0 5 09/27/2023 23:59 226340 Student1 section 001/011 09/21/2023 17:35 1 0 5 09/27/2023 23:59 226340 Student1 section 001/011 09/21/2023 17:38 1 0 5 09/27/2023 23:59 226340 Student1 section 001/011 09/21/2023 17:40 1 0 5 09/27/2023 23:59 1387758 Student2 section 004/014 12/12/2023 05:05 1 5 5 09/27/2023 23:59 1389302 Student3 section 004/014 12/08/2023 01:42 1 5 5 09/27/2023 23:59 1565527 Student4 section 003/013 09/20/2023 17:30 1 0 5 09/27/2023 23:59 1565527 Student4 section 003/013 09/20/2023 17:32 1 0 5 09/27/2023 23:59
t.date_submitted(51:100) = t.date_submitted(51:100) + calmonths(1);
t.Lab_due_date(51:100) = t.Lab_due_date(51:100) + calmonths(1);
t.date_submitted(101:150) = t.date_submitted(101:150) + calmonths(2);
t.Lab_due_date(101:150) = t.Lab_due_date(101:150) + calmonths(2);
Now that that's done, computing the max submit dates is one line. No loops.
groupsummary(t,"Lab_due_date","day","max","date_submitted")
ans = 3x3 table
day_Lab_due_date GroupCount max_date_submitted ________________ __________ __________________ 27-Sep-2023 50 12/12/2023 05:05 27-Oct-2023 50 01/12/2024 05:05 27-Nov-2023 50 02/12/2024 05:05
With two grouping vars, still one line.
groupsummary(t,["Lab_due_date" "class_section"],["day" 'none'],"max","date_submitted")
ans = 15x4 table
day_Lab_due_date class_section GroupCount max_date_submitted ________________ _______________ __________ __________________ 27-Sep-2023 section 001/011 21 09/25/2023 03:58 27-Sep-2023 section 002/012 3 09/19/2023 19:33 27-Sep-2023 section 003/013 14 09/24/2023 22:19 27-Sep-2023 section 004/014 10 12/12/2023 05:05 27-Sep-2023 section 005/015 2 09/25/2023 03:54 27-Oct-2023 section 001/011 21 10/25/2023 03:58 27-Oct-2023 section 002/012 3 10/19/2023 19:33 27-Oct-2023 section 003/013 14 10/24/2023 22:19 27-Oct-2023 section 004/014 10 01/12/2024 05:05 27-Oct-2023 section 005/015 2 10/25/2023 03:54 27-Nov-2023 section 001/011 21 11/25/2023 03:58 27-Nov-2023 section 002/012 3 11/19/2023 19:33 27-Nov-2023 section 003/013 14 11/24/2023 22:19 27-Nov-2023 section 004/014 10 02/12/2024 05:05 27-Nov-2023 section 005/015 2 11/25/2023 03:54
  2 Comments
Sunshine
Sunshine on 29 May 2024
Edited: Sunshine on 29 May 2024
You are correct. I removed the timezone portion of the data. It just seemed easier to remove it. What you have is great and almost supplies what I am attempting to get. For now, I want to look at all dates submitted within the due date, and only those within the due date. If a date submitted is beyond the due date, I don't want to return that info in the results currently. I want to return the closest submitted date within the due date. How would I exclude data in which the date submitted is beyond the due date but return the next closest date that's within the due date? For instance, in "https://www.mathworks.com/matlabcentral/answers/uploaded_files/1702791/lab3_14_37_lower_participants_copy_update.csv", in section 004/014, the max date submitted is 12-Dec-2023 05:05:00 and the next max date is 08-Dec-2023 01:42:00. Both are beyond the due date of 27-Sep-2023 23:59:00. Just eyeballing the 50 rows in the csv, it looks like 24-Sep-2023 02:27:00 is the max within the due date for section 004/014. That's the max date I want to show for section 004/014. How do I return only the max dates within the due date for the 50 submissions and then by section?
Peter Perkins
Peter Perkins on 29 May 2024
doc groupsummary: "You also can specify method as a function handle that returns one output per group whose first dimension has length 1. For table input data, the function operates on each table variable separately."

Sign in to comment.

Categories

Find more on Dates and Time in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!