Would like a script that removes repeat data

1 view (last 30 days)
I'm looking to create a script that removes dates that repeat one after the other. For some reason, the program I used to collect the data does something stupid where they send a prompt twice on the same day, but I only want the program to be sent once. For the repeat dates, I want those dates to be deleted. For example:
Dates_Wrong = ['2/4/21';'2/5/21';'2/5/21';'2/6/21';'2/7/21']
Dates_Wrong = 5×6 char array
'2/4/21' '2/5/21' '2/5/21' '2/6/21' '2/7/21'
You can see here, the 2/5/21 date repeats. I would like to create a script that eliminates that repeat data.
The hard part is that you can't just do unique(x) on the entire dates column because there are different subjects with repeating dates and that is why I'm having trouble. It has to be something where it identifies 2 repeating dates in sequence and removes the more recent date. Here is an example of what our previous dates would look like with the repeat date removed.
Dates_Right = ['2/4/21';'2/5/21';'2/6/21';'2/7/21']
Dates_Right = 4×6 char array
'2/4/21' '2/5/21' '2/6/21' '2/7/21'
This is sort of what I was thinking of doing but I'm not sure if it makes sense
for x=1:length(MorningPrompt.SurveyStartedDate)
if x-1==x %This is where I'm having trouble. I think the rest of the script is fine but I'm not sure how to use this part to account for strings since x isn't the actually string found within that variable
MorningPrompt(x,:) = [];
end
end

Accepted Answer

Steven Lord
Steven Lord on 23 Sep 2022
Dates_Wrong = ['2/4/21';'2/5/21';'2/5/21';'2/6/21';'2/7/21']
Dates_Wrong = 5×6 char array
'2/4/21' '2/5/21' '2/5/21' '2/6/21' '2/7/21'
dt = datetime(Dates_Wrong, 'InputFormat', 'M/d/yy')
dt = 5×1 datetime array
04-Feb-2021 05-Feb-2021 05-Feb-2021 06-Feb-2021 07-Feb-2021
differences = diff(dt)
differences = 4×1 duration array
24:00:00 00:00:00 24:00:00 24:00:00
repeated = differences ~= 0
repeated = 4×1 logical array
1 0 1 1
Note that differences and repeated are both one element shorter than dt. Add a true as the first or last element depending on whether you always want to keep the first element or the last.
  1 Comment
BA
BA on 23 Sep 2022
Thank you! This is wonderful.
Just had a few questions.
1) Since I want the first value of each of the repeats, would I just have to set the last line of the logical array to be 1?
2) For the logical indexing, this is the command I'm using. I think it works but its the first time I've used logical indexing so I'm not sure
%My code adapted using your code
Dates = Dataset.Dates;
dt = datetime(Dates, 'InputFormat', 'M/d/yy');
differences = diff(dt);
repeated = differences ~= 0
%Indexing
NonRepeats = Dataset(repeated(:,1)==1, :);

Sign in to comment.

More Answers (0)

Tags

Products


Release

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!