MATLAB Answers

Subset variable data in one .csv based on values from same variable in a second .csv

2 views (last 30 days)
Louise Wilson
Louise Wilson on 20 Apr 2020
Commented: Peter Perkins on 13 May 2020
I have a large .csv file (~7GB) where the first column is filled with datetimes, and the remaining columns filled with data corresponding to that datetime.
e.g.:
20191025241533, 65.2, 35.2, 75.4
20200122061522, 32.2, 74.5, 61.3
20190411075120, 54.1, 87.0, 22.1
I then have a second smaller .csv which contains a list of datetimes which are of interest to me.
e.g.
20191025241533
20191025241533
I would like to subset the first large .csv based on the datetime values in the second, so I would only keep rows with datetimes present in the second .csv.
Secondly, I would like to use the datetime values in the second smaller .csv to extract files (where the filename is datetime) from a larger folder of files into a separate folder which contains only those files that have the same filename (datetime) as those recorded in the smaller .csv.
Thanks for your help!

  11 Comments

Show 8 older comments
Louise Wilson
Louise Wilson on 6 May 2020
Hi Peter,
Thank you. Yep, I have used withtol as it seems there is a slight difference in the datetimes.
I will use readtable from now on and I'm using head() to preview.
The large .csv is a spreadsheet which contains frequency values in the first row (1-7200Hz) and datenums in the first column. These datenums once converted to datetime correspond to the filenames of different .wav files e.g.yymmddHHMMss.wav
I wrote these files using datenum as the time vector as this is what a previous colleague had done and I've been building on their code. These spreadsheet is the output of an acoustic analysis which takes days to write so I'm afraid I can't go back and re-do them all as it would be so time consuming. So I have to work with this datenum in the first column. I converted the datenum to datetime and then changed the table to a timetable.
The small file contains the .wav file names in the first column and these are in text form as you say but correspond to datetimes.
Why do you advise not to use datenum?
I had used withtol and a tolderance of 5 seconds which got me a believable amount of matches. Using ismember I get a much smaller number of matches so it looks like there is a bit of a difference, something I'll investigate as I'd expect them to be the same.
Louise Wilson
Louise Wilson on 6 May 2020
I have just realised that the reason they don't match is because when I convert from datenum to datetime, the timing is off by 1 second. Is there a way to fix that?
Peter Perkins
Peter Perkins on 13 May 2020
They are almost certainly not off by 1 sec, but rather off by a small negative amount, and the display is off. Add .SSS to the3 display format and you will see. Because datenum counts in "days", they incur round-off for almost every value that is not a whole day, or at some "nice" whole hour. It's that round-off that is causing this. datetime tries to account for that, but it can't possibly be perfect.
So ... this is one of the reasons why not to use datenums. datetime has been around since R2014b. It's a better choice. If you have no control over the file, and you know the time stamps are to a resolution of whole seconds, use dateshift to round your datetimes.

Sign in to comment.

Answers (0)

Products


Release

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!