How do query timestamps when reading fileDatastore
Show older comments
Hello,
I am saving a substantial amount of RAW sensor data at specific intervals into individual files. Then, I am using `fileDatastore` to read all those files and compile them into one variable, as demonstrated below. The issue I'm facing is that I have data spanning over a year, but I only want to read the last three months of data. How can I modify my code so that once it starts reading a file, if it encounters timestamps older than three months, it stops progressing through that datastore?
The purpose of this is to speed up my data compilation by avoiding unnecessary data reads.
fds = fileDatastore("Raw Sensor Data\"+deviceid,"ReadFcn",@load,"FileExtensions",".mat")
readlist=readall(fds)
readlist_size=size(readlist);
temp_live_data_downloaded_data=[];
temp_ref_data_timestamp=[];
temp_ref_data_sensordata=[];
for packet = 1:readlist_size(1)
temp_live_data_downloaded_data =[temp_live_data_downloaded_data readlist{packet}.live_data_downloaded_data];
temp_ref_data_timestamp =[temp_ref_data_timestamp readlist{packet}.ref_data_timestamp];
temp_ref_data_sensordata =[temp_ref_data_sensordata readlist{packet}.ref_data_sensordata];
end
3 Comments
Star Strider
on 24 Jan 2024
If I understand this correctly, the problem is not with datastore, and instead with loading the individual files. And that depends on what are in the .mat files.
Depending on how the files are saved, they would likely have to be completely loaded and then read. You could extract the requisite data from each file using the appropriate logic (perhaps using isbetween), however the entire file would probably have to be loaded first. The matfile function might be an alternative option, however you would have to experiment with that.
Dharmesh Joshi
on 24 Jan 2024
Star Strider
on 24 Jan 2024
Are the timestamps part of the file name?
Answers (1)
Walter Roberson
on 24 Jan 2024
fds = fileDatastore("Raw Sensor Data\"+deviceid,"ReadFcn",@load,"FileExtensions",".mat")
Instead of passing in the directory to fileDatastore(), you would pass in a cell array of validated file names.
rdpath = "Raw Sensor Data";
dinfo = dir(fullfile(rdpath, deviceid, "*.mat"));
filenames_only = {dinfo.name};
%ADJUST THIS SECTION
%for the sake of arguement assume the first 8 characters of the file name
%are constant and the timestamp portion ends with "_"
timestamps = regexp(filenames_only, '^(?<=.{8})[^_]*', 'once');
ts_datetime = datetime(timestamps);
now = datetime('now');
now_minus_3 = now - calmonths(3);
mask = isbetween(timestamps, now_minus_3, now);
dinfo = dinfo(mask);
selected_fullnames = fullfile({dinfo.folder}, {dinfo.name});
fds = fileDatastore(selected_fullnames, "ReadFcn", @local);
The portion marked ADJUST THIS SECTION needs to be adjusted to suit your purposes. The code as-is assumes that the first 8 characters of the file name are to be ignored, and that then there is a timestamp that ends at the first "_", and that timestamp can be fed into datetime() directly. This is example code that needs to be changed according to the actual filename structure you have.
Categories
Find more on Big Data Processing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!