datastore=detectImportOptions Pro Max?
3 views (last 30 days)
Show older comments
I compared the functions of datastore and detectImportOptions. I feel that the datastore is more powerful. Almost all the functions of detectImportOptions are included
1. Datastore can read multiple files or a specified file from different folders at once, while detectImportOptions can only read one file at once
2. Datastore.readall can automatically connect multiple files,
3. DetectImportOptions can set MissingRule, but T=datastore. readall;
Then, use anymessaging | rmmissing | fillmissing | missing | isnan | ismissing
Missing values can also be handled
It seems that there is nothing detectImportOptions can do, while datastores cannot.Is there any situation where only detectImportOptions can be used and datastore cannot be used?
In this case:Combine detectImportOptions with datastore?
Can use fds = fileDatastore(location,"ReadFcn",@fcn) read "specific format file" Instead of using detectImportOptions?
I think this comment is very helpful,thanks a lot for Walter Roberson's help! comment
0 Comments
Answers (1)
Walter Roberson
on 22 Jul 2023
When you readmatrix() or readtable() a file, these days options are automatically detected. But the automatically detected options are not always correct options in the situation. Sometimes you need to detectImportOptions() to get out a basic options structure, then modify the detected options, and then pass the modified options into the appropriate reading routine.
The default reading routines for datasets use the default options, so they might not always read the data correctly. However, if you are aware that is happening, you can specify a custom reading function that takes appropriate steps to read the data correctly.
detectImportOptions is a utility routine that was never intended to manage sets of data, and never intended to read the data and make the read data available: it is only intended to give good guesses about the format of specific data files in order to inform the reading routines such as readtable() .
5 Comments
Walter Roberson
on 23 Jul 2023
When datastore() internally automatically calls readmatrix() or readtable(), those routines call detectImportOptions() or similar routines. The detection of the import options can be relatively expensive -- the detection functions will read up to the first 100 megabytes to try to guess the file format accurately.
Because of that, it can be more efficient to call detectImportOptions() once ahead of time, on one representative file, and make small adjustments (like setting variable types or setting datatime timezone formats), and to store the resulting import options. Then configure the reading routine (that datastore will invoke each time it needs to read a file from the list of files) to use the stored import options. That avoids having to guess the file structure for every file.
This can be especially important for efficiency if you are reading from a network drive such as OneDrive or Google Drive, as reading from network drives can be fairly slow.
See Also
Categories
Find more on Large Files and Big Data in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!