Matlab readmatrix inconsistently reading csv files

46 views (last 30 days)
I'm using matlabs readmatrix function to read in data from a csv file and store to a variable. The csv files are identical in format, with a bunch of lines of text at the start before the data starts at line 21. However, the readmatrix function seems to behave inconsistently, sometimes capturing all the text at the start of the csv and storing as NaN, and other times ignoring these first 21 lines and only grabbing the data. Why is this? What is a better way to do this?
  7 Comments
Christian Taylor
Christian Taylor on 24 Aug 2023
Update: I have just opened my csv files in a text editor. Whilst the headers look identical in Excel, in the text editor there are a number of comma delimiters after most lines on one of the files. Perhaps this explains the different behaviour.
Stephen23
Stephen23 on 24 Aug 2023
Edited: Stephen23 on 24 Aug 2023
"I have just opened my csv files in a text editor. Whilst the headers look identical in Excel, in the text editor there are a number of comma delimiters after most lines on one of the files. Perhaps this explains the different behaviour."
Yes, differences between the files is most likely the cause.
Of course the algorithm used by READTABLE et al is not perfect (there is no such thing) and it cannot read minds: what is obevious to a human is not obvious to a machine. It is always possible to trick or confuse an algorithm with the right combination of data or whatever, such things are mathematically unavoidable.
Note that relying on what files "look like" in MS Excel is a number one mistake that you should avoid: MS Excel mangles data in all sorts of horrible ways that look indistinguishable from inside Excel, e.g. adding or changing dlimiters. It can also change data without any warning:
If you want reliable data processing do NOT open and save text files using MS Excel. It is a great tool for Excel spreadsheets... but for anything else... beware of dragons!

Sign in to comment.

Accepted Answer

Steven Lord
Steven Lord on 24 Aug 2023
If you know exactly how many header lines your file contains, I would specify the NumHeaderLines name-value argument in your readmatrix call.
Alternately you can create a file import options object using detectImportOptions. Once it's been created check that its properties that specify where the data is located (either DataRange or DataLines) and where any variable metadata is located (VariableNamesLine, VariableDescriptionsLine, VariableUnitsLine, or the corresponding Range properties for SpreadsheetImportOptions) match your expectations for where the data / metadata is located based on the expected format of the files. Once you've confirmed that they match your expectations, pass that import options object into readmatrix as the opts input argument.
If the import options properties don't match what you expect, and reviewing the file doesn't indicate to you why MATLAB is detecting the values for those properties that it is, please send a sample data file that demonstrates this behavior to Technical Support using this link along with the import options object and describe the results you expect. It's possible that you've identified a bug or an ambiguous edge case in the import options detection algorithm.

More Answers (0)

Products


Release

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!