Issue with data from .csv being read as string when some are read as a number. Both columns contain commas...

46 views (last 30 days)
Hi all,
importing data from a .csv file as shown:
Here you can see that each column (aside from the initial datetime column) utilize commas to indicate thousands place, however, when I read it into matlab via the readtable function, it loads the data and then I convert each column to it's own variable using table2array.
There are two issues:
1: Columns 2, 4, 5 are reading anything with a comma as NaN, and I'm assuming that is attributed to the comma so I need to find the most efficient means of eliminating that comma (my intuition is to use a for loop to cycle through each element and to dynamically reconstruct) This is a horrid idea considering I have about 170000 data points to iterate through. Any idea for the most efficient method to eliminate the comma so I'm not losing any data to NaN's?
2: Columns 3,6,7 are being read as strings, so they are being stored as cells, which need to be converted to numerical datatypes... I'll be researching this next but I figured I'd mention this since I was asking about issue 1 already. My thought is to convert column's 3, 6, 7 to doubles, but that will lead anything with a comma to be read as a NaN, so I'll apply the fix for issue one.
Finally, I'm curious as to why some are being read as strings and some are being read as doubles when importing the data via readtable. I would imagine all of them would be read as doubles... Quite confused as to what MATLAB is doing here.
Here is an image of the data as read by MATLAB:
Any help is appreciated!

Accepted Answer

Walter Roberson
Walter Roberson on 10 Sep 2020
MATLAB tries to convert entire columns to numeric with str2double. If none of the conversions work (all nan results) then matlab stores the unconverted text. If some of the conversions work then matlab stores the converted results, getting nan for the unconvertable entries, which is the entries that include comma.
I will check later to see if I can find any comma processing.
  1 Comment
Walter Roberson
Walter Roberson on 12 Sep 2020
In fairly recent versions of MATLAB, readtable() can now process text files that use comma as decimal point. If it does not automatically detect the comma, then use the option 'decimal', 'comma'
This does not apply to spreadsheets (such as .xlsx)

Sign in to comment.

More Answers (1)

Cmullica
Cmullica on 10 Sep 2020
Update: I've rectified the issue by just formatting it within the .csv in excel... fixed all of my issues, however, I am still curious as to how this would be done from within matlab if I didn't have access to the excel sheet.

Products


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!