Issue with data from .csv being read as string when some are read as a number. Both columns contain commas...
46 views (last 30 days)
Show older comments
Cmullica
on 10 Sep 2020
Commented: Walter Roberson
on 12 Sep 2020
Hi all,
importing data from a .csv file as shown:
Here you can see that each column (aside from the initial datetime column) utilize commas to indicate thousands place, however, when I read it into matlab via the readtable function, it loads the data and then I convert each column to it's own variable using table2array.
There are two issues:
1: Columns 2, 4, 5 are reading anything with a comma as NaN, and I'm assuming that is attributed to the comma so I need to find the most efficient means of eliminating that comma (my intuition is to use a for loop to cycle through each element and to dynamically reconstruct) This is a horrid idea considering I have about 170000 data points to iterate through. Any idea for the most efficient method to eliminate the comma so I'm not losing any data to NaN's?
2: Columns 3,6,7 are being read as strings, so they are being stored as cells, which need to be converted to numerical datatypes... I'll be researching this next but I figured I'd mention this since I was asking about issue 1 already. My thought is to convert column's 3, 6, 7 to doubles, but that will lead anything with a comma to be read as a NaN, so I'll apply the fix for issue one.
Finally, I'm curious as to why some are being read as strings and some are being read as doubles when importing the data via readtable. I would imagine all of them would be read as doubles... Quite confused as to what MATLAB is doing here.
Here is an image of the data as read by MATLAB:
Any help is appreciated!
0 Comments
Accepted Answer
Walter Roberson
on 10 Sep 2020
MATLAB tries to convert entire columns to numeric with str2double. If none of the conversions work (all nan results) then matlab stores the unconverted text. If some of the conversions work then matlab stores the converted results, getting nan for the unconvertable entries, which is the entries that include comma.
I will check later to see if I can find any comma processing.
1 Comment
Walter Roberson
on 12 Sep 2020
In fairly recent versions of MATLAB, readtable() can now process text files that use comma as decimal point. If it does not automatically detect the comma, then use the option 'decimal', 'comma'
This does not apply to spreadsheets (such as .xlsx)
More Answers (1)
See Also
Categories
Find more on Spreadsheets in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!