Why does Readtable return NaN for values greater than 1000?
9 views (last 30 days)
Show older comments
I was working with a dataset that contains the annual GDP of all countries in billion USD. However, when importing the dataset with readtable, the corresponding table omits all data points that are greater than 1000.
fileName2 = 'GDP by Country 1999-2022.csv';
GDP = readtable(fileName2);
The following is the original CSV data (see Canada)
Whereas the table GDP that readtable produces is as below
Is there a reason for this happening?
4 Comments
Stephen23
on 27 Nov 2023
"the authors used double-quotes for any number greater than 1000"
Ugh.
"Any suggestions on how to convert these strings to double while importing?"
Upload a sample data file by clicking the paperclip button.
Accepted Answer
Stephen23
on 27 Nov 2023
Edited: Stephen23
on 27 Nov 2023
Ugh, double-quoting number values really is an abuse of CSV norms. Note that is not possible to simply remove/ignore the double quotes globally (i.e. throughout the file), because
- they are also significant in the first column of text, e.g. "Bahamas, The"
- the thousands separator is the same as the field delimiter character.
So we need a way of removing/ignoring them only from numeric values... and to know if they are numeric requires removing them first before testing if they are numeric so that the quotes can be removed... catch 22.
Ugh! The only reason someone would go out of their way to make that file more complex to parse (i.e. by adding superfluous quotes) is if they want to make everyone else's lives harder. Avoid the person who designed that file.
One approach to importing that very unfortunate datafile would be to leverage STR2DOUBLE, which by default will accept the "," thousands delimiter. We just need to merge its output with the (already) numeric data:
C = readcell('GDP by Country 1999-2022.csv');
M = str2double(C);
X = isnan(M);
X(:,1) = false;
M(X) = [C{X}] % all data are now numeric (just ignore the first column)
M(24,2:end) % check the first row with double-quoted numbers
And if you really want it in a table:
T = array2table(M(2:end,2:end), 'VariableNames',string(M(1,2:end)), 'RowNames',C(2:end,1))
T{'Brazil',:} % check the first row with double-quoted numbers
More Answers (1)
See Also
Categories
Find more on Cell Arrays in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!