Error in formatting csv files imported from python

4 views (last 30 days)
Hi,
I am trying to format two csv files imported from Python. The following set of commands work fine for one csv file but raises error with another csv file. The two csv files are of the same type, the only difference are the data. I am confused as to why one is working while the other doesn't work.
rawdata = readcell('medcoordsdof.csv'); % Use readcell
data = cellfun(@(x) strtrim(erase(x,{'[' ']'})),rawdata(2,:),'uni',0); % Remove the characters such as '[' & ']'
datacell = cellfun(@(x) str2double(strsplit(x,' ')),data,'uni',0);
The above code works fine with the attached medcoordsdof.csv file.
However, the same lines of code raises error for the adcoordsdof.csv file. For example,
rawdata = readcell('adcoordsdof.csv'); % Use readcell
data = cellfun(@(x) strtrim(erase(x,{'[' ']'})),rawdata(2,:),'uni',0); % Remove the characters such as '[' & ']'
datacell = cellfun(@(x) str2double(strsplit(x,' ')),data,'uni',0);
raises the error:
Error using erase (line 40)
First argument must be text.
Error in Bfield>@(x)strtrim(erase(x,{'[',']'})) (line 4)
data = cellfun(@(x) strtrim(erase(x,{'[' ']'})),rawdata(2,:),'uni',0); % Remove the characters such as '[' & ']'
Error in Bfield (line 4)
data = cellfun(@(x) strtrim(erase(x,{'[' ']'})),rawdata(2,:),'uni',0); % Remove the characters such as '[' & ']'
Any help or suggestion would be extremely useful and will be greatly appreciated.
Thank you.

Answers (2)

Shlok
Shlok on 15 Oct 2024
Hi Avishek,
This error is being caused due to incorrect delimiter detection.
When you provide only the filenames to the “readcell” function, MATLAB attempts to automatically detect the import options, including delimiters, data formats, and whitespace handling. You can inspect the detected import options using the “detectImportOptions” function. This function creates an import options object, which allows you to define how to read data from files, such as CSV or Excel.
Let us observe what import options are detected for both files:
detectImportOptions("medcoordsdof.csv")
ans =
DelimitedTextImportOptions with properties: Format Properties: Delimiter: {','} Whitespace: '\b\t ' LineEnding: {'\n' '\r' '\r\n'} CommentStyle: {} ConsecutiveDelimitersRule: 'split' LeadingDelimitersRule: 'keep' TrailingDelimitersRule: 'ignore' EmptyLineRule: 'skip' Encoding: 'UTF-8' Replacement Properties: MissingRule: 'fill' ImportErrorRule: 'fill' ExtraColumnsRule: 'addvars' Variable Import Properties: Set types by name using setvartype VariableNames: {'X', 'Y', 'Z' ... and 3 more} VariableTypes: {'char', 'char', 'char' ... and 3 more} SelectedVariableNames: {'X', 'Y', 'Z' ... and 3 more} VariableOptions: [1-by-6 matlab.io.VariableImportOptions] Access VariableOptions sub-properties using setvaropts/getvaropts VariableNamingRule: 'modify' Location Properties: DataLines: [2 Inf] VariableNamesLine: 1 RowNamesColumn: 0 VariableUnitsLine: 0 VariableDescriptionsLine: 0 To display a preview of the table, use preview
detectImportOptions("adcoordsdof.csv")
ans =
DelimitedTextImportOptions with properties: Format Properties: Delimiter: {'\t' ' '} Whitespace: '\b' LineEnding: {'\n' '\r' '\r\n'} CommentStyle: {} ConsecutiveDelimitersRule: 'join' LeadingDelimitersRule: 'ignore' TrailingDelimitersRule: 'ignore' EmptyLineRule: 'skip' Encoding: 'UTF-8' Replacement Properties: MissingRule: 'fill' ImportErrorRule: 'fill' ExtraColumnsRule: 'addvars' Variable Import Properties: Set types by name using setvartype VariableNames: {'Var1', 'Var2', 'Var3'} VariableTypes: {'double', 'double', 'double'} SelectedVariableNames: {'Var1', 'Var2', 'Var3'} VariableOptions: [1-by-3 matlab.io.VariableImportOptions] Access VariableOptions sub-properties using setvaropts/getvaropts VariableNamingRule: 'modify' Location Properties: DataLines: [2 Inf] VariableNamesLine: 0 RowNamesColumn: 0 VariableUnitsLine: 0 VariableDescriptionsLine: 0 To display a preview of the table, use preview
For “medcoordsdof.csv”, a comma (,) is detected as the delimiter, while for “adcoordsdof.csv”, both tab and space (\t and ' ') are detected. As a result, the two files are parsed differently, causing the error.
To resolve this, you should manually specify the import options, particularly the delimiters. In this case, explicitly setting the delimiter to a comma will fix the problem. Here is the updated code:
rawdata = readcell('medcoordsdof.csv', Delimiter=','); % Use readcell
data = cellfun(@(x) strtrim(erase(x,{'[' ']'})),rawdata(2,:),'uni',0); % Remove the characters such as '[' & ']'
datacell = cellfun(@(x) str2double(strsplit(x,' ')),data,'uni',0)
datacell = 1x6 cell array
{1x420 double} {1x420 double} {1x420 double} {1x420 double} {1x420 double} {1x420 double}
rawdata = readcell('adcoordsdof.csv', Delimiter=','); % Use readcell
data = cellfun(@(x) strtrim(erase(x,{'[' ']'})),rawdata(2,:),'uni',0); % Remove the characters such as '[' & ']'
datacell = cellfun(@(x) str2double(strsplit(x,' ')),data,'uni',0)
datacell = 1x6 cell array
{1x835 double} {1x835 double} {1x835 double} {1x835 double} {1x835 double} {1x835 double}
For more details on the “readcell” and “detectImportOptions” functions, refer to the following MATLAB documentation links:

Stephen23
Stephen23 on 15 Oct 2024
Edited: Stephen23 on 15 Oct 2024
Well, those are some very badly designed files: if they had stored the data as a standard CSV file with one column for each variable then you would have no problems whatsoever. Instead each variable consists of double-quoted text containing all of its values in one long text vector... ugh UGH!
A much better way to arrange that data is to use a table with the data in columns (which is exactly how they should have been saved in the first place), here using the original column/variable names:
Ta = readtable( 'adcoordsdof.csv', Delimiter=',')
Ta = 1x6 table
X Y Z dof1 dof2 dof3 _________________________________________________________________ ________________________________________________________________ ________________________________________________________________ _____________________________________________________________________________ _____________________________________________________________________________ _____________________________________________________________________________ {'[0.0055738690809404 0.2060174762881647 -0.2440319918546928...'} {'[5.7481198875571859 5.2973121804358341 5.3303267945197774...'} {'[0.0000000000000000 0.0000000000000000 0.0000000000000000...'} {'[ 63 99 102 1371 1374 1446 2223 2298 2226 750 753 693 288 342...'} {'[ 64 100 103 1372 1375 1447 2224 2299 2227 751 754 694 289 343...'} {'[ 65 101 104 1373 1376 1448 2225 2300 2228 752 755 695 290 344...'}
Tm = readtable('medcoordsdof.csv', Delimiter=',')
Tm = 1x6 table
X Y Z dof1 dof2 dof3 ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ _____________________________________________________________________________ _____________________________________________________________________________ _____________________________________________________________________________ {'[3.7879565228811010 3.6184820614440758 3.2968428848988309...'} {'[0.1969263933952541 0.6619506649727049 0.3069983046790851...'} {'[0.0000000000000000 0.0000000000000000 0.0000000000000000...'} {'[1059 1014 1017 444 498 495 447 732 735 675 615 558 618 969...'} {'[1060 1015 1018 445 499 496 448 733 736 676 616 559 619 970...'} {'[1061 1016 1019 446 500 497 449 734 737 677 617 560 620 971...'}
fh = @(c)sscanf(c{:}(2:end),'%f');
Ta = varfun(fh,Ta);
Ta.Properties.VariableNames = strrep(Ta.Properties.VariableNames,'Fun_','') % optional
Ta = 835x6 table
X Y Z dof1 dof2 dof3 _________ _______ _ ____ ____ ____ 0.0055739 5.7481 0 63 64 65 0.20602 5.2973 0 99 100 101 -0.24403 5.3303 0 102 103 104 6.2122 -2.1462 0 1371 1372 1373 6.7154 -1.9188 0 1374 1375 1376 6.7429 -2.3638 0 1446 1447 1448 -0.16182 -5.8618 0 2223 2224 2225 0.12843 -6.1984 0 2298 2299 2300 -0.38404 -6.2799 0 2226 2227 2228 4.4283 1.8579 0 750 751 752 4.6498 2.2479 0 753 754 755 4.1076 2.1596 0 693 694 695 2.7002 6.0682 0 288 289 290 3.0728 5.864 0 342 343 344 3.0047 6.3897 0 291 292 293 3.3624 6.1667 0 345 346 347
Tm = varfun(fh,Tm);
Tm.Properties.VariableNames = strrep(Tm.Properties.VariableNames,'Fun_','') % optional
Tm = 420x6 table
X Y Z dof1 dof2 dof3 ________ _______ _ ____ ____ ____ 3.788 0.19693 0 1059 1060 1061 3.6185 0.66195 0 1014 1015 1016 3.2968 0.307 0 1017 1018 1019 0.18045 2.6531 0 444 445 446 0.12482 3.157 0 498 499 500 0.53643 2.9942 0 495 496 497 -0.23809 2.9119 0 447 448 449 0.81052 -3.5895 0 732 733 734 0.39543 -3.7384 0 735 736 737 0.40402 -3.2137 0 675 676 677 0.41216 4.8861 0 615 616 617 -0.08543 4.5228 0 558 559 560 0.26104 4.3724 0 618 619 620 2.8604 0.34739 0 969 970 971 3.198 0.74789 0 966 967 968 -2.2207 -0.103 0 222 223 224

Products


Release

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!