readtable cannot handle double quotation marks very well

6 views (last 30 days)
I have CSV files saved with LibreOffice with text flanked by double quotation marks (Format quoted field as text).
When I tried to read one of such CSV with two rows with readtable,
T0 = readtable('file1.csv',...
'Encoding','UTF-8','delimiter',',','ReadVariableNames',true);
readtable failed to read the first row,
Then I used this command and it can read both rows.
opts1 = delimitedTextImportOptions('Encoding','UTF-8','Delimiter',',','DataLines',[2 Inf],'VariableNamesLine',1);
T1 = readtable('file1.csv',opts1);
However, the content of table wasn't great:
ans = 2×1 cell
'"optotagging"'
'"behaviour"'
The double quotation marks remained in some columns.
setvaropts' option 'QuoteRule','remove' appeared to be promissing, but I could not get it work.
setvaropts(opts1,'QuoteRule','remove')
How do I nicely remove double quotation marks in CSVs?

Answers (1)

Kouichi C. Nakamura
Kouichi C. Nakamura on 6 Jan 2021
Edited: Kouichi C. Nakamura on 7 Jan 2021
I asked this to Mathworks and their answer was helpful:
opts = detectImportOptions('file1.csv','NumHeaderLines',0,'Delimiter',',') %will almost work for this case, but it detects the first line as a "meta-data" line because it is all string/blank
opts.DataLines = [2,inf] %will work around that issue
T2 = readtable('file1.csv',opts);
With this code, I can read both rows and remove double quotation marks nicely.
According to Mathworks:
> The solution shared, is very specific to your workflow and is an undocumented method which might change without notice.

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!