TEXTSCAN is my personal nemesis: Quotes strings with commas
    9 views (last 30 days)
  
       Show older comments
    
Dear all,
I thought I acquired some Matlab knowledge over the years. But textscan really makes me feel dumb. I can not figure out the syntax to read a quoted text file.
Let's say I have a text file with the following format:
 "L1: quoted text, with commas","111.111",123\n
 "L2: quoted text, with commas","222.222",234\n
 ...
(the field width are the same just for the example; consider them variable in the real world).
Now my understanding of textscan's formatSpec would be:
format = '%q,"%f",%f';
delim  = '\n';
res = textscan(fileId,format,'Delimiter',delim,'ReturnOnError',false);
This is because I am assuming that %q considers everything between two quotes as part of the same string, as the help states
 "If the string begins with a double quotation mark ("), omit the leading quotation mark
  and its accompanying closing mark, which is the second instance of a lone double 
  quotation mark"
However this fails with the ominous error message:
 Error using textscan
 Mismatch between file and format string.
 Trouble reading 'Literal' field from file (row number 1, field number 2) ==>
 "L2: quoted text, with commas","222.222",234\n
So the commas in the quoted string somehow screw-up the parsing, since using the regexp-like
 format = '"%[^"]","%f",%f';
instead of %q works perfectly fine.
So my questions are:
- What am I doing wrong?
 - Why is there the %q option if it does not treat everything in between as string (as this would than simply be a shorthand for "%s")?
 - Am I using the delimiter option correctly?
 
Any help appreciated!
Greetings, David
2 Comments
Accepted Answer
  Stephen23
      
      
 on 13 Feb 2017
        
      Edited: Stephen23
      
      
 on 13 Feb 2017
  
      fmt = '%q"%f"%f';
fid = fopen('temp5.txt','rt');
C = textscan(fid,fmt,'Delimiter',',');
fclose(fid);
Gives this:
>> C{:}
ans = 
    'L1: quoted text, with commas'
    'L2: quoted text, with commas'
ans =
  111.1110
  222.2220
ans =
   123
   234
And your questions:
- you are over-specifying the delimiter, and also confusing the delimiter with the end of line character |
 
6 Comments
  Stephen23
      
      
 on 13 Feb 2017
				
      Edited: Stephen23
      
      
 on 13 Feb 2017
  
			@David J. Mack: the task is complicated by having numeric fields surrounded by double quotes, which is rather erroneous because double quotes always indicate strings. Two solution for this are:
- Change the file writing so that it correctly does not put double quotes around numeric values.
 - Import the numeric values are strings using %q, and quickly convert them using str2double once inside MATLAB.
 
In both of these cases you could go back to defining the delimiter as just the comma, which would avoid collapsing fields together.
More Answers (2)
  Jeremy Hughes
    
 on 16 Feb 2017
        If you have access to R2016b, try using detectImportOptions with readtable.
opts = detectImportOptions(yourfile);
T = readtable(yourfile,opts)
I think this will do what you want without a lot of fuss.
  David J. Mack
      
 on 13 Feb 2017
        
      Edited: David J. Mack
      
 on 13 Feb 2017
  
      
      1 Comment
  Serge
      
 on 27 May 2018
				Any suggestions for this question:
https://au.mathworks.com/matlabcentral/answers/402792-yet-another-textscan-question
See Also
Categories
				Find more on Text Files in Help Center and File Exchange
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!