Import text file with blank lines. Matlab not replacing them with NaN

Matlab is not replacing my blank lines in my txt file with NaN but just joins all the data together. Unfortunately I need to data in the exact order it is as each line is a unique timestamp but the times are do not come in the txt file.
Any ideas? Tried importdata and textscan with no luck. Using R2014b

 Accepted Answer

Remains (at least) two possibilities
  • a loop over fgetl
  • read the file as one string, replace empty lines by 'nan nan ... ' and parse with textscan
Example (R2013a)
>> cac = cssm;
>> cac{:}
ans =
16 2 3 13
5 11 10 8
NaN NaN NaN NaN
9 7 6 12
4 14 15 1
where
function cac = cssm
str = fileread('cssm.txt');
str = regexprep( str, '(?<=\r?\n)[ ]*(?=\r?\n)', 'nan nan nan nan');
cac = textscan( str, '%f%f%f%f', 'CollectOutput', true );
end
and where cssm.txt contains
16 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
&nbsp
Replace
str = regexprep( str, '(?<=\r?\n)[ ]*(?=\r?\n)', 'nan nan nan nan');
by
str = regexprep( str, '(?<=\r?\n)[ ]*(?=\r?\n)' ...
, 'nan nan nan nan', 'emptymatch' );
to handle empty lines

10 Comments

Requires at least one char(32) in "empty" lines
Hi Per,
I would replace your pattern by
'(?<=(\r?\n|^))\s*(?=(\r?\n|$))'
to capture potential tabs or spaces on empty lines as well as cases where the file starts or ends with an empty line.
Hi Cedric,
Regular expressions require thorough testing. You propose three modifications
  • add ^| to handle leading empty lines. Yes, I agree.
  • replace [ ]* by \s* to handle potential delimiter on empty lines. OK, but there is one problem, \s* matches new-line. Replacing by \s*? (lazy) solves that.
  • add $| to handle trailing empty lines. No, because one occurrence of new-line at the end of the file does not indicate an empty line. Interactively created files may or may not have new-line at the end of the last line. Automatically created files "always" have new-line at the end of the last line.
My new expression is
'(?<=\r?\n|^)\s*?(?=\r?\n)'
I agree about point 3, but for point 2: \s* should not match the new-line when it is matched by the look forward. In that case, the pattern is matched because the 0 occurrence defined by the * is verified.
>> regexprep( 'ab', '(?<=a)b*(?=b)', 'z', 'emptymatch' )
ans =
azb
Hi Cedric,
To illustrate how I think, I have created three text files, cssm_0.txt, cssm_1.txt, cssm_2.txt, with zero, one and two empty lines at the end, respectively. The image are clips of the files in NotePad++.
&nbsp
With the expression
'(?<=\r?\n|^)\s*?(?=\r?\n)'
I get the results below
>> clear all,cac = cssm('cssm_0.txt');cac{:}
ans =
NaN NaN NaN NaN
16 2 3 13
5 11 10 8
NaN NaN NaN NaN
NaN NaN NaN NaN
9 7 6 12
4 14 15 1
>> clear all,cac = cssm('cssm_1.txt');cac{:}
ans =
NaN NaN NaN NaN
16 2 3 13
5 11 10 8
NaN NaN NaN NaN
NaN NaN NaN NaN
9 7 6 12
4 14 15 1
NaN NaN NaN NaN
>> clear all,cac = cssm('cssm_2.txt');cac{:}
ans =
NaN NaN NaN NaN
16 2 3 13
5 11 10 8
NaN NaN NaN NaN
NaN NaN NaN NaN
9 7 6 12
4 14 15 1
NaN NaN NaN NaN
NaN NaN NaN NaN
>>
\s* or \s*?
I can reproduce your example
>> regexprep( 'ab', '(?<=a)b*(?=b)', 'z', 'emptymatch' )
ans =
azb
and the lazy ? doesn't hurt
>> regexprep( 'ab', '(?<=a)b*?(?=b)', 'z', 'emptymatch' )
ans =
azb
However it doesn't work with the string from the text file. With the expression
'(?<=\r?\n|^)\s*(?=\r?\n)'
I get
>> clear all,cac = cssm('cssm_1.txt');cac{:}
ans =
NaN NaN NaN NaN
16 2 3 13
and with the expression
'(?<=\r\n|^)\s*(?=\r\n)'
I get
>> clear all, clear classes,cac = cssm('cssm_1.txt');cac{:}
ans =
NaN NaN NaN NaN
16 2 3 13
5 11 10 8
NaN NaN NaN NaN
NaN NaN NaN NaN
9 7 6 12
4 14 15 1
NaN NaN NaN NaN
The problem is with the "?" in \r?\n - I think. In this context "\s*" matches "\r" and the look-ahead is happy with "\n". With "\s*?" the "\r" goes to the look-ahead.
I used "\r*\n" in the first place to match both the DOS and the Windows style of new-line.
Thank you for the illustration, I will have a look in half an hour!
Hi Per,
Thank you for all the illustrations, I agree with all of your conclusions! After spending quite a bit of time working on alternate approaches based on tokens (which happens to be a little bit slower ultimately), I just realize that we don't need to match the eventual \r in the look behind.
Hi Cedric,
You are right, "\r" is not needed in the "look behind". And possibly, it saves on execution time to exclude it.

Sign in to comment.

More Answers (0)

Categories

Asked:

on 29 Jan 2015

Edited:

on 30 Jan 2015

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!