How can I determine the number of Headerlines for varying, non-rectangular text files so that I can parse it with textscan?

I would like to use textscan to read in the tabular integer and floating point data, keying off of the *NODE line. This line can be anywhere in the file with other, non-comment strings and integer lines in there as well. How can I find the varying number of headerlines for any given input file?
My code and example input file are as follows, Thanks!
fid4 = fopen('E:\scratch\ANSYS_macro\MATLAB dyna beams\sample.k'); g = textscan(fid4,'%d %f %f %f','Delimiter','\n','headerlines',15); celldisp(g); fclose(fid4);
*KEYWORD
*TITLE
*DATABASE_FORMAT
$ 1IFORM 2IBINARY
0
$
$
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$ NODE DEFINITIONS $
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$
*NODE
$ 1NID 2X 3Y 4Z 5TC 6RC
1 0.141746 0.55315 -0.00592088
2 0.141746 0.538028 -0.00592088
3 0.126928 0.55315 -0.00669746
4 0.126926 0.538027 -0.00669757
5 0.112141 0.55315 -0.00747244
6 0.112138 0.538025 -0.00747256
7 0.0973459 0.55315 -0.0082478
8 0.0973435 0.538024 -0.00824792
9 0.0825538 0.55315 -0.00902302
10 0.0825514 0.538022 -0.00902315
11 0.0677682 0.55315 -0.0097979
$

Answers (1)

I'm not sure I understand whether your file contains one or more blocks of numerical data. Here is a file that handles both cases.
function g = ccsm()
str = fileread( 'cssm.txt' );
cac = regexp(str,'(?<=\*NODE\s+).+?(?=((\*KEYWORD)|($)))','match');
g = cell( 1, length( cac ) );
for jj = 1 : length( cac )
g{jj} = textscan(cac{jj},'%d%f%f%f', 'CommentStyle','$');
end
end
returns a cell array g, where
>> g{3}
ans =
[11x1 int32] [11x1 double] [11x1 double] [11x1 double]
and where cssm.txt contains three copies of text you included in your question.
.
Comments:
  • the entire text file must fit in memory to use this approach
  • it is not possible read and parse the file in one step with textscan
  • it is safer to use a definition of the file format than guess based on one sample
  • regexp is powerful and fast, but ... . The expression I used assumes that blocks of numerical data are enclosed by "*NOTE" and "*KEYWORD" or by "*NOTE" and end of file.

1 Comment

Ah, a regexp challenge, I take it! ;-)
I'd propose the following:
blocks = regexp( content, '\*NODE(.*?\n){2}([\s\d\-\.]+)', 'tokens' ) ;
if the block doesn't always end with a $ character, and
blocks = regexp( content, '\*NODE(.*?\n){2}([^$]+)', 'tokens' ) ;
if it does. Then, blocks{1} (only cell if there is only one block) is a cell array whose cell 1 contains the header, and whose cell 2 contains the data.

Sign in to comment.

Categories

Asked:

on 30 Apr 2014

Edited:

on 1 May 2014

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!