Importing line-by-line from a text file

Hi I have a problem choosing what kind of tool to import data from a text file. The data looks something like this although the lines are like 4000 and
35, 0401-BA1,1,6,8,00000,0,01,053,50530,'Channel ',,
35, 0401-CB1,1,6,8,00000,0,01,051,50510,'Spillway ',,
35, 0401-EB1,1,3,2,00052,0,01,051,50510,'RioGrande ',,
35, 0401-FS1,1,6,8,00203,0,01,011,27340,'Spillway ',,
35, 0401-MB1,1,6,8,00000,0,01,037,0 ,'Canal ',,
I used the fscanf , textscan but I don't understand how to tackle the spaces. for the second space is it like %15c ?? Also How do I switch to another line should I use /n? Please help me here guys. Thank you

1 Comment

Attaching the text file (or a representative part of it) would be nice. (Use the ‘paperclip’ icon.)

Sign in to comment.

 Accepted Answer

per isakson
per isakson on 3 Apr 2015
Edited: per isakson on 4 Apr 2015
Hint
str = '35, 0401-BA1,1,6,8,00000,0,01,053,50530,''Channel '',,';
fmt = '%d%s%d%d%d%d%d%d%d%d%s%s';
cac = textscan( str, fmt ...
, 'Delimiter' , ',' ...
, 'CollectOutput', true );
  • " how to tackle the spaces" &nbsp depends on how you want the spaces in the resulting string variable values
  • "line-by-line" &nbsp what exactly do you mean?
&nbsp
Addendum I
This illustrates an approach. However, due to various mismatches between the header row and the data rows the result is not useful.
sas = cssm()
returns
1x4000 struct array with fields:
STATE_CODE_001
STRUCTURE_NUMBER_008
RECORD_TYPE_005A
ROUTE_PREFIX_005B
SERVICE_LEVEL_005C
ROUTE_NUMBER_005D
DIRECTION_005E
HIGHWAY_DISTRICT_002
COUNTY_CODE_003
PLACE_CODE_004
....
where
function sas = cssm
ffs = 'c:\m\cssm\NM14.txt';
fid = fopen( ffs );
str = fgetl( fid );
col_head_list = parse_header( str );
sas = repmat( cell2struct( repmat({''},1,112), col_head_list(1:112), 2 ), 1,4000 );
jj = 0;
while not( feof(fid) )
str = fgetl( fid );
buf = parse_data_row( str );
jj = jj + 1;
sas(1,jj) = cell2struct( buf(1:112), col_head_list(1:112), 2 );
end
fclose( fid );
end
function cac = parse_header( str )
cac = strsplit( str, ',' );
cac = strtrim( cac );
end
function cac = parse_data_row( str )
cac = strsplit( str, ',' );
cac = strtrim( cac );
end
&nbsp
Addendum II
I was fooled by the function, strsplit, which as default treats multiple delimiters as one. In my world that only applies to space. Anyhow, now I have fixed the function cssm.
Read the data to a structure
>> sas=cssm;
find all bridges, which have STATUS_WITH_10YR_RULE equal to '2'.
>> is_sw10yrEq2 = strcmp('2',{sas.STATUS_WITH_10YR_RULE});
>> sum(is_sw10yrEq2)
ans =
324
there are 324 of them. List their building year
>> {sas(is_sw10yrEq2).YEAR_BUILT_027}
ans =
Columns 1 through 9
'1915' '1936' '1928' '1931' '1920' '1951' '1933' '1939' '1939'
Columns 10 through 18
'1924' '1931' '1931' '1931' '1932' '1932' '1936' '1937' '1936'
....
where
function sas = cssm
ffs = 'c:\m\cssm\NM14.txt';
fid = fopen( ffs );
str = fgetl( fid );
colhead_list = parse_header( str );
n_colhead = length( colhead_list );
sas = repmat( cell2struct ...
( repmat({''},1,n_colhead), colhead_list, 2 ) ...
, 1,4000 );
jj = 0;
while not( feof(fid) )
str = fgetl( fid );
buf = parse_data_row( str );
jj = jj + 1;
sas(1,jj) = cell2struct( buf, colhead_list, 2 );
end
fclose( fid );
sas(1,jj+1:end) = [];
end
function cac = parse_header( str )
cac = strsplit( str, ',', 'CollapseDelimiters',false );
cac = strtrim( cac );
end
function cac = parse_data_row( str )
cac = strsplit( str, ',', 'CollapseDelimiters',false );
cac = strtrim( cac );
end
&nbsp
Addendum III
Caveat: I hardly used the new (couple of years old), data type table. Nevertheless, table is targeted at problems like yours and I think you should
  • use Import Data interactively to read from a file of format B (one header line, comma separated, varying width) to a table. Make sure to uncheck Multiple Delimiters as One and select table. I've tested; it worked fine.
  • spend time with the documentation of table instead of fighting file formats.
&nbsp
Addendum IV
It is straightforward to read the file of format B (one header line, comma separated, varying width) with textscan
>> cac = cssm
cac =
{3951x134 cell}
where
function cac = cssm
ffs = 'c:\m\cssm\NM14.txt'; % one header line, comma separated, varying width
fid = fopen( ffs );
str = fgetl( fid );
chl = strtrim( strsplit( str, ',', 'CollapseDelimiters',false ) );
len = length( chl );
frm = repmat( '%s', 1, len );
cac = textscan( fid, frm, 'CollectOutput',true );
fclose( fid );
end

7 Comments

line-by-line means, scanning the text file row after row. As I put an example in the question. I tried coming up with this code.
NBIFile = 'D:\Re\nbi 2014\NM14.txt';
fid = fopen(NBIFile);
while ~feof(fid)
NBIData = fgetl(fid);
if feof(fid) break
end
% selects only the items of interest for a bridge
nos=['1' '5':'8']; % define the numbers in the keep list
if (NBIRecord(19)=='1') & (NBIRecord(374)=='Y') & ismember(NBIRecord(200),nos)
What I did above was for the item at posiotn 19, 374 and 200 I took the value and if it is 1 and Y and values 1,5thru8 I need to extract them into another matrix in MATLAB. But I don't know how to do it.
My actual scenario is the text file please find it in the attached, has different items(120 of them) recorded for each bridge and stored row wise for each of the bridge. What I need to do is extract some of the items which I need to compare later and do some analysis. What I did was as the text file had fixed with for each item, I took the physical positions and put that in a if condition, but I am not able to go further. Please could suggest me on this.
per isakson
per isakson on 3 Apr 2015
Edited: per isakson on 3 Apr 2015
Given
  • the entire file fits easily in memory
  • each row is approx. 480 bytes
  • the file has 120 columns
  • It's not practical to write a format-specifier "manually"
Issues
  • "text file had fixed with" are you sure? Inspection in editor show me that all columns don't have fixed width. I doubt that it is safe to rely on position.
  • there are 134 names in the header line, but only 123 columns in the first two data rows. Why?
  • all rows do not contain the same number of columns
I'm sure it has fixed width. Also those 134 names in header file have got all the columns. OK I can say all the data is comma separated, because when I import it in excel its exactly fits in.
I think I got an idea, with the comma separated values can we extract data based on that? Yes as you said relying on the position gave me problems while i was working just now. So with the comma separated values how do we move ahead?
adrooney
adrooney on 3 Apr 2015
Edited: per isakson on 3 Apr 2015
Thank you per isakson for the addendum. I am glad you did that.
  • I found out a text file with fixed width's this time, I think its gonna be easy. But I have a doubt how do I compare a fixed width for example a state code is 356 and is placed at the start of each line. So for an if statement should I use eg: if(NBIdata(1:3)=='356')
  • Also how do I write to direct the pointer from one line to another?
Now I have seen three different file formats:
A, in the question: no header, comma separated, fixed width
<<picture deleted to allow upload of another picture>>
it didn't help to delete this picture :(
&nbsp
B, first NMT.txt: one header line, comma separated, varying width
&nbsp
and the ends of the rows in B
&nbsp
Import Data doesn't get the columns right either
&nbsp
I was fooled by Matlab, which as default treats multiple delimiters as one. I thought that only applied to space.
&nbsp
C, second NMT.txt: no header, no separator, fixed width
&nbsp
The C option is much more easy. It has got fixed widths. I tried to uses textscan but not successful. Could you give your ideas on the C option. I am attaching the field widths in an xls sheet
"[...] ideas on the C option" &nbsp With Matlab, fixed widths should be avoided and when one cannot:
cac = textscan( fid, '%w1s%w2s%w3s%w4s%w5s...', .... );
where w1,w2,w3, etc. are the widths of the columns, e.g. '%8s%12s%30s'. Convert the "numeric columns" to numerical variables in a second step.

Sign in to comment.

More Answers (0)

Categories

Asked:

on 2 Apr 2015

Edited:

on 4 Apr 2015

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!