read ascii non-delimited file

How do I read a numeric non-delimited file in matlab program? I tried using
fid=fopen('filename','r')
this generated fid=-1 and then textscan does not work.
Load function does not work on non-delimited files.
Is there any other way to read such file?
Here is an example how my data file looks -
-99.9999-99.9999-99.9999-99.9999 -1.2828 -1.2812
-1.0910 -1.1864 -1.2920-99.9999-99.9999-99.9999
-99.9999-99.9999-99.9999-99.9999-99.9999-99.9999
Thanks in advance!

6 Comments

Joseph Cheng
Joseph Cheng on 25 Jun 2014
Edited: Joseph Cheng on 25 Jun 2014
there might be something a bit strange going on with your fopen statement. It shouldn't matter what the format it is. you should be getting a positive number. a -1 means it failed to open due to unable to find file, permission(?), etc. so since it failed textscan wouldn't work anyways since the file didn't open. I'd revisit the fopen statement.
Is the data all negative? or is is deliminated by '-'. how do you differentiate 99.999999.9999, or even 99.99990.9999?
dpb
dpb on 25 Jun 2014
Edited: dpb on 26 Jun 2014
Is the data all negative? or is is deliminated by '-'. how do you differentiate 99.999999.9999, or even 99.99990.9999?
It's a fixed-width field, each is 8 columns wide. The values with preceding '-' signs are negative, the rest aren't. The substring you made up of 99.999999.9999 isn't valid; two positive 99.9999 values would be printed as ' 99.9999 99.9999' as the format used was F8.4 (%8.4f in C/Matlab). It appears in the file the -99.9999 value is probably a missing value indicator.
My comment was before he formatted it in that fashion. It previously was just all concatenated together where it appeared to be all in one line.
Perhaps you had to have come from Fortran background going back to punch-card days where column-delimited fields were/are the norm to have just presumed that was only a fig-newton of the word wrap formatting...I'll not bother to mention how old that must make me. :)
luckily/unluckily i never really "had" to do stuff on punch-cards. However i've been on the receiving end of some horrendous text file formatting that deliminators even fixed widths cannot be taken for granted.
If i saw how the poster reformatted the question the way it is now it would be clear there were some formatting pattern.
...been on the receiving end of some horrendous text file formatting that deliminators even fixed widths cannot be taken for granted.
And probably most came from C or like languages where the "stream file" concept of run-on runs rampant. In Fortran even "list-directed" output is quite easy to parse.
I still can't believe I whiffed on the precision specifier for so long with Matlab in reading a fixed-width field, though. I've not yet taken the time to go back to the earlier release to ensure it worked there, too, though...maybe I'll find it was broken before but I'm not holding my breath.

Sign in to comment.

 Accepted Answer

I know it's not too fancy, but it's really simple, so how about:
fid = fopen('vacube.dat')
entireLine = fgetl(fid); % Read first line.
row = 1;
while ischar(entireLine)
fprintf('Line %d = "%s"\n', row, entireLine);
% Parse out the 6 numbers from the 48 character long string.
numbers(row, 1) = str2double(entireLine(1:8));
numbers(row, 2) = str2double(entireLine(9:16));
numbers(row, 3) = str2double(entireLine(17:24));
numbers(row, 4) = str2double(entireLine(25:32));
numbers(row, 5) = str2double(entireLine(33:40));
numbers(row, 6) = str2double(entireLine(41:48));
% Get the next line (if any).
entireLine = fgetl(fid);
row = row + 1; % Increment row counter for the numbers array.
end
fclose(fid); % Close the file.
% Report the numbers to the command line.
numbers
It looks like it's pretty similar to dpb's.

6 Comments

For 360 fields of 8, try this:
fid = fopen('vacube.dat')
entireLine = fgetl(fid);
row = 1;
numNumbers = 360;
while ischar(entireLine)
fprintf('Line %d = "%s"\n', row, entireLine);
% Parse out the 6 numbers from the 48 character long string.
for k = 1 : numNumbers
numbers(row, k) = str2double(entireLine((k-1)*8+1:8*k));
end
% Get the next line (if any).
entireLine = fgetl(fid);
row = row + 1; % Increment row counter for the numbers array.
end
fclose(fid); % Close the file.
% Report the numbers to the command line.
numbers
Thanks. I got an error though. Once I used str2double, the next entireLine=fget1(fid) within the while loop gives me error as - Undefined function 'fget1' for input arguments of type 'double'.
You should have copied my code. If you did you'll see it's FGETL, not FGET1. You must have typed it in yourself and though the lower case l (ELL) was a 1 (one).
thank you so very much.
You're welcome. If it works for you, you can "Vote" for the Answer and officially "Accept" it. Thanks.
Use Per's version or my perturbation thereof instead -- there's no need and performance will be much better if don't scan the file line-by-line.

Sign in to comment.

More Answers (3)

per isakson
per isakson on 26 Jun 2014
Edited: per isakson on 27 Jun 2014
Try
fid = fopen( 'cssm.txt', 'r' );
cac = textscan( fid, '%8.4f%8.4f%8.4f%8.4f%8.4f%8.4f', 'Whitespace',' ');
fclose( fid )
celldisp( cac )
which returns
cac{1} =
-99.9999
-1.0910
-99.9999
cac{2} =
-99.9999
-1.1864
-99.9999
cac{3} =
-99.9999
-1.2920
-99.9999
cac{4} =
-99.9999
-99.9999
-99.9999
cac{5} =
-1.2828
-99.9999
-99.9999
cac{6} =
-1.2812
-99.9999
-99.9999
>>
and where cssm.txt contains
-99.9999-99.9999-99.9999-99.9999 -1.2828 -1.2812
-1.0910 -1.1864 -1.2920-99.9999-99.9999-99.9999
-99.9999-99.9999-99.9999-99.9999-99.9999-99.9999
.
In response to comments:
To read many, N, columns with identical format use
N = 6;
format_spec = repmat( '%8.4f', 1, N );
fid = fopen( 'cssm.txt', 'r' );
cac = textscan( fid, format_spec, 'CollectOutput', true );
fclose( fid );
celldisp( cac )
which returns
cac{1} =
-99.9999 -99.9999 -99.9999 -99.9999 -1.2828 -1.2812
-1.0910 -1.1864 -1.2920 -99.9999 -99.9999 -99.9999
-99.9999 -99.9999 -99.9999 -99.9999 -99.9999 -99.9999
.
One more revisit: The precision specifier is NOT needed
N = 6;
% format_spec = repmat( '%8.4f', 1, N );
format_spec = repmat( '%8f', 1, N );
fid = fopen( 'cssm.txt', 'r' );
cac = textscan( fid, format_spec, 'CollectOutput', true );
fclose( fid );
celldisp( cac )
returns
cac{1} =
-99.9999 -99.9999 -99.9999 -99.9999 -1.2828 -1.2812
-1.0910 -1.1864 -1.2920 -99.9999 -99.9999 -99.9999
-99.9999 -99.9999 -99.9999 -99.9999 -99.9999 -99.9999

7 Comments

dpb
dpb on 26 Jun 2014
Edited: dpb on 26 Jun 2014
+123!!!
Per, I don't know why I hadn't ever thought to explicitly encode the precision as well as the field width (although in my defense I'll note that in all my years of complaining, nobody else has ever pointed it out, either). The latter includes the response from TMW on the previous behavior/enhancement request.
That does, however, cause the scanner to actually stop and then pick up again at the right spot.
Now can you explain how/why that's enough and a simple '%8f' isn't and is that somehow/somewhere documented within C? I've never seen anything that implied it, certainly, in all the looking I've done, but I'm certainly anything but a C-lizard.
Anyway, thanks for sticking that in here! I'll make the note on the aforementioned request and ask for a documentation enhancement.
PS. The 'whitespace' parameter isn't needed--I wondered at first if that was the trick that somehow I had missed initially but it is simply having the explicit exact format that is the deal, it seems.
Thanks for the answer. When I wrote the example of the data, I kept it short by writing only 6 numbers in each line. The data I am working on actually has 360 fixed width 8.4f numbers. What is the solution for reading that data without writing %8.4f 360 times?
per isakson
per isakson on 26 Jun 2014
Edited: per isakson on 26 Jun 2014
vacube, I've added a modified script to my answer.
per isakson
per isakson on 26 Jun 2014
Edited: per isakson on 26 Jun 2014
dpb, I cannot recall how I came to try this format specification. Anyhow, it was definitely not based on the documentation of C. I've never used C. To me, this works automagically.
I'm not always comfortable with default values. Obviously, whitespace is not needed. My first reading was interrupted by " -1." I added whitespace,' ' and it worked. Now, I cannot reproduce the problem. However, with whitespace,'' it doesn't work.
It is a good idea to ask TMW for documentation.
dpb
dpb on 26 Jun 2014
Edited: dpb on 26 Jun 2014
OK, Per, I thought perhaps you did have intimate knowledge of the C spec for a basis. Certainly the explicit format string should work; again why I hadn't used it previously in my many previous set-tos with Matlab and fixed-column-width files I can't fathom; certainly did much with them in Fortran.
I understand the empty string for whitespace not working; that would eliminate the needed blanks in some of the fields. Who knows on the occasional irreproducible glitch?--generally it's a typo that can't reproduce or other "gotcha'!".
Anyway, again, I'm certainly pleased to have discovered the result even if there's still nuances don't fully understand.
I've had multitude of conversations w/ TMW support over the years on it altho I guess I have never posed the question of what's different in parsing a fixed-width field of W characters with %Wf vis a vis %W.Pf, that's true.
Anyway, thanks again, and thanks for the followup...I'll go away now on this particular subject. :)
Surprise:
>> sscanf( '1.23.45.6', '%3f%3f%3f' )
ans =
1.2000
3.4000
5.6000
and
>> textscan( '1.23.45.6', '%3f%3f%3f' )
ans =
[1.2000] [3.4000] [5.6000]
dpb
dpb on 27 Jun 2014
Edited: dpb on 27 Jun 2014
The sample case here is an anomaly; all columns are separated by either a '-' or a blank and that's enough to get the parser back on track. When it's truly only column-width w/o the help of either blanks or signs to help, then the parser gets lost. To see the issue, try the test.dat file in the earlier thread.
Or, even easier to see...
>>sscanf( '1.2 5.6', '%3f%3f%3f' )
ans =
1.2000
5.6000
>>
The intermediate 'bbb' column gets skipped over...

Sign in to comment.

I see IA answered the problem about opening the file; that's just the beginning as Matlab has no way to read a nondelimited, fixed-width file with native methods. There's an extended thread here that I just checked on status of earlier today...
for a long and ugly story.
For your case with a fixed set of fields for all the data, probably the simplest is to just read as character data and then arrange to parse the columns in order...I put the two records into a file and...
>> type vacube.dat
-99.9999-99.9999-99.9999-99.9999 -1.2828 -1.2812 -1.0910 -1.1864 -1.2920
-99.9999-99.9999-99.9999-99.9999-99.9999-99.9999-99.9999-99.9999-99.9999
>> c=textread('vacube.dat','%s','delimiter','\n'); c=char(c{:});
>> d=zeros(size(c,1),size(c,2)/8);
>> j=0;for i=1:8:length(c),j=j+1;d(:,j)=str2num(c(:,i:i+7));end
>> d
d =
-99.9999 -99.9999 -99.9999 -99.9999 -1.2828 -1.2812 -1.0910 -1.1864 -1.2920
-99.9999 -99.9999 -99.9999 -99.9999 -99.9999 -99.9999 -99.9999 -99.9999 -99.9999
>>
Above
a) reads into cell array, converts to character
b) preallocates a data array based on length of line and known 8-wide data fields
c) loops over the columns 8 at a time and converts the substrings, storing results in the appropriate column of the data vector.
Is filename a variable? If so, try it without the quotes:
fid = fopen(filename,'rt')

2 Comments

Thanks. Actually this worked. filename is a variable, and after I removed the quotes around it, it gave me positive fid. Thanks.
dpb
dpb on 26 Jun 2014
Edited: dpb on 26 Jun 2014
That's the repmat trick...
fmt=repmat('%8.4f',1,360);
data=cell2mat(textscan(fid,fmt));
The above cell2mat will convert the cell array from textscan to a regular array for subsequent ease of addressing (by eliminating the {} notation).
As an aside, I still wish followed the FORMAT style so could write
fmt='360F8.4'
-- much simpler and more legible syntax than repmat
Since it's regular and you know the number, you can also just use '%8.4f' and then reshape the returned vector, remembering storage order is row-major.
dat=reshape(textread(filename,'%8.4f'),360,[]).';
While deprecated, textread has much of the functionality of textscan and a couple of advantages where don't need all of its fancier cousin--namely, does away with the fopen/fclose by handling opening the file internally and returns a regular double array instead of cell array saving the extra step of casting.

Sign in to comment.

Asked:

on 25 Jun 2014

Edited:

dpb
on 27 Jun 2014

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!