Read binary with ascii header

9 views (last 30 days)
David
David on 24 Nov 2015
Commented: Guillaume on 24 Nov 2015
Hi,
I have a .vtu-file which contains numerical data (velocity value(s) at certain x,y,z-points) in binary format. The format and information about the file is given in a short (ascii!) header at the beginning of the file:
<VTKFile type="UnstructuredGrid" version="0.1" byte_order="LittleEndian" compressor="vtkZLibDataCompressor">
<UnstructuredGrid>
<Piece NumberOfPoints=" 318105" NumberOfCells=" 1779159">
<Points>
<DataArray type="Float64" NumberOfComponents="3" format="binary">
The NumberOfPoints specifies the length of the file, and there should be 7 columns of data. As specified, the data array is given in Float64 format.
Anyway, the question is: how can I read in the ascii header, extract information from there, and the proceed by reading the rest of the file as binary.
Any ideas?

Answers (2)

Guillaume
Guillaume on 24 Nov 2015
You can still use text reading operation with a binary file, so use fgetl (or fgets) to read your text (assuming the header actually consists of lines ending with a newline), then normal binary reading (with fread).
fid = fopen('yourfile.vtu');
header = arrayfun(@(~) fgetl(fid), 1:5, 'UniformOutput', false);
numpoints = str2double(regexp(header{3}, '(?<=NumberOfPoints=" +)\d+', 'match', 'once'));
datatype = regexp(header{5}, '(?<=type=")[^"]+(?=")', 'match', 'once');
points = fread(fid, [numpoints 7], lower(datatype)); %assume datatype is a valid type for fread. 'float64' is
fclose(fid);
Note that I assumed that the format of your header was constant. Always 5 lines with number of points on the 3rd and data type on the fifth.
  3 Comments
Guillaume
Guillaume on 24 Nov 2015
float64 (in lowercase) is a valid type for fread. It is equivalent to double.
There can be several reasons for the binary part not to read properly:
- end of line issues. As you probably know there are different conventions for marking the end of a text line. Windows text files typically use '\r\n', Unix text files use '\n'. Other exotic formats may use something different. When you read files in text mode, matlab automatically detect the correct line ending and strip these off. In binary mode, I'm not sure what happens (doc is not clear on that), so possibly part of the end of the last line is left over and causes an offset error.
- Are you sure the data starts immediately after the text portion? This would also cause an offset error
- I assumed that 'float64' means IEEE 754 double precision float. It could mean something else.
Guillaume
Guillaume on 24 Nov 2015
Another two things:
- endianness seems to be encoded in the file. It may be safer to explicitly use it in the fread code. Although if you are on windows, you were already reading in little endian:
endianness = regexp(header{1}, '(?<=byte_order=")[^"]+', 'match', 'once');
points = fread(fid, [numpoints 7], lower(datatype), 0, lower(endianness(1))); %assume endiannes either starts with 'L' or 'B'
- more importantly, your header includes compressor="vtkZLibDataCompressor"> which would indicate that your data is compressed with zlib. If that is the case, then it is a lot more complicated to read it. There are submissions on the file exchange to decompress zlib.

Sign in to comment.


Thorsten
Thorsten on 24 Nov 2015
Edited: Thorsten on 24 Nov 2015
Use fgets to read the first ASCII lines and then use fread (with appropriate arguments) for the remaining binary stuff.
  2 Comments
David
David on 24 Nov 2015
Hi,
thanks for the reply. I can easily read the first header lines using fgets, but I'm not sure what arguments to give fread (or whatever good function to use for the binary reading) in order for it to know how to skip the header lines.
Do you have any suggestion on that?
Image Analyst
Image Analyst on 24 Nov 2015
Here's 2 lines of code from one of my utilities that reads 2-D image slices from a raw 3-D image:
dataLengthString = '*uint16'; % You need the *, otherwise fread returns doubles.
oneFullSlice = fread(fileHandle, [rows, columns], dataLengthString);

Sign in to comment.

Categories

Find more on Large Files and Big Data in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!