Read binary with ascii header
8 views (last 30 days)
Show older comments
Hi,
I have a .vtu-file which contains numerical data (velocity value(s) at certain x,y,z-points) in binary format. The format and information about the file is given in a short (ascii!) header at the beginning of the file:
<VTKFile type="UnstructuredGrid" version="0.1" byte_order="LittleEndian" compressor="vtkZLibDataCompressor">
<UnstructuredGrid>
<Piece NumberOfPoints=" 318105" NumberOfCells=" 1779159">
<Points>
<DataArray type="Float64" NumberOfComponents="3" format="binary">
The NumberOfPoints specifies the length of the file, and there should be 7 columns of data. As specified, the data array is given in Float64 format.
Anyway, the question is: how can I read in the ascii header, extract information from there, and the proceed by reading the rest of the file as binary.
Any ideas?
0 Comments
Answers (2)
Guillaume
on 24 Nov 2015
You can still use text reading operation with a binary file, so use fgetl (or fgets) to read your text (assuming the header actually consists of lines ending with a newline), then normal binary reading (with fread).
fid = fopen('yourfile.vtu');
header = arrayfun(@(~) fgetl(fid), 1:5, 'UniformOutput', false);
numpoints = str2double(regexp(header{3}, '(?<=NumberOfPoints=" +)\d+', 'match', 'once'));
datatype = regexp(header{5}, '(?<=type=")[^"]+(?=")', 'match', 'once');
points = fread(fid, [numpoints 7], lower(datatype)); %assume datatype is a valid type for fread. 'float64' is
fclose(fid);
Note that I assumed that the format of your header was constant. Always 5 lines with number of points on the 3rd and data type on the fifth.
3 Comments
Guillaume
on 24 Nov 2015
float64 (in lowercase) is a valid type for fread. It is equivalent to double.
There can be several reasons for the binary part not to read properly:
- end of line issues. As you probably know there are different conventions for marking the end of a text line. Windows text files typically use '\r\n', Unix text files use '\n'. Other exotic formats may use something different. When you read files in text mode, matlab automatically detect the correct line ending and strip these off. In binary mode, I'm not sure what happens (doc is not clear on that), so possibly part of the end of the last line is left over and causes an offset error.
- Are you sure the data starts immediately after the text portion? This would also cause an offset error
Guillaume
on 24 Nov 2015
Another two things:
- endianness seems to be encoded in the file. It may be safer to explicitly use it in the fread code. Although if you are on windows, you were already reading in little endian:
endianness = regexp(header{1}, '(?<=byte_order=")[^"]+', 'match', 'once');
points = fread(fid, [numpoints 7], lower(datatype), 0, lower(endianness(1))); %assume endiannes either starts with 'L' or 'B'
- more importantly, your header includes compressor="vtkZLibDataCompressor"> which would indicate that your data is compressed with zlib. If that is the case, then it is a lot more complicated to read it. There are submissions on the file exchange to decompress zlib.
Thorsten
on 24 Nov 2015
Edited: Thorsten
on 24 Nov 2015
Use fgets to read the first ASCII lines and then use fread (with appropriate arguments) for the remaining binary stuff.
2 Comments
Image Analyst
on 24 Nov 2015
Here's 2 lines of code from one of my utilities that reads 2-D image slices from a raw 3-D image:
dataLengthString = '*uint16'; % You need the *, otherwise fread returns doubles.
oneFullSlice = fread(fileHandle, [rows, columns], dataLengthString);
See Also
Categories
Find more on Large Files and Big Data in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!