reading from a binary file
Show older comments
I am getting an '18„N¦NÆN' instead of an '18' using fread. What is wrong here?
Thanks.
Answers (1)
Jan
on 25 Feb 2011
The command, you did not show, is wrong. This might be better:
A = fopen(FID, [1, 2], 'char=>char')
But this is a bold guess only. If you show the necessary details of your FREAD command and the corresponding data, a useful help is more likely and less random.
10 Comments
Doug
on 25 Feb 2011
Walter Roberson
on 26 Feb 2011
Are you reading characters or bytes? Asking to read a character can read as many bytes as are needed to decode at UTF encoded character. The default is UTF-8, which can involve up to three bytes per character.
If you want to read bytes instead of characters, then
fread(fid,2,'uint8=>char')
Jan
on 27 Feb 2011
@Doug: Does this mean, that you accept my question?
Jan
on 27 Feb 2011
@Walter: help('fread') in Matlab 2009a claims, that FREAD('uint8=>char') and FREAD('*char') yield to the same result. And in the current doc found in the net I do not see any details about UTF or Unicode characters. Are you able to read multi-byte characters just controlled by FREAD?
Walter Roberson
on 28 Feb 2011
Jan, look at the fread documentation, http://www.mathworks.com/help/techdoc/ref/fread.html
For char it says the size "Depends on the encoding scheme associated with the file. Set encoding with fopen."
In my opinion, you are misinterpreting an example. In the example you are looking at, all of the characters have single-byte representations under the default UTF-8 encoding, so in that special case, uint8=>char and *char would have the same effect. The author of the example was not making a declaration about the general effect of those two precisions, only a declaration with respect to that one example.
Jan
on 28 Feb 2011
@Walter: The OP did not describe any details, just that FREAD replied '18„N¦NÆN' instead of the wanted '18'. It is very unlikely, that the OP's problem concerns the choice of a UINT8 or CHAR precision or unicode-encoding, but the size argument of FREAD. I definitely do *not* misinterprete an example. I simply do not see a reason to speculate about the encoding scheme as long as the OP does not mention this with any little detail. In addition his two days old comment, that "fread(fid, 2, '*char')" has solved his problem, implies, that a discussion about the coding scheme is irrelevant here.
Thanks for explaining your opinion, but it does not seem to be useful for the OP, me or other readers.
Walter Roberson
on 28 Feb 2011
What does the 2009a help for fread say? The 2008b help lists uint8 in the first table and specifically says it is 8 bits, but lists char in the second table which is described with
The following platform dependent formats are also supported but
they are not guaranteed to be the same size on all platforms.
and specifically says,
If the precision is 'char' or 'char*1', MATLAB reads characters
using the encoding scheme associated with the file. See FOPEN
for more information.
The only hint I see that uint8 and char might yield the same result is a line in Examples,
This time, specify that you want each element read as an unsigned
8-bit integer and output as a character. (Using a precision of
'char=>char' or '*char' will produce the same result):
Examples, especially in the help document, are not considered to be definitive, and that particular example only deals with characters that would be encoded as single bytes in the default encoding. The interpretation that uint8=>char and *char are to be considered equivalent to each other in all circumstances directly contradicts the 2008b help, the 2008b "doc", and the 2010b "doc".
Jan
on 28 Feb 2011
@Walter: "The interpretation that uint8=>char and *char are to be considered equivalent...in all circumstances...contradicts the...doc": This is correct. Therefore it is fine, that nobody ever claimed such an equivalence.
I cannot see any conflicting points and therefore I do not see a reason to continue this pointless discussion: I know the difference between UINT8 and a Unicode character, you know this as well, and the OP's problem is not affected by this in any way.
Walter Roberson
on 28 Feb 2011
The data that the OP read in at first appears to me to potentially be UTF-8 encoded, based upon the characters that the OP shows. I am concerned that if the OP continues to work with this data stream that the OP might encounter cases where the difference between reading "char" and "uint8" might become important. It would be safer for the OP to use uint8=>char if reading bytes if the OP's intent.
Jan
on 28 Feb 2011
No. If the OP reads a Unicode file, he will most likely open it in the necessary encoding scheme. If he then switchs from the working "fread(fid, 2, '*char')" to "fread(fid, 2, 'uint8=>char')" he will get a different unwanted result.
Now you found a point, were we two disagree. But who cares?
Categories
Find more on Data Type Conversion in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!