Import timestamps from textfile with more than 10k entries (R2014b)

This is less of a question but more of a bug report: I got a text file with something like 80k data entries i.e. lines with numbers separated by a tab. Besides several numeric values the last two columns of this file contain the date and time of the moment the measured value was taken. When I use the built-in file import wizard and select to import the columns as column vectors, while giving each column the corresponding format -- so "Number" and "Datetime" -- and names, everything works fine. The importing takes some time, but finishes without any errors eventually. Nevertheless, the imported vector for the time column -- format "HH:mm:ss" -- only contains entries with todays date, no timestamp whatsoever ... Anyone else had the problem yet? I've solved the problem by writing a function which imports that column in blocks of size <10k -- a number I found to be the largest block size to still result in an actual import of the correct timestamp by try and error. I've attached the file, so anyone can try if he gets the same result:

3 Comments

Try to replace
"HH:mm:ss"
by
"HH:MM:SS"
if it's hours, minutes, seconds
Can't remember exactly, but I think that has been the definition of R2014a and earlier. HH:MM:SS in R2014b results in NaT -- "Not a Time". Apparently this whole datetime-thing of R2014b has changed things a bit, at least my import wizard now suggests "HH:mm:ss" as a default option to import dates.
I'm using R2013b and find it difficult to take-in that The MathWorks have made changes to the datetime format specifiers.
HH:MM:SS has "always" been the format string for hour,minute,second. And the R2014b documentation on datenum prescribe "MM" for minutes and "mm" for month. See datenum, Input Arguments. The doc on uiimport and dataimport doesn't say anything.
MM Month, numerical using two digits 04
mm Minute, using two digit 41
I'm warned!
&nbsp
IMO: textscan is the best function to read files like yours.

Sign in to comment.

Answers (2)

Yes, Star Strider is correct. The string "E16" appears 24 times in the first column. What's that supposed to indicate? And I found another problem
&nbsp
&nbsp
One way to read and parse the file:
>> cac = cssm
cac =
[80602x6 double] {80602x2 cell}
>>
where
function cac = cssm
str = fileread('2015-01-20_scan (1).txt');
str = strrep( str, 'E16', '1E16' );
str = strrep( str, '..', '.' );
cac = textscan( str, '%f%f%f%f%f%f%s%s' ...
, 'Delimiter' , '\t' ...
, 'CollectOutput' , true ...
, 'EmptyValue' , nan );
end
and then convert the datetime strings.
&nbsp
TreatAsEmpty takes care of E16
>> textscan( '123,E16,456', '%f%f%f', 'Delimiter',',', 'TreatAsEmpty','E16' )
ans =
[123] [NaN] [456]

3 Comments

Thank you guys for all your help. 'E16' is an error string from one of our sensors -- so the actual information ain't that important. So far I just take it as nan and set it to zero later. Those "double dots" cause a bit more headache at my end right now. Didn't actually see them so far, so thanks for the data scanning.
Uploaded a new file, shouldn't contain any '..' anymore. Really don't have a clue how those got into the data anyways.
  • I assume that "E16" stands for error 16. Is there a handful of other errors? With only a few error strings, TreatAsEmpty is the Matlab way.
  • My function, cssm, handles the double dot. Shouldn't be a headache. As long as the files fits in memory this approach works well. Of course, next time it is something different.

Sign in to comment.

Hannes, your data file format has serious problems. I worked for a few hours to get textscan to import it, and I was successful to an extent. The problem is with sections such as:
3024.699 -10.750235 7.350000 47.369776 51 20.01.2015 12:21:00
3024.665 -10.750235 7.350000 47.369776 51 20.01.2015 12:21:00
E16 -10.750235 7.350000 47.369776 51 20.01.2015 12:21:00
E16 -10.750235 7.350000 47.369776 51 20.01.2015 12:21:00
E16 -10.750235 7.350000 47.369776 51 20.01.2015 12:21:00
3024.670 -10.750235 7.350000 47.369776 51 20.01.2015 12:21:00
3024.633 -10.750235 7.350000 47.369776 51 20.01.2015 12:21:01
3024.644 -10.750235 7.350000 47.369776 51 20.01.2015 12:21:01
3024.750 -10.750235 7.350000 47.369776 51 20.01.2015 12:21:01
3024.683 -10.750235 7.350000 47.369776 51 20.01.2015 12:21:01
E16 -10.750235 7.350000 47.369776 51 20.01.2015 12:21:01
E16 -10.750235 7.350000 47.369776 51 20.01.2015 12:21:01
3024.683 -10.750235 7.350000 47.369776 51 20.01.2015 12:21:01
3024.709 -10.750235 7.350000 47.369776 51 20.01.2015 12:21:02
The irregularities make it impossible to read. You may have to edit it in a word processor, use fgets, or something else. Since I have no idea if the ‘E16’ information is important, I have no idea how to deal with it.

1 Comment

I wanted to wait to find out how you wanted to treat the ‘E16’ lines before I went further.
There are also several blank lines that would require a ‘restart’ from the current position in the file using:
[Dts, posn] = textscan(fidi, '%f%f%f%f%f%s%s', 'Delimiter',' ', 'CollectOutput',0, 'HeaderLines',1, 'TreatAsEmpty',{'E16'});
fseek(fidi, 0, 0);
for each one. (I use ‘fidi’ to indicate ‘file ID input’, and created by fopen.) This would require a loop with the iterations equal to the number of blank lines, or a while loop, beginning with:
while ~feof(fidi)
You could also go through and eliminate them manually.
I also found that the delimiter was a space (' ') not ('\t'). I leave the 'CollectOutput' option to you. You can easily use the cell2mat function to create a matrix out of the first 5 columns later if you choose to set it to 0. That would leave the date and time strings as a separate cell (here ‘dt’) that you could convert to date numbers with:
dtn = datenum(dt, 'dd.mm.yyyy HH:MM:SS');
You would have to deal with the double dots (‘..’) as a separate problem, before you use textscan, since it has no way to deal with them.
EDIT — There are more problems that are causing the datenum conversion to fail:
3024.564 -10.773204 7.350000 46.682830 56 20.01.2015 12:27:22
3024.654 -10.773204 7.350000 46.682830 56 20.01.2015 12:27:22
3024.555 -10.773204 7.350000 46.15 13:03:42
2974.471 -10.929406 7.350000 42.011411 90 20.01.2015 13:03:42
2974.453 -10.22
3024.562 -10.773204 7.350000 46.682830 56 20.01.2015 12:27:23
3024.631 -10.773204 7.350000 46.682830 56 20.01.2015 12:27:23
The lines are incomplete, so the format fails. Line #22167 is the first problem it encounters. There may be more.
Sorry, but I’m stopping here.

Sign in to comment.

Categories

Products

Asked:

on 22 Jan 2015

Edited:

on 23 Jan 2015

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!