Import timestamps from textfile with more than 10k entries (R2014b)
Show older comments
This is less of a question but more of a bug report: I got a text file with something like 80k data entries i.e. lines with numbers separated by a tab. Besides several numeric values the last two columns of this file contain the date and time of the moment the measured value was taken. When I use the built-in file import wizard and select to import the columns as column vectors, while giving each column the corresponding format -- so "Number" and "Datetime" -- and names, everything works fine. The importing takes some time, but finishes without any errors eventually. Nevertheless, the imported vector for the time column -- format "HH:mm:ss" -- only contains entries with todays date, no timestamp whatsoever ... Anyone else had the problem yet? I've solved the problem by writing a function which imports that column in blocks of size <10k -- a number I found to be the largest block size to still result in an actual import of the correct timestamp by try and error. I've attached the file, so anyone can try if he gets the same result:
3 Comments
per isakson
on 22 Jan 2015
Edited: per isakson
on 22 Jan 2015
Try to replace
"HH:mm:ss"
by
"HH:MM:SS"
if it's hours, minutes, seconds
John
on 22 Jan 2015
per isakson
on 22 Jan 2015
Edited: per isakson
on 22 Jan 2015
I'm using R2013b and find it difficult to take-in that The MathWorks have made changes to the datetime format specifiers.
HH:MM:SS has "always" been the format string for hour,minute,second. And the R2014b documentation on datenum prescribe "MM" for minutes and "mm" for month. See datenum, Input Arguments. The doc on uiimport and dataimport doesn't say anything.
R2014b datetime Properties, Assign date and time components and specify datetime display format should be the place to find out. Excerpts:
MM Month, numerical using two digits 04
mm Minute, using two digit 41
I'm warned!
 
IMO: textscan is the best function to read files like yours.
Answers (2)
per isakson
on 22 Jan 2015
Edited: per isakson
on 22 Jan 2015
Yes, Star Strider is correct. The string "E16" appears 24 times in the first column. What's that supposed to indicate? And I found another problem
 

 
One way to read and parse the file:
>> cac = cssm
cac =
[80602x6 double] {80602x2 cell}
>>
where
function cac = cssm
str = fileread('2015-01-20_scan (1).txt');
str = strrep( str, 'E16', '1E16' );
str = strrep( str, '..', '.' );
cac = textscan( str, '%f%f%f%f%f%f%s%s' ...
, 'Delimiter' , '\t' ...
, 'CollectOutput' , true ...
, 'EmptyValue' , nan );
end
and then convert the datetime strings.
 
TreatAsEmpty takes care of E16
>> textscan( '123,E16,456', '%f%f%f', 'Delimiter',',', 'TreatAsEmpty','E16' )
ans =
[123] [NaN] [456]
3 Comments
John
on 23 Jan 2015
John
on 23 Jan 2015
per isakson
on 23 Jan 2015
Edited: per isakson
on 23 Jan 2015
- I assume that "E16" stands for error 16. Is there a handful of other errors? With only a few error strings, TreatAsEmpty is the Matlab way.
- My function, cssm, handles the double dot. Shouldn't be a headache. As long as the files fits in memory this approach works well. Of course, next time it is something different.
Star Strider
on 22 Jan 2015
Hannes, your data file format has serious problems. I worked for a few hours to get textscan to import it, and I was successful to an extent. The problem is with sections such as:
3024.699 -10.750235 7.350000 47.369776 51 20.01.2015 12:21:00
3024.665 -10.750235 7.350000 47.369776 51 20.01.2015 12:21:00
E16 -10.750235 7.350000 47.369776 51 20.01.2015 12:21:00
E16 -10.750235 7.350000 47.369776 51 20.01.2015 12:21:00
E16 -10.750235 7.350000 47.369776 51 20.01.2015 12:21:00
3024.670 -10.750235 7.350000 47.369776 51 20.01.2015 12:21:00
3024.633 -10.750235 7.350000 47.369776 51 20.01.2015 12:21:01
3024.644 -10.750235 7.350000 47.369776 51 20.01.2015 12:21:01
3024.750 -10.750235 7.350000 47.369776 51 20.01.2015 12:21:01
3024.683 -10.750235 7.350000 47.369776 51 20.01.2015 12:21:01
E16 -10.750235 7.350000 47.369776 51 20.01.2015 12:21:01
E16 -10.750235 7.350000 47.369776 51 20.01.2015 12:21:01
3024.683 -10.750235 7.350000 47.369776 51 20.01.2015 12:21:01
3024.709 -10.750235 7.350000 47.369776 51 20.01.2015 12:21:02
The irregularities make it impossible to read. You may have to edit it in a word processor, use fgets, or something else. Since I have no idea if the ‘E16’ information is important, I have no idea how to deal with it.
1 Comment
Star Strider
on 23 Jan 2015
Edited: Star Strider
on 23 Jan 2015
I wanted to wait to find out how you wanted to treat the ‘E16’ lines before I went further.
There are also several blank lines that would require a ‘restart’ from the current position in the file using:
[Dts, posn] = textscan(fidi, '%f%f%f%f%f%s%s', 'Delimiter',' ', 'CollectOutput',0, 'HeaderLines',1, 'TreatAsEmpty',{'E16'});
fseek(fidi, 0, 0);
for each one. (I use ‘fidi’ to indicate ‘file ID input’, and created by fopen.) This would require a loop with the iterations equal to the number of blank lines, or a while loop, beginning with:
while ~feof(fidi)
You could also go through and eliminate them manually.
I also found that the delimiter was a space (' ') not ('\t'). I leave the 'CollectOutput' option to you. You can easily use the cell2mat function to create a matrix out of the first 5 columns later if you choose to set it to 0. That would leave the date and time strings as a separate cell (here ‘dt’) that you could convert to date numbers with:
dtn = datenum(dt, 'dd.mm.yyyy HH:MM:SS');
You would have to deal with the double dots (‘..’) as a separate problem, before you use textscan, since it has no way to deal with them.
EDIT — There are more problems that are causing the datenum conversion to fail:
3024.564 -10.773204 7.350000 46.682830 56 20.01.2015 12:27:22
3024.654 -10.773204 7.350000 46.682830 56 20.01.2015 12:27:22
3024.555 -10.773204 7.350000 46.15 13:03:42
2974.471 -10.929406 7.350000 42.011411 90 20.01.2015 13:03:42
2974.453 -10.22
3024.562 -10.773204 7.350000 46.682830 56 20.01.2015 12:27:23
3024.631 -10.773204 7.350000 46.682830 56 20.01.2015 12:27:23
The lines are incomplete, so the format fails. Line #22167 is the first problem it encounters. There may be more.
Sorry, but I’m stopping here.
Categories
Find more on Dates and Time in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!