Reading specific data from formatted txt files - looks very dificult
Show older comments
Hi,
I have 15 ascii files. The file names are 1948_1950, 1950_1955, 1955_1960, 1960_1965, ..., 2010_2014 (all files except the frist and the last have 5 yaer span in the name). The 15th file is Kos.txt that has only dates and hours from 1948 to 2010 (but not all dates in that period). I've attached 1948_1950.txt and Kos.txt files.
You'll see that files with year in their name have year and time next to the word 'CENTERS' when you open them. So, the first file has 480101 0000 indicating date Jan 01, 1949 (date format is 'yymmdd') at hour 00. About 40 lines below is the following line:
"k io lon lat f c dp rd zs up vp lonv latv".
I need data that are below that line, in this case it would be:
"1 11 63.14 50.20. etc...
If you go through the file you'll see that pattern. Date and time are next to the CENTER word and then about 40 lines below are the date that I need (always below the line that starts with "k io lon lat ...".
However, there are three additional problems:
- I need these information only for dates and hours specified in Kos.dat file
- Date formats in Kos.dat and the files with years in their name are not the same
- Date format in files with years in their name is 'yymmdd', but when it comes to year 2000-2010 then 101 would be 000101 for Jan 1st, 2000. Therefore, zeros before the first integer are missing.
I know that this might be very challenging, but I would very much appriciate help.
Thanks in advance, Djordje
4 Comments
Image Analyst
on 2 Aug 2014
It's not hard, just a lot of grunt work that would take us more than a few minutes. Can't you figure it out using the ideas we gave you in your prior post?
djr
on 3 Aug 2014
per isakson
on 3 Aug 2014
Edited: per isakson
on 3 Aug 2014
Did you specify
- in what form you want the result
- the date format used in Kos.txt
dpb
on 3 Aug 2014
As I showed you before, the easiest will likely be to read the whole file into memory and then select those wanted (or eliminate the unwanted).
The rest is pretty much as IA say just more or less trivial grunt work of counting lines, creating format strings and using textscan and/or other io functions.
I don't see a piece of the puzzle that hasn't been addressed in one of the previous postings other than perhaps finding the given line. That's pretty much either
a) use a fixed headerlines count if the offset is fixed or
b) read line-by-line until find the string. That is indeed pretty simple...
while ~feof(fid)
l=fgetl(fid);
if strfind(l,'a unique pattern in the target string'), break,end
end
If you need to find the number of lines to the given one the first time so can use headerlines later for multiple sections that are a fixed (but initially unknown) separation, then just add a counter to the loop.
Accepted Answer
More Answers (0)
Categories
Find more on Text Data Preparation in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!