Reading specific data from formatted txt files - looks very dificult

Question

0 votes

Hi,

I have 15 ascii files. The file names are 1948_1950, 1950_1955, 1955_1960, 1960_1965, ..., 2010_2014 (all files except the frist and the last have 5 yaer span in the name). The 15th file is Kos.txt that has only dates and hours from 1948 to 2010 (but not all dates in that period). I've attached 1948_1950.txt and Kos.txt files.

You'll see that files with year in their name have year and time next to the word 'CENTERS' when you open them. So, the first file has 480101 0000 indicating date Jan 01, 1949 (date format is 'yymmdd') at hour 00. About 40 lines below is the following line:

"k io lon lat f c dp rd zs up vp lonv latv".

I need data that are below that line, in this case it would be:

"1 11 63.14 50.20. etc...

If you go through the file you'll see that pattern. Date and time are next to the CENTER word and then about 40 lines below are the date that I need (always below the line that starts with "k io lon lat ...".

However, there are three additional problems:

I need these information only for dates and hours specified in Kos.dat file
Date formats in Kos.dat and the files with years in their name are not the same
Date format in files with years in their name is 'yymmdd', but when it comes to year 2000-2010 then 101 would be 000101 for Jan 1st, 2000. Therefore, zeros before the first integer are missing.

I know that this might be very challenging, but I would very much appriciate help.

Thanks in advance, Djordje

4 Comments
Show 2 older comments Hide 2 older comments

per isakson on 3 Aug 2014

Edited: per isakson on 3 Aug 2014

Did you specify

in what form you want the result
the date format used in Kos.txt

dpb on 3 Aug 2014

Open in MATLAB Online

As I showed you before, the easiest will likely be to read the whole file into memory and then select those wanted (or eliminate the unwanted).

The rest is pretty much as IA say just more or less trivial grunt work of counting lines, creating format strings and using textscan and/or other io functions.

I don't see a piece of the puzzle that hasn't been addressed in one of the previous postings other than perhaps finding the given line. That's pretty much either

a) use a fixed headerlines count if the offset is fixed or

b) read line-by-line until find the string. That is indeed pretty simple...

while ~feof(fid)
  l=fgetl(fid);
  if strfind(l,'a unique pattern in the target string'), break,end
end

If you need to find the number of lines to the given one the first time so can use headerlines later for multiple sections that are a fixed (but initially unknown) separation, then just add a counter to the loop.

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

per isakson on 3 Aug 2014

Edited: per isakson on 3 Aug 2014

Open in MATLAB Online

1 vote

I disagree, it's not that simple. Ok, it depends.

I've chosen to divide the task into two steps

Read the data-file and put the required data into a containers.Map object. The object may be saved to a mat-file. More data can be added to the object later. There are methods with which one may inspect data interactively.
Loop over the "keys" of the key-file and print result to the screen. It's a demo after all.

Questions on performance and memory usage are postponed.

Error handling and more remains, e.g. testing and documentation.

&nbsp

Demo:

>> specific_data
Key: 19490101T0000,  Data: 
Key: 19490101T0600,  Data: 
Key: 19490101T1200,  Data: 
   1.0e+03 *
  Columns 1 through 9
    0.0010         0    0.0596    0.0436    1.0379   -0.0002    0.0072    0.0093    0.1614
  Columns 10 through 13
    0.0006    0.0004    0.0560    0.0483
Key: 19490101T1800,  Data: 
Key: 19490102T0000,  Data: 
....

where

    function    specific_data
        key_filespec = 'h:\m\cssm\Kos.txt';
        met_filespec = 'h:\m\cssm\1948_1950.txt';
        lib = containers.Map( 'KeyType', 'char', 'ValueType', 'any' );
        lib = met2lib( met_filespec, lib );
        fid = fopen( key_filespec );
        cac = textscan( fid, '%s' );
        fclose(fid);
        for kk = 1 : length( cac{1} )
            key = datestr( datevec( cac{1}(kk), 'ddmmyyyyHH' ) ...
                         , 'yyyymmddTHHMM' );
            if not(isrow( key ))
                keyboard
            end
            fprintf( '\nKey: %s,  Data: \n', key )
            if isKey( lib, key )
                disp( lib( key ) )
            end
            pause(0.1)
        end
    end

and

    function    lib = met2lib( filespec, lib )
        str = fileread( filespec );
        cac = strtrim( strsplit( str, 'CENTRES:' ) );
        cac(1) = [];
        for bb = 1 : length( cac )
            block_str    = cac{bb};
            datetime_str = repmat( '0', 1, 11 );
            str = strtrim( block_str(1:12) );
            datetime_str( end-length(str)+1 : end ) = str;
            timekey = datestr( datevec(datetime_str,'yymmdd HHMM',1940)...
                             , 'yyyymmddTHHMM' );
            colhead_xpr ...
            = 'k\s+io\s+lon\s+lat\s+f\s+c\s+dp\s+rd\s+zs\s+up\s+vp\s+lonv\s+latv\s+';
            str = regexp( block_str, ['(?<=',colhead_xpr,').+$'], 'match' );
            if not( isempty( str ) )
                num_val  = str2num( str{:} );
            else
                num_val  = [];
            end
            lib( timekey ) = num_val;
        end
    end

5 Comments
Show 3 older comments Hide 3 older comments

per isakson on 5 Aug 2014

Edited: per isakson on 5 Aug 2014

"But it gives me a specification, like [3x13 double]" &nbsp This comment indicates that you badly need to do some getting-started-exercises with the Matlab Desktop before you start experimenting with deeply nested cell arrays.
"I figured out" &nbsp The MathWorks forbid me to use the acronym, RTFM. Even after 20+ years with Matlab I read the on-line help all the time.
" is there a way to make it automatic for all files" &nbsp Yes, my code is the start of something automatic. But since you did not indicate how you will use the data, I just dumped it on the screen.
"very sophisticated" &nbsp I tried to structure the code somewhat and I use regular expressions. Stuctured programming was regarded sophisticated in the late seventies. My use of regular expressions might be sophisticated in the Matlab world.
My lib is way better than your eval.

/ not so humble

djr on 5 Aug 2014

Sorry if you are offended. As I said before, I just started using Matlab and I have to do this asap. I know that most of my questions are maybe even stupid but I have like 2 weeks to finish this and 2 weeks of Matlab experience so far.

Thanks... P.S. It's a way better...

Sign in to comment.

Reading specific data from formatted txt files - looks very dificult

4 Comments
Show 2 older comments Hide 2 older comments

Accepted Answer

5 Comments
Show 3 older comments Hide 3 older comments

More Answers (0)

Categories

Tags

Community Treasure Hunt

Reading specific data from formatted txt files - looks very dificult

4 Comments Show 2 older comments Hide 2 older comments

Accepted Answer

5 Comments Show 3 older comments Hide 3 older comments

More Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

4 Comments
Show 2 older comments Hide 2 older comments

5 Comments
Show 3 older comments Hide 3 older comments