Can REGEXP map values from different parts of a text file?

Question

0 votes

I have a text file with the following contents:

MSNout_BER (0:31) Observation #100 Rx'd at:  (58568.000) Msg. Time: (58568.000)
    Forward to IMU: true   Rcv Date: 2010121   Synch: f0f0   Rel Mode: Active
MSNout_SSS (0:32) Observation #101 Rx'd at:  (58569.000) Msg. Time: (58569.000)
    Forward to IRU: true   Rcv Date: 2010121   Synch: a0a0   Bel Mode: High
Type: 12    Malck ID: 12345 Time Tag: 58548.12345678
Hand ID: 0  SV ID:   51 Spam ID: 0  BOZ/FAS: 0  Realt Flag: 0
MSNout_BER (0:33) Observation #102 Rx'd at:  (58570.000) Msg. Time: (58570.000)
    Forward to IMU: true   Rcv Date: 2010121   Synch: f0f0   Rel Mode: Active
MSNout_SSS (0:34) Observation #103 Rx'd at:  (58571.000) Msg. Time: (58571.000)
    Forward to IRU: true   Rcv Date: 2010121   Synch: a0a0   Bel Mode: High
Type: 1 Malck ID: 12345 Time Tag: 58549.12345678
Hand ID: 1  SV ID:   2  Spam ID: 0  BOZ/FAS: 1  Realt Flag: 0
Type: 1 Malck ID: 12345 Time Tag: 58550.12345678
Hand ID: 1  SV ID:   2  Spam ID: 0  BOZ/FAS: 1  Realt Flag: 0
Type: 1 Malck ID: 12345 Time Tag: 58551.12345678
Hand ID: 1  SV ID:   2  Spam ID: 0  BOZ/FAS: 1  Realt Flag: 0
Type: 1 Malck ID: 12345 Time Tag: 58552.12345678
Hand ID: 1  SV ID:   2  Spam ID: 0  BOZ/FAS: 1  Realt Flag: 0
Type: 1 Malck ID: 12345 Time Tag: 58553.12345678
Hand ID: 1  SV ID:   1  Spam ID: 0  BOZ/FAS: 1  Realt Flag: 0
Type: 1 Malck ID: 12345 Time Tag: 58554.12345678
Hand ID: 1  SV ID:   1  Spam ID: 0  BOZ/FAS: 1  Realt Flag: 0
Type: 1 Malck ID: 12345 Time Tag: 58555.12345678
Hand ID: 1  SV ID:   1  Spam ID: 0  BOZ/FAS: 1  Realt Flag: 0
Type: 1 Malck ID: 12345 Time Tag: 58556.12345678
Hand ID: 1  SV ID:   3  Spam ID: 0  BOZ/FAS: 1  Realt Flag: 0

I’m using the following commands to retrieve the values for the Time Tag: and SV ID: (values 1 and 2 only, all others are ignored);

[fn,pn] = uigetfile('*.txt,"Select Text File');
OAMfilename = fullfile(pn, fn);
buffer  = fileread(OAMfilename);
pattern = '*?Tag:\s+([\d\.]+).*?SV ID:\s+([12])\W';
tokens = regexp(buffer, pattern, 'tokens');
data = reshape(str2double([tokens{:}]), 2, []).';

Results:

1234567800  2
1234567800  2
1234567800  2
1234567800  2
1234567800  1
1234567800  1
1234567800  1

Initially, I thought the results were as expected. Then I noticed the time tag for the first occurrence of SV ID equal to 2 was wrong - 58549.12345678 is the proper time tag.

Is it possible to force MATLAB to recognize each Time Tag value that occurs just prior to each SV ID value? Could a Lookaround operator be used in this case?

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

per isakson on 7 Jun 2013

Edited: per isakson on 10 Jun 2013

Open in MATLAB Online

1 vote

This seems to work.

    buf = fileread( 'cssm.txt' );
    rex = '(?<=Time Tag: )([\d\.]+).+?(?<=SV ID:[ ]+)(\d+)';
    cac = regexp( buf, rex, 'tokens' );
    cac{:}

returns

    ans = 
        '58548.12345678'    '51'
    ans = 
        '58549.12345678'    '2'
    ans = 
        '58550.12345678'    '2'
    ans = 
        '58551.12345678'    '2'
    ans = 
        '58552.12345678'    '2'
    ans = 
        '58553.12345678'    '1'
    ans = 
        '58554.12345678'    '1'
    ans = 
        '58555.12345678'    '1'
    ans = 
        '58556.12345678'    '3'

where cssm.txt contains your data

.

Comments on the regular expression:

capture tokens
capture the group of digits, which follow after identifiers and space
the "identifiers and space" are used as expressions in look behind operators
thus two groups of (?<= name)( value)
between these two groups: .+?, which is a Lazy Quantifier. It advances the current position one position or more, but only as much of the quantified expression as necessary.
the regular expression must match one sub-string, thus something is needed to match the characters between the two groups to make the two one sub-string. In this case that is done by .+?.

Most of the italic words are copy&paste from the on-line help.

.

BTW: Your pattern works - after a little fixing:

rex = '*?Tag:\s+([\d\.]+).*?SV ID:\s+([125]{1,2})\W';

but what is the purpose of the leading *? and the trailing \W ?

.

A bit more robust:

rex = '(?<=Time Tag:)[ ]+([\d\.]+)[^\n]+?(?<=SV ID:)[ ]+(\d+)';

Replacing \s+ between name and value by [ ]+ excludes new-line, tab, etc.
Replacing .*? between the two name-value-pairs by |[^

9 Comments
Show 7 older comments Hide 7 older comments

Cedric on 13 Jun 2013

Edited: Cedric on 17 Jun 2013

Open in MATLAB Online

Actually

'([\d\.]+)\s+Hand.+?SV ID:\s+(\d+)'

does match SV ID 51.

What was wrong with your initial pattern is that the first match is the whole:

 Tag: 58548.12345678
 Hand ID: 0  SV ID:   51 Spam ID: 0  BOZ/FAS: 0  Realt Flag: 0
 MSNout_BER (0:33) Observation #102 Rx'd at:  (58570.000) Msg. Time: (58570.000)
 Forward to IMU: true   Rcv Date: 2010121   Synch: f0f0   Rel Mode: Active
 MSNout_SSS (0:34) Observation #103 Rx'd at:  (58571.000) Msg. Time: (58571.000)
 Forward to IRU: true   Rcv Date: 2010121   Synch: a0a0   Bel Mode: High
 Type: 1 Malck ID: 12345 Time Tag: 58549.12345678
 Hand ID: 1  SV ID:   2

(which gives time=58548.12345678 and SVID=2)

If you want to select only those with SV IDs 1 and 2, you can use

'([\d\.]+)\s+Hand[^B]+?SV ID:\s+([12])'

which works based on the fact that there is no 'B' in between the time tag and the SV ID (it appears only after the SV ID in 'BOZ'). You could also use an expression that prevents another 'Time Tag' to appear in between the initial time tag and the SV ID, or limit the number of characters in between the tie tag and the SV ID (i.e. replace .+? with .{1,45}), but I think that ^B is simpler. Of course, you could just stick to the expression which matches all entries and then filter out those with SV IDs not in {1,2} after conversion to numeric.

Brad on 17 Jun 2013

Per, Cedric, after re-installing MATLAB I'm getting the proper results. I tried both approaches provided by the 2 of you and they run like a champ. Thanks for the help on this.

Sign in to comment.

Can REGEXP map values from different parts of a text file?

0 Comments
Show -2 older comments Hide -2 older comments

Accepted Answer

9 Comments
Show 7 older comments Hide 7 older comments

More Answers (0)

Categories

Products

Tags

Community Treasure Hunt

Can REGEXP map values from different parts of a text file?

0 Comments Show -2 older comments Hide -2 older comments

Accepted Answer

9 Comments Show 7 older comments Hide 7 older comments

More Answers (0)

Categories

Products

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

9 Comments
Show 7 older comments Hide 7 older comments