What would be the best approach to solve this data mapping problem?

Question

Brad on 15 Jul 2013

0
Link

Direct link to this question

https://ch.mathworks.com/matlabcentral/answers/82040-what-would-be-the-best-approach-to-solve-this-data-mapping-problem

Accepted Answer: Cedric

I’ve got a large text file whose content looks like this;

MSN_BER (0:31) Observation #1 Rx'd at: (58570.500) Msg. Time: (58568.000)

Forward to IMU: true Rcv Date: 2010121 Synch: f0f0 Rep Mode: Replay_Mode

State Time: 12:00:00.000 (58570.500)

State Position: -1111.1111, -2222.2222, -3333.3333

MSN_RAM (0:32) Observation #100 Rx'd at: (58568.000) Msg. Time: (58568.000)

Forward to IMU: true Rcv Date: 2010121 Synch: f0f0 Rep Mode: Replay_Mode

Fmt: 10 (AIRBORN__ARRAY_LOT) Length: 1234 Remote Num: 1 Number of Observations: 00

Type: 1 Track ID: 12345 Time Tag: 58567.00000000

Band ID: 1 AC ID: 1 Scan ID: 0 LRT/HRT: 1 Valid Flag: 0

Aircraft POS X: -10000.12345678 Y: 2000.123456789 Z: 30000.12345678

Performance: 1.12345 Hydro Pressures: 0.0000000 Compression: 0.000000

Type: 1 Track ID: 12345 Time Tag: 58568.00000000

Band ID: 1 AC ID: 2 Scan ID: 0 LRT/HRT: 1 Valid Flag: 0

Aircraft POS X: -40000.12345678 Y: 5000.123456789 Z: 60000.12345678

Performance: 11.12345 Hydro Pressures: 0.0000000 Compression: 0.000000

Type: 1 Track ID: 12345 Time Tag: 58569.00000000

Band ID: 1 AC ID: 14 Scan ID: 0 LRT/HRT: 1 Valid Flag: 0

Aircraft POS X: -70000.12345678 Y: 8000.123456789 Z: 90000.12345678

Performance: 11.12345 Hydro Pressures: 0.0000000 Compression: 0.000000

Type: 1 Track ID: 12345 Time Tag: 58570.00000000

Band ID: 1 AC ID: 2 Scan ID: 0 LRT/HRT: 1 Valid Flag: 0

Aircraft POS X: -10000.12345678 Y: 4000.123456789 Z: 30000.12345678

Performance: 8.12345 Hydro Pressures: 0.0000000 Compression: 0.000000

MSN_BER (0:31) Observation #2 Rx'd at: (58590.000) Msg. Time: (58568.000)

Forward to IMU: true Rcv Date: 2010121 Synch: f0f0 Rep Mode: Replay_Mode

State Time: 12:00:00.000 (58582.500)

State Position: -4444.4444, -5555.5555, -6666.6666

MSN_RAM (0:32) Observation #100 Rx'd at: (58569.000) Msg. Time: (58569.000)

Forward to IMU: true Rcv Date: 2010121 Synch: f0f0 Rep Mode: Replay_Mode

Fmt: 10 (AIRBORN__ARRAY_LOT) Length: 5678 Remote Num: 1 Number of Observations: 01

Type: 1 Track ID: 12345 Time Tag: 58581.00000000

Band ID: 1 AC ID: 1 Scan ID: 0 LRT/HRT: 1 Valid Flag: 0

Aircraft POS X: -11000.12345678 Y: 4100.123456789 Z: 31000.12345678

Performance: 1.12345 Hydro Pressures: 0.0000000 Compression: 0.000000

Type: 1 Track ID: 12345 Time Tag: 58582.00000000

Band ID: 1 AC ID: 2 Scan ID: 0 LRT/HRT: 1 Valid Flag: 0

Aircraft POS X: -21000.12345678 Y: 4200.123456789 Z: 32000.12345678

Performance: 4.12345 Hydro Pressures: 0.0000000 Compression: 0.000000

Type: 1 Track ID: 12345 Time Tag: 58585.00000000

Band ID: 1 AC ID: 6 Scan ID: 0 LRT/HRT: 1 Valid Flag: 0

Aircraft POS X: -31000.12345678 Y: 4300.123456789 Z: 33000.12345678

Performance: 7.12345 Hydro Pressures: 0.0000000 Compression: 0.000000

Type: 1 Track ID: 12345 Time Tag: 58586.00000000

Band ID: 1 AC ID: 2 Scan ID: 0 LRT/HRT: 1 Valid Flag: 0

Aircraft POS X: -41000.12345678 Y: 4400.123456789 Z: 34000.12345678

Performance: 21.12345 Hydro Pressures: 0.0000000 Compression: 0.000000

Type: 1 Track ID: 12345 Time Tag: 58588.00000000

Band ID: 1 AC ID: 2 Scan ID: 0 LRT/HRT: 1 Valid Flag: 0

Aircraft POS X: -51000.12345678 Y: 4500.123456789 Z: 35000.12345678

Performance: 20.12345 Hydro Pressures: 0.0000000 Compression: 0.000000

For processing and plotting, I’m looking for a way to do the following:

Create an < n x 11 double > array where the following parameters are included:

a. The state times (hh:mm:ss.000 and UTC) and state position values in _BER (5 parameters).

b. The time tag UTC time, AC ID value, the 3 platform position values, and the performance value for each AC ID = 1 and 2 (6 parameters).

Currently, I’m using 2 separate REGEXPs to extract the _BER parameters and AC ID parameters. They are:

% Parse out the BER State Times and Position Values

exp = 'State Time:\s+([\d:\.]+).\s+\(([\d.]+)\).*?State Position:\s+([-?\d\.]+),\s+([-?\d\.]+),\s+([-?\d\.]+)';

tokens = regexp(buffer, exp, 'tokens');

BER_State_Data = reshape(str2double([tokens{:}]), 5, []).';

% Parse out the AC ID values equal only to 1 or 2, their Time Tags, and

% the 3 platform position values (x,y,z) and performance values.

exp = '([\d\.]+)\s+Band[^A]+?AC ID:\s+([12]{1})\W.*?Aircraft POS X:\s+([-?\d\.]+).\s+Y:\s+([-?\d\.]+).\s+Z:\s+([-?\d\.]+).*?ance:\s+([\d\.e+-]+).';

tokens = regexp(buffer, exp, 'tokens');

AC12_data = reshape(str2double([tokens{:}]),6,[]).';

These 2 sets of commands yield:

BER_State_Data =

NaN 58570.5000000000 -1111.11110000000 -2222.22220000000 -3333.33330000000

NaN 58582.5000000000 -4444.44440000000 -5555.55550000000 -6666.66660000000

AC12_data =

58567 1 -10000.1234567800 2000.12345678900 30000.1234567800 1.12345000000000

58568 2 -40000.1234567800 5000.12345678900 60000.1234567800 11.1234500000000

58570 2 -10000.1234567800 4000.12345678900 30000.1234567800 8.12345000000000

58581 1 -11000.1234567800 4100.12345678900 31000.1234567800 1.12345000000000

58582 2 -21000.1234567800 4200.12345678900 32000.1234567800 4.12345000000000

58586 2 -41000.1234567800 4400.12345678900 34000.1234567800 21.1234500000000

58588 2 -51000.1234567800 4500.12345678900 35000.1234567800 20.1234500000000

However, I need an n x 11 array that looks like this:

NaN 58570.5 -1111.11 -2222.22 -3333.33 58567 1 -10000.1 2000.123 30000.12 1.12345

NaN 58570.5 -1111.11 -2222.22 -3333.33 58568 2 -40000.1 5000.123 60000.12 11.12345

NaN 58570.5 -1111.11 -2222.22 -3333.33 58570 2 -10000.1 4000.123 30000.12 8.12345

NaN 58582.5 -4444.44 -5555.56 -6666.67 58581 1 -11000.1 4100.123 31000.12 1.12345

NaN 58582.5 -4444.44 -5555.56 -6666.67 58582 2 -21000.1 4200.123 32000.12 4.12345

NaN 58582.5 -4444.44 -5555.56 -6666.67 58586 2 -41000.1 4400.123 34000.12 21.12345

NaN 58582.5 -4444.44 -5555.56 -6666.67 58588 2 -51000.1 4500.123 35000.12 20.12345

where the first state time & position data are mapped to the applicable AC IDs which follow it. Then the 2nd set of state time & position data are mapped to the applicable AC IDs which follow it. And so on until the end of the text file.

NOTEs: The NaNs are a result of the hh:mm:ss.000 state time in _BER, and are not a problem. The AC12 data message(s) will always follow the BER state data message– but the number of AC12 data messages can vary from 1 to many.

I’m not exactly sure how to approach the problem given the use of REGEXP. Can I use another MATLAB command (along with REGEXP) to map the applicable BER state message data to the AC12 data messages? Or just write the BER state message data into the array?

Any ideas would be appreciated. Thank you.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Cedric on 15 Jul 2013

0
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/82040-what-would-be-the-best-approach-to-solve-this-data-mapping-problem#answer_91759

Edited: Cedric on 15 Jul 2013

Open in MATLAB Online

It is not trivial in the sense that REGEXP provides you with two series of data with no information for relating one to the other. I see two options (without thinking too much)..

1. Instead of calling REGEXP twice, you call it a first time to get blocks based on a split matching 'MSN_BER'. You can then loop over these blocks and extract data that are to be mapped. E.g. (not tested):

EDIT: splitting using REGEXP is simpler than my first proposal..

 bufferSplit = regexp(buffer, 'MSN_BER', 'split') ;
 for bId = 1 : length(bufferSplit)
    if isempty(bufferSplit{bId}),  continue ;  end
    % Here, your code based on two REGEXP using bufferSplit{bId} 
    % instead of buffer.
 end

this way you know that, at each step of the loop, BER_State_Data and AC12_Data belong to the same block.

First proposal (I leave it for the record):

 startPos = regexp(buffer, 'MSN_BER', 'start') ;
 nBlocks = length(startPos) ;
 for bId = 1 : nBlocks
    if bId < nBlocks
        miniBuffer = buffer(startPos(bId):startPos(bId+1)-1) ;
    else
        miniBuffer = buffer(startPos(bId):end) ;
    end
    % Here, your code based on two REGEXP using miniBuffer instead of buffer.
 end

2. If you can count on the fact (?) that the 'Time Tag:' field associated with entries that belong to the same block as a given BER entry are <= the 'Rx'd at:' (or State Time) field of the BER entry, then you can build the join directly from what you already have, using the 2nd column of BER_State_Data and the first column of AC12_Data. E.g. (not tested):

 for berId = 1 : size(BER_State_Data, 1)
    if berId == 1,  prev = 0 ;  else prev = BER_State_Data(berId-1,2) ;  end
    ac12Ids = AC12_data(:,1)>prev & AC12_data(:,1)<=BER_State_Data(berId,2) ;
    % Here you build whatever you want with
    %   BER_State_Data(berId,:)   and    AC12_data(ac12Ids,:)
 end

=========================================================

PS: if you are the Brad who asked earlier about calling various functions based on a "per column" function ID, here is one example:

 f{1} = @sin ;
 f{2} = @(x) x.^(1/2) ;
 f{3} = @(x) -x ;
 M = magic(8)
 c = [1, 1, 1, 2, 2, 3, 3, 3] ;
 fM = arrayfun(@(cId) f{c(cId)}(M(:,cId)), 1:length(c), 'UniformOutput', false);
 cell2mat(fM)

3 Comments
Show 1 older commentHide 1 older comment

Cedric on 15 Jul 2013

Edited: Cedric on 15 Jul 2013

Open in MATLAB Online

For #1, this is why I start the loop with

if isempty(bufferSplit{bId}), continue ; end

REGEXP/split returns what is on both sides of splits, which mean '' when there is nothing. You can see this below:

 >> regexp('ABA', 'A', 'split')
 ans = 
    ''    'B'    ''
 >> regexp('BAB', 'A', 'split')
 ans = 
    'B'    'B'

For #2,3, yes, what I am showing in my example is how to create a context which allows building the mapping, but I left this operation to you.

What you have to do is something like that:

 buffer = fileread('Brad3.txt') ;
 bufferSplit = regexp(buffer, 'MSN_BER', 'split') ;
 nBlocks = length(buffer) ;
 output  = cell(nBlocks, 1) ;
 for bId = 1 : length(bufferSplit)
    if isempty(bufferSplit{bId}),  continue ;  end
    % Your block of code with the modifications that I proposed.
    exp = 'State Time:\s+([\d:\.]+).\s+\(([\d.]+)\).*?State Position:\s+([-?\d\.]+),\s+([-?\d\.]+),\s+([-?\d\.]+)';
    tokens = regexp(bufferSplit{bId}, exp, 'tokens');
    BER_State_Data = reshape(str2double([tokens{:}]), 5, []).';
    exp = '([\d\.]+)\s+Band[^A]+?AC ID:\s+([12]{1})\W.*?Aircraft POS X:\s+([-?\d\.]+).\s+Y:\s+([-?\d\.]+).\s+Z:\s+([-?\d\.]+).*?ance:\s+([\d\.e+-]+).';
    tokens = regexp(bufferSplit{bId}, exp, 'tokens');
    AC12_data = reshape(str2double([tokens{:}]),6,[]).';
    % A couple additional lines to map and store.
    output{bId} = repmat(BER_State_Data, size(AC12_data, 1), 1) ;
    output{bId} = [output{bId}, AC12_data] ;
 end
 output = cell2mat(output) ;

Brad on 19 Jul 2013

Cedric, it took some time to eliminate the bugs. But this approach works great.

Thanks again!

Sign in to comment.

What would be the best approach to solve this data mapping problem?

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

3 Comments
Show 1 older commentHide 1 older comment

More Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

What would be the best approach to solve this data mapping problem?

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

3 Comments Show 1 older commentHide 1 older comment

More Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

3 Comments
Show 1 older commentHide 1 older comment