How can I sort my data from regexp?

2 views (last 30 days)
Linus Dock
Linus Dock on 14 Oct 2016
Edited: Guillaume on 14 Oct 2016
Hi I have a problem when using regexp with this command.
RVRtmp=regexp(TXTmod,'R\d\d\w\/\w*\d\d\d\D\>','match')
The output cell is mostly empty and looks like this:
[]
[]
[]
[]
[]
[]
[]
<1x4 cell>
<1x4 cell>
<1x4 cell>
<1x4 cell>
<1x4 cell>
[]
I would like to obtain the information in the [1x4 cells]. The information inside the cells look like this:
'R01L/P1500N' 'R19R/0900VP1500N' 'R01R/0800V1400D' 'R19L/1000N'
Here I would like to obtain the information 'R01L' as a variable or string and the corresponding value of '1500' as a vector or cell. I'm having a bit of trouble to extract the data as the empty cells is not working with my command:
RVR1=regexp(RVRtmp{1072}{1},'\d{4}','match')
I would like to arrange the data like this:
R01L =
NaN
NaN
1500
2000
1000
500
700
NaN
TXTmod looks like this:
'METAR ESSA 200901220720Z 03003KT 1500 R01L/P1500N R19R/P1500N R01R/0700N R19L/0800V1000N BR VV002 M00/M00 Q1006 01710173 08710164 51710170 TEMPO 2000'
'METAR ESSA 200901220750Z 04003KT 020V090 1500 R01L/P1500N R19R/P1500N R01R/0800V1000N R19L/0900N BR VV002 M00/M00 Q1006 01710173 08710164 51710170 TEMPO 2000'
'METAR ESSA 200901220820Z 02003KT 320V100 1000 R01L/P1500N R19R/0900VP1500N R01R/0800V1400D R19L/1000N BR VV002 M00/M00 Q1006 01710173 08710164 51710170 TEMPO 2000'
'METAR ESSA 200901220850Z 06004KT 0900 R01L/P1500N R19R/1100V1500U R01R/1000V1400N R19L/1200N FZFG VV002 M00/M00 Q1006 01710173 08710164 51710170 TEMPO 0700'
'METAR ESSA 200901220920Z 04003KT 360V060 1000 R01L/P1500N R19R/1200U R01R/0700N R19L/1000VP1500N BR VV002 M00/M00 Q1006 01710173 08710164 51710170 TEMPO 1500'
'METAR ESSA 200901220950Z 04004KT 1500 BR VV002 M00/M00 Q1005 01710173 08710164 51710170 NOSIG'
'METAR ESSA 200901221020Z 01003KT 1700 BR BKN002 BKN017 M00/M00 Q1005 01710173 08710164 51710170 NOSIG'
'METAR ESSA 200901221050Z 35004KT 2500 BKN002 BKN019 00/00 Q1004 01710173 08710164 51710170 NOSIG'

Accepted Answer

Guillaume
Guillaume on 14 Oct 2016
Edited: Guillaume on 14 Oct 2016
There is no real need for the intermediate regexp, you can get it all with just one regular expression:
tokens = regexp(TXTmod, '(R\d\d\w)/\w*(\d\d\d\d)\D\>', 'tokens'); %You were missing a \d in your regexp (which was captured by the \w* so it didn't matter)
Or more efficient (but a bit longer):
tokens = regexp(TXTmod, '\<(R\d{2}[A-Z])/(?:(?:\d{4})?[A-Z]+)?(\d{4})[A-Z]\>', 'tokens')
Note the inefficiency in your original expression: The \w*\d\d\d in your first regular expression is going to cause a lot of backtracking by the regular expression engine because the \w* is always going to match the next three \d. Because * is greedy, at first the engine is going to match the three digits with \w* and find then that it can't match 3 digits after. So it's going to backtrack one digit, match the first two digits with \w*, the 3rd digit with \d and find that it still can't find a match for the next two \d. it will have to backtrack two more times until \w* only match the letters and the three \d match a digit.
The new regular expression matches a optional group of 4 digits followed by 1 or more letter and then capture the final groups of 4 digits before the last letter. I've also added a start of word match: \<.
Other note: To rearrange the tokens of each string into a two column cell array:
cellfun(@(t) vertcat(t{:}), tokens, 'UniformOutput', false)

More Answers (0)

Categories

Find more on Numeric Types in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!