Regexp expression to handle changing format

2 views (last 30 days)
%dummy data
% t,00000000CIB0000004001,0.47,L,000 00:00:00.00,343 19:54:20.684 8,22.501
% t,00000000CIB0000004001,0.47,L,000 00 00:00:00.00,21 343 19:54:20.684 8,22.501
S=fileread(filename);
myexpression = ['(?<tvar>w*,'...
'(?<tmCodeRdr>\w*),'...
'(?<tmCodLvl>\w*\.*\w*),'...
'(?<HNL>\w*),'...
'(?<codeTm>\w*\s*\d*\:*\d*\:*\d*\.*\d*,'... % <== This line handles the first line of dummy data
'(?<caprTm>\w*\s*\d*\:*\d*\:*\d*\.*\d*\s*\d*,'... % <== This line handles the first line of dummy data
'(?<logAt>\w*\.*\w*']
parts = regexp(filtered,myexpression,'names')
The third and second to last variables (codeTm, caprTm) change formats within the data. How can I modify or add logic to accept 2 to 3 spaced values within the variable "codeTm" and 3 to 4 spaced values within variable "caprTm"???
2 spaced valued variable (000 00:00:00.00)
3 spaced valued variable (000 00 00:00:00.00) or (343 19:54:20.684 8)
4 spaced valued variable (21 343 19:54:20.684 8)
Thank you for the help. My apologies for making my expresion so complicated. Still learning the in's and out's for expression formats for regexp to read data.
  2 Comments
Stephen23
Stephen23 on 7 Mar 2022
It is not clear why you are using regular expressions for importing this data: READTABLE et al have options for handling missing field data. Having you considered using the inbuilt data importing functions?
jimmy zubiate
jimmy zubiate on 9 Mar 2022
In the process of learning Matlab. Persued regexp function to create a structure array where I could maneuver through the values to perform analysis needed.
What I'm thinking I should pursue is prep file to remove unwanted white space, headers and other non-useful data and import as a comma space delimited file. Then I can count items inside each variable, marked by spaces and then off to the next step.
Other option is pursue fgetl function and implement logic to read useful data gracefully. I'm attaching dummy test data for your viewing. Thanks.

Sign in to comment.

Answers (1)

Stephen23
Stephen23 on 7 Mar 2022
Edited: Stephen23 on 7 Mar 2022
You can easily make a group optional or occur a specific number of times using any suitable quantifier, for example:
(..)? % zero or one time
(..)* % zero or more times
(..){2,4} % two to four times
etc.
However, rather than trying to match specific groups of characters I would use a simpler approach of matching sets of characters. I had to fix several other bugs in your regular expression to get this working, mostly missing backslashes and parentheses.
str = fileread('test.txt')
str =
't,00000000CIB0000004001,0.47,L,000 00:00:00.00,343 19:54:20.684 8,22.501 t,00000000CIB0000004001,0.47,L,000 00 00:00:00.00,21 343 19:54:20.684 8,22.501'
rgx = ['^\s*(?<tvar>\w*),'...
'(?<tmCodeRdr>\w*),'...
'(?<tmCodLvl>\d*\.?\d*),'...
'(?<HNL>\w*),'...
'(?<codeTm>[ :\w\.]*),'...
'(?<caprTm>[ :\w\.]*),'...
'(?<logAt>\d*\.?\d*)'];
parts = regexp(str,rgx,'names','lineanchors')
parts = 1×2 struct array with fields:
tvar tmCodeRdr tmCodLvl HNL codeTm caprTm logAt
parts.codeTm
ans = '000 00:00:00.00'
ans = '000 00 00:00:00.00'
But personally I would not try and reinvent the wheel for such a data file, READTABLE is much simpler:
tbl = readtable('test.txt','delimiter',',');
tbl.Properties.VariableNames = {'tvar','tmCodeRdr','tmCodLv','HNL','codeTm','caprTm','logAt'}
tbl = 2×7 table
tvar tmCodeRdr tmCodLv HNL codeTm caprTm logAt _____ _________________________ _______ _____ ______________________ _________________________ ______ {'t'} {'00000000CIB0000004001'} 0.47 {'L'} {'000 00:00:00.00' } {'343 19:54:20.684 8' } 22.501 {'t'} {'00000000CIB0000004001'} 0.47 {'L'} {'000 00 00:00:00.00'} {'21 343 19:54:20.684 8'} 22.501
  1 Comment
jimmy zubiate
jimmy zubiate on 9 Mar 2022
That should work. Let me try to implement on my side and see what I get. Thanks Stephen!

Sign in to comment.

Categories

Find more on Characters and Strings in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!