How can i regexp this sample text?
7 views (last 30 days)
Show older comments
im looking to grab all the data after "CHANNEL" into a cell array such that C{1,1} = Shell Angle and so on
and looking to grab all the data after "UNITS" into a separate cell array such that D{1:1} = deg and so on
Im having trouble with the non alphanumeric stuff such as @, (, ), etc
Any help greatly appreciated, Thanks!
0 Comments
Answers (2)
Stephen23
on 28 Mar 2019
Edited: Stephen23
on 28 Mar 2019
Here is a regexp-based solution which does NOT use hard-coded variable names (it automatically detects the variable names), and does NOT magically create variables in the MATLAB workspace... read this to know why dynamically naming variables is a bad way to write code:
The code is quite straighforward: the first regexp identifies the variable name and square bracket pairs, the second regexp identifies the list values themselves:
str = fileread('sample2.txt');
tkn = regexp(str,'(\w+)\s+=\s+\[([^\]]+)\]','tokens');
tkn = vertcat(tkn{:});
hdr = tkn(:,1);
tkn = regexp(tkn(:,2),'''([^'']+)''','tokens');
tkn = cellfun(@(c)[c{:}],tkn,'uni',0);
And checking:
>> hdr{2}
ans =
UNIT
>> tkn{2}
ans =
Columns 1 through 8
'deg' 'lbf/(ft.s)' 'lbf/(ft.s)' 'lbf/(ft.s)' 'lbf/(ft.s)' 'lbf/(ft.s)' 'lbf/(ft.s)' 'lbf/(ft.s)'
Columns 9 through 16
'lbf/(ft.s)' 'lbf/(ft.s)' 'lbf/(ft.s)' 'lbf/(ft.s)' 'lbf/(ft.s)' 'lbf/(ft.s)' 'lbf/(ft.s)' 'lbf/(ft.s)'
Columns 17 through 20
'lbf/(ft.s)' 'lbf/(ft.s)' 'lbf/(ft.s)' 'lbf/(ft.s)'
You can access the data directly from the cell arrays, or use them to define a convenient structure:
>> S = cell2struct(tkn,hdr,1);
>> S.UNIT
ans =
Columns 1 through 8
'deg' 'lbf/(ft.s)' 'lbf/(ft.s)' 'lbf/(ft.s)' 'lbf/(ft.s)' 'lbf/(ft.s)' 'lbf/(ft.s)' 'lbf/(ft.s)'
Columns 9 through 16
'lbf/(ft.s)' 'lbf/(ft.s)' 'lbf/(ft.s)' 'lbf/(ft.s)' 'lbf/(ft.s)' 'lbf/(ft.s)' 'lbf/(ft.s)' 'lbf/(ft.s)'
Columns 17 through 20
'lbf/(ft.s)' 'lbf/(ft.s)' 'lbf/(ft.s)' 'lbf/(ft.s)'
>> S.CHANNEL
ans =
Columns 1 through 8
'Shell Angle' [1x30 char] [1x30 char] [1x30 char] [1x30 char] [1x30 char] [1x30 char] [1x30 char]
Columns 9 through 16
[1x30 char] [1x30 char] [1x31 char] [1x31 char] [1x31 char] [1x31 char] [1x31 char] [1x31 char]
Columns 17 through 20
[1x31 char] [1x31 char] [1x31 char] [1x31 char]
2 Comments
Guillaume
on 28 Mar 2019
Edited: Guillaume
on 28 Mar 2019
>> map = containers.Map(hdr, tkn);
>> map('UNIT')
ans =
1×20 cell array
Columns 1 through 3
{'deg'} {'lbf/(ft.s)'} {'lbf/(ft.s)'}
Columns 4 through 6
{'lbf/(ft.s)'} {'lbf/(ft.s)'} {'lbf/(ft.s)'}
Columns 7 through 9
{'lbf/(ft.s)'} {'lbf/(ft.s)'} {'lbf/(ft.s)'}
Columns 10 through 12
{'lbf/(ft.s)'} {'lbf/(ft.s)'} {'lbf/(ft.s)'}
Columns 13 through 15
{'lbf/(ft.s)'} {'lbf/(ft.s)'} {'lbf/(ft.s)'}
Columns 16 through 18
{'lbf/(ft.s)'} {'lbf/(ft.s)'} {'lbf/(ft.s)'}
Columns 19 through 20
{'lbf/(ft.s)'} {'lbf/(ft.s)'}
Whatever you do, do not use values coming out of a text file to name actual variables. Creating variables dynamically is a sure way to have bugs, you may well overwrite an existing variable if for some reason the input file is slightly different.
Adam Danz
on 21 Mar 2019
Edited: Adam Danz
on 28 Mar 2019
In this solution, the file is first read into matlab as a char array. Then I segregate the "CHANNEL" variable and convert it into Matlab syntax so it can be converted to a cell array. The same is then done with the "UNIT" variable.
file = 'C:\Users\name\Documents\MATLAB\sample2.txt';
c = fileread(file);
% segregate "Channel" array
chanTxt = regexp(c,'CHANNEL.+?\]', 'match');
% Replace ampersands with elipses
chanTxt = strrep(chanTxt, '&', '...');
% Replace square bracket with curly brackets
chanTxt = strrep(chanTxt, '[', '{');
chanTxt = strrep(chanTxt, ']', '}');
% Execute string to convert it to cell array of strings
eval(chanTxt{:}); % Now you have a variable named "CHANNEL"
% CHANNEL' % (first 5 lines)
% ans =
% 20×1 cell array
% {'Shell Angle' }
% {'Surface Heat Flux @ GridLine=1' }
% {'Surface Heat Flux @ GridLine=2' }
% {'Surface Heat Flux @ GridLine=3' }
% {'Surface Heat Flux @ GridLine=4' }
% {'Surface Heat Flux @ GridLine=5' }
% Now go through the same process for "UNIT"
unitTxt = regexp(c,'UNIT.+?\]', 'match');
unitTxt = strrep(unitTxt, '&', '...');
unitTxt = strrep(unitTxt, '[', '{');
unitTxt = strrep(unitTxt, ']', '}');
eval(unitTxt{:}); % Now you have a variable named "UNIT"
A great work station to develop regular expressions: https://regex101.com/
0 Comments
See Also
Categories
Find more on Multirate Signal Processing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!