How to take first Character if name starts with specified string

22 views (last 30 days)
Hi,
I have below cell array,
STA.TH01E2RT00
STA.J01E2RT00R99
STA.TB01E2RT00
Deafult.TY04KT00
Cond.TH09K12E3T00R15
STA.TH01E2RT00
PAR_STA.TH01E2RT00
I want to extract the first English character if the name starts with "STA.", If the first Character is "T", then take next character.
My desired outPut:
H
J
B
Deafult.TY04KT00
Cond.TH09K12E3T00R15
H
PAR_STA.TH01E2RT00
  2 Comments
John D'Errico
John D'Errico on 27 Apr 2018
Time to learn tools like strtok. Test for the substring 'STA' using isequal. Its just a couple of tests, really. Write a function that will extract the desired result on any string. Then apply to each cell of the array. So cellfun would suffice.
Why not try it?
Wick
Wick on 1 May 2018
Edited: Wick on 1 May 2018
You can also use sscanf in a nested manner. Something like:
Y = sscanf(string_to_check,'STA.%s')
Y will be empty if the string doesn't start with STA.
if Y(1) is 'T' then make another sscanf for the rest of the string and grab the first letter after that.

Sign in to comment.

Answers (3)

Stephen23
Stephen23 on 2 May 2018
Edited: Stephen23 on 2 May 2018
One solution:
C = {...
'STA.TH01E2RT00'
'STA.J01E2RT00R99'
'STA.TB01E2RT00'
'Deafult.TY04KT00'
'Cond.TH09K12E3T00R15'
'STA.TH01E2RT00'
'PAR_STA.TH01E2RT00'}
rgx = '(?<=^STA\.)T?\w';
D = regexp(C,rgx,'match','once');
Z = C;
X = cellfun('isempty',D);
Z(~X) = cellfun(@(s){s(end)},D(~X))
Giving:
>> Z{:}
ans = H
ans = J
ans = B
ans = Deafult.TY04KT00
ans = Cond.TH09K12E3T00R15
ans = H
ans = PAR_STA.TH01E2RT00
There might be a way to get rid of the second cellfun, with the right regular expression.
  2 Comments
Bohan Liu
Bohan Liu on 3 May 2018
Hey dude,I have a question regarding the option 'once'. It is pretty cool that with 'once' specified additionally, the output is cell array of character vectors, yet all matched are found. In my case(without 'once'), however, the output is nested cell array. So in essence the 'once' is not affecting the number of matches but the data type of the output? I am wondering why.
Stephen23
Stephen23 on 4 May 2018
Edited: Stephen23 on 4 May 2018
"So in essence the 'once' is not affecting the number of matches but the data type of the output? I am wondering why."
Actually the option 'once' does change how many times the regular expression is matched, exactly as the regexp documentation describes: "Match the expression as many times as possible (default), or only once.". You can test this quite easily:
>> C = regexp('ab12cd34ed','\d+','match','once')
C = 12
>> C = regexp('ab12cd34ed','\d+','match');
>> C{:}
ans = 12
ans = 34
The default option is 'all': because multiple matches are possible (but not necessary) with this option all outputs are nested into cell arrays. The exact nesting of the output also depends on whether the input was a string array, a char vector or a cell array, so to know what to expect you should read the documentation.

Sign in to comment.


Bohan Liu
Bohan Liu on 2 May 2018
Hi,using regexp function with lookaround assertions is a feasible walkaround for loops. Funnily enough, there is a glitch in the exp, as the program should look behind any char vector that starts with STA. either with one T appended or none at all(specified by ?). I am kind of baffled why the result includes T as well.
exp='(?<=^STA\.T?)\w';
CellArrayNew=regexp(CellArray,exp,'match');
  2 Comments
Wick
Wick on 2 May 2018
I think the OP wanted the entire string if there was a failed match. As slick as it is to extract all the strings in one shot, I'm not sure your bit of code would do that.
Bohan Liu
Bohan Liu on 2 May 2018
Yep, right. I will try to work on this and update the answer if I come up with something new. Thanks for the info.

Sign in to comment.


Jan
Jan on 2 May 2018
While regexp is very powerful, strncmp is faster usually.
C = {'STA.TH01E2RT00'
'STA.J01E2RT00R99'
'STA.TB01E2RT00'
'Deafult.TY04KT00'
'Cond.TH09K12E3T00R15'
'STA.TH01E2RT00'
'PAR_STA.TH01E2RT00'};
index = strncmp(C, 'STA.T', 5);
C(index) = cellfun(@(c) c(6), C(index), 'Uniformoutput', 0);
index = strncmp(C, 'STA.', 4);
C(index) = cellfun(@(c) c(5), C(index), 'Uniformoutput', 0)
  2 Comments
Mekala balaji
Mekala balaji on 4 May 2018
Sir,
It works, but its not sure 5th position or 4th position index = strncmp(C, 'STA.T', 5); index = strncmp(C, 'STA.', 4); only suirity it starts with "STA.", take the first two alphabets after STA., and if the first one is T, then take the 2nd alphabet, else the the first one
Jan
Jan on 4 May 2018
I don't understand this comment. My code does:
  1. If the string starts with 'STA.' and a following 'T', the 5th character is chosen. In otehr words: When it starts with 'STA.T'.
  2. If it starts with 'STA.' otherwise, the 4th character is chosen.
This replies the example output from your original question. Do you want something else?

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!