Matlab. Find the indices of a cell array of strings with characters all contained in a given string (without repetition)
4 views (last 30 days)
Show older comments
Matthew Worker
on 13 Oct 2013
Commented: Cedric
on 13 Oct 2013
I have one string and a cell array of strings.
str = 'actaz';
dic = {'aaccttzz', 'ac', 'zt', 'ctu', 'bdu', 'zac', 'zaz'};
I want to obtain:
idx = [2, 3, 6];
I have written a very long code that:
1. finds the elements with length not greater than length(str);
2. removes the elements with characters not included in str;
3. finally, for each remaining element, checks the characters one by one
Essentially, it's an almost brute force code and runs very slowly. I wonder if there is a simple way to do it fast.
0 Comments
Accepted Answer
Mohsen Nosratinia
on 13 Oct 2013
You can sort the strings and then match them using regular expression. For your example the pattern will be ^a{0,2}c{0,1}t{0,1}z{0,1}$:
u = unique(str);
t = ['^' sprintf('%c{0,%d}', [u; histc(str,u)]) '$'];
s = cellfun(@sort, dic, 'uni', 0);
idx = find(~cellfun('isempty', regexp(s, t)));
0 Comments
More Answers (3)
Cedric
on 13 Oct 2013
Edited: Cedric
on 13 Oct 2013
Here is another solution, for the fun of it ..
>> spectrum = @(s) accumarray(s.'-64, ones(size(s)), [58,1]) ;
>> str_spec = spectrum(str) ;
Then
>> find(cellfun(@(s)all(spectrum(s)<=str_spec), dic))
ans =
2 3 6
>> dic = {'aaccttzz', 'ac', 'zt', 'ctu', 'bdu', 'zac', 'zaz', 'aaz', 'aaaz'} ;
>> find(cellfun(@(s)all(spectrum(s)<=str_spec), dic))
ans =
2 3 6 8
where we see that 'aaz' was taken into account but not 'aaaz' as is has more a's than str.
2 Comments
N/A
on 13 Oct 2013
Cedric, when I try your code it returns only 2.
When I use my real cell of strings, it returns this error:
??? Error using ==> accumarray
First input SUBS must contain positive integer subscripts.
Error in ==> @(s)accumarray(s.'-64,ones(size(s)),[58,1])
Error in ==> @(s)all(spectrum(s)<=str_spec)
I am trying to understand why this is happening.
Cedric
on 13 Oct 2013
Edited: Cedric
on 13 Oct 2013
A copy/paste with the following returns only 2?
str = 'actaz' ;
dic = {'aaccttzz', 'ac', 'zt', 'ctu', 'bdu', 'zac', 'zaz'} ;
spectrum = @(s) accumarray(s.'-64, ones(size(s)), [58,1]) ;
str_spec = spectrum(str) ;
find(cellfun(@(s)all(spectrum(s)<=str_spec), dic))
It shouldn't.
In your real cell of strings, do you have special characters, numbers or spaces? I wrote this solution thinking that there would be only letters (lower or upper case). If there are spaces and digits, please use the update version of spectrum:
spectrum = @(s) accumarray(s.'-31, ones(size(s)), [91,1]) ;
If there can be any special character, just use
spectrum = @(s) accumarray(s.'-0, ones(size(s)), [256,1]) ;
PS: s.'-0 could be replaced by double(s).', but I wanted to keep the previous structure for it not to be even more confusing ;-)
Jos (10584)
on 13 Oct 2013
Take a look at my function MATCHROW:
str = 'actaz';
dic = {'aaccttzz', 'ac', 'zt', 'ctu', 'bdu', 'zac', 'zaz', 'aaz'};
indices = find(cellfun(@(x) matchrow(str,x),dic))
The warning it issues (when numel(str) > numel(dic{k}) ), can be ignored. The function can be found here: http://www.mathworks.com/matlabcentral/fileexchange/14520
0 Comments
Azzi Abdelmalek
on 13 Oct 2013
Edited: Azzi Abdelmalek
on 13 Oct 2013
str = 'actaz';
dic = {'aaccttzz', 'ac', 'zt', 'ctu', 'bdu', 'zac', 'zaz'};
idx1=find(cellfun(@numel,dic)<=numel(str))
dic1=dic(idx1)
idx2=cellfun(@(x) numel(unique(x))==numel(x),dic1)
idx3=idx1(idx2)
dic3=dic1(idx2)
idx4= cellfun(@(x) all(ismember(x,str)),dic3)
indices=idx3(idx4)
2 Comments
N/A
on 13 Oct 2013
Dear Azzi,
your solution is amazing and very elegant, although apparently it is also a bit slow (but faster than the code that I am currently using).
There is only one problem. Suppose that
dic = {'aaccttzz', 'ac', 'zt', 'ctu', 'bdu', 'zac', 'zaz', 'aaz'};
the solution should be
indices = [2 3 6 8];
because 'a' appears twice in str. I realize I should have made clear that characters can appear at most n times in dic if they appear n times in str - i.e characters cannot be repeated more times than they are repeated in str.
Do you think your code can be adjusted to this condition?
Anyway, thank you a lot for your prompt answer!
See Also
Categories
Find more on Logical in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!