Finding Duplicate string values in two cell array 22124x1

I have a cell 22124x1 and it contain duplicate Values, I want to know how many times these values duplicate and their index
first cell contain these values Datacell=
'221853_s_at'
'221971_x_at'
'221971_x_at'
'221971_x_at'
'221971_x_at'
'222031_at'
'222031_at'
'31637_s_at'
'37796_at'
'38340_at'
'39854_r_at'
'53202_at'
'53202_at'
'60528_at'
'60528_at'
'90610_at'
'90610_at'
symbol cell:
'OR1D4 '
' OR1D5'
' HLA-DRB4 '
' HLA-DRB5 '
' LOC100133661 '
' LOC100294036'
'UTP14A '
' UTP14C'
'GTF2H2 '
'ZNF324B '
' LOC644504'
'JMJD7 '
'ZNF324B '
' JMJD7-PLA2G4B'
'OR2A20P '
' OR2A5 '
' OR2A9P'
'ZNF324B '
' ZNF584'
'WHAMM '
' WHAMML1 '
'LOC100290658 '
' WHAMML2'
'NR1D1 '
' THRA'
'C7orf25 '
' PRR5 '
' PRR5-ARHGAP8'
'LOC100290658 '
'C7orf25 '
' SAP25'
'HIP1R '
' LOC100294412'
Any help will be highly appreciated

1 Comment

Added (2) additional lines to get names and indices:
function [dupNames, dupNdxs] = getDuplicates(aList) % find duplicate entries in the list of names
[uniqueList,~,uniqueNdx] = unique(aList);
N = histc(uniqueNdx,1:numel(uniqueList));
dupNames = uniqueList(N>1);
dupNdxs = arrayfun(@(x) find(uniqueNdx==x), find(N>1), ...
'UniformOutput',false);
end

Sign in to comment.

Answers (1)

Let C be your cell array of strings, then
[UniqueC,~,k] = unique(C)
N = histc(k,1:numel(UniqueC))
will give you the unique elements in UniqueC and their frequency in N

2 Comments

Thanks. But It does not give me their index unfortuantely
The code given by Chuck Olosky gives the duplicate string names and indexes:
...
dupNames = uniqueList(N>1); % Names
dupNdxs = arrayfun(@(x) find(uniqueNdx==x), find(N>1),'UniformOutput',false); % Indexes

Sign in to comment.

Categories

Tags

No tags entered yet.

Asked:

on 26 Jan 2016

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!