Using unique fuction on cell array

Hello,
This is following up on a previous post adding two cells together. http://www.mathworks.com/matlabcentral/answers/263633-combining-to-two-cells
The reason I this is because I couldn't use the unique function unless it was a cell array of string. So I changed the one column. However, I can't apply this to the whole table and I want to use the function X=unique(X,'stable'). So in the picture attached it would remove second of the two highlighted rows. The unique function doesn't work as there is a mix of cell types.
ALTERNATIVE:
Focus on three columns where one is a mixture of string/number, number and date (first three columns in the sample excel sheet).
For the string/number I want to use something similar to numstr(). From my previous question I probably create another column using the for loop
for n = 1:length(p)
A(n) = {[num2str(p{n,1}),k{n,1}]};
end
and then for the number I would use the suggestion from dpb using cellmat. For dates unique works fine.
Putting them together I would find the unique indices
Thanks, Stephan

9 Comments

Your attachment didn't make it. Make sure you confirm the file selection.
Is there a reason why column 1 contains strings? Are there any values in there that aren't numeric? If not then just convert column 1 to doubles or singles and use a double/single array for the whole lot.
Otherwise I guess you will have to convert the numeric elements of your cell array to strings in order to use the 'unique' function.
Outline the goal again more succinctly and what the actual data are before you go munging around willy-nilly changing numerics to string representations, etc. With the data just as an image can't do anything with it, but why are there multiple values in some cells and not others--is that real or a fignewton of the conversion to character instead of numeric. If those are some sort of a compound component ID or somesuch, then indeed you can't use a numeric representation and unique as you've got what would be arrays versus single values to do the comparison over.
OTOH, if the idea is to remove the rows that have duplicate values in the first column, then simply use the alternate return from unique -- see the Answer for that solution.
There's nothing keeping you from using unique on numeric data albeit there's always the issue of floating point comparison for noninteger values.
jgg
jgg on 13 Jan 2016
Edited: jgg on 13 Jan 2016
I'm unclear why unique does not work here:
Name = {'Fred';'Betty';'Betty';'Bob';'George';'Jane'};
[C,ia,ic] = unique(Name,'stable');
Name(ia);
yields
'Fred'
'Betty'
'Bob'
'George'
'Jane'
as desired.
As mentioned below I'm looking at more generic unique function as we have in excel remove duplicates where we don't differentiate between the different cell types.
I will upload a data file but it will not cover all possibilities (except if it is a date)
jgg - in the example you provide you are only looking at strings and not a mixture of datatypes.
Stephen23
Stephen23 on 14 Jan 2016
Edited: Stephen23 on 14 Jan 2016
It looks like poor data design is making things more complicated too.
In Excel everything is stuck in one table... but MATLAB is not Excel. Regardless of this many beginners stick numeric (or mixed) data into cell arrays, without realizing that they should stick to keeping data in the simplest array possible to minimize processing complications: this means numeric data in numeric arrays, and strings in cell arrays (or char).
If you search this forum you will find lots of beginners attempting to manipulate numeric values inside cell arrays. The usual solution is to remove the values from the cell array and perform the desired operation. The optimal solution is that they should not have been in cell arrays in the first place.
Perhaps the data structure should be revised to reflect the data types that it contains, and the flow of the algorithm.
I would like to simplify the data! But I think the data is inputted by various sources and is not something that can be standardized. Many of them are a mixture of numbers and string in the format of 1,2...x 'ND'
Edited the question as given your inputs it may be easier to focus on the data types I know. Not the exact solution but given the inconsistency in the data it probably it is the easiest to implement in Matlab

Sign in to comment.

Answers (1)

>> ccc % a sample cell array similar to shown...
ccc =
'13,14' [10700]
'13,14' [ 0]
'123' [ 200]
'123' [ 200]
>> [~,ia]=unique(cell2mat(ccc(:,2)),'stable') % get the unique indices from the 2nd column
ia =
1
2
3
>> ccc(ia,:) % show the result
ans =
'13,14' [10700]
'13,14' [ 0]
'123' [ 200]
>>
To pare the table simply reassign --
ccc=ccc(ia,:);

4 Comments

Thanks dpb. I assume that you are suggesting to fix the column 2 that has the issue. However, I have taken only a sample of the data I have to manipulate. It has another 50+ columns which could vary in cell structure. Is there a more dynamic way of doing what you did above? I guess a for loop although that may take a huge amount of time
Well, as described in my earlier comment we need to know the precise data structure and what's real vs what's a figment of your having converted from the original form to try to make something work rather than being the actual data form.
Specifically, again, what is the deal on the first column as shown and what is the real underlying problem to be solved? Is it the redundant values as shown in the one column above, the duplicate ID in the first column (which, if so, there would seem to be as far as the amount of data shown an issue in the first two rows as well) or what, precisely? We can't solve a problem that isn't formulated.
Also, attach a short section of an actual data file, not the image. It doesn't have to be large in either dimension to illustrate with but must represent the various constraints and conditions that are to be handled. Posting the desired result along with it always is a plus.
Well the data structure may vary. It may be a string or a cell i.e. there are no restrictions of the input unless it is a date.
I can upload a data file but the data would not fully encompass all possibilities. I just want to simple remove duplicates as you can do in excel which doesn't differentiate between the different cell types (or maybe it does its own conversion)
I believe internally for that operation Excel does the comparison to each column individually behind the scenes and then combines those logical results. If your data really are so ill-formed as you say and you can't (or won't???) clean it up in the process of importing it to make it more manageable, then I'd posit the above is the only option you've left yourself.

Sign in to comment.

Categories

Asked:

on 13 Jan 2016

Commented:

dpb
on 14 Jan 2016

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!