Arrange Matrix - Same Values of a Row to one Column
Show older comments
Hello, I have a huge database that needs to be re-arranged:
System_Values ; System_Name
1 ; ABC1
3 ; ABC2
8 ; ABC1
7 ; ABC3
0 ; ABC1
5 ; ABC3
8 ; ABC2
4 ; ABC3
7 ; ABC2
6 ; ABC2
So as you can see there are three different system names ABC1..3. ABC1 for example has several values in the first column: 1,8 and 0. I want matlab to find the systems and adding the values of each system in one column:
ABC1 ; ABC2 ; ABC3
1 ; 3 ; 7
8 ; 8 ; 5
0 ; 7 ; 4
empty ; 6 ; empty
I have absolutely no idea how to do this in a matlab script. I hope you can help me, please let me know if you don`t understand my problem.
Thanks
4 Comments
Guillaume
on 26 Jan 2015
What output do you expect if one system has 4 values and the others only 3?
Marv
on 26 Jan 2015
A 2d cell array has to have the same number of rows for all columns. Hence if one column is made up of 130 rows, so are all the other. So what do you put in there after the first 5 rows for the shorter column? NaN? 0? empty matrix?
The other option is to have a 1d row cell array made up of 1d column cell arrays. The inner cell arrays can then have different number of rows. It's lot more unwieldy though.
Hence my question.
Marv
on 26 Jan 2015
Answers (6)
It is unclear how you want to export the result, but here is how you can extract the data by system name:
isMatch = strcmpi( 'ABC1', System_Name ) ;
data_ABC1 = System_Values(isMatch) ;
and you can do the same for ABC2 and ABC3, or build a loop like:
nSys = 3 ;
names = cell( 1, nSys ) ;
values = cell( 1, nSys ) ;
for sysId = 1 : nSys
names{sysId} = sprintf( 'ABC%d', sysId ) ;
isMatch = strcmpi( names{sysId}, System_Name ) ;
values{sysId} = System_Values(isMatch) ;
end
EDIT 01/26/2015@7:38pm: taking in account your last comments, here is a solution.
sourceData = {1 'ABC1';3 'ABC2'; 8 'ABC1'; 7 'ABC3'; 0 'ABC1'; ...
5 'ABC3'; 8 'ABC2'; 4 'ABC3'; 7 'ABC2'; 6 'ABC2'} ;
[names, ~, cId] = unique( sourceData(:, 2) ) ;
buf = bsxfun( @eq, 1:max( cId ), cId ) ;
rId = sum( cumsum( buf ) .* buf, 2 ) ;
destData = [names.'; cell( max( rId ), numel( names ))] ;
destData(sub2ind( size(destData), 1+rId, cId )) = sourceData(:, 1) ;
which leads to:
>> destData
destData =
'ABC1' 'ABC2' 'ABC3'
[ 1] [ 3] [ 7]
[ 8] [ 8] [ 5]
[ 0] [ 7] [ 4]
[] [ 6] []
Both this and Guillaume's solutions have the disadvantage that you won't understand them if you are not familiar with MATLAB though. If I were you, I would go for a much simpler solution based on STRCMPI and a loop, and the call to UNIQUE that Star Strider proposed first.
Let us know if you are interested. It would involve a bit more code, but it would have the advantage that you would understand it better (especially if you come back to the code a few months/years from now).
EDIT 01/27/2015@9:30am: here is a solution that involves only basic MATLAB operations. My advice is that you try to fully understand it, even if you end up implementing Guillaume or my solution above, because they are more concise but significantly more complicated.
% - Init: extract unique names, initialize data structures for storing
% both cell data and statistics (as a numeric array).
names = unique( sourceData(:,2) ) ;
nNames = numel( names ) ;
data = cell( 1, nNames ) ;
stat = zeros( 4, nNames ) ;
% - Extract data and compute stat.
for nId = 1 : nNames
% Extract data for current name, store as cell array and convert
% to num for computing statistics.
isMatch = strcmpi( names{nId}, sourceData(:,2) ) ;
data{nId} = sourceData(isMatch,1) ;
dataNum = cell2mat( data{nId} ) ;
% Compute statistics, add extra row with data count (useful for
% initializing destination array later).
stat(1, nId) = mean( dataNum ) ;
stat(2, nId) = var( dataNum ) ;
stat(3, nId) = std( dataNum ) ;
stat(4, nId) = numel( dataNum ) ;
end
% - Build output cell array.
destData = cell( max( stat(4,:) ), nNames ) ;
% Copy data cell arrays to destination cell array.
for nId = 1 : nNames
destData(1:stat(4,nId), nId) = data{nId} ;
end
% Frame with names and statistics. Being a numeric array, the latter
% is converted to cell array first.
destData = [names.'; destData; num2cell( stat(1:3,:) )] ;
% - Export, e.g. to XLSX file.
xlswrite( 'Output.xlsx', destData ) ;
2 Comments
Guillaume
on 27 Jan 2015
I think Cedric's latest edition is probably the best way for a beginner to process and export the data indeed.
The downside is that the format of the output destData is not very good for visualisation within matlab. If you want to visualise it in matlab, I'd then convert the cells/matrices into tables. I'd also keep the stats and data separate.
.. and then maybe learning how to use Tables; but our solutions based on BSXFUN, ACCUMARRAY, ARRAY/CELLFUN, will require a lot of work if you want to understand what we do.
Note that I could update this solution (based on explicit loops) so it uses a numeric array of data. The only information that I/we still need is what kind of export you need exactly. You showed the content and said "text file", but is it really ASCII/plain text, or wouldn't you prefer an Excel file?
Star Strider
on 26 Jan 2015
Edited: Star Strider
on 26 Jan 2015
Edited to use the new (20:00 GMT) version of ‘M’:
This works:
M = {1 'ABC1'
3 'ABC2'
8 'ABC1'
7 'ABC3'
0 'ABC1'
5 'ABC3'
8 'ABC2'
4 'ABC3'
7 'ABC2'
6 'ABC2'};
[ABC, ia, ic] = unique(M(:,2));
for k1 = 1:size(ABC,1)
ABCv{k1} = [ABC{k1} M(ic == k1, 1)']';
end
C1 = ABCv{1} % Display Output
C2 = ABCv{2}
C3 = ABCv{3}
produces:
C1 =
'ABC1'
[1.0000e+000]
[8.0000e+000]
[0.0000e+000]
C2 =
'ABC2'
[3.0000e+000]
[8.0000e+000]
[7.0000e+000]
[6.0000e+000]
C3 =
'ABC3'
[7.0000e+000]
[5.0000e+000]
[4.0000e+000]
Guillaume
on 26 Jan 2015
accumarray is quite good for this. The difficulty is filling shorter columns with empty:
C = {1 'ABC1';3 'ABC2'; 8 'ABC1'; 7 'ABC3'; 0 'ABC1'; 5 'ABC3'; 8 'ABC2'; 4 'ABC3'; 7 'ABC2'; 6 'ABC2'}; %demo data
[names, ~, in] = unique(C(:, 2)); %get unique names and their position
namesdata = accumarray(in, cell2mat(C(:, 1)), [], @(x) {x}); %redistribute name data into cell arrays
%now what's left is to expand all the cell arrays to the same size
maxsize = max(cellfun(@numel, namesdata)); %get max length
namesdata = cellfun(@(c) [num2cell(c); cell(maxsize-numel(c), 1)], namesdata, 'UniformOutput', false); %fill shorter columns with empty cells
result = [names'; horzcat(namesdata{:})] %and concatenate the lot
1 Comment
Guillaume
on 27 Jan 2015
Following your comments, what you really need are tables. Here is one way to do it:
C = {1 'ABC1';3 'ABC2'; 8 'ABC1'; 7 'ABC3'; 0 'ABC1'; 5 'ABC3'; 8 'ABC2'; 4 'ABC3'; 7 'ABC2'; 6 'ABC2'}; %demo data
[names, ~, in] = unique(C(:, 2)); %get unique names and their position
namesdata = accumarray(in, cell2mat(C(:, 1)), [], @(x) {x}); %redistribute name data into cell arrays
maxsize = max(cellfun(@numel, namesdata)); %get max length
namesdata = cell2mat(cellfun(@(c) [c; nan(maxsize-numel(c), 1)], namesdata, 'UniformOutput', false)');
At this point namesdata is now a matrix of your data, with NaN for empty elements. You can make a table with:
tdata = array2table(namesdata, 'VariableNames', names)
To create the statistics table:
stats = [nanmean(namesdata); nanvar(namesdata, 1); nanstd(namesdata, 1)]; %requires stat toolbox
tstats = array2table(stats, 'VariableNames', names, 'RowNames', {'mean', 'variance', 'standard_deviation'})
To write it to a text file:
writetable([tdata;tstats], 'somefile.txt', 'WriteRowNames', true);
Marv
on 26 Jan 2015
0 votes
7 Comments
Star Strider
on 26 Jan 2015
Did you try them with your actual data?
I named my variable ‘ABC’ out of convenience, and to make the code clearer. You could call it any legal MATLAB variable name you want, and it would work as well. My code should work regardless of the names of the variables, or their number or the number of repeats or the type of the numeric variable in the first column. It only depends on the unique function that is reasonably robust, so you would be constrained only by its limitations.
I haven’t run Cedric’s code, but I would imagine it is as robust.
Star Strider
on 27 Jan 2015
@Cedric: Hi and Happy New Year as well! Welcome back — you’ve been missed!
Star Strider
on 27 Jan 2015
My pleasure, Cedric! Your significant expertise is always welcome. I always learn from your Answers and Comments, many sending me back to the Documentation to learn something new about a particular function. I hope your absence was simply an essentially voluntary hiatus. Guillaume is keeping the bar high in your absence.
The spammers still exist, but the (albeit imperfect) spam filter actually traps a lot of them. (I doubt ‘cracked’ triggered it, but if it did, it will catch this one, too. Testing...)
The spam filter seems to be heuristic, so some spam still show up, but most get quarantined in Potentially Spam Questions (‘Flagged Content’ -> ‘Potentially Spam Questions’ in the upper right). Those with 3K privileges can get others’ erroneously-flagged Questions, Answers, and Comments out of ’spam purgatory’, or you can by clicking on the ‘Contact’ link in the spam banner above your Answer or Comment and telling the powers-that-be that your contribution is not spam. I had to do that a couple times, and I’ve rescued a few (such as yours) that I found.
The new default Profile page is causing some controversy (I’ve complained), but if you click on ‘My Profile’ on the upper left above ‘Ask a Question’, you’re back to the previous version, at least for your own Profile. It fortunately seems to be a work in progress, but why it got ‘fixed’ when it wasn’t ‘broken’ mystifies me.
And then there’s HG2...
Thank you Star Strider! It is a voluntary hiatus driven by the fact that I have too much work AND because I know too well how "too much" time I can spend answering questions if I allow me to interact on the forum.
Thank you for the link and for the information about these new features. I am not too fond of the new design for profiles either. Hopefully, as you say, it is a work in progress!
Star Strider
on 27 Jan 2015
My pleasure! I’m glad it is a voluntary hiatus. There are worse things in life than too much work, providing you’re getting paid appropriately. (Harking back to my Internal Medicine residency when my workload seemed infinite but our pay on an hourly basis was about one third the minimum wage.) ‘Answers’ can definitely expand to fill all available time, so I’ll likely reduce my participation in a few months.
I had to ask about getting access to the ‘old’ Answers features, since it was very much less than obvious. It really hinders spam-deletion efforts. I am assured that this is ‘Version 1.0’ of the ‘new’ Answers, but then the ‘old’ Answers wasn’t broken, so the need to ‘fix’ it still mystifies.
Marv
on 27 Jan 2015
4 Comments
Guillaume
on 27 Jan 2015
1. The empty values are "[]"
This is what you asked. [] is the empty matrix. How else would you represent empty in matlab?
2. How can I export this cell array in a text file ?
In that format, not easily.
3. I would like to calculate the standard deviation of each columm and outputting this at the end of each column
Again, the data format is a hindrance. Furthermore, you're overloading the data in each column. In addition to a column header, and the data, you now want statistics stored in the column. With a cell array, it's only your own convention that tells you what each row is. Matlab won't track this for you.
In the end a cell array with empty elements is probably not what you want (hence, my original question about what output you expect). For a start, I would use NaNs instead of empty, so your data can be a matrix. Statistics on a matrix is very straightforward. On a cell array not so much. If you want headers you can store the matrix in a table with column headers. The stats, I would store in another table with both column and row headers. Tables are the easiest thing to export to file in matlab.
See comments to my answer on how to do this.
Marv
on 27 Jan 2015
Guillaume
on 27 Jan 2015
I would say in your particular case, which involves redistributing single column data into multicolumn data, it's much easier in matlab.
You can of course do it in VBA, if you're more familiar with it. I doubt you could just do it with Excel formulas.
In any case, I've added code as a comment to my own answer on how to do what you want using tables. The same would be significantly more code in VBA.
Cedric
on 27 Jan 2015
See the edit in my post. There is nothing complicated overall, but you have to understand data manipulation in MATLAB, and in particular the difference between numeric arrays and cell arrays.
Andrei Bobrov
on 27 Jan 2015
Edited: Andrei Bobrov
on 27 Jan 2015
sourceData = {1 'ABC1';3 'ABC2'; 8 'ABC1'; 7 'ABC3'; 0 'ABC1'; ...
5 'ABC3'; 8 'ABC2'; 4 'ABC3'; 7 'ABC2'; 6 'ABC2'} ;
[a,~,c] = unique(sourceData(:,2));
[v,ii] = sort(c);
t = [1;diff(v)]>0;
k = ones(size(t));
n = (1:numel(v))';
xx = n(t);
k(xx(2:end)) = 1-diff(find(t));
out = [a(:)';accumarray([cumsum(k),v],ii,[],@(x)sourceData(x,1))];
or used Cedric's idea
[a,~,c] = unique(sourceData(:,2));
b = bsxfun( @eq, 1:max( c ), c );
[ii,jj] = find(b);
d = cumsum(b).*b;
out2 = [a(:)';accumarray([d(d>0),jj],ii,[],@(x)sourceData(x,1))];
d2 = out2(2:end,:);
d2(cellfun('isempty',d2)) = {nan};
d2 = cell2mat(d2);
out3 = [out2;num2cell([nanmean(d2);nanstd(d2);nanvar(d2)])];
2 Comments
Marv
on 27 Jan 2015
As written in my answer Marv (EDIT 01/27/2015@9:30am), you should understand the basic approach first, because it would take a lot of time for us to explain in depth what Andrei, Guillaume, or I implemented using BSXFUN, ACCUMARRAY, etc., and you would not be able to re-use this material in another context.
If you focus on the basic approach, however, you'll have to get more familiar with e.g. numeric and cell types of arrays, and this won't be lost because they are used everywhere in MATLAB.
I am speaking for myself here, but I guess that it can be extended to Andrei and Guillaume (correct me if I am wrong): I/we usually start writing these concise solutions because we find them elegant, despite not knowing at first how proficient the original poster (OP) is with MATLAB. But, as far as I am concerned, these aren't good solutions (unless the OP is very proficient already), because they are too complex for an economy of just a few lines of code. In addition, more lines of code and explicit loops, don't mean less efficient, as the basic approach is the second fastest code according to the profiler (in this context, after repeating sourceData 1000 times for working with a larger data set).
Categories
Find more on Tables in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!