'reshape' by identifiers

Hi, I am trying to reshape data by two identifiers: date and id. The origianl data I have can be simplified as follow;
date id v1
2000 1 99
2001 1 84
1997 2 74
1998 2 89
1999 2 48
2000 2 43
2001 2 45
2002 2 49
And I need to change the original data into the matrix as below;
date id1 id2
1997 . 74
1998 . 89
1999 . 48
2000 99 43
2001 84 45
2002 . 49
I used to use nested for loop that goes through every elemnt in the original data and copy it into the matrix I need to generate if both of date and id are matched. But now, the length of original data is over 3 mil and it took me more than 30 min. to make the matrix I want. And I don't think plain reshape function can solve this problem. Can anyone help me to solve this problem? Thank you.
Minsoo

 Accepted Answer

This should probably be much faster:
[uyear, m, yearidx] = unique(Matrix(:,1));
OutMat = nan(length(uyear),3);
OutMat(:,1) = uyear(:);
OutMat(sub2ind([length(OutMat),3], yearidx, 1 + Matrix(:,2))) = Matrix(:,3);
(Function corrected as the output of the previous iteration required a reshape())

More Answers (2)

L = M(:,2)==1;
M2 = M(~L,:);
M2(ismember(M2(:,1),M(L,1)),2) = M(L,3)

3 Comments

M = [2000 1 99
2001 1 84
1997 2 74
1998 2 89
1999 2 48
2000 2 43
2001 2 45
2002 2 49
2003 1 55];
>> L = M(:,2)==1;
>> M2 = M(~L,:);
>> M2(ismember(M2(:,1),M(L,1)),2) = M(L,3)
??? Subscripted assignment dimension mismatch.
If you use the original matrix from the question instead of this slightly augmented one,
M =[ 2000 1 99
2001 1 84
1997 2 74
1998 2 89
1999 2 48
2000 2 43
2001 2 45
2002 2 49]
>> L = M(:,2)==1;
>> M2 = M(~L,:);
>> M2(ismember(M2(:,1),M(L,1)),2) = M(L,3)
M2 =
1997 2 74
1998 2 89
1999 2 48
2000 99 43
2001 84 45
2002 2 49
>> L = M(:,2)==1;
>> M2 = M(~L,:);
>> M2(ismember(M2(:,1),M(L,1)),2) = M(L,3)
M2 =
1997 2 74
1998 2 89
1999 2 48
2000 99 43
2001 84 45
2002 2 49
Notice the 2's left in column 2 :(
Hi Walter!
I agree with you, my variant - the answer to a specific question.
'2' can be easily replaced by any number or NaN -> M2(M2(:,2)==2,2) = NaN;
But not if the id that _should_ go in the second column is 2.

Sign in to comment.

Minsoo Kim
Minsoo Kim on 25 Jun 2011

0 votes

Hi Walter and Andrei! Thank you very much for your helpful answers. Yes, Andrei's answer can be easily complemented by deleting numbers in each column. I accepted Walter's solution because it doesn't need for loop to apply his answer to my original data of about 3mil observations and 20K unique id's. Thank you very much.
Minsoo

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!