# 2 data sets - Need to make the same size!

6 views (last 30 days)
Mate 2u on 22 Jun 2012
EDITED-HAD AN ERROR CHANGE THE ARRAY SIZE OF B NOW
Hi Everybody..
I have 2 big data sets.
Data set 1 is contained in two arrays, the components are A which is a cell array of size 65,000,000 x 1, and B which is a normal array of size 65,000,000 x 1.
Data set 2 is contained in two arrays also, the components are C which is a cell array of size 61,500,000 x 1, and B which is a normal array of size 61,500,000 x 1.
Formats
A and C:
A is the date and time stamp for the corresponding prices in B
C is the date and time stamp for the corresponding prices in D.
The date and time are incremental for both and are irregular (therefore price changes at different times for both). The format is as:
'20090501 00:00:00.365'
'20090501 00:00:00.371'
'20090501 00:00:00.605'
'20090501 00:00:00.863'
--------
B and D are the prices which correspond to the date and times of A and C respectively. The prices are in a format as:
98.9020000000000
98.8990000000000
98.8850000000000
98.8890000000000
What I want
I want to ammend this so that I can get A, B, C and D all the same size. I do not want to lose any information.
So lets look at our first dataset. We have A and B (time and price). We now look at C and we add into A all the entries of C which are not already in A. THEN....for these new date and times in A we need to add the corresponding price in B, which will just be the price of the nearest price above it.
'20090501 00:00:00.645' into A then the corresponding B entry would be the price of the time before this time. So if we already had '20090501 00:00:00.605' in A and 98.9020000000000 in B then the new B entry for the new time added would remain 98.9020000000000 .
I look forward to some great and interesting answers.

Walter Roberson on 24 Jun 2012
Convert to serial date numbers. Then use the two-output form of histc(); the second output will be the bin number of the highest vector entry that does not exceed the probe times.
Walter Roberson on 24 Jun 2012
A_datenum = datenum(A, 'yyyymmdd HH:MM:SS.FFF')
C_datenum = datenum(C, 'yyyymmdd HH:MM:SS.FFF');
[count, C_bin] = histc(C_datenum, [-inf A_datenum(:); inf] );
Now, the entry for C{K} matches before anything in A if C_bin(K) is 1, and otherwise is at least as late as the A entry A{C_bin(K) - 1} . The -1 is because of the bin that got added to catch times before anything in A.
You did not say anything about how prices should be handled.