2 data sets - Need to make the same size!

Question

0 votes

EDITED-HAD AN ERROR CHANGE THE ARRAY SIZE OF B NOW

Hi Everybody..

I have 2 big data sets.

Data set 1 is contained in two arrays, the components are A which is a cell array of size 65,000,000 x 1, and B which is a normal array of size 65,000,000 x 1.

Data set 2 is contained in two arrays also, the components are C which is a cell array of size 61,500,000 x 1, and B which is a normal array of size 61,500,000 x 1.

Formats

A and C:

A is the date and time stamp for the corresponding prices in B

C is the date and time stamp for the corresponding prices in D.

The date and time are incremental for both and are irregular (therefore price changes at different times for both). The format is as:

'20090501 00:00:00.365'

'20090501 00:00:00.371'

'20090501 00:00:00.605'

'20090501 00:00:00.863'

--------

B and D are the prices which correspond to the date and times of A and C respectively. The prices are in a format as:

98.9020000000000

98.8990000000000

98.8850000000000

98.8890000000000

What I want

I want to ammend this so that I can get A, B, C and D all the same size. I do not want to lose any information.

So lets look at our first dataset. We have A and B (time and price). We now look at C and we add into A all the entries of C which are not already in A. THEN....for these new date and times in A we need to add the corresponding price in B, which will just be the price of the nearest price above it.

For example, If we add

'20090501 00:00:00.645' into A then the corresponding B entry would be the price of the time before this time. So if we already had '20090501 00:00:00.605' in A and 98.9020000000000 in B then the new B entry for the new time added would remain 98.9020000000000 .

I look forward to some great and interesting answers.

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Walter Roberson on 24 Jun 2012

0 votes

Convert to serial date numbers. Then use the two-output form of histc(); the second output will be the bin number of the highest vector entry that does not exceed the probe times.

1 Comment
Show -1 older comments Hide -1 older comments

Walter Roberson on 24 Jun 2012

A_datenum = datenum(A, 'yyyymmdd HH:MM:SS.FFF')

C_datenum = datenum(C, 'yyyymmdd HH:MM:SS.FFF');

[count, C_bin] = histc(C_datenum, [-inf A_datenum(:); inf] );

Now, the entry for C{K} matches before anything in A if C_bin(K) is 1, and otherwise is at least as late as the A entry A{C_bin(K) - 1} . The -1 is because of the bin that got added to catch times before anything in A.

You did not say anything about how prices should be handled.

Sign in to comment.

2 data sets - Need to make the same size!

0 Comments
Show -2 older comments Hide -2 older comments

Answers (1)

1 Comment
Show -1 older comments Hide -1 older comments

Categories

Tags

Community Treasure Hunt

2 data sets - Need to make the same size!

0 Comments Show -2 older comments Hide -2 older comments

Answers (1)

1 Comment Show -1 older comments Hide -1 older comments

Categories

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

1 Comment
Show -1 older comments Hide -1 older comments