# Data manipulation of a financial time series

2 views (last 30 days)
Mate 2u on 5 Oct 2012
Hi all I have a 1x1 Struct called "ans". Within this I have a textdata matrix of size 5 million x 4.
Now in this textdata I have 4 columns, 1st column is dates, 2nd column is time, 3rd column is prices and 4th column is symbol.
OBJECTIVE
I need to rearrange the 2nd column times so it gives me the prices in 30 second intervals. Currently it is not regular.
Example of current format of column 2 is:'18:42:20.000' '18:42:22.000' '18:42:30.000' '18:42:40.000' '18:42:40.000' '18:42:40.000' '18:42:40.000' '18:42:40.000' '18:42:43.000' '18:42:49.000' '18:43:00.000'
I need to do it so that it gives me the times only every 30 seconds and the corresponding price (column 3) at that specific time. The goal is to have regular 30 second intervals alongside the price at that current time. So we only need to work with columns 2 and 3.
Please message me if I never explained properly, and thanks so much for the help in advance.

Show 1 older comment
Mate 2u on 5 Oct 2012
I do not need to distinguish between days. In your example the price at 18:43:00 would be the last price at 18:42:40 as it stays the same from the last 42.40 to 43......so we have to think of it as a financial application
José-Luis on 5 Oct 2012
Well, you would still need to distinguish between days then. Say the price in Monday at 18:40:02 is 412 bananas and on Tuesday at 18:40:02 it is 24 bananas. That would have an impact in your calculations. Similary, you have several data with the same time stamp:
• 18:42:00 12 bananas
• 18:42:00 13 bananas
• 18:42:00 0 bananas
Which is the valid one? Are the prices guaranteed to be the same? Or how do you calculate it? The average? The minimum? The maximum? The last one? Or maybe I am missing something?
Mate 2u on 5 Oct 2012
Hi, you are right 1st column is dates in the format of 01/03/2007...
The prices are not guaranteed to be the same.
Let me explain properly: I need the prices in 30 second even intervals. Currently it is uneven. If there are multiple prices for a certain time that corresponds to many trades occuring in a second and the last price is what the asset would remain until the next trade? Does this help?

José-Luis on 5 Oct 2012
Edited: José-Luis on 5 Oct 2012
Here is what you can do, assuming your_data is a cell matrix containing strings.
your_data{1} = '01/08/2007';
your_data{1,2} = '18:04:01';
your_data{2,1} = '04/08/2007';
your_data{2,2} = '18:05:01';
your_data{1,3} = '12';
your_data{2,3} = '13';
%Getting Matlab's date number
your_dates = cellfun(@(a,b) datenum([a ' ' b],'dd/mm/yy HH:MM:SS'),...
your_data(:,1),your_data(:,2));
%Creating time stamp with 1/2 hour interval
interpDates = (floor(your_dates(1)):1/2880:ceil(your_dates(end)));
interpDates(interpDates < your_dates(1)) = [];
interpDates(interpDates > your_dates(end)) = [];
%Keeping only the last time stamp, from the data
[dummy idx dummy] = unique(your_dates,'last');
your_dates = your_dates(idx);
%Getting data values
your_bananas = cellfun(@(a) str2double(a), your_data(:,3));
your_bananas = your_bananas(idx);
%Function to find the nearest value
find_banana = @(x) find(your_dates>=x,1,'first'); %There is probably a more efficient way
%Interpolating
your_interp_bananas = arrayfun(@(x) your_bananas(find_banana(x)),interpDates);

Show 1 older comment
Mate 2u on 5 Oct 2012
Data sample:
'01/03/2007' '15:30:06.000' '55.90' 'CTH07'
'01/03/2007' '15:30:30.000' '55.75' 'CTH07'
'01/03/2007' '15:30:42.000' '55.80' 'CTH07'
'01/03/2007' '15:30:53.000' '55.85' 'CTH07'
'01/03/2007' '15:30:57.000' '55.75' 'CTH07'
'01/03/2007' '15:31:17.000' '55.70' 'CTH07'
'01/03/2007' '15:31:23.000' '55.65' 'CTH07'
'01/03/2007' '15:31:36.000' '55.55' 'CTH07'
'01/03/2007' '15:31:38.000' '55.60' 'CTH07'
'01/03/2007' '15:31:43.000' '55.55' 'CTH07'
'01/03/2007' '15:31:44.000' '55.60' 'CTH07'
'01/03/2007' '15:31:50.000' '55.70' 'CTH07'
'01/03/2007' '15:32:07.000' '55.55' 'CTH07'
'01/03/2007' '15:32:07.000' '55.90' 'CTH07'
'01/03/2007' '15:40:41.000' '55.30' 'CTH07'
'01/03/2007' '15:40:43.000' '55.40' 'CTH07'
'01/03/2007' '15:40:52.000' '55.30' 'CTH07'
'01/03/2007' '15:40:54.000' '55.50' 'CTH07'
'01/03/2007' '15:41:33.000' '55.15' 'CTH07'
OUTPUT REQUIRED
'01/03/2007' '15:30:30.000' '55.75' 'CTH07'
'01/03/2007' '15:31:00.000' '55.75' 'CTH07'
'01/03/2007' '15:31:30.000' '55.65' 'CTH07'
'01/03/2007' '15:32:00.000' '55.70' 'CTH07'
'01/03/2007' '15:32:30.000' '55.90' 'CTH07'
'01/03/2007' '15:33:00.000' '55.90' 'CTH07'
'01/03/2007' '15:33:30.000' '55.90' 'CTH07'
etc.......
José-Luis on 5 Oct 2012
Modified original code. I thought it was 30 min interval. To do a 30 second interval, use:
interpDates = (floor(your_dates(1)):1/2880:ceil(your_dates(end)));
Also I realized an error in the find_bananas function, it should have been written the other way around:
find_banana = @(x) find(your_dates>=x,1,'first');
Mate 2u on 5 Oct 2012
OK, I am running it on my 5,000,000 x 4 cell array. It is Busy, I will let you know when it has run and if it works.