# Grouping of data as per date and perform different operations

66 views (last 30 days)
Harsh Rob on 29 Dec 2019
Commented: Walter Roberson on 2 Jan 2020
I have a set of data which needs to be grouped on the basis of the dates. Further calculations needs to be done like calculation of RV, BPV etc on a daily basis.
I tried using the code newStr = extractBefore(str,11) to extract only the dates and then try the grouping. However, since ' is present in the string, the code is not able to extract the date ('08-Jan-2015 01:33:28').
Can someone help me group this data on the basis of dates and perform further operations.
In reference to final_use variable in workspace-
The first column in the attached .mat file are the dates and time(in unix format) and the second column are the prices. I have calculated the returns and is in the third column.
This further converts the unix time into character array.
%I created a code to convert the unix time into a time-
function dn = unixtime_to_datenum( unix_time )
dn = unix_time/86400 + 719529; %# == datenum(1970,1,1)
end
%I created a code to convert the time into a string-
time = final_use(:,1)
str = datestr( unixtime_to_datenum( time ) )

Cris LaPierre on 30 Dec 2019
I'm not sure what version of MATLAB you are using, but why not use the datetime data type? This line of code will convert the first column of data to datetimes, removing the need to create a separate function for this.
time = datetime(final_use(:,1),'ConvertFrom','epochtime',"Epoch",'1970-01-01')
Once you have the data as datetimes, you can then compute summary statistics grouping the data by specific time intervals (minutes, hours, days, weeks, months, etc). Use the groupsummary function and specify the desired groupbin. For example, the mean price for each day can be computed this way (I convert the matrix to a table first)
time = datetime(final_use(:,1),'ConvertFrom','epochtime',"Epoch",'1970-01-01')
price = final_use(:,2);
returns = final_use(:,3);
dataTbl = table(time,price,returns);
summaryTbl = groupsummary(dataTbl,"time","day",'mean',"price") Harsh Rob on 30 Dec 2019
Thank you for the answer guys. If I consider this code, then I am not able to run it for my calculations. In other words, I dont want to calculate mean, rather I want to calculate other values like RV, BPV, TPV, G and H values for which I have created functions.
I want to call this function and desire to get the output as corresponding values on each date.
The main task here is grouping of data and then find the corresponding values of this data for each day.
time = datetime(final_use(:,1),'ConvertFrom','epochtime',"Epoch",'1970-01-01')
price = final_use(:,2);
returns = final_use(:,3);
dataTbl = table(time,price,returns);
summaryTbl = groupsummary(dataTbl,"time","day",'mean',"price")
%THe function which i have created for calculaing my values on each day.
% Please note that the I have about 96 (15 min returns) returns on each day
function [RV,BV,RJ,ZJ,TP,RT,JT]=rvbpvariation(r,alpha)
alpha =0.999;
%This function computes the presence of a jump using the bipower variation test
%%The inputs are r a vector of realised returns Tx1;alpha is a constant,
%e.g. alpha == 0.999 means that the jump detection threshold is set to 1%
%The outputs are rv the realised variance
%bv the realised bipower variation
%rj is the unormalised jump z-score
%zjthe normalised jump z-score
%tp is the tripower quarticity
%rt is the daily return (sum(r))
%jt is the estimated jump size (0 if no jump present)
m= length(r);
RV=sum(r.^2);
bv=0;
for i=2:m
bv=bv+abs(r(i))*abs(r(i-1));
end
BV =(pi./2).*(m./(m-1))*bv;
RJ=(RV-BV)./RV;
tp=0;
k=4/3;
mu=(2.^(k./2)).*gamma((k+1)./2)./gamma(0.5);
for i=3:m
q=(abs(r(i-2)).^(k)).*(abs(r(i-1)).^(k)).*(abs(r(i)).^(k));
tp=tp+q;
end
TP=m.*mu.^(-3).*(m./(m-2)).*tp;
ZJ=RJ./(sqrt((((pi./2).^2)+pi-5).*(1./m).*max(1,TP./(BV.^2))));
RT=sum(r);
JT=sign(RT).*sqrt((RV-BV).*(ZJ>=norminv(alpha)));
Cris LaPierre on 2 Jan 2020
It is still possible to use your own equations. However, when performing calculations on groups, the result must be a single value for each group. This means the function can only return a single output variable containing a single value. This means you need to create a function for each value you want to create. You can then use function handles to have those values computed for each group.
Some of your functions use the resutls of previous calculations, so they can only be computed once their dependent variables have been computed. I might set it up like this.
% Set up the data
time = datetime(final_use(:,1),'ConvertFrom','posixtime')
price = final_use(:,2);
returns = final_use(:,3);
dataTbl = table(time,price,returns);
% Define constants
alpha =0.999;
k=4/3;
mu=(2.^(k./2)).*gamma((k+1)./2)./gamma(0.5);
% Define functions
RV = @(r) sum(r.^2);
BV = @(r) (pi./2).*(length(r)./(length(r)-1)).*sum(abs(r(2:end)).*abs(r(1:end-1)));
TP = @(r) length(r).*mu.^(-3).*(length(r)./(length(r)-2)).*sum((abs(r(1:end-2)).^(k)).*(abs(r(2:end-1)).^(k)).*(abs(r(3:end)).^(k)));
% Compute dependent variables
summaryTbl = groupsummary(dataTbl,"time","day",{RV,BV,TP,'sum'},"returns");
summaryTbl.Properties.VariableNames(3:end) = ["RV" "BV" "TP" "RT"];
% Compute remaining values
summaryTbl.RJ = (summaryTbl.RV-summaryTbl.BV)./summaryTbl.RV;
summaryTbl.ZJ = summaryTbl.RJ./(sqrt((((pi./2).^2)+pi-5).*(1./summaryTbl.GroupCount).*max(1,summaryTbl.TP./(summaryTbl.BV.^2))));
summaryTbl.JT = sign(summaryTbl.RT).*sqrt((summaryTbl.RV-summaryTbl.BV).*(summaryTbl.ZJ>=norminv(alpha)))
Walter Roberson on 2 Jan 2020
There is a trick: you can return a cell array. That qualifies as "a single output variable containing a single value."
If you return a row vector then afterwards you can cell2mat() and then array2table() if you want a table() of results.