handling irregular observations. Maybe more progress needs to be made by Matlab team
1 view (last 30 days)
Show older comments
Dear all,
Since in my analysis I use irregular time series observations that do not have standard frequency (like monthly, daily , yearly, quarterly) I was wondering how useful matlab can be in this case.
To give an example please take a look at the following link that displays how SAS (which I am not familiar with) can handle "automatically" such problems
I paste the table "Output 14.3.1 Measured Defect Rates"
1 13JAN1992 55
2 27JAN1992 73
3 19FEB1992 84
4 08MAR1992 69
5 27MAR1992 66
6 05APR1992 77
7 29APR1992 63
8 11MAY1992 81
9 25MAY1992 89
10 07JUN1992 94
11 23JUN1992 105
12 11JUL1992 97
13 15AUG1992 112
14 29AUG1992 89
15 10SEP1992 77
16 27SEP1992 8
we have irregular observations and after the interpolation we get monthly averages :
Obs date defects
1 JAN1992 59.323
2 FEB1992 82.000
3 MAR1992 66.909
4 APR1992 70.205
5 MAY1992 82.762
6 JUN1992 99.701
7 JUL1992 101.564
8 AUG1992 105.491
9 SEP1992 79.206
I had a discussion with Oleg regarding one of my previous questions
on how to obtain monthly averages when I have irregular observations. If I apply the approach of Oleg half the values in the output matrix interpData{b} are the same as the original input matrix A. But as you can see from the second table above, none of these values are the same as those of the first table.
So, going back to my previous question ( http://www.mathworks.de/matlabcentral/answers/44968-data-frequency-conversion-problem)
is it possible to apply something similar as in the case of SAS program?. If not, then it is a pity that such a powerful program like Matlab is less better than SAS in this domain of converting irregular time series observations to other frequencies.
Thank you
10 Comments
per isakson
on 6 Aug 2012
Edited: per isakson
on 6 Aug 2012
I'll like to pose a question. Assume you have bimonthly data
Jan&Feb 17
Mar&Apr 71
May&Jun 43
and I claim that the "best" monthly averages are
Jan 17
Feb 17
Mar 71
Apr 71
May 43
Jun 43
I guess you don't agree, but what arguments would you use to convince me that there are "better" estimates?
There is no magic trick!
Accepted Answer
Oleg Komarov
on 5 Aug 2012
Edited: Oleg Komarov
on 6 Aug 2012
I gave a look at SAS and honestly I don't understand how they got those values!
My approach was to take intra-month averages (I tried to interpret SASs method) and then interpolate them:
A = {
1 '13JAN1992' 55
2 '27JAN1992' 73
3 '19FEB1992' 84
4 '08MAR1992' 69
5 '27MAR1992' 66
6 '05APR1992' 77
7 '29APR1992' 63
8 '11MAY1992' 81
9 '25MAY1992' 89
10 '07JUN1992' 94
11 '23JUN1992' 105
12 '11JUL1992' 97
13 '15AUG1992' 112
14 '29AUG1992' 89
15 '10SEP1992' 77
16 '27SEP1992' 82}
% Convert dates to serial dates and store with data in a double matrix
data = [datenum(A(:,2),'ddmmmyyyy') cat(1,A{:,3})];
% Retrieve month year day
[yy mm dd] = datevec(data(:,1));
% Create aggregation subs for accumarray
subsr = repmat((yy-yy(1))*12 + mm-mm(1) + 1,2,1);
subsc = repmat(1:size(data,2),size(data,1),1);
% Take averages
avgData = accumarray([subsr subsc(:)], data(:),[],@nanmean);
% Interpolate
xi = datenum(1992,1:9,1);
intData = interp1(avgData(:,1),avgData(:,2),xi,'linear','extrap')
% Also, direct interpolation without averaging
intData2 = interp1(data(:,1),data(:,2),xi,'linear','extrap');
Plot
plot(data(:,1),data(:,2),'-db',xi,intData,'--om',xi,intData2,'-.+r')
axis tight
grid on
set(gca,'Xtick',xi)
datetick('x','mmm yy','keepticks')
legend('your data','interpolation of averages','direct interpolatio','location','NorthWest')
I feel a clarification is needed in response to salva's comments:
I don't know how many times I already said that, but manipulating data is dodgy. Even more the way SAS accomplishes that, which is not CLEAR from the link.
If you're doing research in finance/economics and you manipulate your data because you need it at certain points in time (at the beginning of the month) it's gonna already be an artificial result, but acceptable.
Do you think SAS is fancy because it changes ALL the values, well I assure SASs power isn't that.
MATLAB may lack some functions, but nobody stops you from writing your own and sharing it on the FEX.
MATLAB is not just a program but a programming language and it's not limited to statistics!
So yes, SAS could be more suited for statistical analysis because it has more embedded functions.
8 Comments
Oleg Komarov
on 6 Aug 2012
First of all you have to quantify how much of the population you lose if you discard completely the irregular series and the bi-monthly. Decide then which series to keep.
Then, I would suggest to apply some selection rules, a very standard approach. Filter out from the analysis those series which do not pass the selection rules, i.e. those which have very irregular spacing in time. How to decide about the rules, you should refer to literature that has already approached your type of analysis/data.
You can aggregate the monthly data to the bi-monthly frequency, that wouldn't impact your results as much as would the interpolation.
More Answers (0)
See Also
Categories
Find more on Descriptive Statistics and Visualization in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!