Histogram Fit: Scaling and offset

I have one dimensional data (~12500 entries) with values reaching from ~135 to ~1150, yielding 3 peaks (see attachment).
Now I want to create a histogram showing the data distribution, as well as a fitting curve and a goodness of fit (chi squared) test.
Thus far I got the following:
load Data.mat
bins = round(sqrt(length(Data))); % Number of bins
[f, x] = hist(Data,bins); % Calculate histogram
pd = fitdist(x','Kernel'); % Calculate fit
y = pdf(pd,f); % Calculate pdf
figure(1)
dx = diff(x(1:2));
bar(x, f/sum(f*dx)); % Normalizing and plotting
hold on
plot(x,y,'Linewidth',2) % Plot fit
hold off
[h,p] = chi2gof(x,'CDF',pd,'Alpha',0.05); % Chi squared test
While my chi2gof test yields expected results (h=0 ; p = 0.9983) my plot doesn't look to well:
The scale of the fitting curve sems to be way off for all 3 peaks. Additionally I'd expect the curve to get a lot closer to 0 for very low and very high values.
Thanks in advance for any suggestions on how to improve/fix my code!

2 Comments

Regarding the scaling problem, what is sum(y*dx)?
Regarding the above-0 tails of the estimated pdf, do they drop off when you compute the density over a wider range, e.g. -500 to 2000? If so, the problem may be that the kernel bandwidth is not optimal. The default is to choose a good bandwidth to estimate a normal distribution. This looks much more like a mixture of three different distributions, so MATLAB's bandwidth guess may be pretty far from optimal.
@Scaling problem
sum(y*dx) = 0.3353
I also attached the data file to the OP, in case it's needed :)
@ 0-tails
Thanks, I managed to achieve better tails by adjusting the bandwidth of the kernel!
pd = fitdist(x','Kernel','Width',75);

Sign in to comment.

 Accepted Answer

I think there are a couple of problems. Try this:
load Data.mat
bins = round(sqrt(length(Data))); % Number of bins
[f, x] = hist(Data,bins); % Calculate histogram
pd = fitdist(Data,'Kernel','Width',5); % Calculate fit
y = pdf(pd,x); % Calculate pdf of bin values
figure(1)
dx = diff(x(1:2));
bar(x, f/sum(f*dx)); % Normalizing and plotting
hold on
y = y / sum(y*dx);
plot(x,y,'Linewidth',2) % Plot fit
hold off
[h,p] = chi2gof(x,'CDF',pd,'Alpha',0.05); % Chi squared test

1 Comment

Works perfectly! Thanks a lot!
I didn't work a lot with fitdist and pdf yet, I'm glad you were able to point out my mistakes :)

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!