Use Kernel Density Estimation to get the probability of a new observation

7 views (last 30 days)
Hi all, I would like to use KDE to fit a 1d variable and then getting the probability of a new observation given the fitted model using pdf( kde, new observation ). I used fitdist with the 'Kernel' option and then plotted the corresponding pdf. My question is, shouln't the pdf be normalized? How can I get the probability of a new observation then?
Thank you
x = [15.276,15.277,15.279,15.28,15.281,15.186,15.187,15.188,15.19,15.191,15.193,15.194,15.195,15.197,15.198,15.199,15.201,15.202,15.204,15.205,15.206,15.208,15.209,15.211,15.212,15.213,15.215,15.216,15.217,15.219,15.22,15.222,15.223,15.224,15.226,15.227,15.229,15.23,15.231,15.233,15.234,15.236,15.237,15.238,15.24,15.241,15.243,15.244,15.245,15.247,15.248,15.249,15.251,15.252,15.254,15.255,15.283,15.284,15.286,15.287,15.288,15.29,15.291,15.172,15.173,15.174,15.176,15.177,15.179,15.18,15.181,15.183,15.184,15.293,15.294,15.167,15.169,15.17,15.295,15.297,15.298,15.299,15.301,15.302,15.304,15.305,15.306,15.308,15.309,15.311,15.312,15.313,15.315,15.316,15.145,15.147,15.148,15.149,15.151,15.152,15.154,15.155,15.156,15.158,15.159,15.161,15.162,15.163,15.165,15.166,15.293,15.294,15.167,15.317,15.319,15.32,15.322,15.323,15.324,15.326,15.327,15.329,15.33,15.331,15.333,15.334,15.336,15.337,15.338,15.34,15.341,15.343,15.344,15.345,15.347,15.348,15.349,15.351,15.352,15.354,15.355,15.356,15.358,15.359,15.361,15.362,15.363,15.365,15.366,15.368]';
t = max( min(x) - 0.1, 0 ) : 0.01 : min( max(x) + 0.1, 24);
kde = fitdist(x,'Kernel','Kernel','normal','Support','positive')
pdf_kde = pdf( kde, t );
plot(t, pdf_kde, 'g-'), hold on;

Accepted Answer

Brendan Hamm
Brendan Hamm on 1 Apr 2015
The pdf integrates to be 1, so I am not sure why you think it needs to be normalized? Furthermore, this gives you a continuous distribution and therefore P(x|distribution) = 0 for all x, as points have no mass. the pdf is NOT telling you a probability, but rather just the probability density function evaluation at this point. Probability is the area under the curve between two values (the limits of the integral). You could ask, "what is the probability a new observation is less than or equal to 15.2?" and the answer would be found from the cdf:
cdf(kde,15.2) % P(x <= 15.2)
ans =
0.2727
or, "What is the probability the sample will be less than 15.4 but greater than 15.2?":
cdf(kde,15.4) - cdf(kde,15.2)% P(15.2 < x < 15.4) = P(x < 15.4) - P(x < 15.2);
ans =
0.7105
I hope this helps clear this up.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!