How to plot probability density curve?

I have modify the strip as per my data but its the resulst are not expected. Why the movemedian=25 is fixed here.
X = readmatrix('R_0.01.csv');
r_a=[X(1,:)];
r_b=[X(2,:)];
r_c=[X(3,:)];
r_d=[X(4:8,:)];
r_e=[X(9:12,:)];
r_f=[X(13:16,:)];
r_g=[X(17:26,:)];
r_h=[X(27:48,:)];
r_i=[X(49:120,:)];
r_j=[X(121:186,:)];
r_aam = (r_a(~isnan(r_a)));
r_abm = r_b(~isnan(r_b));
r_acm = r_c(~isnan(r_c));
r_adm = r_d(~isnan(r_d));
r_aem = r_e(~isnan(r_e));
r_afm = r_f(~isnan(r_f));
r_agm = r_g(~isnan(r_g));
r_ahm = r_h(~isnan(r_h));
r_aim = r_i(~isnan(r_i));
r_ajm= r_j(~isnan(r_j));
pd = makedist('Normal')
[f1,x1,flo1,fup1] = ecdf(r_aam);
[f2,x2,flo2,fup2] = ecdf(r_abm);
[f3,x3,flo3,fup3] = ecdf(r_acm);
[f4,x4,flo4,fup4] = ecdf(r_adm);
[f5,x5,flo5,fup5] = ecdf(r_aem);
[f6,x6,flo6,fup6] = ecdf(r_afm);
[f7,x7,flo7,fup7] = ecdf(r_agm);
[f8,x8,flo8,fup8] = ecdf(r_ahm);
[f9,x9,flo9,fup9] = ecdf(r_aim);
[f10,x10,flo10,fup10] = ecdf(r_ajm);
figure
plot(x, f)
grid
title('Empirical CDF')
dfdxs1 = smoothdata(gradient(f1)./gradient(x1), 'movmedian',25);
dfdxs2 = smoothdata(gradient(f2)./gradient(x2), 'movmedian',20);
dfdxs3 = smoothdata(gradient(f3)./gradient(x3), 'movmedian',25);
dfdxs4 = smoothdata(gradient(f4)./gradient(x4), 'movmedian',25);
dfdxs5 = smoothdata(gradient(f5)./gradient(x5), 'movmedian',25);
dfdxs6 = smoothdata(gradient(f6)./gradient(x6), 'movmedian',25);
dfdxs7 = smoothdata(gradient(f7)./gradient(x7), 'movmedian',25);
dfdxs8 = smoothdata(gradient(f8)./gradient(x8), 'movmedian',25);
dfdxs9 = smoothdata(gradient(f9)./gradient(x9), 'movmedian',25);
dfdxs10 = smoothdata(gradient(f10)./gradient(x10), 'movmedian',1000);
aaa1=smooth(dfdxs1)
aaa2=smooth(dfdxs2)
aaa3=smooth(dfdxs3)
aaa4=smooth(dfdxs4)
aaa5=smooth(dfdxs5)
aaa6=smooth(dfdxs6)
aaa7=smooth(dfdxs7)
aaa8=smooth(dfdxs8)
aaa9=smooth(dfdxs9)
aaa10=smooth(dfdxs10)
figure
plot(x1, aaa1)
plot(x2, aaa2)
plot(x3, aaa3)
plot(x4, aaa4)
plot(x5, aaa5)
plot(x6, aaa6)
plot(x7, aaa7)
plot(x8, aaa8)
plot(x9, aaa9)
plot(x10, aaa10)

 Accepted Answer

For data with an unknown distribution, I generally use the empirical cumulative distribution (ecdf) function to get the CDF, and the use the gradient function to derive the PDF. This is generally more robust than estimating the PDF directly, at least in my experience.
.

10 Comments

Exactly, I need to plot the emprical probability density but hav eno idea how to fix it for my data set
Is it the plot() function you're having trouble figuring out how to use?
Andi
Andi on 8 Dec 2021
Edited: Andi on 8 Dec 2021
May be your judgement about the question is not that much good, If you carefully look over the attached figure, it shows absolute values for emperical density distribution not the cummulative.
For a random vector, for example:
y = [5*randn(1,175)+15 2*randn(1,150)+45];
[f,x,flo,fup] = ecdf(y);
figure
plot(x, f)
grid
title('Empirical CDF')
dfdxs = smoothdata(gradient(f)./gradient(x), 'movmedian',25);
figure
plot(x, dfdxs)
grid
title('Empirical PDF')
ylim([0 0.3])
text([15 45], [0.1 0.15], compose('\\leftarrow Gaussian Peak \\mu = %.1f', [15 45]), 'Horiz','left', 'Vert','bottom', 'Rotation',90)
I created normal distributions with means at 15 and 45 to demonstrate this. Taking the derivative of the empirical cumulative distribution shows them distinctly. I used the smoothdata function on the derivative to filter out some of the noise (the derivative operation is a highpass filter that emphasizes noise) so that the peaks are more easily seen. Use whatever smoothing methods are appropriate. (Filtering the CDF first and then taking the derivative did not perform as well as filtering the derivative.)
So, the approach works!
.
May you look over the main question.
Please get rid of the numbered variables, put all of them in arrays (cell arrays here), and use loops to process them!
The original posted code is now —
X = readmatrix('https://www.mathworks.com/matlabcentral/answers/uploaded_files/826810/R_0.01.csv');
SzX = size(X)
SzX = 1×2
186 53
r{1}=[X(1,:)]; % First, Put The Previously Numbered Variables Into A Cell Array ...
r{2}=[X(2,:)];
r{3}=[X(3,:)];
r{4}=[X(4:8,:)];
r{5}=[X(9:12,:)];
r{6}=[X(13:16,:)];
r{7}=[X(17:26,:)];
r{8}=[X(27:48,:)];
r{9}=[X(49:120,:)];
r{10}=[X(121:186,:)];
for k = 1:numel(r) % ... Then, Remove The 'NaN' Elements ...
ra{k} = r{k}(~isnan(r{k}));
N{k} = numel(ra{k});
end
SegLen = [N{:}].' % ... Then, Determine The Segment Lengths ...
SegLen = 10×1
43 53 53 265 211 212 527 1073 3736 3423
pd = makedist('Normal')
pd =
NormalDistribution Normal distribution mu = 0 sigma = 1
for k = 1:numel(r) % ... Then, Calculate The 'ecdf' Variables ...
[f{k},x{k},flo{k},fhi{k}] = ecdf(ra{k});
end
figure
hold on
for k = 1:numel(r) % ... Then, Plot Them ...
plot(x{k}, f{k}, 'DisplayName',sprintf('r{%2d}',k))
end
hold off
grid
legend('Location','best')
title('Empirical CDF')
for k = 1:numel(r) % ... Then, Smooth Them ...
WinLen = max([5 N{k}/20]) % ... Adjusting Window Length To Vector Length ...
dfdxs{k} = smoothdata(gradient(f{k})./gradient(x{k}), 'movmedian',WinLen);
end
WinLen = 5
WinLen = 5
WinLen = 5
WinLen = 13.2500
WinLen = 10.5500
WinLen = 10.6000
WinLen = 26.3500
WinLen = 53.6500
WinLen = 186.8000
WinLen = 171.1500
for k = 1:numel(r) % ... Then, Smooth Them Again ...
aaa{k} = smooth(dfdxs{k});
end
for k = 1:numel(r) % ... Then, Find The Peaks & Return Related Data ...
[Pks{k},Locs{k},Wdh{k},Prm{k}] = findpeaks(aaa{k}, 'MinPeakProminence',0.25);
end
figure
hold on
for k = 1:numel(r) % ... Then, Plot The End Result!
plot(x{k}, aaa{k}, 'DisplayName',sprintf('aaa{%2d}',k))
end
hold off
grid
legend('Location','best')
figure
for k = 1:numel(r) % ... Then, Plot The End Result!
subplot(numel(r)/2, 2, k)
plot(x{k}, aaa{k})
hold on
plot(x{k}(Locs{k}), Pks{k}, '^r', 'MarkerFaceColor','r')
hold off
grid
ylim([0 max(ylim)])
title(sprintf('aaa{%2d}',k))
end
sgtitle('Individual PDF Plot & Peak Locations')
It may be necessary to smooth them a bit more.
There are clearly several peaks in many of those traces. Adjust the 'MinPeakProminence' value to give the desired result. The peaks can then be fitted with a Gaussian distribution using normpdf (as part of the objective function) and the appropriate nonlinear parameter estimation routine. Use the peak locations (here ‘x{k}(Locs{k})’) for the initial μ val;ue, and half the FWHM value (here ‘Wdh{k}’) for the initial σ value. It will probably require more experimentation to identify and fit all of them, however that should be possible for most of the records (1, 3, and 9 could be challenging).
.
Hi, I attempted multiple times but unable to smoothen the curve may you give some suggestions
With the smoothdata function, the best option might be 'sgolay', although it restricts itself to a second-order polynomial, so changing the window length is likely the only option. If you have the Signal Processing Toolbox, experiment with the sgolayfilt function. I generally use a third-order polynomial with varying frame lengths and just experiment until I am satisfied with the result.
.
Torsten
Torsten on 30 Mar 2022
Edited: Torsten on 30 Mar 2022
Use MATLAB's "ksdensity".
I try with the ksdensity but results are not expected.

Sign in to comment.

More Answers (0)

Products

Tags

Asked:

on 8 Dec 2021

Commented:

on 30 Mar 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!