pearsrnd and pearspdf are not coherent for the Pearson IV distribution?

We are having some trouble using the Pearson 4 implementation of the functions pearsrnd, pearspdf and pearscdf.
We think that pearsrnd does not generate samples from the distribution defined in pearspdf. In particular, we've found that pearsrnd over generates observations in the tails of the distribution.
In order to see this, we've implemented two simple tests:
1. We've compared the PDF of the Pearson 4 distribution (using pearspdf) to the PDF obtained from a simulation using pearsrnd, but we've found that the two are very different (even with a very long sample). This problem doesn't happen for other distributions. In particular, we've implemented the same test with the Pearson 6, and the results are satisfactory.
2. In addition, to test if the generatated sample is compatible with its CDF, we've applied the function pearscdf to the generated sample. We should have got a uniform distribution, because if X is a random variable and F its CDF, then F(X) is uniform in [0,1]. This is indeed what we've got with the Pearson 6, as expected, but not what we've got using the Pearson 4.
We've conducted a similar experiment using Mathematica, and we get the correct result.
We've attached the code we've used.
Any idea why we're getting this result?
Thank you very much.
%% Seed
rng(123)
%% Parameters
%Type IV
mu4 = 5;
sigma4 = 1;
skew4 = 1;
kurt4 = 10;
% Check the distribution type
[ign, type4] = pearsrnd(mu4, sigma4, skew4, kurt4, 1);
disp("This Pearson is of type " + type4)
This Pearson is of type 4
% Type VI
mu6 = 2;
sigma6 = 1;
skew6 = 2;
kurt6 = 10;
% Check the distribution type
[ign, type6] = pearsrnd(mu6, sigma6, skew6, kurt6, 1);
disp("This Pearson is of type " + type6)
This Pearson is of type 6
%% Simulate n observations according to both Pearson distributions defined above
n = 100000;
samples4 = pearsrnd(mu4, sigma4, skew4, kurt4, 1, n);
samples6 = pearsrnd(mu6, sigma6, skew6, kurt6, 1, n);
%% First test: we compare the simulated pdf (i.e. histogram of the sample) to the closed form PDF
pdf4 = @(x) pearspdf(x, mu4, sigma4, skew4, kurt4);
pdf6 = @(x) pearspdf(x, mu6, sigma6, skew6, kurt6);
x_ = linspace(-10, 10, 10000);
y4 = pdf4(x_);
y6 = pdf6(x_);
figure('Position', [100, 100, 1000, 400]); % [left, bottom, width, height]
subplot(1, 2, 1);
plot(x_, y4, Color='red', LineWidth=2)
title('Simulated and closed form PDF for Pearson 4')
hold on
histogram(samples4, 1000, 'Normalization', 'pdf', 'FaceColor', 'blue')
hold off
subplot(1, 2, 2);
plot(x_, y6, Color='red', LineWidth=2)
title('Simulated and closed form PDF for Pearson 6')
hold on
histogram(samples6, 1000, 'Normalization', 'pdf', 'FaceColor', 'blue')
hold off
sgtitle('Test 1: Comparison between Pearson 4 and 6');
%% Second test: Let X be a random variable and F its CDF, then F(X) is uniform in [0,1].
% We apply the Pearson CDF to the simulated observations and plot its
% histogram. We should obtain something resembling a uniform distribution.
figure('Position', [100, 100, 1000, 400]); % [left, bottom, width, height]
subplot(1, 2, 1);
u4 = pearscdf(samples4, mu4, sigma4, skew4, kurt4);
histogram(u4)
title('F(X) for Pearson 4')
subplot(1, 2, 2);
u6 = pearscdf(samples6, mu6, sigma6, skew6, kurt6);
histogram(u6)
title('F(X) for Pearson 6')
sgtitle('Test 2: Comparison between Pearson 4 and 6');
%% Conclusion
%We get the expected results using the Pearson 6, but we cannot say the
%same for the Pearson 4. Why?

1 Comment

MathWorks has posted a bug report https://www.mathworks.com/support/bugreports/3219165
This has been fixed in R2024a and in R2023b Update 8. The bug report also includes a workaround for affected versions of 23b.

Sign in to comment.

 Accepted Answer

Hello Marco,
It appears that you are implicating the pearson rand function here, but I believe the real culprit is the pearson pdf.
I have the rand function but not the pdf since the latter does not appear until Matlab 2023b. I wrote my own pearson4 pdf function, code is below, and it agrees pretty well with the Matlab pearson rand function (that plot is not shown).
After obtaining a new_cdf by numerical integration, a histogram of new_cdf(pearson rand) for comparison to the ones you did is shown below. And it makes sense.
mu = 5;
sdev = 1;
skew = 1;
kurt = 10;
n = 100000;
samples4 = pearsrnd(mu,sdev,skew,kurt, 1,n);
figure(1); grid on
histogram(samples4, 1000, 'Normalization', 'pdf', 'FaceColor', 'blue')
hold on
x = -20:.01:20;
f = pears4pdf(x,mu,sdev,skew,kurt);
plot(x,f)
hold off
% --------
function f = pears4pdf(x,mu,sdev,skew,kurt)
% pearson type 4 pdf
m = (5*kurt-6*skew^2-9)/(2*kurt-3*skew^2-6);
b = 2*(m-1)/sqrt((4*(2*m-3)/((m-2)^2*skew^2)) -1)*(-sign(skew));
J0 = J(m,0,b);
a = sdev*sqrt(J0/J(m,2,b));
z = (x-mu)/a;
z0 = -b/(2*(m-1));
arg = z+z0;
% f is normalized
f = (1/(J0*a))*(1+arg.^2).^(-m).*exp(-b*atan(arg));
end
% --------
function u = J(m,n,b)
% u = Int{-inf,inf} (z-z0)^n (1+z^2)^(-m) exp(-b*atan(z)) dz
% m > (n+1)/2
z0 = -b/(2*(m-1));
if m <= (n+1)/2
u = inf;
elseif n == -1
u = 0;
elseif n == 0
u = real((pi*gamma(2*m-1)) ...
/(2^(2*m-2)*exp(gammalni(m+b*i/2))*exp(gammalni(m-b*i/2))));
else
C = 2*(m-1)-(n-1);
u = ((n-1)/C)*(2*z0*J(m,n-1,b) + (1+z0^2)*J(m,n-2,b)) ;
end
end
% ---------
function w = gammalni(z)
% natural log of gamma function for input vector z with
% constant real part, for example z = 2 + i*(0:.01:6),
% by Stirling's approximation.
% see Whittaker and Watson, 4th edition, 12.33. note that
% error term for gamma(x+iy) is always less than that for gamma(x)
% David Goodmanson
%
% w = gammalni(z)
if any(diff(real(z))); error('Re(z) must be constant'); end
% these affect the precision of the calculation.
% higher values use more terms but in theory should be better.
zreal = 6; nterm = 5;
if real(z(1)) <(1/4) % use reflection property to keep x >0
end
reflect = 'true';
z = 1-z;
else
reflect = 'false';
end
m=0; % make Re(z) >= zreal so that |z| is large enough
while real(z(1)) < zreal
z = z+1;
m = m+1;
end
% c(i) = (-)^(i-1) B(i)/(2i)(2i-1), B are Bernoulli numbers
c(1) = 1/12; c(2) = -1/360; c(3) = 1/1260;
c(4) = -1/1680; c(5) = 1/1188; c(6) = -691/360360;
c(7) = 1/156; c(8) = -3617/122400; c(9) = 43867/244188;
c(10) = -174611/125400; c(11) = 77683/5796;
w = log(z).*(z-1/2) -z + log(sqrt(2*pi)); % log gamma for large z
zn = z;
for j=1:nterm;
zn = zn./(z.^2); % correction terms to log gamma are
w = w + c(j)*zn; % c(1) 1/z + c(2) 1/z^3 + ...
end
for n =1:m; % bring Re(z) back to its actual value.
z = z-1; % gamma(z-1) = gamma(z)/(z-1)
w = w -log(z);
end
% gamma(z)gamma(1-z) = pi/sin(pi z), but don't let argument
% of sine get too large and generate infinity
if strcmp(reflect, 'true')
absy = abs(imag(z));
w = log(pi) -w + -pi*absy ...
-log( ( exp(pi*(i*z-absy)) -exp(-pi*(i*z+absy)) )/(2*i) );
end
Histogram of new_cdf(pearsrnd). new_cdf is obtained by numerical integration of pears4pdf.

10 Comments

Hi David,
I tried to run your code to compare pears4pdf to pearspdf, but needed gammalni. I assume that's something you wrote. Is there an equivalent Matlab or toolbox function?
HI Paul,
I forgot about that function so I added it above. At one time I needed gamma(z) for a vector z of complex values, all of which have the same real part. The code here is uses a single scalar z which of course works as well.
I would not have had to write the code if Matlab had gamma(z) for complex z. Naturally to stay in business Mathworks has to adapt and address the work that people are doing now, neural networks for example. But It still baffles me why they are so casual about higher mathematical functions and symbolics. There are tons of examples on this site where people go to Wolfram Alpha (for free) to do a calculation that Matlab is incapable of. And the gamma function for complex argument is yet another example. Mathworks seems to have no shame about this.
On the brighter side, I see that you went past me on reputation points, and in view of your answers they are high quality points. (some people shotgun their answers, some people jump on really easy questions and some not very good answers get accepted). I'm getting older and slowing down, but you would have sped past anyway.
Hi David,
Thanks for posting your function and for your kind words. I know you've had longstanding displeasure with TMW implementation of special functions. Have you submitted an enhancement request?
Anyway, to the problem at hand, here's the comparison between your function and pearspdf
mu = 5;
sdev = 1;
skew = 1;
kurt = 10;
n = 100000;
samples4 = pearsrnd(mu,sdev,skew,kurt, 1,n);
figure(1); grid on
histogram(samples4, 1000, 'Normalization', 'pdf', 'FaceColor', 'blue','EdgeColor','none')
hold on
x = -20:.001:20;
fdg = pears4pdf(x,mu,sdev,skew,kurt);
plot(x,fdg)
ftmw = pearspdf(x,mu,sdev,skew,kurt);
plot(x,ftmw)
hold off
legend('histogram','DG','TMW')
format long e
[trapz(x,fdg) trapz(x,ftmw)]
ans = 1×2
9.999909458744830e-01 9.999367341876323e-01
% --------
function f = pears4pdf(x,mu,sdev,skew,kurt)
% pearson type 4 pdf
m = (5*kurt-6*skew^2-9)/(2*kurt-3*skew^2-6);
b = 2*(m-1)/sqrt((4*(2*m-3)/((m-2)^2*skew^2)) -1)*(-sign(skew));
J0 = J(m,0,b);
a = sdev*sqrt(J0/J(m,2,b));
z = (x-mu)/a;
z0 = -b/(2*(m-1));
arg = z+z0;
% f is normalized
f = (1/(J0*a))*(1+arg.^2).^(-m).*exp(-b*atan(arg));
end
% --------
function u = J(m,n,b)
% u = Int{-inf,inf} (z-z0)^n (1+z^2)^(-m) exp(-b*atan(z)) dz
% m > (n+1)/2
z0 = -b/(2*(m-1));
if m <= (n+1)/2
u = inf;
elseif n == -1
u = 0;
elseif n == 0
u = real((pi*gamma(2*m-1)) ...
/(2^(2*m-2)*exp(gammalni(m+b*i/2))*exp(gammalni(m-b*i/2))));
else
C = 2*(m-1)-(n-1);
u = ((n-1)/C)*(2*z0*J(m,n-1,b) + (1+z0^2)*J(m,n-2,b)) ;
end
end
% ---------
function w = gammalni(z)
% natural log of gamma function for input vector z with
% constant real part, for example z = 2 + i*(0:.01:6),
% by Stirling's approximation.
% see Whittaker and Watson, 4th edition, 12.33. note that
% error term for gamma(x+iy) is always less than that for gamma(x)
% David Goodmanson
%
% w = gammalni(z)
if any(diff(real(z))); error('Re(z) must be constant'); end
zreal = 6; nterm = 5;
% use reflection property to keep x >0
if real(z(1)) <(1/4),
reflect = 'true';
z = 1-z;
else
reflect = 'false';
end
m=0;
while real(z(1)) < zreal % make Re(z) >= zreal so that |z| is large enough
z = z+1;
m = m+1;
end
% c(i) = (-)^(i-1) B(i)/(2i)(2i-1), B are Bernoulli numbers
c(1) = 1/12; c(2) = -1/360; c(3) = 1/1260;
c(4) = -1/1680; c(5) = 1/1188; c(6) = -691/360360;
c(7) = 1/156; c(8) = -3617/122400; c(9) = 43867/244188;
c(10) = -174611/125400; c(11) = 77683/5796;
w = log(z).*(z-1/2) -z + log(sqrt(2*pi)); %log gamma for large z
zn = z;
for j=1:nterm;
zn = zn./(z.^2); % correction terms to log gamma are
w = w + c(j)*zn; % c(1) 1/z + c(2) 1/z^3 + ...
end
for n =1:m; % bring Re(z) back to its actual value.
z = z-1; % gamma(z-1) = gamma(z)/(z-1)
w = w -log(z);
end
% gamma(z)gamma(1-z) = pi/sin(pi z), but don't let argument
% of sine get too large and generate infinity
if strcmp(reflect, 'true')
absy = abs(imag(z));
w = log(pi) -w + -pi*absy ...
-log( ( exp(pi*(i*z-absy)) -exp(-pi*(i*z+absy)) )/(2*i) );
end
end
HI Paul/Marco, Thanks Paul for the pdf comparison. Things seem pretty clear cut, and I submitted a bug report with a link to this question.
Hi Paul,
I went looking for possible reasons for this error. The simplest reason is that the pdf is a function of (x-mu)/a, and e.g. for the unit normal, 'a' is the standard deviation, sdev If in fact a ~= sdev then the pdf will be wrong, And that appears to be what's happening.
After the parameters m and b are calculated, the second moment comes out with some value or other, not related to the input sdev. So you have to make an adjustment, which in the code is
a = sdev*sqrt(J0/J(m,2,b)).
It looks like someone forgot to do that. In the code if you replace that line with a = sdev, you get what looks very much like the incorrect result. But I can't expand out your plot to really make sure. If you have time, could you expand the plot to see if that is the case, and maybe run another quick example? If you indeed get the 'correct incorrect' result I will submit an updated bug report and maybe a Mathworks fix would happen faster if they know what that fix is.
Hi David,
Looks like you got it. I added a flag to pears4pdf. If false, the code runs with the correct expression for a. If true, run with the incorrect expression for a.
Do you think this would also solve the CDF problem? I don't know how/if pearspdf is related to pearscdf.
mu = 5;
sdev = 1;
skew = 1;
kurt = 10;
figure(1); grid on
hold on
x = 0:.001:10;
fdgcorrect = pears4pdf(x,mu,sdev,skew,kurt,false);
fdgincorrect = pears4pdf(x,mu,sdev,skew,kurt,true);
plot(x,fdgcorrect)
plot(x,fdgincorrect,'o','MarkerIndices',1:100:numel(x))
ftmw = pearspdf(x,mu,sdev,skew,kurt);
plot(x,ftmw)
hold off
legend('DGcorrect','DGIncorrect','TMW')
% --------
function f = pears4pdf(x,mu,sdev,skew,kurt,flag)
% pearson type 4 pdf
m = (5*kurt-6*skew^2-9)/(2*kurt-3*skew^2-6);
b = 2*(m-1)/sqrt((4*(2*m-3)/((m-2)^2*skew^2)) -1)*(-sign(skew));
J0 = J(m,0,b);
if flag
a = sdev;
else
a = sdev*sqrt(J0/J(m,2,b));
end
z = (x-mu)/a;
z0 = -b/(2*(m-1));
arg = z+z0;
% f is normalized
f = (1/(J0*a))*(1+arg.^2).^(-m).*exp(-b*atan(arg));
end
% --------
function u = J(m,n,b)
% u = Int{-inf,inf} (z-z0)^n (1+z^2)^(-m) exp(-b*atan(z)) dz
% m > (n+1)/2
z0 = -b/(2*(m-1));
if m <= (n+1)/2
u = inf;
elseif n == -1
u = 0;
elseif n == 0
u = real((pi*gamma(2*m-1)) ...
/(2^(2*m-2)*exp(gammalni(m+b*i/2))*exp(gammalni(m-b*i/2))));
else
C = 2*(m-1)-(n-1);
u = ((n-1)/C)*(2*z0*J(m,n-1,b) + (1+z0^2)*J(m,n-2,b)) ;
end
end
% ---------
function w = gammalni(z)
% natural log of gamma function for input vector z with
% constant real part, for example z = 2 + i*(0:.01:6),
% by Stirling's approximation.
% see Whittaker and Watson, 4th edition, 12.33. note that
% error term for gamma(x+iy) is always less than that for gamma(x)
% David Goodmanson
%
% w = gammalni(z)
if any(diff(real(z))); error('Re(z) must be constant'); end
zreal = 6; nterm = 5;
% use reflection property to keep x >0
if real(z(1)) <(1/4),
reflect = 'true';
z = 1-z;
else
reflect = 'false';
end
m=0;
while real(z(1)) < zreal % make Re(z) >= zreal so that |z| is large enough
z = z+1;
m = m+1;
end
% c(i) = (-)^(i-1) B(i)/(2i)(2i-1), B are Bernoulli numbers
c(1) = 1/12; c(2) = -1/360; c(3) = 1/1260;
c(4) = -1/1680; c(5) = 1/1188; c(6) = -691/360360;
c(7) = 1/156; c(8) = -3617/122400; c(9) = 43867/244188;
c(10) = -174611/125400; c(11) = 77683/5796;
w = log(z).*(z-1/2) -z + log(sqrt(2*pi)); %log gamma for large z
zn = z;
for j=1:nterm;
zn = zn./(z.^2); % correction terms to log gamma are
w = w + c(j)*zn; % c(1) 1/z + c(2) 1/z^3 + ...
end
for n =1:m; % bring Re(z) back to its actual value.
z = z-1; % gamma(z-1) = gamma(z)/(z-1)
w = w -log(z);
end
% gamma(z)gamma(1-z) = pi/sin(pi z), but don't let argument
% of sine get too large and generate infinity
if strcmp(reflect, 'true')
absy = abs(imag(z));
w = log(pi) -w + -pi*absy ...
-log( ( exp(pi*(i*z-absy)) -exp(-pi*(i*z+absy)) )/(2*i) );
end
end
Thanks, Paul. For the cdf, I just do numerical integration of the pdf since I doubt there could be an analytical expression for it (not counting some kind of infinite series). And that does solve the cdf problem as shown by the plot posted above in the answer as a comparison to Marco's discovery of the incorrect result. I modified the answer to try and make that conclusion more clear.
Thanks to Marco for finding this in the first place.
For the cdf integration I just found end points by experimentation that are good enough to represent +-inf. I'm working on code to determine the end points. Cubic spline is used for both the integration on a grid of points (with code I wrote, if Matlab has that function I'm not aware) and interpolation for the desired cdf points. But with a fine enough grid, trapz and interp1 will work as well and it's probably faster. It just seems kind of primitive.
Hi David,
pearspdf and pearscdf are both m-files, so the source code can be checked. You may be interested in checking out both to compare to your implementation.
pearscdf does numerical integration using integral, but intererestingly enough the pdf function that's integrated is defined locally inside pearscdf instead of defined with a call to pearspdf. I did verify (for this example) that calling integral on pearspdf gives a close result to pearscdf.
Hi David and Paul,
I'd like to thank both of you for taking the time to investigate the problem and come up with a solution.
I've been contacted by Mathworks and hopefully the bug will be fixed soon!
MathWorks has posted a bug report which includes a workaround for affected versions of 23b. See https://www.mathworks.com/support/bugreports/3219165
This has been fixed in R2024a and in R2023b Update 8. The bug report also includes a workaround for affected versions of 23b.

Sign in to comment.

More Answers (1)

Drew
Drew on 23 Feb 2024
Edited: Drew on 2 Jun 2024
MathWorks has posted a bug report https://www.mathworks.com/support/bugreports/3219165
This has been fixed in R2024a and in R2023b Update 8. The bug report also includes a workaround for affected versions of 23b.

Products

Release

R2023b

Asked:

on 23 Jan 2024

Edited:

on 7 Jun 2024

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!