pearsrnd and pearspdf are not coherent for the Pearson IV distribution?

Question

Marco on 23 Jan 2024

1
Link

Direct link to this question

https://ch.mathworks.com/matlabcentral/answers/2073236-pearsrnd-and-pearspdf-are-not-coherent-for-the-pearson-iv-distribution

Edited: Drew on 7 Jun 2024

Accepted Answer: David Goodmanson

Test_Pearson.m

Open in MATLAB Online

We are having some trouble using the Pearson 4 implementation of the functions pearsrnd, pearspdf and pearscdf.

We think that pearsrnd does not generate samples from the distribution defined in pearspdf. In particular, we've found that pearsrnd over generates observations in the tails of the distribution.

In order to see this, we've implemented two simple tests:

1. We've compared the PDF of the Pearson 4 distribution (using pearspdf) to the PDF obtained from a simulation using pearsrnd, but we've found that the two are very different (even with a very long sample). This problem doesn't happen for other distributions. In particular, we've implemented the same test with the Pearson 6, and the results are satisfactory.

2. In addition, to test if the generatated sample is compatible with its CDF, we've applied the function pearscdf to the generated sample. We should have got a uniform distribution, because if X is a random variable and F its CDF, then F(X) is uniform in [0,1]. This is indeed what we've got with the Pearson 6, as expected, but not what we've got using the Pearson 4.

We've conducted a similar experiment using Mathematica, and we get the correct result.

We've attached the code we've used.

Any idea why we're getting this result?

Thank you very much.

%% Seed

rng(123)

%% Parameters

%Type IV

mu4 = 5;

sigma4 = 1;

skew4 = 1;

kurt4 = 10;

% Check the distribution type

[ign, type4] = pearsrnd(mu4, sigma4, skew4, kurt4, 1);

disp("This Pearson is of type " + type4)

This Pearson is of type 4

% Type VI

mu6 = 2;

sigma6 = 1;

skew6 = 2;

kurt6 = 10;

% Check the distribution type

[ign, type6] = pearsrnd(mu6, sigma6, skew6, kurt6, 1);

disp("This Pearson is of type " + type6)

This Pearson is of type 6

%% Simulate n observations according to both Pearson distributions defined above

n = 100000;

samples4 = pearsrnd(mu4, sigma4, skew4, kurt4, 1, n);

samples6 = pearsrnd(mu6, sigma6, skew6, kurt6, 1, n);

%% First test: we compare the simulated pdf (i.e. histogram of the sample) to the closed form PDF

pdf4 = @(x) pearspdf(x, mu4, sigma4, skew4, kurt4);

pdf6 = @(x) pearspdf(x, mu6, sigma6, skew6, kurt6);

x_ = linspace(-10, 10, 10000);

y4 = pdf4(x_);

y6 = pdf6(x_);

figure('Position', [100, 100, 1000, 400]); % [left, bottom, width, height]

subplot(1, 2, 1);

plot(x_, y4, Color='red', LineWidth=2)

title('Simulated and closed form PDF for Pearson 4')

hold on

histogram(samples4, 1000, 'Normalization', 'pdf', 'FaceColor', 'blue')

hold off

subplot(1, 2, 2);

plot(x_, y6, Color='red', LineWidth=2)

title('Simulated and closed form PDF for Pearson 6')

hold on

histogram(samples6, 1000, 'Normalization', 'pdf', 'FaceColor', 'blue')

hold off

sgtitle('Test 1: Comparison between Pearson 4 and 6');

%% Second test: Let X be a random variable and F its CDF, then F(X) is uniform in [0,1].

% We apply the Pearson CDF to the simulated observations and plot its

% histogram. We should obtain something resembling a uniform distribution.

figure('Position', [100, 100, 1000, 400]); % [left, bottom, width, height]

subplot(1, 2, 1);

u4 = pearscdf(samples4, mu4, sigma4, skew4, kurt4);

histogram(u4)

title('F(X) for Pearson 4')

subplot(1, 2, 2);

u6 = pearscdf(samples6, mu6, sigma6, skew6, kurt6);

histogram(u6)

title('F(X) for Pearson 6')

sgtitle('Test 2: Comparison between Pearson 4 and 6');

%% Conclusion

%We get the expected results using the Pearson 6, but we cannot say the

%same for the Pearson 4. Why?

1 Comment
Show -1 older commentsHide -1 older comments

Drew on 22 Mar 2024

Edited: Drew on 2 Jun 2024

MathWorks has posted a bug report https://www.mathworks.com/support/bugreports/3219165

This has been fixed in R2024a and in R2023b Update 8. The bug report also includes a workaround for affected versions of 23b.

Sign in to comment.

Sign in to answer this question.

Answer 1

David Goodmanson on 15 Feb 2024

1
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/2073236-pearsrnd-and-pearspdf-are-not-coherent-for-the-pearson-iv-distribution#answer_1409113

Edited: David Goodmanson on 17 Feb 2024

Open in MATLAB Online

Hello Marco,

It appears that you are implicating the pearson rand function here, but I believe the real culprit is the pearson pdf.

I have the rand function but not the pdf since the latter does not appear until Matlab 2023b. I wrote my own pearson4 pdf function, code is below, and it agrees pretty well with the Matlab pearson rand function (that plot is not shown).

After obtaining a new_cdf by numerical integration, a histogram of new_cdf(pearson rand) for comparison to the ones you did is shown below. And it makes sense.

mu = 5;
sdev = 1;
skew = 1;
kurt = 10;
n = 100000;
samples4 = pearsrnd(mu,sdev,skew,kurt, 1,n);
figure(1); grid on
histogram(samples4, 1000, 'Normalization', 'pdf', 'FaceColor', 'blue')
hold on
x = -20:.01:20;
f = pears4pdf(x,mu,sdev,skew,kurt);
plot(x,f)
hold off
% --------
function f = pears4pdf(x,mu,sdev,skew,kurt)
% pearson type 4 pdf
m = (5*kurt-6*skew^2-9)/(2*kurt-3*skew^2-6);
b = 2*(m-1)/sqrt((4*(2*m-3)/((m-2)^2*skew^2)) -1)*(-sign(skew));
J0 = J(m,0,b);
a = sdev*sqrt(J0/J(m,2,b));
z = (x-mu)/a;
z0 = -b/(2*(m-1));
arg = z+z0;
% f is normalized
f = (1/(J0*a))*(1+arg.^2).^(-m).*exp(-b*atan(arg));
end
% --------
function u = J(m,n,b)
% u = Int{-inf,inf} (z-z0)^n (1+z^2)^(-m) exp(-b*atan(z)) dz
% m > (n+1)/2
z0 = -b/(2*(m-1));
if m <= (n+1)/2
    u = inf;
elseif n == -1 
    u = 0;
elseif n == 0
    u = real((pi*gamma(2*m-1)) ...
        /(2^(2*m-2)*exp(gammalni(m+b*i/2))*exp(gammalni(m-b*i/2))));
else
    C = 2*(m-1)-(n-1);
    u = ((n-1)/C)*(2*z0*J(m,n-1,b) + (1+z0^2)*J(m,n-2,b)) ;  
end
end
% ---------
function w = gammalni(z)
% natural log of gamma function for input vector z with
% constant real part, for example z = 2 + i*(0:.01:6),
% by Stirling's approximation.
% see Whittaker and Watson, 4th edition, 12.33.  note that
% error term for gamma(x+iy) is always less than that for gamma(x)
% David Goodmanson
%
% w = gammalni(z)
if any(diff(real(z))); error('Re(z) must be constant'); end
  
% these affect the precision of the calculation.
% higher values use more terms but in theory should be better. 
zreal = 6;  nterm = 5;
if real(z(1)) <(1/4)       % use reflection property to keep x >0
end
  reflect = 'true';
  z = 1-z;
else
  reflect = 'false';
end
m=0;                       % make Re(z) >= zreal so that |z| is large enough
while real(z(1)) < zreal 
  z = z+1;
  m = m+1;
end
% c(i) = (-)^(i-1) B(i)/(2i)(2i-1), B are Bernoulli numbers
 c(1) = 1/12;             c(2) = -1/360;         c(3) = 1/1260;
 c(4) = -1/1680;          c(5) = 1/1188;         c(6) = -691/360360; 
 c(7) = 1/156;            c(8) = -3617/122400;   c(9) = 43867/244188;
c(10) = -174611/125400;  c(11) = 77683/5796;
w = log(z).*(z-1/2) -z + log(sqrt(2*pi));     % log gamma for large z
zn  = z;                           
for j=1:nterm;
  zn = zn./(z.^2);           %  correction terms to log gamma are    
  w = w + c(j)*zn;           %  c(1) 1/z + c(2) 1/z^3 + ...
end
for n =1:m;                  % bring Re(z) back to its actual value.
  z = z-1;                   % gamma(z-1) = gamma(z)/(z-1)
  w = w -log(z);
end
% gamma(z)gamma(1-z) = pi/sin(pi z), but don't let argument
% of sine get too large and generate infinity
if strcmp(reflect, 'true')
  absy = abs(imag(z));
  w = log(pi) -w + -pi*absy ...
      -log( ( exp(pi*(i*z-absy)) -exp(-pi*(i*z+absy)) )/(2*i) );
end

Histogram of new_cdf(pearsrnd). new_cdf is obtained by numerical integration of pears4pdf.

10 Comments
Show 8 older commentsHide 8 older comments

David Goodmanson on 16 Feb 2024

HI Paul,

I forgot about that function so I added it above. At one time I needed gamma(z) for a vector z of complex values, all of which have the same real part. The code here is uses a single scalar z which of course works as well.

I would not have had to write the code if Matlab had gamma(z) for complex z. Naturally to stay in business Mathworks has to adapt and address the work that people are doing now, neural networks for example. But It still baffles me why they are so casual about higher mathematical functions and symbolics. There are tons of examples on this site where people go to Wolfram Alpha (for free) to do a calculation that Matlab is incapable of. And the gamma function for complex argument is yet another example. Mathworks seems to have no shame about this.

On the brighter side, I see that you went past me on reputation points, and in view of your answers they are high quality points. (some people shotgun their answers, some people jump on really easy questions and some not very good answers get accepted). I'm getting older and slowing down, but you would have sped past anyway.

Paul on 16 Feb 2024

Edited: Paul on 17 Feb 2024

Open in MATLAB Online

Hi David,

Thanks for posting your function and for your kind words. I know you've had longstanding displeasure with TMW implementation of special functions. Have you submitted an enhancement request?

Anyway, to the problem at hand, here's the comparison between your function and pearspdf

mu = 5;

sdev = 1;

skew = 1;

kurt = 10;

n = 100000;

samples4 = pearsrnd(mu,sdev,skew,kurt, 1,n);

figure(1); grid on

histogram(samples4, 1000, 'Normalization', 'pdf', 'FaceColor', 'blue','EdgeColor','none')

hold on

x = -20:.001:20;

fdg = pears4pdf(x,mu,sdev,skew,kurt);

plot(x,fdg)

ftmw = pearspdf(x,mu,sdev,skew,kurt);

plot(x,ftmw)

hold off

legend('histogram','DG','TMW')

format long e

[trapz(x,fdg) trapz(x,ftmw)]

ans = 1×2

9.999909458744830e-01 9.999367341876323e-01

% --------

function f = pears4pdf(x,mu,sdev,skew,kurt)

% pearson type 4 pdf

m = (5*kurt-6*skew^2-9)/(2*kurt-3*skew^2-6);

b = 2*(m-1)/sqrt((4*(2*m-3)/((m-2)^2*skew^2)) -1)*(-sign(skew));

J0 = J(m,0,b);

a = sdev*sqrt(J0/J(m,2,b));

z = (x-mu)/a;

z0 = -b/(2*(m-1));

arg = z+z0;

% f is normalized

f = (1/(J0*a))*(1+arg.^2).^(-m).*exp(-b*atan(arg));

end

% --------

function u = J(m,n,b)

% u = Int{-inf,inf} (z-z0)^n (1+z^2)^(-m) exp(-b*atan(z)) dz

% m > (n+1)/2

z0 = -b/(2*(m-1));

if m <= (n+1)/2

u = inf;

elseif n == -1

u = 0;

elseif n == 0

u = real((pi*gamma(2*m-1)) ...

/(2^(2*m-2)*exp(gammalni(m+b*i/2))*exp(gammalni(m-b*i/2))));

else

C = 2*(m-1)-(n-1);

u = ((n-1)/C)*(2*z0*J(m,n-1,b) + (1+z0^2)*J(m,n-2,b)) ;

end

% ---------

function w = gammalni(z)

% natural log of gamma function for input vector z with

% constant real part, for example z = 2 + i*(0:.01:6),

% by Stirling's approximation.

% see Whittaker and Watson, 4th edition, 12.33. note that

% error term for gamma(x+iy) is always less than that for gamma(x)

% David Goodmanson

%

% w = gammalni(z)

if any(diff(real(z))); error('Re(z) must be constant'); end

zreal = 6; nterm = 5;

% use reflection property to keep x >0

if real(z(1)) <(1/4),

reflect = 'true';

z = 1-z;

else

reflect = 'false';

end

m=0;

while real(z(1)) < zreal % make Re(z) >= zreal so that |z| is large enough

z = z+1;

m = m+1;

end

% c(i) = (-)^(i-1) B(i)/(2i)(2i-1), B are Bernoulli numbers

c(1) = 1/12; c(2) = -1/360; c(3) = 1/1260;

c(4) = -1/1680; c(5) = 1/1188; c(6) = -691/360360;

c(7) = 1/156; c(8) = -3617/122400; c(9) = 43867/244188;

c(10) = -174611/125400; c(11) = 77683/5796;

w = log(z).*(z-1/2) -z + log(sqrt(2*pi)); %log gamma for large z

zn = z;

for j=1:nterm;

zn = zn./(z.^2); % correction terms to log gamma are

w = w + c(j)*zn; % c(1) 1/z + c(2) 1/z^3 + ...

end

for n =1:m; % bring Re(z) back to its actual value.

z = z-1; % gamma(z-1) = gamma(z)/(z-1)

w = w -log(z);

end

% gamma(z)gamma(1-z) = pi/sin(pi z), but don't let argument

% of sine get too large and generate infinity

if strcmp(reflect, 'true')

absy = abs(imag(z));

w = log(pi) -w + -pi*absy ...

-log( ( exp(pi*(i*z-absy)) -exp(-pi*(i*z+absy)) )/(2*i) );

end

David Goodmanson on 17 Feb 2024

Open in MATLAB Online

Hi Paul,

I went looking for possible reasons for this error. The simplest reason is that the pdf is a function of (x-mu)/a, and e.g. for the unit normal, 'a' is the standard deviation, sdev If in fact a ~= sdev then the pdf will be wrong, And that appears to be what's happening.

After the parameters m and b are calculated, the second moment comes out with some value or other, not related to the input sdev. So you have to make an adjustment, which in the code is

a = sdev*sqrt(J0/J(m,2,b)).

It looks like someone forgot to do that. In the code if you replace that line with a = sdev, you get what looks very much like the incorrect result. But I can't expand out your plot to really make sure. If you have time, could you expand the plot to see if that is the case, and maybe run another quick example? If you indeed get the 'correct incorrect' result I will submit an updated bug report and maybe a Mathworks fix would happen faster if they know what that fix is.

Paul on 17 Feb 2024

Open in MATLAB Online

Hi David,

Looks like you got it. I added a flag to pears4pdf. If false, the code runs with the correct expression for a. If true, run with the incorrect expression for a.

Do you think this would also solve the CDF problem? I don't know how/if pearspdf is related to pearscdf.

mu = 5;

sdev = 1;

skew = 1;

kurt = 10;

figure(1); grid on

hold on

x = 0:.001:10;

fdgcorrect = pears4pdf(x,mu,sdev,skew,kurt,false);

fdgincorrect = pears4pdf(x,mu,sdev,skew,kurt,true);

plot(x,fdgcorrect)

plot(x,fdgincorrect,'o','MarkerIndices',1:100:numel(x))

ftmw = pearspdf(x,mu,sdev,skew,kurt);

plot(x,ftmw)

hold off

legend('DGcorrect','DGIncorrect','TMW')

% --------

function f = pears4pdf(x,mu,sdev,skew,kurt,flag)

% pearson type 4 pdf

m = (5*kurt-6*skew^2-9)/(2*kurt-3*skew^2-6);

b = 2*(m-1)/sqrt((4*(2*m-3)/((m-2)^2*skew^2)) -1)*(-sign(skew));

J0 = J(m,0,b);

if flag

a = sdev;

else

a = sdev*sqrt(J0/J(m,2,b));

end

z = (x-mu)/a;

z0 = -b/(2*(m-1));

arg = z+z0;

% f is normalized

f = (1/(J0*a))*(1+arg.^2).^(-m).*exp(-b*atan(arg));

end

% --------

function u = J(m,n,b)

% u = Int{-inf,inf} (z-z0)^n (1+z^2)^(-m) exp(-b*atan(z)) dz

% m > (n+1)/2

z0 = -b/(2*(m-1));

if m <= (n+1)/2

u = inf;

elseif n == -1

u = 0;

elseif n == 0

u = real((pi*gamma(2*m-1)) ...

/(2^(2*m-2)*exp(gammalni(m+b*i/2))*exp(gammalni(m-b*i/2))));

else

C = 2*(m-1)-(n-1);

u = ((n-1)/C)*(2*z0*J(m,n-1,b) + (1+z0^2)*J(m,n-2,b)) ;

end

% ---------

function w = gammalni(z)

% natural log of gamma function for input vector z with

% constant real part, for example z = 2 + i*(0:.01:6),

% by Stirling's approximation.

% see Whittaker and Watson, 4th edition, 12.33. note that

% error term for gamma(x+iy) is always less than that for gamma(x)

% David Goodmanson

%

% w = gammalni(z)

if any(diff(real(z))); error('Re(z) must be constant'); end

zreal = 6; nterm = 5;

% use reflection property to keep x >0

if real(z(1)) <(1/4),

reflect = 'true';

z = 1-z;

else

reflect = 'false';

end

m=0;

while real(z(1)) < zreal % make Re(z) >= zreal so that |z| is large enough

z = z+1;

m = m+1;

end

% c(i) = (-)^(i-1) B(i)/(2i)(2i-1), B are Bernoulli numbers

c(1) = 1/12; c(2) = -1/360; c(3) = 1/1260;

c(4) = -1/1680; c(5) = 1/1188; c(6) = -691/360360;

c(7) = 1/156; c(8) = -3617/122400; c(9) = 43867/244188;

c(10) = -174611/125400; c(11) = 77683/5796;

w = log(z).*(z-1/2) -z + log(sqrt(2*pi)); %log gamma for large z

zn = z;

for j=1:nterm;

zn = zn./(z.^2); % correction terms to log gamma are

w = w + c(j)*zn; % c(1) 1/z + c(2) 1/z^3 + ...

end

for n =1:m; % bring Re(z) back to its actual value.

z = z-1; % gamma(z-1) = gamma(z)/(z-1)

w = w -log(z);

end

% gamma(z)gamma(1-z) = pi/sin(pi z), but don't let argument

% of sine get too large and generate infinity

if strcmp(reflect, 'true')

absy = abs(imag(z));

w = log(pi) -w + -pi*absy ...

-log( ( exp(pi*(i*z-absy)) -exp(-pi*(i*z+absy)) )/(2*i) );

end

Marco on 19 Feb 2024

Hi David and Paul,

I'd like to thank both of you for taking the time to investigate the problem and come up with a solution.

I've been contacted by Mathworks and hopefully the bug will be fixed soon!

Drew on 23 Feb 2024

Edited: Drew on 7 Jun 2024

MathWorks has posted a bug report which includes a workaround for affected versions of 23b. See https://www.mathworks.com/support/bugreports/3219165

This has been fixed in R2024a and in R2023b Update 8. The bug report also includes a workaround for affected versions of 23b.

Sign in to comment.

Answer 2

Drew on 23 Feb 2024

0
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/2073236-pearsrnd-and-pearspdf-are-not-coherent-for-the-pearson-iv-distribution#answer_1415443

Edited: Drew on 2 Jun 2024

MathWorks has posted a bug report https://www.mathworks.com/support/bugreports/3219165

This has been fixed in R2024a and in R2023b Update 8. The bug report also includes a workaround for affected versions of 23b.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

pearsrnd and pearspdf are not coherent for the Pearson IV distribution?

1 Comment
Show -1 older commentsHide -1 older comments

Accepted Answer

10 Comments
Show 8 older commentsHide 8 older comments

More Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

pearsrnd and pearspdf are not coherent for the Pearson IV distribution?

1 Comment Show -1 older commentsHide -1 older comments

Accepted Answer

10 Comments Show 8 older commentsHide 8 older comments

More Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

1 Comment
Show -1 older commentsHide -1 older comments

10 Comments
Show 8 older commentsHide 8 older comments

0 Comments
Show -2 older commentsHide -2 older comments