probplot

Probability plots

collapse all in page

Syntax

probplot(y)

probplot(y,cens)

probplot(y,cens,freq)

probplot(dist,___)

probplot(ax,___)

probplot(ax,pd)

probplot(ax,fun,params)

probplot(___,'noref')

h = probplot(___)

Description

probplot(y) creates a normal probability plot comparing the distribution of the data in y to the normal distribution.

probplot plots each data point in y using marker symbols and draws a reference line that represents the theoretical distribution. If the sample data has a normal distribution, then the data points appear along the reference line. The reference line connects the first and third quartiles of the data and extends to the ends of the data. A distribution other than normal introduces curvature in the data plot.

example

probplot(y,cens) creates a probability plot using the censoring data in cens.

probplot(y,cens,freq) creates a probability plot using the censoring data in cens and the frequency data in freq.

example

probplot(dist,___) creates a probability plot for the distribution specified by dist, using any of the input arguments in the previous syntaxes.

example

probplot(ax,___) adds a probability plot into the existing probability plot axes specified by ax, using any of the input arguments in the previous syntaxes.

probplot(ax,pd) adds a fitted line on the existing probability plot axes specified by ax to represent the probability distribution pd.

probplot(ax,fun,params) adds a fitted line on the existing probability plot axes specified by ax to represent the function fun with the parameters params.

example

probplot(___,'noref') omits the reference line from the plot.

h = probplot(___) returns graphics handles corresponding to the plotted lines.

example

Examples

collapse all

Create Weibull Probability Plot

Open Live Script

Generate sample data and create a probability plot.

Generate sample data. The sample x1 contains 500 random numbers from a Weibull distribution with scale parameter A = 3 and shape parameter B = 3. The sample x2 contains 500 random numbers from a Rayleigh distribution with scale parameter B = 3.

rng('default');  % For reproducibility
x1 = wblrnd(3,3,[500,1]);
x2 = raylrnd(3,[500,1]);

Create a probability plot to assess whether the data in x1 and x2 comes from a Weibull distribution.

figure
probplot('weibull',[x1 x2])
legend('Weibull Sample','Rayleigh Sample','Location','best')

Figure contains an axes object. The axes object with title Probability Plot for Weibull Distribution, xlabel Data, ylabel Probability contains 4 objects of type functionline, line. One or more of the lines displays its values using only markers These objects represent Weibull Sample, Rayleigh Sample.

The probability plot shows that the data in x1 comes from a Weibull distribution, while the data in x2 does not.

Alternatively, you can use wblplot to create a Weibull probability plot.

Add Fitted Line to Probability Plot

Open Live Script

Create a probability plot and an additional fitted line on the same figure.

Generate sample data containing about 20% outliers in the tails. The left tail of the sample data contains 10 values randomly generated from an exponential distribution with parameter mu = 1. The right tail contains 10 values randomly generated from an exponential distribution with parameter mu = 5. The center of the sample data contains 80 values randomly generated from a standard normal distribution.

rng('default')  % For reproducibility
left_tail = -exprnd(1,10,1);
right_tail = exprnd(5,10,1);
center = randn(80,1);
data = [left_tail;center;right_tail];

Create a probability plot to assess whether the sample data comes from a normal distribution.

probplot(data)

Plot a t location-scale curve on the same figure to compare with data.

p = mle(data,'distribution','tLocationScale');
t = @(data,mu,sig,df)cdf('tLocationScale',data,mu,sig,df);
h = probplot(gca,t,p);
h.Color = 'r';
h.LineStyle = '-';
title('{\bf Probability Plot}')
legend('Normal','Data','t','Location','NW')

Figure contains an axes object. The axes object with title Probability Plot, xlabel Data, ylabel Probability contains 3 objects of type functionline, line. One or more of the lines displays its values using only markers These objects represent Normal, Data, t.

The plot shows that neither the normal line nor the t location-scale curve fits the tails very well because of the outliers.

Identify Significant Effects with Half-Normal Probability Plot

Open Live Script

Create a half-normal probability distribution plot to identify significant effects in an experiment to study factors that might influence flow rate in a chemical manufacturing process. The four factors are reactants A, B, C, and D. Each factor is present at two levels (high and low concentration). The experiment contains only one replication at each factor level.

Load the sample data.

load flowrate

The first four columns of the table flowrate contain the design matrix for the factors and their interactions. The design matrix is coded to use 1 for the high factor level and -1 for the low factor level. The fifth column of flowrate contains the measured flow rate.

Fit a linear regression model using rate as the response variable. Use predictor variables A, B, C, D, and all of their interaction terms.

mdl = fitlm(flowrate,'rate ~ A*B*C*D');

Calculate and store the absolute value of the factor effect estimates. To obtain the factor effect estimates, multiply the coefficient estimates obtained during the model fitting by two. This step is necessary because the regression coefficients measure the effect of a one-unit change in x on the mean of y. However, the effects estimates measure a two-unit change in x due to the design matrix coding of -1 and 1. Exclude the baseline measurement. Note that the factor order in mdl may be different from the order in the original design matrix.

effects = abs(mdl.Coefficients{2:end,1}*2);

Create a half-normal probability plot using the absolute value of the effects estimates, excluding the baseline.

figure
h = probplot('halfnormal',effects);

Figure contains an axes object. The axes object with title Probability Plot for Half Normal Distribution, xlabel Data, ylabel Probability contains 2 objects of type functionline, line. One or more of the lines displays its values using only markers

Label the points and format the plot. First, return the index values for the sorted effects estimates (from lowest to highest). Then use these index values to sort the probability values stored in the graphics handle (h(1).YData).

[b,i] = sort(effects);
prob(i) = h(1).YData;

Add text labels to the plot at each point. For each point, the x-value is the effects estimate and the y-value is the corresponding probability.

text(effects,prob,mdl.CoefficientNames(2:end),'FontSize',8,...
    'VerticalAlignment','top')
h(1).Color = 'r';

Figure contains an axes object. The axes object with title Probability Plot for Half Normal Distribution, xlabel Data, ylabel Probability contains 17 objects of type functionline, line, text. One or more of the lines displays its values using only markers

The points located far from the reference line represent the significant effects.

Create a Normal Probability Plot Using Frequency Data

Open Live Script

Generate simulated frequency data.

y = 1:10;
freq = [2 4 6 7 9 8 7 7 6 5];

Create a normal probability plot using the frequency data.

probplot(y,[],freq)

Figure contains an axes object. The axes object with title Probability Plot for Normal Distribution, xlabel Data, ylabel Probability contains 2 objects of type functionline, line. One or more of the lines displays its values using only markers

The normal probability plot shows that the data do not have a normal distribution.

Input Arguments

collapse all

`y` — Sample data
numeric vector | numeric matrix

Sample data, specified as a numeric vector or numeric matrix. probplot displays each value in y using marker symbols including 'x' and 'o'. If y is a matrix, then probplot displays a separate line for each column of y.

Not all distributions are appropriate for all data sets. probplot errors if the data set is inappropriate for a specified distribution. See dist for appropriate data ranges for each distribution.

`dist` — Distribution for probability plot
probability distribution object | `'normal'` | `'exponential'` | `'extreme value'` | `'half normal'` | `'lognormal'` | ...

Distribution for probability plot, specified as a probability distribution object or one of the following distribution names:

Name	Plot Type	Data Range
`'normal'`	Normal probability plot	All values
`'exponential'`	Exponential probability plot	Nonnegative values
`'extreme value'`	Extreme value probability plot	All values
`'half normal'`	Half-normal probability plot	All values
`'lognormal'`	Lognormal probability plot	Positive values
`'logistic'`	Logistic probability plot	All values
`'loglogistic'`	Loglogistic probability plot	Positive values
`'rayleigh'`	Rayleigh probability plot	Positive values
`'weibull'`	Weibull probability plot	Positive values

The default is 'normal' if you create a probability plot in a new figure. If you add a probability plot to a figure that already includes one by using the ax input argument, then the default is the plot type of the existing probability plot.

You can create a probability distribution object with specified parameter values using makedist. Alternatively, fit a probability distribution object to sample data using fitdist. For more information on probability distribution objects, see Working with Probability Distributions.

The y-axis scale is based on the selected distribution. The x-axis has a log scale for the Weibull, loglogistic, and lognormal distributions, and a linear scale for the others.

Not all distributions are appropriate for all data sets. probplot errors if the data set is inappropriate for a specified distribution.

Example: 'weibull'

`cens` — Censoring data
numeric vector

Censoring data, specified as a numeric vector. cens must be the same length as y, and contain a 1 value for observations that are right-censored and a 0 value for observations that are measured exactly.

Data Types: single | double

`freq` — Frequency data
vector of integer values

Frequency data, specified as a vector of integer values. freq must be the same length as y. freq contains the integer frequencies for the corresponding elements in y.

To create a probability plot using frequency data but not censoring data, specify empty brackets ([]) for cens.

Data Types: single | double

`ax` — Target axes
`Axes` object | `UIAxes` object

Target axes, specified as an Axes object or a UIAxes object. probplot adds an additional plot into the axes specified by ax. For details, see Axes Properties and UIAxes Properties.

Use gca to return the current axes for the current figure.

`pd` — Probability distribution for reference line
probability distribution object

Probability distribution for reference line, specified as a probability distribution object. probplot adds a fitted line to the axes specified by ax to represent the probability distribution specified by pd.

Create a probability distribution object with specified parameter values using makedist. Alternatively, fit a probability distribution object to sample data using fitdist. For more information on probability distribution objects, see Working with Probability Distributions.

`fun` — Function for reference line
function handle

Function for reference line, specified as a function handle. probplot adds a fitted line to the axes specified by ax to represent the function specified by fun, evaluated at the parameters specified by params.

fun is a function handle to a cdf function, specified using the function handle operator @. The function must accept a vector of input values as its first argument, and return a vector containing the cdf evaluated at each input value. Specify the parameter values required to evaluate fun using the params argument. For more information on function handles, see Create Function Handle.

Example: @wblpdf

Data Types: function_handle

`params` — Reference line function parameters
vector of numeric values | cell array

Reference line function parameters, specified as a vector of numeric values or a cell array. probplot adds a fitted line to the axes specified by ax to represent the function specified by fun, evaluated at the parameters specified by params.

fun is a function handle to a cdf function, specified using the function handle operator @. The function must accept a vector of values as its first argument, and return a vector of cdf values evaluated at each value. Specify the parameter values required to evaluate fun using the params argument. For more information on function handles, see Create Function Handle.

Output Arguments

collapse all

`h` — Graphic handles for line objects
vector of `Line` graphic handles

Graphic handles for line objects, returned as a vector of Line graphic handles. Graphic handles are unique identifiers that you can use to query and modify the properties of a specific line on the plot. For each column of y, probplot returns two handles:

The line representing the data points. probplot represents each data point in y using marker symbols such as '+' and 'o'.
The line showing the theoretical distribution for the probability plot, represented as a dashed line.

To view and set properties of line objects, use dot notation. For information on using dot notation, see Access Property Values. For information on the Line properties that you can set, see Line Properties.

Algorithms

probplot matches the quantiles of sample data to the quantiles of a given probability distribution. The sample data is sorted, scaled according to the choice of dist, and plotted on the x-axis. When dist is 'lognormal', 'loglogistic', or 'weibull', the scaling is logarithmic. Otherwise, the scaling is linear. The y-axis represents the quantiles of the distribution specified in dist, converted into probability values. The scaling depends on the given distribution and is not linear.

Where the x-axis value is the ith sorted value from a sample of size N, the y-axis value is the midpoint between evaluation points of the empirical cumulative distribution function of the data. In the case of uncensored data, the midpoint is equal to $\frac{(i - 0.5)}{N}$ .

probplot superimposes a reference line to assess the linearity of the plot. If the data is uncensored, then the line goes through the first and third quartiles of the data. If the data is censored, then the line shifts accordingly. If the data is uncensored and dist is 'half normal', then probplot uses the zeroth and second quartiles instead.

Version History

Introduced before R2006a

probplot

Syntax

Description

Examples

Create Weibull Probability Plot

Add Fitted Line to Probability Plot

Identify Significant Effects with Half-Normal Probability Plot

Create a Normal Probability Plot Using Frequency Data

Input Arguments

`y` — Sample data
numeric vector | numeric matrix

`dist` — Distribution for probability plot
probability distribution object | `'normal'` | `'exponential'` | `'extreme value'` | `'half normal'` | `'lognormal'` | ...

`cens` — Censoring data
numeric vector

`freq` — Frequency data
vector of integer values

`ax` — Target axes
`Axes` object | `UIAxes` object

`pd` — Probability distribution for reference line
probability distribution object

`fun` — Function for reference line
function handle

`params` — Reference line function parameters
vector of numeric values | cell array

Output Arguments

`h` — Graphic handles for line objects
vector of `Line` graphic handles

Algorithms

Version History

See Also

Topics

probplot

Syntax

Description

Examples

Create Weibull Probability Plot

Add Fitted Line to Probability Plot

Identify Significant Effects with Half-Normal Probability Plot

Create a Normal Probability Plot Using Frequency Data

Input Arguments

y — Sample data numeric vector | numeric matrix

dist — Distribution for probability plot probability distribution object | 'normal' | 'exponential' | 'extreme value' | 'half normal' | 'lognormal' | ...

cens — Censoring data numeric vector

freq — Frequency data vector of integer values

ax — Target axes Axes object | UIAxes object

pd — Probability distribution for reference line probability distribution object

fun — Function for reference line function handle

params — Reference line function parameters vector of numeric values | cell array

Output Arguments

h — Graphic handles for line objects vector of Line graphic handles

Algorithms

Version History

See Also

Topics

`y` — Sample data
numeric vector | numeric matrix

`dist` — Distribution for probability plot
probability distribution object | `'normal'` | `'exponential'` | `'extreme value'` | `'half normal'` | `'lognormal'` | ...

`cens` — Censoring data
numeric vector

`freq` — Frequency data
vector of integer values

`ax` — Target axes
`Axes` object | `UIAxes` object

`pd` — Probability distribution for reference line
probability distribution object

`fun` — Function for reference line
function handle

`params` — Reference line function parameters
vector of numeric values | cell array

`h` — Graphic handles for line objects
vector of `Line` graphic handles