Documentation

boxplot

Syntax

boxplot(X)
boxplot(X,G)
boxplot(axes,X,...)
boxplot(...,'Name',value)

Description

boxplot(X) produces a box plot of the data in X. If X is a matrix, there is one box per column; if X is a vector, there is just one box. On each box, the central mark is the median, the edges of the box are the 25th and 75th percentiles, the whiskers extend to the most extreme data points not considered outliers, and outliers are plotted individually. For controlling how much the whiskers extend, see the 'whiskers' name-value pair argument.

boxplot(X,G) specifies one or more grouping variables G, producing a separate box for each set of X values sharing the same G value or values. Grouping variables must have one row per element of X, or one row per column of X. Specify a single grouping variable in G using a vector, a character array, a cell array of strings, or a vector categorical array; specify multiple grouping variables in G using a cell array of these variable types, such as {G1 G2 G3}, or by using a matrix. If multiple grouping variables are used, they must all be the same length. Groups that contain a NaN value or an empty string in a grouping variable are omitted, and are not counted in the number of groups considered by other parameters.

By default, character and string grouping variables are sorted in the order they initially appear in the data, categorical grouping variables are sorted by the order of their levels, and numeric grouping variables are sorted in numeric order. To control the order of groups, do one of the following:

  • Use categorical variables in G and specify the order of their levels.

  • Use the 'grouporder' parameter described below.

  • Pre-sort your data.

boxplot(axes,X,...) creates the plot in the axes with handle axes.

boxplot(...,'Name',value) specifies one or more optional parameter name/value pairs, as described in the following table. Specify Name in single quotes.

NameValue
'plotstyle'
  • 'traditional' — Traditional box style. This is the default.

  • 'compact' — Box style designed for plots with many groups. This style changes the defaults for some other parameters, as described in the following table.

'boxstyle'
  • 'outline' — Draws an unfilled box with dashed whiskers. This is the default.

  • 'filled' — Draws a narrow filled box with lines for whiskers.

'colorgroup'

One or more grouping variables, of the same type as permitted for G, specifying that the box color should change when the specified variables change. The default is [] for no box color change.

'colors'

Colors for boxes, specified as a single color (such as 'r' or [1 0 0]) or multiple colors (such as 'rgbm' or a three-column matrix of RGB values). The sequence is replicated or truncated as required, so for example 'rb' gives boxes that alternate in color. The default when no 'colorgroup' is specified is to use the same color scheme for all boxes. The default when 'colorgroup' is specified is a modified hsv colormap.

'datalim'

A two-element vector containing lower and upper limits, used by 'extrememode' to determine which points are extreme. The default is [-Inf Inf].

'extrememode'
  • 'clip' — Moves data outside the datalim limits to the limit. This is the default.

  • 'compress' — Evenly distributes data outside the datalim limits in a region just outside the limit, retaining the relative order of the points.

A dotted line marks the limit if any points are outside it, and two gray lines mark the compression region if any points are compressed. Values at +/–Inf can be clipped or compressed, but NaN values still do not appear on the plot. Box notches are drawn to scale and may extend beyond the bounds if the median is inside the limit; they are not drawn if the median is outside the limits.

'factordirection'
  • 'data' — Arranges factors with the first value next to the origin. This is the default.

  • 'list' — Arranges factors left-to-right if on the x axis or top-to-bottom if on the y axis.

  • 'auto' — Uses 'data' for numeric grouping variables and 'list' for strings.

'fullfactors'
  • 'off' — One group for each unique row of G. This is the default.

  • 'on' — Create a group for each possible combination of group variable values, including combinations that do not appear in the data.

'factorseparator'

Specifies which factors should have their values separated by a grid line. The value may be 'auto' or a vector of grouping variable numbers. For example, [1 2] adds a separator line when the first or second grouping variable changes value. 'auto' is [] for one grouping variable and [1] for two or more grouping variables. The default is [].

'factorgap'

Specifies an extra gap to leave between boxes when the corresponding grouping factor changes value, expressed as a percentage of the width of the plot. For example, with [3 1], the gap is 3% of the width of the plot between groups with different values of the first grouping variable, and 1% between groups with the same value of the first grouping variable but different values for the second. 'auto' specifies that boxplot should choose a gap automatically. The default is [].

'grouporder'

Order of groups for plotting, specified as a cell array of strings. With multiple grouping variables, separate values within each string with a comma. Using categorical arrays as grouping variables is an easier way to control the order of the boxes. The default is [], which does not reorder the boxes.

'jitter'

Maximum distance d to displace outliers along the factor axis by a uniform random amount, in order to make duplicate points visible. A d of 1 makes the jitter regions just touch between the closest adjacent groups. The default is 0.

'labels'

A character array, cell array of strings, or numeric vector of box labels. There may be one label per group or one label per X value. Multiple label variables may be specified via a numeric matrix or a cell array containing any of these types.

    Tip   To remove labels from a plot, use the following command:

    set(gca,'XTickLabel',{' '})
'labelorientation'
  • 'inline' — Rotates the labels to be vertical. This is the default when plotstyle is 'compact'.

  • 'horizontal' — Leaves the labels horizontal. This is the default when plotstyle has the default value of 'traditional'.

When the labels are on the y axis, both settings leave the labels horizontal.

'labelverbosity'
  • 'all' — Displays every label. This is the default.

  • 'minor' — Displays a label for a factor only when that factor has a different value from the previous group.

  • 'majorminor' — Displays a label for a factor when that factor or any factor major to it has a different value from the previous group.

'medianstyle'
  • 'line' — Draws a line for the median. This is the default.

  • 'target' — Draws a black dot inside a white circle for the median.

'notch'
  • 'on' — Draws comparison intervals using notches when plotstyle is 'traditional', or triangular markers when plotstyle is 'compact'.

  • 'marker' — Draws comparison intervals using triangular markers.

  • 'off' — Omits notches. This is the default.

Two medians are significantly different at the 5% significance level if their intervals do not overlap. Interval endpoints are the extremes of the notches or the centers of the triangular markers. The extremes correspond to q2 – 1.57(q3q1)/sqrt(n) and q2 + 1.57(q3q1)/sqrt(n), where q2 is the median (50th percentile), q1 and q3 are the 25th and 75th percentiles, respectively, and n is the number of observations without any NaN values. When the sample size is small, notches may extend beyond the end of the box.

'orientation'
  • 'vertical' — Plots X on the y axis. This is the default.

  • 'horizontal' — Plots X on the x axis.

'outliersize'

Size of the marker used for outliers, in points. The default is 6 (6/72 inch).

'positions'

Box positions specified as a numeric vector with one entry per group or X value. The default is 1:numGroups, where numGroups is the number of groups.

'symbol'

Symbol and color to use for outliers, using the same values as the LineSpec parameter in plot. The default is 'r+'. If the symbol is omitted then the outliers are invisible; if the color is omitted then the outliers have the same color as their corresponding box.

'whisker'

Maximum whisker length w. The default is a w of 1.5. Points are drawn as outliers if they are larger than q3 + w(q3q1) or smaller than q1w(q3q1), where q1 and q3 are the 25th and 75th percentiles, respectively. The default of 1.5 corresponds to approximately +/–2.7σ and 99.3 coverage if the data are normally distributed. The plotted whisker extends to the adjacent value, which is the most extreme data value that is not an outlier. Set whisker to 0 to give no whiskers and to make every point outside of q1 and q3 an outlier.

'widths'

A scalar or vector of box widths for when boxstyle is 'outline'. The default is half of the minimum separation between boxes, which is 0.5 when the positions argument takes its default value. The list of values is replicated or truncated as necessary.

When the plotstyle parameter takes the value 'compact', the following default values for other parameters apply.

ParameterDefault when plotstyle is 'compact'
'boxstyle''filled'
'factorseparator''auto'
'factorgap''auto'
'jitter'0.5
'labelorientation''inline'
'labelverbosity''majorminor'
'medianstyle''target'
'outliersize'4
'symbol''o'

You can see data values and group names using the data cursor in the figure window. The cursor shows the original values of any points affected by the datalim parameter. You can label the group to which an outlier belongs using the gname function.

To modify graphics properties of a box plot component, use findobj with the Tag property to find the component's handle. Tag values for box plot components depend on parameter settings, and are listed in the table below.

Parameter SettingsTag Values

All settings

  • 'Box'

  • 'Outliers'

When 'plotstyle' is 'traditional'

  • 'Median'

  • 'Upper Whisker'

  • 'Lower Whisker'

  • 'Upper Adjacent Value'

  • 'Lower Adjacent Value'

When 'plotstyle' is 'compact'

  • 'Whisker'

  • 'MedianOuter'

  • 'MedianInner'

When 'notch' is 'marker'

  • 'NotchLo'

  • 'NotchHi'

Examples

expand all

Create Box Plots for Grouped Data

Load the sample data.

load carsmall

Create a box plot of the miles per gallon (MPG) measurements from the sample data, grouped by the vehicles' country of origin, Origin. Add a title and label the axes.

boxplot(MPG,Origin)
title('Miles per Gallon by Vehicle Origin')
xlabel('Country of Origin')
ylabel('Miles per Gallon (MPG)')

Create Notched Box Plots

Generate two sets of sample data. The first sample, x1, contains random numbers generated from a normal distribution with mu = 5 and sigma = 1. The second sample, x2, contains random numbers generated from a normal distribution with mu = 6 and sigma = 1.

rng default;  % For reproducibility
x1 = normrnd(5,1,100,1);
x2 = normrnd(6,1,100,1);

Create notched box plots of x1 and x2. Label each box with its corresponding mu value.

figure;
boxplot([x1,x2],'notch','on','labels',{'mu = 5','mu = 6'})

The difference between the medians of the two groups is approximately 1. Since the notches in the box plot do not overlap, you can conclude, with 95% confidence, that the true medians do differ.

The following figure shows the box plot for the same data with the length of the whiskers specified as 1.0 times the interquartile range. Points beyond the whiskers are displayed using +.

figure;
boxplot([x1,x2],'notch','on','labels',{'mu = 5','mu = 6'},'whisker',1)

Create Compact Box Plots

Create a 100-by-25 matrix of random numbers generated from a standard normal distribution to use as sample data.

rng('default');  % For reproducibility
X = randn(100,25);

Create two box plots for the data in X on the same figure. The top plot uses the default box plot formatting. The bottom plot uses compact formatting.

figure;

subplot(2,1,1)
boxplot(X)

subplot(2,1,2)
boxplot(X,'plotstyle','compact')

References

[1] McGill, R., J. W. Tukey, and W. A. Larsen. "Variations of Boxplots." The American Statistician. Vol. 32, No. 1, 1978, pp. 12–16.

[2] Velleman, P.F., and D.C. Hoaglin. Applications, Basics, and Computing of Exploratory Data Analysis. Pacific Grove, CA: Duxbury Press, 1981.

[3] Nelson, L. S. "Evaluating Overlapping Confidence Intervals." Journal of Quality Technology. Vol. 21, 1989, pp. 140–141.

[4] Langford, E. "Quartiles in Elementary Statistics", Journal of Statistics Education. Vol. 14, No. 3, 2006.

Was this topic helpful?