mahal

Mahalanobis distance to reference samples

collapse all in page

Syntax

d2 = mahal(Y,X)

Description

example

d2 = mahal(Y,X) returns the squared Mahalanobis distance of each observation in Y to the reference samples in X.

Examples

collapse all

Compare Mahalanobis and Squared Euclidean Distances

Open Live Script

Generate a correlated bivariate sample data set.

rng('default') % For reproducibility
X = mvnrnd([0;0],[1 .9;.9 1],1000);

Specify four observations that are equidistant from the mean of X in Euclidean distance.

Y = [1 1;1 -1;-1 1;-1 -1];

Compute the Mahalanobis distance of each observation in Y to the reference samples in X.

d2_mahal = mahal(Y,X)

d2_mahal = 4×1

    1.1095
   20.3632
   19.5939
    1.0137

Compute the squared Euclidean distance of each observation in Y from the mean of X .

d2_Euclidean = sum((Y-mean(X)).^2,2)

d2_Euclidean = 4×1

    2.0931
    2.0399
    1.9625
    1.9094

Plot X and Y by using scatter and use marker color to visualize the Mahalanobis distance of Y to the reference samples in X.

scatter(X(:,1),X(:,2),10,'.') % Scatter plot with points of size 10
hold on
scatter(Y(:,1),Y(:,2),100,d2_mahal,'o','filled')
hb = colorbar;
ylabel(hb,'Mahalanobis Distance')
legend('X','Y','Location','best')

All observations in Y ([1,1], [-1,-1,], [1,-1], and [-1,1]) are equidistant from the mean of X in Euclidean distance. However, [1,1] and [-1,-1] are much closer to X than [1,-1] and [-1,1] in Mahalanobis distance. Because Mahalanobis distance considers the covariance of the data and the scales of the different variables, it is useful for detecting outliers.

Input Arguments

collapse all

`Y` — Data
n-by-m numeric matrix

Data, specified as an n-by-m numeric matrix, where n is the number of observations and m is the number of variables in each observation.

X and Y must have the same number of columns, but can have different numbers of rows.

Data Types: single | double

`X` — Reference samples
p-by-m numeric matrix

Reference samples, specified as a p-by-m numeric matrix, where p is the number of samples and m is the number of variables in each sample.

X and Y must have the same number of columns, but can have different numbers of rows. X must have more rows than columns.

Data Types: single | double

Output Arguments

collapse all

`d2` — Squared Mahalanobis distance
n-by-1 numeric vector

Squared Mahalanobis distance of each observation in Y to the reference samples in X, returned as an n-by-1 numeric vector, where n is the number of observations in X.

More About

collapse all

Mahalanobis Distance

The Mahalanobis distance is a measure between a sample point and a distribution.

The Mahalanobis distance from a vector y to a distribution with mean μ and covariance Σ is

$d = \sqrt{(y - μ) \sum^{- 1} (y - μ)'} .$

This distance represents how far y is from the mean in number of standard deviations.

mahal returns the squared Mahalanobis distance d² from an observation in Y to the reference samples in X. In the mahal function, μ and Σ are the sample mean and covariance of the reference samples, respectively.

Version History

Introduced before R2006a

mahal

Syntax

Description

Examples

Compare Mahalanobis and Squared Euclidean Distances

Input Arguments

`Y` — Data
n-by-m numeric matrix

`X` — Reference samples
p-by-m numeric matrix

Output Arguments

`d2` — Squared Mahalanobis distance
n-by-1 numeric vector

More About

Mahalanobis Distance

Version History

See Also

Topics

mahal

Syntax

Description

Examples

Compare Mahalanobis and Squared Euclidean Distances

Input Arguments

Y — Data n-by-m numeric matrix

X — Reference samples p-by-m numeric matrix

Output Arguments

d2 — Squared Mahalanobis distance n-by-1 numeric vector

More About

Mahalanobis Distance

Version History

See Also

Topics

`Y` — Data
n-by-m numeric matrix

`X` — Reference samples
p-by-m numeric matrix

`d2` — Squared Mahalanobis distance
n-by-1 numeric vector