Many data sets have one or more missing values. It is convenient
to code missing values as NaN
(Not
a Number) to preserve the structure of data sets across multiple variables
and observations.
Normal MATLAB® arithmetic operations yield NaN
values
when operands are NaN
. Removing the NaN
values
would destroy the matrix structure. Removing the rows containing the NaN
values
would discard data. Statistics and Machine
Learning Toolbox™ functions in the following
table remove NaN
values only for the purposes of
computation.
Function | Description |
---|---|
nancov | Covariance matrix, ignoring |
nanmax | Maximum, ignoring |
nanmean | Mean, ignoring |
nanmedian | Median, ignoring |
nanmin | Minimum, ignoring |
nanstd | Standard deviation, ignoring |
nansum | Sum, ignoring |
nanvar | Variance, ignoring |
Other Statistics and Machine
Learning Toolbox functions also ignore NaN
values.
These include iqr
, kurtosis
, mad
, prctile
, range
, skewness
, and trimmean
.
Create a 3-by-3 matrix of sample data. Remove two data values by replacing them with NaN
.
X = magic(3); X([1 5]) = [NaN NaN]
X = 3×3
NaN 1 6
3 NaN 7
4 9 2
Compute the sum of for each column of the sample data matrix using the sum
function.
s1 = sum(X)
s1 = 1×3
NaN NaN 15
If a column contains a NaN
value, then the sum
function will return NaN
as the sum of the data in that column.
For comparison, compute the sum for each column of the sample data matrix using the nansum
function.
s2 = nansum(X)
s2 = 1×3
7 10 15
If a column contains a NaN
value, then the nansum
function ignores the NaN
value and returns the sum of the remaining values in the column.