Main Content

standardizeMissing

Insert standard missing values

Description

example

B = standardizeMissing(A,indicator) replaces values specified in indicator with standard missing values in A and returns a standardized array or table.

Missing values are defined according to the data type of A:

  • NaNdouble, single, duration, and calendarDuration

  • NaTdatetime

  • <missing>string

  • <undefined>categorical

  • ' 'char

  • {''}cell of character vectors

If A is a table, then the data type of each column defines the missing value for that column.

example

B = standardizeMissing(___,Name,Value) specifies additional parameters for standardizing missing values using one or more name-value arguments. For example, standardizeMissing(A,indicator,'DataVariables',datavars) standardizes missing values in the variables specified by datavars when A is a table or timetable.

Examples

collapse all

Create a row vector and replace all instances of -99 with the standard missing value for double data types, NaN.

A = [0 1 5 -99 8 3 4 -99 16];
B = standardizeMissing(A,-99)
B = 1×9

     0     1     5   NaN     8     3     4   NaN    16

Create a table containing Inf and 'N/A' to represent missing values.

dblVar = [NaN;3;Inf;7;9];
cellstrVar = {'one';'three';'';'N/A';'nine'};
charVar = ['A';'C';'E';' ';'I'];
categoryVar = categorical({'red';'yellow';'blue';'violet';''});

A = table(dblVar,cellstrVar,charVar,categoryVar)
A=5×4 table
    dblVar    cellstrVar    charVar    categoryVar
    ______    __________    _______    ___________

     NaN      {'one'   }       A       red        
       3      {'three' }       C       yellow     
     Inf      {0x0 char}       E       blue       
       7      {'N/A'   }               violet     
       9      {'nine'  }       I       <undefined>

Replace all instances of Inf with NaN and replace all instances of 'N/A' with the empty character vector, ''.

B = standardizeMissing(A,{Inf,'N/A'})
B=5×4 table
    dblVar    cellstrVar    charVar    categoryVar
    ______    __________    _______    ___________

     NaN      {'one'   }       A       red        
       3      {'three' }       C       yellow     
     NaN      {0x0 char}       E       blue       
       7      {0x0 char}               violet     
       9      {'nine'  }       I       <undefined>

Replace instances of Inf and 'N/A' occurring in specified variables of a table with the standard missing value indicators.

Create a table containing Inf and 'N/A' to represent missing values.

a = {'alpha';'bravo';'charlie';'';'N/A'};
x = [1;NaN;3;Inf;5];
y = [57;732;93;1398;Inf];

A = table(a,x,y)
A=5×3 table
         a          x      y  
    ___________    ___    ____

    {'alpha'  }      1      57
    {'bravo'  }    NaN     732
    {'charlie'}      3      93
    {0x0 char }    Inf    1398
    {'N/A'    }      5     Inf

For the variables a and x, replace instances of Inf with NaN and 'N/A' with the empty character vector, ''.

B = standardizeMissing(A,{Inf,'N/A'},'DataVariables',{'a','x'})
B=5×3 table
         a          x      y  
    ___________    ___    ____

    {'alpha'  }      1      57
    {'bravo'  }    NaN     732
    {'charlie'}      3      93
    {0x0 char }    NaN    1398
    {0x0 char }      5     Inf

Inf in the variable y remains unchanged because y is not included in the DataVariables name-value argument.

Input Arguments

collapse all

Input data, specified as a vector, matrix, multidimensional array, table, or timetable. If A is a timetable, then standardizeMissing operates on the table data only and ignores NaT and NaN values in the vector of row times.

Data Types: double | single | char | string | cell | table | timetable | categorical | datetime | duration

Nonstandard missing value indicator, specified as a scalar, vector, or cell array. The elements of indicator define the values that standardizeMissing treats as missing. If A is an array, then indicator must be a vector. If A is a table or timetable, then indicator can also be a cell array with entries of multiple data types.

The data types specified in indicator match data types in the corresponding entries of A. The following are additional data type matches between the elements of indicator and elements of A:

  • double indicators match double, single, integer, and logical entries of A.

  • string and char indicators match categorical entries of A.

Example: B = standardizeMissing(A,'N/A') replaces the character vector 'N/A' with the empty character vector, ''.

Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | logical | char | string | cell | datetime | duration

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: standardizeMissing(T,indicator,'ReplaceValues',false)

Table variables to operate on, specified as one of the options in this table. The DataVariables value indicates which variables of the input table to fill.

Other variables in the table not specified by DataVariables pass through to the output without being standardized.

OptionDescriptionExamples
Variable name

A character vector or scalar string specifying a single table variable name

'Var1'

"Var1"

Vector of variable names

A cell array of character vectors or string array where each element is a table variable name

{'Var1' 'Var2'}

["Var1" "Var2"]

Scalar or vector of variable indices

A scalar or vector of table variable indices

1

[1 3 5]

Logical vector

A logical vector whose elements each correspond to a table variable, where true includes the corresponding variable and false excludes it

[true false true]

Function handle

A function handle that takes a table variable as input and returns a logical scalar

@isnumeric

vartype subscript

A table subscript generated by the vartype function

vartype('numeric')

Example: standardizeMissing(T,indicator,'DataVariables',["Var1" "Var2" "Var4"])

Replace values indicator, specified as one of these values when A is a table or timetable:

  • true or 1 — Replace input table variables with table variables containing standardized data.

  • false or 0 — Append input table variables with table variables containing standardized data.

For vector, matrix, or multidimensional array input data, ReplaceValues is not supported.

B is the same size as A unless the value of ReplaceValues is false. If the value of ReplaceValues is false, then the width of B is the sum of the input data width and the number of data variables specified.

Example: standardizeMissing(T,indicator,'ReplaceValues',false)

Algorithms

standardizeMissing treats leading and trailing white space differently for cell arrays of character vectors, character arrays, and categorical arrays.

  • For cell arrays of character vectors, standardizeMissing does not ignore white space. All character vectors must match exactly a character vector specified in indicator.

  • For character arrays, standardizeMissing ignores trailing white space.

  • For categorical arrays, standardizeMissing ignores leading and trailing white space.

Extended Capabilities

Version History

Introduced in R2013b

expand all