Main Content

standardizeMissing

Insert standard missing values

Description

B = standardizeMissing(A,indicator) replaces values in A that match a value in the nonstandard missing value indicator with standard missing values.

Standard missing values are defined according to the data type of A, or if A is a table, the data type of each variable:

  • NaNdouble, single, duration, and calendarDuration

  • NaTdatetime

  • <missing>string

  • <undefined>categorical

  • {''}cell of character vectors

In addition to standardizing missing values, you can interactively find, fill, or remove missing data by adding the Clean Missing Data task to a live script.

example

B = standardizeMissing(___,Name=Value) specifies additional parameters for standardizing missing values using one or more name-value arguments. For example, standardizeMissing(A,indicator,DataVariables=datavars) standardizes missing values in the specified variables when A is a table or timetable.

example

Examples

collapse all

Create a row vector and replace all instances of -99 with the standard missing value for double data types, NaN.

A = [0 1 5 -99 8 3 4 -99 16];
B = standardizeMissing(A,-99)
B = 1×9

     0     1     5   NaN     8     3     4   NaN    16

Replace instances of Inf and "N/A" occurring in all variables of a table with the standard missing value indicators.

Create a table containing Inf and "N/A" to represent nonstandard missing values.

x = ["alpha"; "bravo"; "charlie"; missing; "N/A"];
y = [1; NaN; 3; Inf; 5];
z = [57; 732; 93; 398; Inf];

A = table(x,y,z)
A=5×3 table
        x         y      z 
    _________    ___    ___

    "alpha"        1     57
    "bravo"      NaN    732
    "charlie"      3     93
    <missing>    Inf    398
    "N/A"          5    Inf

Replace all instances of Inf with NaN and "N/A" with <missing>.

B = standardizeMissing(A,{Inf,'N/A'})
B=5×3 table
        x         y      z 
    _________    ___    ___

    "alpha"        1     57
    "bravo"      NaN    732
    "charlie"      3     93
    <missing>    NaN    398
    "N/A"          5    NaN

Replace instances of Inf and "N/A" occurring in specified variables of a table with the standard missing value indicators.

Create a table containing Inf and "N/A" to represent nonstandard missing values.

x = ["alpha"; "bravo"; "charlie"; missing; "N/A"];
y = [1; NaN; 3; Inf; 5];
z = [57; 732; 93; 398; Inf];

A = table(x,y,z)
A=5×3 table
        x         y      z 
    _________    ___    ___

    "alpha"        1     57
    "bravo"      NaN    732
    "charlie"      3     93
    <missing>    Inf    398
    "N/A"          5    Inf

For the variables x and y, replace instances of Inf with NaN and "N/A" with <missing>. Inf remains unchanged in table variable z because z is not included in the DataVariables name-value argument.

B = standardizeMissing(A,{Inf "N/A"},DataVariables=["x" "y"])
B=5×3 table
        x         y      z 
    _________    ___    ___

    "alpha"        1     57
    "bravo"      NaN    732
    "charlie"      3     93
    <missing>    NaN    398
    <missing>      5    Inf

Input Arguments

collapse all

Input data, specified as a vector, matrix, multidimensional array, table, or timetable. If A is a timetable, then standardizeMissing operates on the table data only and ignores NaT and NaN values in the vector of row times.

Data Types: single | double | char | string | table | timetable | cell | categorical | datetime | duration

Nonstandard missing value indicator, specified as a scalar, vector, or cell array of values that the standardizeMissing function should treat as missing.

  • If A is an array, then indicator must be a scalar or vector.

  • If A is a table or timetable, then indicator can also be a cell array with entries of multiple data types.

The entries of indicator override all standard missing value indicators. To specify additional indicators while maintaining the standard list, include missing as an element in indicator.

Each indicator is used to identify missing elements in A of the same class. These indicators are also matched to elements in A of different classes:

  • double indicators also match numeric entries of A.

  • string, and char, and cell arrays of character vectors indicators also match categorical and string entries of A.

  • single, integer, and logical indicators also match double entries of A.

Example: B = standardizeMissing(A,0}) recognizes only 0 as a missing value.

Example: B = standardizeMissing(A,["Unset" missing]), for categorical array A, recognizes "Unset" as a missing value in addition to the standard missing value for a categorical value.

Example: B = standardizeMissing(T,{-99 missing}), for table T, recognizes -99 as a missing value in addition to the standard missing value for the type of each table variable.

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: standardizeMissing(T,indicator,ReplaceValues=false)

Table variables to operate on, specified as one of the options in this table. The DataVariables value indicates which variables of the input table to examine for missing values.

Other variables in the table not specified by DataVariables pass through to the output without being standardized.

Indexing SchemeValues to SpecifyExamples

Variable name

  • A string scalar or character vector

  • A string array or cell array of character vectors

  • A pattern object

  • "A" or 'A' — A variable named A

  • ["A" "B"] or {'A','B'} — Two variables named A and B

  • "Var"+digitsPattern(1) — Variables named "Var" followed by a single digit

Variable index

  • An index number that refers to the location of a variable in the table

  • A vector of numbers

  • A logical vector. Typically, this vector is the same length as the number of variables, but you can omit trailing 0 (false) values.

  • 3 — The third variable from the table

  • [2 3] — The second and third variables from the table

  • [false false true] — The third variable

Function handle

  • A function handle that takes a table variable as input and returns a logical scalar

  • @isnumeric — All the variables containing numeric values

Variable type

  • A vartype subscript that selects variables of a specified type

  • vartype("numeric") — All the variables containing numeric values

Example: standardizeMissing(T,indicator,DataVariables=["Var1" "Var2" "Var4"])

Whether to replace values in a table or timetable, specified as one of these values:

  • true or 1 — Replace input table variables containing missing entries with standardized table variables.

  • false or 0 — Append the input table with all table variables that were checked for missing entries. The missing entries in the appended variables are standardized.

For array input data, ReplaceValues is not supported.

B is the same size as A unless the value of ReplaceValues is false. If the value of ReplaceValues is false, then the width of B is the sum of the input data width and the number of data variables specified.

Example: standardizeMissing(T,indicator,ReplaceValues=false)

Output Arguments

collapse all

Standardized data, returned as a vector, matrix, multidimensional array, table, or timetable. B has the same size and data type as A except for when the value of ReplaceValues is false.

Algorithms

standardizeMissing treats leading and trailing white space differently for cell arrays of character vectors, character arrays, and categorical arrays.

  • For cell arrays of character vectors, standardizeMissing does not ignore white space. All character vectors must match exactly a character vector specified in indicator.

  • For character arrays, standardizeMissing ignores trailing white space.

  • For categorical arrays, standardizeMissing ignores leading and trailing white space.

Alternative Functionality

Live Editor Task

In addition to standardizing missing values, you can interactively find, fill, or remove missing data by adding the Clean Missing Data task to a live script.

Clean Missing Data task in the Live Editor

Extended Capabilities

expand all

Version History

Introduced in R2013b

expand all