# Clean Outlier Data

Find, fill, or remove outliers in the Live Editor

## Description

The Clean Outlier Data task lets you
interactively handle outliers in data. The task automatically generates MATLAB^{®} code for your live script.

Using this task, you can:

Find, fill, or remove outliers from data in a workspace variable.

Customize the methods for finding and filling outliers.

Visualize the outlier data and cleaned data.

### More

### Related Functions

Clean Outlier Data generates code that uses the
`isoutlier`

, `filloutliers`

, and `rmoutliers`

functions.

## Open the Task

To add the Clean Outlier Data task to a live script in the MATLAB Editor:

On the

**Live Editor**tab, select**Task**>**Clean Outlier Data**.In a code block in the script, type a relevant keyword, such as

`outlier`

,`clean`

,`fill`

, or`remove`

. Select`Clean Outlier Data`

from the suggested command completions. For some keywords, the task automatically updates one or more corresponding parameters.

## Examples

### Remove Outliers from Table

Interactively remove outliers from a table using the Clean Outlier Data task in the Live Editor.

Create a table using patient height and weight data from a sample file.

load("patients.mat","Height","Weight") T = table(Height,Weight); head(T)

Height Weight ______ ______ 71 176 69 163 64 131 67 133 64 119 68 142 64 142 68 180

Open the Clean Outlier Data task in the Live Editor. To clean the patient data, select `T`

as the input data. Then, compute on the `Height`

and `Weight`

variables by selecting `All supported variables`

.

The Clean Outlier Data task can fill or remove outlier data. To remove the table rows corresponding to patients with outlier height or weight measurements, use the **Cleaning method** field to select `Remove outliers`

. Then, to define outliers as elements below the 10th percentile or above the 90th percentile, use the **Detection method** field to select `Percentiles`

.

Then, to visualize the cleaned height and weight data, use the **Variable to display** field to select all variables.

This task returns a table of the cleaned data and a logical vector indicating the rows removed from the input table. Use `outlierIndices`

to determine the number of rows removed from the table.

nrows = sum(outlierIndices)

nrows = 24

### Related Examples

## Parameters

`Input data`

— Valid input data from workspace

vector | table | timetable

This task operates on input data contained in a vector, table, or timetable. The
data can be of type `single`

or `double`

.

For table or timetable input data, to clean all variables with type
`single`

or `double`

, select ```
All
supported variables
```

. To choose which `single`

or
`double`

variables to clean, select ```
Specified
variables
```

.

`Cleaning method`

— Cleaning method for filling outliers

`Linear interpolation`

(default) | `Constant value`

| `Convert to missing`

| ...

Specify the method for filling outliers as one of these options.

Fill Method | Description |
---|---|

`Linear interpolation` | Linear interpolation of neighboring, nonoutlier values |

`Constant value` | Specified scalar value, which is `0` by default |

`Convert to missing` | Convert to default definition of standard missing value |

`Center value` | Center value determined by the detection method |

`Clip to threshold value` | Lower threshold value for elements smaller than the lower threshold determined by the detection method; upper threshold value for elements larger than the upper threshold determined by the detection method |

`Previous value` | Previous nonoutlier value |

`Next value` | Next nonoutlier value |

`Nearest value` | Nearest nonoutlier value |

`Spline interpolation` | Piecewise cubic spline interpolation |

```
Shape-preserving cubic interpolation
(PCHIP)
``` | Shape-preserving piecewise cubic spline interpolation |

`Modified Akima cubic interpolation` | Modified Akima cubic Hermite interpolation |

`Detection method`

— Method for detecting outliers

`Moving median`

(default) | `Median`

| `Mean`

| ...

Specify the detection method for finding outliers as one of these options.

Method | Description |
---|---|

`Moving median` | Define outliers as elements more than the specified threshold of local
scaled median absolute deviations (MAD) from the local median over a specified
window. The default threshold is `3` . |

`Median` | Define outliers as elements more than the specified threshold of scaled MAD
from the median. The default threshold is `3` . For input data
`A` , the scaled MAD is defined as
`c*median(abs(A-median(A)))` , where
`c=-1/(sqrt(2)*erfcinv(3/2))` . |

`Mean` | Define outliers as elements more than the specified threshold of standard
deviations from the mean. The default threshold is `3` . This
method is faster but less robust than `Median` . |

`Quartiles` | Define outliers as elements more than the specified threshold of
interquartile ranges above the upper quartile (75 percent) or below the lower
quartile (25 percent). The default threshold is `1.5` . This
method is useful when the input data is not normally distributed. |

`Grubbs` | Define outliers using Grubbs’ test, which removes one outlier per iteration based on hypothesis testing. This method assumes that the input data is normally distributed. |

```
Generalized extreme studentized deviate
(GESD)
``` | Define outliers using the generalized extreme studentized deviate test for
outliers. This iterative method is similar to `Grubbs`
but can perform better when multiple outliers are masking each other. |

`Moving mean` | Define outliers as elements more than the specified threshold of local
standard deviations from the local mean over a specified window. The default
threshold is `3` . |

`Percentiles` | Define outliers as elements outside of the percentile range specified by an
upper and lower threshold. The default lower percentile threshold is
`10` , and the default upper percentile threshold is
`90` . Valid threshold values are in the interval [0,
100]. |

| Define outliers as elements outside of the range specified by an upper and lower threshold. Specify the thresholds as scalars or vectors matching the width of the input data. |

| Define outlier locations using a workspace variable. Specify a logical
array or table with logical variables, where elements with a value of
`1` (`true` ) correspond to outliers. |

`Moving window`

— Window for moving methods

`Centered`

(default) | `Asymmetric`

Specify the window type and size when the method for detecting outliers is
`Moving median`

or ```
Moving
mean
```

.

Window | Description |
---|---|

`Centered` | Specified window length centered about the current point |

`Asymmetric` | Specified window containing the number of elements before the current point and the number of elements after the current point |

Window sizes are relative to the **X-axis** variable units.

## Version History

**Introduced in R2019b**

### R2024b: Detect outliers using range or workspace variable

You can define outliers as elements outside of a range defined by an upper and lower
threshold or as elements indicated by a value of `1`

(`true`

) in a workspace variable. Select the
`Range`

or `Workspace variable`

detection method, respectively.

### R2022b: Plot multiple table variables

Simultaneously plot multiple table variables in the display of this Live Editor task. For
table or timetable data, to visualize all selected table variables at once in a tiled chart
layout, set the **Variable to display** field.

### R2022b: Convert outliers to missing

You can convert outlier data to missing data indicated by the value
`NaN`

. Set the **Cleaning method** field to
`Fill outliers`

and select the ```
Convert to
missing
```

option.

### R2022b: Append cleaned table variables

Append input table variables with table variables containing cleaned data. For table or
timetable input data, to append the cleaned data, set the **Output
format** field.

### R2022a: Live Editor task does not run automatically if inputs have more than 1 million elements

This Live Editor task does not run automatically if the inputs have more than 1 million elements. In previous releases, the task always ran automatically for inputs of any size. If the inputs have a large number of elements, then the code generated by this task can take a noticeable amount of time to run (more than a few seconds).

When a task does not run automatically, the Autorun indicator is disabled. You can either run the task manually when needed or choose to enable the task to run automatically.

### R2021a: Operate on multiple table variables

This Live Editor task can operate on multiple table variables at the same time. For table
or timetable input data, to operate on multiple variables, select ```
All supported
variables
```

or `Specified variables`

. Return all of
the variables or only the modified variables, and specify which variable to
visualize.

### R2021a: Visualize results with histogram

Visualize results with a histogram plot for most detection methods. The histogram can summarize the input data, outliers, cleaned data with outliers filled, and outlier detection thresholds and center value.

## See Also

### Functions

`isoutlier`

|`filloutliers`

|`rmoutliers`

|`clip`

|`isbetween`

### Live Editor Tasks

- Clean Missing Data | Find Change Points | Find Local Extrema | Smooth Data | Find and Remove Trends | Normalize Data | Compute by Group

### Apps

## MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)