Pocket Guide

The Engineer’s Guide to Time Series Anomaly Detection

Explore how to identify events or patterns in data that differ from expected behavior using MATLAB.

This go-to guide is for engineers who want to design and deploy algorithms to detect anomalies in time series data.

Introduction to Anomaly Detection

What Is Anomaly Detection?

Anomaly detection is the process of identifying events or patterns in data that differ from expected behavior. These deviations, known as anomalies, can indicate important information like machine faults, security breaches, early warning signs, or process inefficiencies.

Industrial machinery anomalies are detected from vibration signals. The three-axis vibration data from a time window is processed and anomalies are detected by a machine learning model. The model decision boundary is displayed on a principal component plane, where red dots indicate anomalies.

Example Anomaly Detection Use Cases

Anomaly detection algorithms are useful in various industries and applications:

Predictive Maintenance: Anomaly detection algorithms deployed in industrial equipment can identify early signs of machine failure, improving safety and avoiding costly downtime.

Digital Health: Detecting anomalies in electrocardiogram (ECG) signals can reveal various cardiac conditions, such as arrhythmias, atrial fibrillation, and ventricular tachycardia.

Flight Test Data Analysis: Engineers rely on flight tests to understand system behavior and inform design choices. Anomaly detection can flag erratic sensor values and identify test conditions that fall outside the expected performance range.

Fault-Tolerant Controls: Embedded anomaly detection algorithms in an electric motor can flag changes in real time. This enables the controller to adjust the response to prevent damage and improve performance.

Types of Anomalies in Time Series Data

Although anomalies can appear in various types of data, such as image, video, and text, this guide focuses on time series data, which is prevalent in engineering applications. Time series data records sequential measurements, such as temperature, pressure, and vibration, over time. There are three main types of anomalies in time series data.

MATLAB figure showing a noisy sinusoidal curve with three outliers marked by red dots.

Point anomalies, or outliers, are individual data points that deviate from the expected value.

MATLAB figure showing a highlighted segment marking it as an anomaly.

Collective anomalies occur when a group of data points together deviate from expected patterns.

MATLAB figure showing highlighted segments marking anomalies in three curves.

Multivariate anomalies are anomalies that occur when analyzing multiple data sources together and may or may not be anomalies in a single data source.

Choosing an Anomaly Detection Algorithm

MATLAB provides a broad range of time series anomaly detection algorithms that fall into two major categories: statistical and distance-based methods and one-class AI models. These are all possible tools you can use when designing anomaly detection algorithms, but which tool to choose depends on your data and goals.

Statistical and Distance-Based Methods

Statistical methods rely on assumptions about the underlying data distribution. They apply thresholds, trends, or probability models to identify points that deviate from expected statistical behavior. A simple statistical anomaly detection approach might be computing the moving mean of a signal and flagging an anomaly when the mean exceeds a certain threshold.

Distance-based methods do not assume a particular data distribution; instead, they flag anomalies by calculating distances between individual observations or subsequences, identifying those that are far from typical patterns. For example, matrixProfile compares each subsequence in a time series and identifies anomalies as those that are the furthest distance from the others.

Using distanceProfile to identify anomalies in time series data.

When to use Statistical and Distance-Based Methods:

  • You have limited training data
  • You understand the baseline statistics and trends in your data well
  • You have limited compute resources

Characteristics:

+   Simple and interpretable
+   Low computational cost
+   Suitable for point and collective anomalies
-   May struggle with complex patterns or nonlinear dynamics
-   Sensitive to noise and data dimensionality

Example techniques in MATLAB:

  • isoutlier – Find outliers in data
  • findchangepts – Find abrupt changes in signal
  • robustcov – Compute robust multivariate covariance and mean estimate
  • mahal – Compute Mahalanobis distance to reference samples
  • distanceProfile – Compute distance profile between query subsequence and all other subsequences of a time series
  • matrixProfile – Compute matrix profile between all pairs of subsequences in a multivariable time series

Explore an example

Detect anomalies in motor data using matrix profile

Matrix profile is useful for efficiently detecting collective anomalies in long time series. This example shows how to apply matrix profile to detect anomalies in armature current measurements collected from a degrading DC motor.

Matrix profile of motor data with top motif pair and discord segments highlighted

One-Class AI Models

Anomaly detection methods in this category train a machine learning or deep learning model on a set of mostly normal baseline data. The model learns normal behavior and identifies deviations as anomalies. One-class AI models work best when anomalies are rare or lack a well-defined pattern. They can be applied across multiple channels of data for all types of anomalies.

When to use One-Class AI Models:

  • You want to train a model to identify anomalies in new data as it arrives
  • You have a lot of normal data for training
  • Anomalies are rare, unknown, or lack a well-defined pattern

Characteristics:

+   Suitable for all types of anomalies, including complex and nonlinear patterns
+   Robust to noise and data dimensionality
+   Deep learning methods do not require feature engineering
-    Longer training time and computational requirements
-    Often less interpretable than statistical and distance-based methods

Example techniques in MATLAB:

  • iforest – Isolation forest
  • ocsvm – One-class support vector machine
  • rccforest – Robust random cut forest
  • deepSignalAnomalyDetector – Deep learning models based on 1D convolutional autoencoder or long short-term memory network (LSTM)
  • tcnAD – Temporal convolutional network (TCN)
  • deepantAD – Convolutional neural network (CNN)
  • usAD – Dual-encoder network
  • vaelstmAD – Variational autoencoder and LSTM

Explore an example

Train deep learning models to detect anomalies

This example uses a deep learning model to detect abnormal heartbeat sequences in electrocardiogram (ECG) data.

ECG data with detected anomaly shown in red.

ECG data with detected anomaly shown in red.

Preparing Data for Anomaly Detection

Quality data is critical for building accurate anomaly detection algorithms. But collecting, exploring, and preprocessing data for anomaly detection is challenging—and not a one-size-fits-all approach. The techniques you apply depend on the data characteristics, hardware requirements, and the problem you’re trying to solve. Even with plenty of data, detecting anomalies can be difficult as the data comes from hundreds of noisy sensors. In these cases, extracting features and reducing data dimensions is essential.

Datasets for Anomaly Detection

Anomaly detection has flexible training data requirements compared to many traditional machine learning tasks—you don’t need labeled data. Anomaly detection algorithms are trained only on normal data or mostly normal data with some anomalies mixed in. With this type of data, you need to train an anomaly detection model to recognize what is normal—anything that isn’t normal is classified as an anomaly.

If you have a balanced, labeled dataset with two or more classes, apply an approach called supervised learning for multiclass fault detection.

Diagram of normal and mostly normal data is suitable for anomaly detection. Balanced and labeled data can be used for supervised learning.

Data suitable for anomaly detection.

Learn more about these approaches:

Exploring and Preprocessing Data

Before designing an algorithm, you need to understand what your data is telling you. Exploring and visualizing your data helps identify useful data sources, discern data patterns, reveal data quality, and determine necessary preprocessing steps. You can use the Signal Analyzer app to interactively explore and preprocess time series data.

Data collected from sensors is usually noisy, messy, and disorganized. There are numerous techniques and interactive apps in MATLAB for preprocessing steps like resampling, smoothing, organizing, filling missing data, and more.

Learn more about data preprocessing in MATLAB:

Extracting Features from Raw Data

While some AI models can operate directly on time series data, extracting features from data can improve model performance by highlighting the most important quantities that differentiate between normal and anomalous data. Feature extraction involves transforming raw data into meaningful numerical features that you can use to train a model.

For example, you might compute features like the mean, standard deviation, peak amplitude, or dominant frequency. These features summarize important behaviors in the time series data, making it easier for the model to learn what normal looks like.

Engineers familiar with their data often already know the most useful features to use. However, you can use the automated feature extraction process, which extracts common features (descriptive statistics, spectral measurements, and more) and then ranks them. The Diagnostic Feature Designer app in Predictive Maintenance Toolbox can help you extract, compare, and rank features interactively, then use these features to train an AI model.

Reducing Data Dimensionality

After collecting data and extracting features, it's common to end up with a huge number of them—sometimes hundreds per signal. This high dimensionality can make models slower to train, harder to interpret, and more prone to overfitting. Reducing dimensionality by selecting or combining only the most informative features helps simplify the model and focus the model training on the key features that indicate anomalies. Dimensionality reduction techniques, like Principal Component Analysis (PCA) or feature ranking, can help identify which features contribute most to anomaly detection.

Explore an example

Detect anomalies in industrial machinery using three-axis vibration data

This real-world example trains three different one-class AI models. The process includes data exploration, feature extraction, training the models, and evaluating their performance.

Three plots showing vibration data for three channels before and after maintenance.

Three-axis vibration data before and after maintenance on industrial equipment. The data after maintenance represents normal operation, while the data right before maintenance is an anomaly.

Implementation for the Real World

Engineers want to design anomaly detection algorithms so they can detect anomalies in real systems. Developing these algorithms is an iterative process.

This guide covers essential steps for developing an anomaly detection algorithm: exploring and preprocessing data, extracting features, and choosing algorithms. After development, you can evaluate the algorithm on new data to ensure it generalizes well. Finally, you can deploy it for real-time use and update it as needed with incremental learning.

Workflow of 5 steps in a loop: acquire and explore data, develop amomaly detection algorithm, test on new data, deploy to hardware, and update model.

Workflow of developing and deploying AI model-based anomaly detection model for real-world applications.

Deploying to Embedded Hardware

With MATLAB for Embedded AI, you can create Simulink blocks or generate C/C++ code to deploy anomaly detection algorithms to embedded devices (such as a microcontroller or ECU) for real-time anomaly detection. It is important to test these algorithms in simulation to ensure their effectiveness before deploying them to embedded hardware. Simulink provides a platform for integrating anomaly detection algorithms with other system components, enabling system-level real-time simulations.

Diagram representing deployment of an anomaly detection algorithm to a Raspberry Pi.

Deploying a Deep Signal Anomaly Detector Simulink block to Raspberry Pi® for real-time noise detection.

Incremental Anomaly Detection

Incremental anomaly detection methods continuously process incoming data and compute anomaly scores from a data stream in real time. An incremental model fits on data. MATLAB provides incrementalRobustRandomCutForest and incrementalOneClassSVM model objects for incremental anomaly detection. First, a model is trained to calculate an anomaly score. Then, the threshold is calculated and adjusted based on the anomaly score obtained from data used for training. The model will further detect anomalies, retrain itself, and adjust the threshold as new batches of data stream in.

Learn more about incremental anomaly detection.

MATLAB figure showing anomaly score, threshold, and detected anomalies over iterations across model fitting, threshold learning, and online estimating stages.

Model fitting, threshold learning, and online estimating stages of applying incremental learning techniques with time series data.

Key Takeaways

Time series anomaly detection involves identifying deviations from expected patterns, which can signal critical issues such as machine faults, health risks, or process inefficiencies. MATLAB provides a complete workflow for detecting anomalies in time series data, including data exploration and preprocessing, dimensionality reduction and feature extraction, and selecting appropriate algorithms. With Simulink, you can integrate anomaly detection algorithms into larger system models to verify overall system behavior through simulation. The workflow also supports embedded deployment, enabling you to generate C/C++ code to deploy anomaly detection algorithms on edge devices or in real-time applications.

Discover more about anomaly detection and predictive maintenance.