# incrementalDriftAwareLearner

## Description

`incrementalDriftAwareLearner`

creates an `incrementalDriftAwareLearner`

model
object, which incorporates an incremental classification or regression learner and an
incremental concept drift detector to provide a self-adjusting incremental machine learning
model. `incrementalDriftAwareLearner`

supports all classification and regression models for
incremental learning and all concept drift detection methods supported by Statistics and Machine Learning Toolbox™.

Unlike most Statistics and Machine Learning Toolbox model objects, `incrementalDriftAwareLearner`

can be called directly. After
you create an `incrementalDriftAwareLearner`

object, it is prepared for incremental drift-aware learning.

`incrementalDriftAwareLearner`

is best suited for incremental learning that adapts for
concept drift. For a traditional approach to batch drift detection, see `detectdrift`

.

## Creation

You can create an `incrementalDriftAwareLearner`

model object in the following ways:

Initiate an incremental classification or regression learner using any incremental learner. Pass the incremental learning model as an input in the call to

`incrementalDriftAwareLearner`

. For example,BaseLearner = incrementalClassificationLinear(); Mdl = incrementalDriftAwareLearner(BaseLearner);

Initiate an incremental classification or regression learner using any incremental learner. Initiate an incremental concept drift detector using

`incrementalConceptDriftDetector`

. Pass both the incremental learning model and concept drift detector as inputs in the call to`incrementalDriftAwareLearner`

. For example,`BaseLearner = incrementalRegressionKernel(); DDM = incrementalConceptDriftDetector("ddm"); Mdl = incrementalDriftAwareLearner(BaseLearner,DriftDetector=DDM);`

### Syntax

### Description

returns a drift-aware model `Mdl`

= incrementalDriftAwareLearner(`BaseLearner`

)`Mdl`

for incremental learning with default
model parameters and default drift detector.

sets additional options using name-value arguments. For example,
`Mdl`

= incrementalDriftAwareLearner(`BaseLearner`

,`Name=Value`

)`incrementalDriftAwareLearner(BaseLearner,DriftDetector=CDDetector,TrainingPeriod=1000)`

specifies the concept drift detector as a predefined `CDDetector`

and
sets the training period to 1000 observations.

### Input Arguments

`BaseLearner`

— Underlying incremental classification or regression model

`incrementalClassificationKernel`

object | `incrementalClassificationLinear`

object | `incrementalClassificationECOC`

object | `incrementalClassificationNaiveBayes`

object | `incrementalRegressionKernel`

object | `incrementalRegressionLinear`

object

Underlying incremental classification or regression model, specified as one of the following.

To learn how to create these learners, refer to the corresponding reference page.

**Name-Value Arguments**

Specify optional pairs of arguments as
`Name1=Value1,...,NameN=ValueN`

, where `Name`

is
the argument name and `Value`

is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.

**Example: **`BufferSize=5000,TrainingPeriod=8000,StableCountLimit=6000`

specifies the buffer size as 5000, the training period as 8000, and the limit of
consecutive stable states before a reset as 6000 observations.

`BufferSize`

— Size of buffer to store loss values

7000 (default) | scalar integer

Size of the buffer to store the loss values of `BaseLearner`

for each training observation, specified as a scalar integer.

**Example: **`BufferSize=5000`

**Data Types: **`single`

| `double`

`DriftDetector`

— Incremental concept drift detector

`HoeffdingDriftDetectionMethod`

object | `DriftDetectionMethod`

object

Incremental concept drift detector used for monitoring and detecting drift,
specified as a `HoeffdingDriftDetectionMethod`

or
`DriftDetectionMethod`

object.

If

`BaseLearner`

is an incremental classification object, then the default detector is`HoeffdingDriftDetectionMethod`

that uses the moving average method. That is,`incrementalDriftAwareLearner`

creates the drift detector using`incrementalConceptDriftDetector("hddma")`

.If

`BaseLearner`

is an incremental regression object, then the default is`HoeffdingDriftDetectionMethod`

that uses the moving average method for continuous variables. That is,`incrementalDriftAwareLearner`

creates the drift detector using`incrementalConceptDriftDetector("hddma",InputType="continuous")`

.

To specify an incremental concept drift detector that uses a different method,
see the `incrementalConceptDriftDetector`

reference page.

**Example: **`DriftDetector=dd`

`TrainingPeriod`

— Number of observations used for training

10000 (default) | scalar integer

Number of observations used for training, specified as a scalar integer.

If you specify the `TrainingPeriod`

value as
`Inf`

, then the software always trains with incoming data.

If the `TrainingPeriod`

value is smaller than the
`BaseLearner.MetricsWarmupPeriod`

value, then
`incrementalDriftAwareLearner`

sets the `TrainingPeriod`

value
as `BaseLearner.MetricsWarmupPeriod`

.

**Example: **`TrainingPeriod=7000`

**Data Types: **`single`

| `double`

`StableCountLimit`

— Maximum number of consecutive `'Stable'`

observations before soft reset

40000 (default) | scalar integer

Maximum number of consecutive `'Stable'`

observations before a
soft reset, specified as a scalar integer.

**Example: **`StableCountLimit=35000`

**Data Types: **`single`

| `double`

`WarningCountLimit`

— Maximum number of consecutive `'Warning'`

observations before reset

1400 (default) | scalar integer

Maximum number of consecutive `'Warning'`

observations before a
reset, specified as a scalar integer.

**Example: **`WarningCountLimit=1000`

**Data Types: **`single`

| `double`

## Properties

`BaseLearner`

— Underlying incremental classification or regression model

`incrementalClassificationKernel`

object | `incrementalClassificationLinear`

object | `incrementalClassificationECOC`

object | `incrementalClassificationNaiveBayes`

object | `incrementalRegressionKernel`

object | `incrementalRegressionLinear`

object

This property is read-only.

Underlying incremental classification or regression model, specified as one of the following model objects.

This property is set by the `BaseLearner`

input argument.

Access the properties of `BaseLearner`

using the dot operator,
for example, `Mdl.BaseLearner.Solver`

.

`DriftDetector`

— Underlying incremental concept drift detector

`HoeffdingDriftDetectionMethod`

object | `DriftDetectionMethod`

object

This property is read-only.

Underlying incremental concept drift detector, specified as either a
`HoeffdingDriftDetectionMethod`

or
`DriftDetectionMethod`

object.

This property is set by the `DriftDetector`

name-value
argument.

Access the properties of `DriftDetector`

using the dot operator,
for example, `Mdl.DriftDetector.WarningThreshold`

.

`TrainingPeriod`

— Number of observations used for training

scalar integer

This property is read-only.

Number of observations used for training before the software starts monitoring for potential drift, specified as a scalar integer.

This property is set by the `TrainingPeriod`

name-value
argument.

**Data Types: **`double`

`StableCountLimit`

— Maximum number of consecutive `'Stable'`

observations before a soft reset

scalar integer

This property is read-only.

Maximum number of consecutive `'Stable'`

observations before a soft
reset, specified as a scalar integer.

This property is set by the `StableCountLimit`

name-value
argument.

**Data Types: **`double`

`PreviousDriftStatus`

— Status of `DriftDetector`

prior to training most recent data

`'Stable'`

| `'Warning'`

| `'Drift'`

This property is read-only.

Status of `DriftDetector`

prior to training most recent data,
specified as `'Stable'`

, `'Warning'`

, or
`'Drift'`

.

**Data Types: **`char`

`DriftStatus`

— Current status of `DriftDetector`

`'Stable'`

| `'Warning'`

| `'Drift'`

This property is read-only.

Current status of `DriftDetector`

after training with the most
recent data, specified as `'Stable'`

, `'Warning'`

, or
`'Drift'`

.

**Data Types: **`char`

`DriftDetected`

— Flag indicating whether `DriftStatus`

is `'Drift'`

`false`

or `0`

| `true`

or `1`

This property is read-only.

Flag indicating whether `DriftStatus`

is
`'Drift'`

, specified as logical `0`

(`false`

) or `1`

(`true`

).

**Data Types: **`logical`

`WarningCountLimit`

— Maximum number of consecutive `'Warning'`

observations before a reset

scalar integer

This property is read-only.

Maximum number of consecutive `'Warning'`

observations before a
reset, specified as a scalar integer.

**Data Types: **`double`

`WarningDetected`

— Flag indicating whether `DriftStatus`

is `'Warning'`

`false`

or `0`

| `true`

or `1`

This property is read-only.

Flag indicating whether `DriftStatus`

is
`'Warning'`

, specified as logical `0`

(`false`

) or `1`

(`true`

).

**Data Types: **`logical`

`IsTraining`

— Flag indicating whether `BaseLearner`

continues training with incoming data

`false`

or `0`

| `true`

or `1`

This property is read-only.

Flag indicating whether `BaseLearner`

continues training with
incoming data, specified as logical `0`

(`false`

) or
`1`

(`true`

).

**Data Types: **`logical`

`IsWarm`

— Flag indicating whether model tracks performance metrics

`false`

or `0`

| `true`

or `1`

This property is read-only.

Flag indicating whether the incremental model tracks performance metrics, specified
as logical `0`

(`false`

) or `1`

(`true`

).

`incrementalDriftAwareLearner`

takes this property from
`Mdl.BaseLearner`

.

The incremental model `Mdl`

is *warm*
(`IsWarm`

becomes `true`

) after incremental fitting
functions fit (`Mdl.BaseLearner.EstimationPeriod`

+
`MetricsWarmupPeriod`

) observations to the incremental
model.

Value | Description |
---|---|

`true` or `1` | The incremental model `Mdl` is warm. Consequently,
`updateMetrics` and
`updateMetricsAndFit` track performance metrics in the
`Metrics` property of `Mdl` . |

`false` or `0` | The incremental model `Mdl` is not warm.
`updateMetrics` and
`updateMetricsAndFit` do not track performance
metrics. |

**Data Types: **`logical`

`NumPredictors`

— Number of predictor variables

nonnegative numeric scalar

This property is read-only.

Number of predictor variables, specified as a nonnegative numeric scalar.

`incrementalDriftAwareLearner`

takes this property from
`Mdl.BaseLearner`

. You can specify the number of predictor variables
during the initiation of `BaseLearner`

.

**Data Types: **`double`

`NumTrainingObservations`

— Number of observations fit to incremental model

`0`

(default) | nonnegative numeric scalar

This property is read-only.

Number of observations fit to the incremental model `Mdl`

,
specified as a nonnegative numeric scalar.

`incrementalDriftAwareLearner`

pulls this property from
`Mdl.BaseLearner`

.

`NumTrainingObservations`

increases when you pass
`Mdl`

and training data to `fit`

or
`updateMetricsAndFit`

.

**Note**

If you convert a traditionally trained model to create
`Mdl.BaseLearner`

, `incrementalDriftAwareLearner`

does not add
the number of observations fit to the traditionally trained model to
`NumTrainingObservations`

.

**Data Types: **`double`

`Metrics`

— Model performance metrics

table

This property is read-only.

Model performance metrics updated during incremental learning by
`updateMetrics`

or `updateMetricsAndFit`

,
specified as a table with two columns and *m* rows, where
*m* is the number of metrics specified by the
`Metrics`

name-value argument during the initiation of
`BaseLearner`

.

`incrementalDriftAwareLearner`

takes this property from
`Mdl.BaseLearner`

.

The columns of `Metrics`

are labeled `Cumulative`

and `Window`

.

– Element`Cumulative`

is the model performance, as measured by metric`j`

, from the time the model becomes warm (`j`

`IsWarm`

is`1`

).– Element`Window`

is the model performance, as measured by metric`j`

, evaluated over all observations within the window specified by the`j`

`MetricsWindowSize`

property. The software updates`Window`

after it processes`MetricsWindowSize`

observations.

Rows are labeled by the specified metrics.

**Data Types: **`table`

`MetricsWarmupPeriod`

— Number of observations fit before tracking performance metrics

nonnegative integer

This property is read-only.

Number of observations to which the incremental model must be fit before it tracks
performance metrics in its `Metrics`

property, specified as a
nonnegative integer.

`incrementalDriftAwareLearner`

takes this property from
`Mdl.BaseLearner`

. You can specify the metrics warm up period during
the initiation of `BaseLearner`

.

**Data Types: **`double`

`MetricsWindowSize`

— Number of observations to use to compute window performance metrics

positive integer

This property is read-only.

Number of observations to use to compute window performance metrics, specified as a positive integer.

`incrementalDriftAwareLearner`

pulls this property from
`Mdl.BaseLearner`

. You can specify the metrics window size during the
initiation of the `BaseLearner`

.

**Data Types: **`double`

## Object Functions

`fit` | Train drift-aware learner for incremental learning with new data |

`loss` | Regression or classification error of incremental drift-aware learner |

`perObservationLoss` | Per observation regression or classification error of incremental drift-aware learner |

`predict` | Predict responses for new observations from incremental drift-aware learning model |

`reset` | Reset incremental drift-aware learner |

`updateMetrics` | Update performance metrics in incremental drift-aware learning model given new data |

`updateMetricsAndFit` | Update performance metrics in incremental drift-aware learning model given new data and train model |

## Examples

### Create Incremental Drift-Aware Learner Without Any Prior Information

Load the human activity dataset. Randomly shuffle the data.

load humanactivity; n = numel(actid); rng(1) % For reproducibility idx = randsample(n,n);

For details on the data set, enter `Description`

at the command line.

Define the predictor and response variables.

X = feat(idx,:); Y = actid(idx);

Responses can be one of five classes: Sitting, Standing, Walking, Running, or Dancing. Dichotomize the response by identifying whether the subject is moving (actid > 2).

Y = Y > 2;

Flip labels for the second half of the dataset to simulate drift.

Y(floor(numel(Y)/2):end,:) = ~Y(floor(numel(Y)/2):end,:);

Initiate a default incremental drift-aware model for classification as follows:

Create a default incremental linear SVM model for binary classification.

Initiate a default incremental drift-aware model using the incremental linear SVM model.

incMdl = incrementalClassificationLinear(); idaMdl = incrementalDriftAwareLearner(incMdl);

`idaMdl`

is an `incrementalDriftAwareLearner`

model. All its properties are read-only.

Preallocate the number of variables in each chunk for creating a stream of data and the variable to store the classification error.

numObsPerChunk = 50; nchunk = floor(n/numObsPerChunk); ce = array2table(zeros(nchunk,2),'VariableNames',["Cumulative" "Window"]);

Preallocate variables for tracking drift status.

status = zeros(nchunk,1); statusname = strings(nchunk,1);

Simulate a data stream with incoming chunks of 50 observations each. At each iteration:

Call

`updateMetricsAndFit`

to update the performance metrics and fit the drift-aware model to the incoming data. Overwrite the previous incremental model with the new one.Store the cumulative and per iteration classification error in

`ce`

. The`Metrics`

property of`idaMdl`

stores the cumulative and window classification error, which is updated at each iteration.

for j = 1:nchunk ibegin = min(n,numObsPerChunk*(j-1)+1); iend = min(n,numObsPerChunk*j); idx = ibegin:iend; idaMdl = updateMetricsAndFit(idaMdl,X(idx,:),Y(idx)); statusname(j) = string(idaMdl.DriftStatus); if idaMdl.DriftDetected status(j) = 2; elseif idaMdl.WarningDetected status(j) = 1; else status(j) = 0; end ce{j,:} = idaMdl.Metrics{"ClassificationError",:}; end

The `updateMetricsAndFit`

function first evaluates the performance of the model by calling `updateMetrics`

on incoming data, and then fits the model to data by calling `fit`

:

The `updateMetrics`

function evaluates the performance of the model as it processes incoming observations. The function writes specified metrics, measured cumulatively and within a specified window of processed observations, to the `Metrics`

model property.

The `fit`

function fits the model by updating the base learner and monitoring for drift given an incoming batch of data. When you call `fit`

, the software performs the following procedure:

Trains the model up to

`NumTrainingObservations`

observations.After training, the software starts tracking the model loss to see if any concept drift has occurred and updates drift status accordingly.

When the drift status is

`Warning`

, the software trains a temporary model to replace the`BaseLearner`

in preparation for an imminent drift.When the drift status is

`Drift`

, temporary model replaces the`BaseLearner`

.When the drift status is

`Stable`

, the software discards the temporary model.

For more information, see the **Algorithms** section.

Plot the cumulative and per window classification error. Mark the warmup and training periods, and where the drift was introduced.

h = plot(ce.Variables); xlim([0 nchunk]) ylabel("Classification Error") xlabel("Iteration") xline(idaMdl.MetricsWarmupPeriod/numObsPerChunk,"g-.","Warmup Period",LineWidth= 1.5) xline(idaMdl.TrainingPeriod/numObsPerChunk,"b-.","Training Period",LabelVerticalAlignment="middle",LineWidth= 1.5) xline(floor(numel(Y)/2)/numObsPerChunk,"m--","Drift",LabelVerticalAlignment="middle",LineWidth= 1.5) legend(h,ce.Properties.VariableNames) legend(h,Location="best")

Plot the drift status versus the iteration number.

figure() gscatter(1:nchunk,status,statusname,'gmr','*ox',[4 5 5],'on',"Iteration","Drift Status","filled")

### Compute Performance Metrics and Monitor Concept Drift

Create the random concept data and concept drift generator using the helper functions, `HelperSineGenerator`

and `HelperConceptDriftGenerator`

, respectively.

concept1 = HelperSineGenerator(ClassificationFunction=1,IrrelevantFeatures=true,TableOutput=false); concept2 = HelperSineGenerator(ClassificationFunction=3,IrrelevantFeatures=true,TableOutput=false); driftGenerator = HelperConceptDriftGenerator(concept1,concept2,15000,1000);

When `ClassificationFunction`

is 1, `HelperSineGenerator`

labels all points that satisfy *x1* < *sin(x2) *as 1, otherwise the function labels them as 0. When `ClassificationFunction`

is 3, this is reversed. That is, `HelperSineGenerato`

r labels all points that satisfy *x1* >= *sin(x2) *as 1, otherwise the function labels them as 0 [2]. The software returns the data in matrices for using in incremental learners.

`HelperConceptDriftGenerator`

establishes the concept drift. The object uses a sigmoid function `1./(1+exp(-4*(numobservations-position)./width))`

to decide the probability of choosing the first stream when generating data [3]. In this case, the position argument is 15000 and the width argument is 1000. As the number of observations exceeds the position value minus half of the width, the probability of sampling from the first stream when generating data decreases. The sigmoid function allows a smooth transition from one stream to the other. Larger width values indicate a larger transition period where both streams are approximately equally likely to be selected.

Initiate an incremental drift-aware model for classification as follows:

Create an incremental Naive Bayes classification model for binary classification.

Initiate an incremental concept drift detector that uses the Hoeffding's Bounds Drift Detection Method with moving average (HDDMA).

Using the incremental linear model and the concept drift detector, initiate an incremental drift-aware model. Specify the training period as 5000 observations.

BaseLearner = incrementalClassificationNaiveBayes(MaxNumClasses=2,Metrics="classiferror"); dd = incrementalConceptDriftDetector("hddma"); idal = incrementalDriftAwareLearner(BaseLearner,DriftDetector=dd,TrainingPeriod=5000);

Preallocate the number of variables in each chunk and number of iterations for creating a stream of data.

numObsPerChunk = 10; numIterations = 4000;

Preallocate the variables for tracking the drift status and drift time, and storing the classification error.

dstatus = zeros(numIterations,1); statusname = strings(numIterations,1); driftTimes = []; ce = array2table(zeros(numIterations,2),VariableNames=["Cumulative" "Window"]);

Simulate a data stream with incoming chunks of 10 observations each and perform incremental drift-aware learning. At each iteration:

Simulate predictor data and labels, and update

`driftGenerator`

using the helper function`hgenerate`

.Call

`updateMetricsAndFit`

to update the performance metrics and fit the incremental drift-aware model to the incoming data.Track and record the drift status and the classification error for visualization purposes.

rng(12); % For reproducibility for j = 1:numIterations % Generate data [driftGenerator,X,Y] = hgenerate(driftGenerator,numObsPerChunk); % Update performance metrics and fit idal = updateMetricsAndFit(idal,X,Y); % Record drift status and classification error statusname(j) = string(idal.DriftStatus); ce{j,:} = idal.Metrics{"ClassificationError",:}; if idal.DriftDetected dstatus(j) = 2; elseif idal.WarningDetected dstatus(j) = 1; else dstatus(j) = 0; end if idal.DriftDetected driftTimes(end+1) = j; end end

Plot the cumulative and per window classification error. Mark the warmup and training periods, and where the drift was introduced.

h = plot(ce.Variables); xlim([0 numIterations]) ylim([0 0.22]) ylabel("Classification Error") xlabel("Iteration") xline(idal.MetricsWarmupPeriod/numObsPerChunk,"g-.","Warmup Period",LineWidth=1.5) xline(idal.MetricsWarmupPeriod/numObsPerChunk+driftTimes,"g-.","Warmup Period",LineWidth=1.5) xline(idal.TrainingPeriod/numObsPerChunk,"b-.","Training Period",LabelVerticalAlignment="middle",LineWidth=1.5) xline(driftTimes,"m--","Drift",LabelVerticalAlignment="middle",LineWidth=1.5) legend(h,ce.Properties.VariableNames) legend(h,Location="best")

The `updateMetricsAndFit`

function first evaluates the performance of the model by calling `updateMetrics`

on incoming data, and then fits the model to data by calling `fit`

:

The `updateMetrics`

function evaluates the performance of the model as it processes incoming observations. The function writes specified metrics, measured cumulatively and within a specified window of processed observations, to the `Metrics`

model property.

The `fit`

function fits the model by updating the base learner and monitoring for drift given an incoming batch of data. When you call `fit`

, the software performs the following procedure:

Trains the model up to

`NumTrainingObservations`

observations.After training, the software starts tracking the model loss to see if any concept drift has occurred and updates drift status accordingly.

When the drift status is

`Warning`

, the software trains a temporary model to replace the`BaseLearner`

in preparation for an imminent drift.When the drift status is

`Drift`

, temporary model replaces the`BaseLearner`

.When the drift status is

`Stable`

, the software discards the temporary model.

For more information, see the **Algorithms** section.

Plot the drift status versus the iteration number.

gscatter(1:numIterations,dstatus,statusname,"gmr","o",5,"on","Iteration","Drift Status","filled")

### Monitor Concept Drift in Regression Model

Create the random concept data and the concept drift generator using the helper functions `HelperRegrGenerator`

and `HelperConceptDriftGenerator`

, respectively.

concept1 = HelperRegrGenerator(NumFeatures=100,NonZeroFeatures=[1,20,40,50,55], ... FeatureCoefficients=[4,5,10,-2,-6],NoiseStd=1.1,TableOutput=false); concept2 = HelperRegrGenerator(NumFeatures=100,NonZeroFeatures=[10,20,45,56,80], ... FeatureCoefficients=[4,5,10,-2,-6],NoiseStd=1.1,TableOutput=false); driftGenerator = HelperConceptDriftGenerator(concept1,concept2,15000,1000);

`HelperRegrGenerator`

generates streaming data using features and feature coefficients for regression specified in the call to the function. At each step, the function samples the predictors from a normal distribution. Then, the function computes the response using the feature coefficients and predictor values and adding a random noise from a normal distribution with mean zero and specified noise standard deviation. The software returns the data in matrices for using in incremental learners.

`HelperConceptDriftGenerator`

establishes the concept drift. The object uses a sigmoid function `1./(1+exp(-4*(numobservations-position)./width))`

to decide the probability of choosing the first stream when generating data [3]. In this case, the position argument is 15000 and the width argument is 1000. As the number of observations exceeds the position value minus half of the width, the probability of sampling from the first stream when generating data decreases. The sigmoid function allows a smooth transition from one stream to the other. Larger width values indicate a larger transition period where both streams are approximately equally likely to be selected.

Initiate an incremental drift-aware model for regression as follows:

Create an incremental linear model for regression. Specify the linear regression model type and solver type.

Initiate an incremental concept drift detector that uses the Hoeffding's Bounds Drift Detection Method with moving average (HDDMA).

Using the incremental linear model and the concept drift detector, instantiate an incremental drift-aware model. Specify the training period as 6000 observations.

baseMdl = incrementalRegressionLinear(Learner="leastsquares",Solver="sgd",EstimationPeriod=1000,Standardize=false); dd = incrementalConceptDriftDetector("hddma",Alternative="greater",InputType="continuous",WarmupPeriod=1000); idal = incrementalDriftAwareLearner(baseMdl,DriftDetector=dd,TrainingPeriod=6000);

Preallocate the number of variables in each chunk and number of iterations for creating a stream of data.

numObsPerChunk = 10; numIterations = 4000;

Preallocate the variables for tracking the drift status and drift time, and storing the regression error.

dstatus = zeros(numIterations,1); statusname = strings(numIterations,1); driftTimes = []; ce = array2table(zeros(numIterations,2),VariableNames=["Cumulative" "Window"]);

Simulate a data stream with incoming chunks of 10 observations each and perform incremental drift-aware learning. At each iteration:

Simulate predictor data and labels, and update the drift generator using the helper function

`hgenerate`

.Call

`updateMetricsAndFit`

to update the performance metrics and fit the incremental drift-aware model to the incoming data.Track and record the drift status and the regression error for visualization purposes.

rng(12); % For reproducibility for j = 1:numIterations % Generate data [driftGenerator,X,Y] = hgenerate(driftGenerator,numObsPerChunk); % Update performance metrics and fit idal = updateMetricsAndFit(idal,X,Y); % Record drift status and regression error statusname(j) = string(idal.DriftStatus); ce{j,:} = idal.Metrics{"MeanSquaredError",:}; if idal.DriftDetected dstatus(j) = 2; elseif idal.WarningDetected dstatus(j) = 1; else dstatus(j) = 0; end if idal.DriftDetected driftTimes(end+1) = j; end end

Plot the cumulative and per window regression error. Mark the warmup and training periods, and where the drift was introduced.

h = plot(ce.Variables); xlim([0 numIterations]) ylabel("Mean Squared Error") xlabel("Iteration") xline((idal.MetricsWarmupPeriod+idal.BaseLearner.EstimationPeriod)/numObsPerChunk,"g-.","Warmup Period",LineWidth=1.5) xline(idal.TrainingPeriod/numObsPerChunk,"b-.","Training Period",LabelVerticalAlignment="middle",LineWidth=1.5) xline(driftTimes,"m--","Drift",LabelVerticalAlignment="middle",LineWidth=1.5) legend(h,ce.Properties.VariableNames) legend(h,Location="best")

Plot the drift status versus the iteration number.

gscatter(1:numIterations,dstatus,statusname,'gmr','o',5,'on',"Iteration","Drift Status","filled")

## Algorithms

### Incremental Drift-Aware Learning

*Incremental learning*, or *online
learning*, is a branch of machine learning concerned with processing incoming
data from a data stream, possibly given little to no knowledge of the distribution of the
predictor variables, aspects of the prediction or objective function (including tuning
parameter values), or whether the observations are labeled. Incremental learning differs
from traditional machine learning, where enough labeled data is available to fit to a model,
perform cross-validation to tune hyperparameters, and infer the predictor distribution. For
more details, see Incremental Learning Overview.

Unlike other incremental learning functionality offered by Statistics and Machine Learning Toolbox, `incrementalDriftAwareLearner`

model object combines incremental learning and
concept drift detection.

After creating an `incrementalDriftAwareLearner`

object, use `updateMetrics`

to update model performance metrics and `fit`

to fit the
base model to incoming chunk of data, check for potential drift in the model performance
(concept drift), and update or reset the incremental drift-aware learner, if necessary. You
can also use `updateMetricsAndFit`

. The `fit`

function
implements the Reactive Drift Detection Method (RDDM) [1] as follows:

After

`Mdl.BaseLearner.EstimationPeriod`

(if necessary) and`MetricsWarmupPeriod`

, the function trains the incremental drift-aware model up to`NumTrainingObservations`

observations until it reaches`TrainingPeriod`

. (If the`TrainingPeriod`

value is smaller than the`Mdl.BaseLearner.MetricsWarmupPeriod`

value, then`incrementalDriftAwareLearner`

sets the`TrainingPeriod`

value as`Mdl.BaseLearner.MetricsWarmupPeriod`

.)When

`NumTrainingObservations > TrainingPeriod`

, the software starts tracking the model loss. The software computes the per observation loss using the`perObservationLoss`

function. While computing the per observation loss, the software uses the`"classiferror"`

loss metric for classification models and`"squarederror"`

for regression models. The function then appends the loss values computed using the last chunk of data to the existing buffer loss values.Next, the software checks to see if any concept drift occurred by using the

`detectdrift`

function and updates`DriftStatus`

accordingly.

Based on the drift status, `fit`

performs the following procedure:

The software first increases the consecutive`DriftStatus`

is`'Warning'`

–`'Warning'`

status count by 1.If the consecutive

`'Warning'`

status count is less than the`WarningCountLimit`

value and the`PreviousDriftStatus`

value is`Stable`

, then the software trains a temporary incremental learner (if one does not exist) and sets it (or the existing one) to`BaseLearner`

.Then the software resets the temporary incremental learner using the learner's

`reset`

function.If the consecutive

`'Warning'`

status count is less than the`WarningCountLimit`

value and the`PreviousDriftStatus`

value is`'Warning'`

, then the software trains the existing temporary incremental model using the latest chunk of data.If the consecutive

`'Warning'`

status count is more than the`WarningCountLimit`

value, then the software sets the`DriftStatus`

value to`'Drift'`

.

The software performs the following steps.`DriftStatus`

is`'Drift'`

–Sets the consecutive

`'Warning'`

status count to 0.Resets

`DriftDetector`

using the`reset`

function.Empties the buffer loss values and appends the loss values for the latest chunk of data to buffer loss values.

If the temporary incremental model is not empty, then the software sets the current

`BaseLearner`

value to the temporary incremental model and empties the temporary incremental model.If the temporary incremental model is empty, then the software resets the

`BaseLearner`

value by using the learner's`reset`

function.

The software first increases the consecutive`DriftStatus`

is`'Stable'`

–`'Stable'`

status count by 1.If the consecutive

`'Stable'`

status count is less than the`StableCountLimit`

and the`PreviousDriftStatus`

value is`'Warning'`

, then the software sets the number of warnings to zero and empties the temporary model.If the consecutive

`'Stable'`

status count is more than the`StableCountLimit`

value, then the software resets the`DriftDetector`

using the`reset`

function. Then the software tests all of the saved loss values in the buffer for concept drift by using the`detectdrift`

function.

Once `DriftStatus`

is set to `'Drift'`

, and the
`BaseLearner`

and `DriftDetector`

are reset, the
software waits until `Mdl.BaseLearner.EstimationPeriod`

+
`Mdl.BaseLearner.MetricsWarmupPeriod`

before it starts computing the
performance metrics.

### Performance Metrics

The

`updateMetrics`

and`updateMetricsAndFit`

functions track model performance metrics (`Metrics`

) from new data when the incremental model is*warm*(`Mdl.BaseLearner.IsWarm`

property). An incremental model becomes warm after`fit`

or`updateMetricsAndFit`

fits the incremental model to`MetricsWarmupPeriod`

observations, which is the*metrics warm-up period*.If

`Mdl.BaseLearner.EstimationPeriod`

> 0, the functions estimate hyperparameters before fitting the model to data. Therefore, the functions must process an additional`EstimationPeriod`

observations before the model starts the metrics warm-up period.The

`Metrics`

property of the incremental model stores two forms of each performance metric as variables (columns) of a table,`Cumulative`

and`Window`

, with individual metrics in rows. When the incremental model is warm,`updateMetrics`

and`updateMetricsAndFit`

update the metrics at the following frequencies:`Cumulative`

— The functions compute cumulative metrics since the start of model performance tracking. The functions update metrics every time you call the functions, and base the calculation on the entire supplied data set until a model reset.`Window`

— The functions compute metrics based on all observations within a window determined by the`MetricsWindowSize`

name-value argument.`MetricsWindowSize`

also determines the frequency at which the software updates`Window`

metrics. For example, if`MetricsWindowSize`

is 20, the functions compute metrics based on the last 20 observations in the supplied data (`X((end – 20 + 1):end,:)`

and`Y((end – 20 + 1):end)`

).Incremental functions that track performance metrics within a window use the following process:

Store

`MetricsWindowSize`

amount of values for each specified metric, and store the same amount of observation weights.Populate elements of the metrics values with the model performance based on batches of incoming observations, and store the corresponding observation weights.

When the window of observations is filled, overwrite

`Mdl.Metrics.Window`

with the weighted average performance in the metrics window. If the window is overfilled when the function processes a batch of observations, the latest incoming`MetricsWindowSize`

observations are stored, and the earliest observations are removed from the window. For example, suppose`MetricsWindowSize`

is 20, there are 10 stored values from a previously processed batch, and 15 values are incoming. To compose the length 20 window, the functions use the measurements from the 15 incoming observations and the latest 5 measurements from the previous batch.

The software omits an observation with a

`NaN`

score when computing the`Cumulative`

and`Window`

performance metric values.

## References

[1] Barros, Roberto S.M. , et al.
"RDDM: Reactive drift detection method." *Expert Systems with
Applications*. vol. 90, Dec. 2017, pp. 344-55. https://doi.org/10.1016/j.eswa.2017.08.023.

[2] Bifet, Albert, et al. "New
Ensemble Methods for Evolving Data Streams." *Proceedings of the 15th ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining*. ACM Press,
2009, p. 139. https://doi.org/10.1145/1557019.1557041.

[3] Gama, João, et al. "Learning with
drift detection". *Advances in Artificial Intelligence – SBIA 2004*, edited
by Ana L. C. Bazzan and Sofiane Labidi, vol. 3171, Springer Berlin Heidelberg, 2004, pp. 286–95.
https://doi.org/10.1007/978-3-540-28645-5_29.

## Version History

**Introduced in R2022b**

## MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)