Bagging, which stands for “bootstrap aggregation”, is a
type of ensemble learning. To bag a weak learner such as a decision tree on a dataset,
fitcensemble
generates many bootstrap
replicas of the dataset and grows decision trees on these replicas. fitcensemble
obtains each bootstrap replica by randomly selecting
N
observations out of N
with replacement, where
N
is the dataset size. To find the predicted response of a trained
ensemble, predict
take an average over predictions from
individual trees.
Drawing N
out of N
observations
with replacement omits on average 37% (1/e) of
observations for each decision tree. These are "out-of-bag" observations.
For each observation, oobLoss
estimates the out-of-bag
prediction by averaging over predictions from all trees in the ensemble
for which this observation is out of bag. It then compares the computed
prediction against the true response for this observation. It calculates
the out-of-bag error by comparing the out-of-bag predicted responses
against the true responses for all observations used for training.
This out-of-bag average is an unbiased estimator of the true ensemble
error.