resubLoss
Regression error by resubstitution
Syntax
Description
returns the resubstitution loss with additional options specified by one or more name-value
arguments.L
= resubLoss(tree
,Name=Value
)
Examples
Compute the In-Sample MSE
Load the carsmall
data set. Consider Displacement
, Horsepower
, and Weight
as predictors of the response MPG
.
load carsmall
X = [Displacement Horsepower Weight];
Grow a regression tree using all observations.
Mdl = fitrtree(X,MPG);
Compute the resubstitution MSE.
resubLoss(Mdl)
ans = 4.8952
Examine the MSE for Each Subtree
Unpruned decision trees tend to overfit. One way to balance model complexity and out-of-sample performance is to prune a tree (or restrict its growth) so that in-sample and out-of-sample performance are satisfactory.
Load the carsmall
data set. Consider Displacement
, Horsepower
, and Weight
as predictors of the response MPG
.
load carsmall
X = [Displacement Horsepower Weight];
Y = MPG;
Partition the data into training (50%) and validation (50%) sets.
n = size(X,1); rng(1) % For reproducibility idxTrn = false(n,1); idxTrn(randsample(n,round(0.5*n))) = true; % Training set logical indices idxVal = idxTrn == false; % Validation set logical indices
Grow a regression tree using the training set.
Mdl = fitrtree(X(idxTrn,:),Y(idxTrn));
View the regression tree.
view(Mdl,Mode="graph");
The regression tree has seven pruning levels. Level 0 is the full, unpruned tree (as displayed). Level 7 is just the root node (i.e., no splits).
Examine the training sample MSE for each subtree (or pruning level) excluding the highest level.
m = max(Mdl.PruneList) - 1; trnLoss = resubLoss(Mdl,SubTrees=0:m)
trnLoss = 7×1
5.9789
6.2768
6.8316
7.5209
8.3951
10.7452
14.8445
The MSE for the full, unpruned tree is about 6 units.
The MSE for the tree pruned to level 1 is about 6.3 units.
The MSE for the tree pruned to level 6 (i.e., a stump) is about 14.8 units.
Examine the validation sample MSE at each level excluding the highest level.
valLoss = loss(Mdl,X(idxVal,:),Y(idxVal),Subtrees=0:m)
valLoss = 7×1
32.1205
31.5035
32.0541
30.8183
26.3535
30.0137
38.4695
The MSE for the full, unpruned tree (level 0) is about 32.1 units.
The MSE for the tree pruned to level 4 is about 26.4 units.
The MSE for the tree pruned to level 5 is about 30.0 units.
The MSE for the tree pruned to level 6 (i.e., a stump) is about 38.5 units.
To balance model complexity and out-of-sample performance, consider pruning Mdl
to level 4.
pruneMdl = prune(Mdl,Level=4);
view(pruneMdl,Mode="graph")
Input Arguments
tree
— Regression tree
RegressionTree
object
Regression tree, specified as a RegressionTree
object created using the fitrtree
function.
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Example: L = resubloss(tree,Subtrees="all")
prunes all
subtrees.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: L = resubloss(tree,"Subtrees","all")
prunes all
subtrees.
LossFun
— Loss function
"mse"
(default) | function handle
Loss function, specified as a function handle or "mse"
for mean
squared error.
You can write your own loss function in the syntax described in Loss Functions.
Data Types: char
| string
| function_handle
Subtrees
— Pruning level
0
(default) | vector of nonnegative integers | "all"
Pruning level, specified as a vector of nonnegative integers in ascending order or
"all"
.
If you specify a vector, then all elements must be at least 0
and at most
max(tree.PruneList)
. 0
indicates the full,
unpruned tree and max(tree.PruneList)
indicates the completely pruned
tree (in other words, just the root node).
If you specify "all"
, then resubLoss
operates on all
subtrees (in other words, the entire pruning sequence). This specification is equivalent
to using 0:max(tree.PruneList)
.
resubLoss
prunes tree
to each level indicated in
Subtrees
, and then estimates the corresponding output arguments.
The size of Subtrees
determines the size of some output
arguments.
To invoke Subtrees
, the properties PruneList
and
PruneAlpha
of tree
must be nonempty. In
other words, grow tree
by setting Prune="on"
, or
by pruning tree
using prune
.
Example: Subtrees="all"
Data Types: single
| double
| char
| string
TreeSize
— Tree size
"se"
(default) | "min"
Tree size, specified as one of the following:
"se"
— Theresubloss
function returns the highest pruning level with loss within one standard deviation of the minimum (L
+se
, whereL
andse
relate to the smallest value inSubtrees
)."min"
— Theresubloss
function returns the element ofSubtrees
with smallest loss, which is usually the smallest element ofSubtrees
.
Example: TreeSize="min"
Output Arguments
L
— Regression loss
numeric vector of positive values
Regression loss, returned as a vector of the length of
Subtrees
.
se
— Standard error of loss
numeric vector of positive values
Standard error of loss, returned as a vector of the length of
Subtrees
.
NLeaf
— Number of leaves
numeric vector of nonnegative integers
Number of leaves (terminal nodes) in the pruned subtrees, returned as a vector of
the length of Subtrees
.
bestLevel
— Optimal pruning level
nonnegative numeric scalar
Optimal pruning level, returned as a nonnegative numeric scalar whose value depends
on TreeSize
:
When
TreeSize
is"se"
, thenbestLevel
is the highest pruning level with loss within one standard deviation of the minimum (L
+se
, whereL
andse
relate to the smallest value inSubtrees
).When
TreeSize
is"min"
, thenbestLevel
is the element ofSubtrees
with the smallest loss, usually the smallest element ofSubtrees
.
More About
Loss Functions
The built-in loss function is "mse"
, meaning mean squared
error.
To write your own loss function, create a function file of the form
function loss = lossfun(Y,Yfit,W)
N
is the number of rows oftree
.X
.Y
is anN
-element vector representing the observed response.Yfit
is anN
-element vector representing the predicted responses.W
is anN
-element vector representing the observation weights.The output
loss
should be a scalar.
Pass the function handle @
as the
value of the lossfun
LossFun
name-value argument.
Extended Capabilities
GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.
This function fully supports GPU arrays. For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
Version History
Introduced in R2011a
See Also
resubPredict
| loss
| fitrtree
Open Example
You have a modified version of this example. Do you want to open this example with your edits?
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)