cvloss

Classification error by cross-validation for classification tree model

Syntax

``E = cvloss(tree)``
``E = cvloss(tree,Name=Value)``
``````[E,SE,Nleaf,BestLevel] = cvloss(___)``````

Description

example

````E = cvloss(tree)` returns the cross-validated classification error (loss) `E` for the trained classification tree model `tree`. The `cvloss` function uses stratified partitioning to create cross-validated sets. That is, for each fold, each partition of the data has roughly the same class proportions as in the data used to train `tree`.```

example

````E = cvloss(tree,Name=Value)` specifies additional options using one or more name-value arguments. For example, you can specify the pruning level, tree size, and number of cross-validation samples.```

example

``````[E,SE,Nleaf,BestLevel] = cvloss(___)``` also returns the standard error of `E`, the number of leaf nodes of `tree`, and the optimal pruning level for `tree`, using any of the input argument combinations in the previous syntaxes.```

Examples

collapse all

Compute the cross-validation error for a default classification tree.

Load the `ionosphere` data set.

`load ionosphere`

Grow a classification tree using the entire data set.

`Mdl = fitctree(X,Y);`

Compute the cross-validation error.

```rng(1); % For reproducibility E = cvloss(Mdl)```
```E = 0.1168 ```

`E` is the 10-fold misclassification error.

Apply k-fold cross validation to find the best level to prune a classification tree for all of its subtrees.

Load the `ionosphere` data set.

`load ionosphere`

Grow a classification tree using the entire data set. View the resulting tree.

```Mdl = fitctree(X,Y); view(Mdl,'Mode','graph')```

Compute the 5-fold cross-validation error for each subtree except for the highest pruning level. Specify to return the best pruning level over all subtrees.

```rng(1); % For reproducibility m = max(Mdl.PruneList) - 1```
```m = 7 ```
`[E,~,~,bestLevel] = cvloss(Mdl,'Subtrees',0:m,'KFold',5)`
```E = 8×1 0.1282 0.1254 0.1225 0.1282 0.1282 0.1197 0.0997 0.1738 ```
```bestLevel = 6 ```

Of the `7` pruning levels, the best pruning level is `6`.

Prune the tree to the best level. View the resulting tree.

```MdlPrune = prune(Mdl,'Level',bestLevel); view(MdlPrune,'Mode','graph')```

Input Arguments

collapse all

Classification tree model, specified as a `ClassificationTree` model object trained with `fitctree`.

Name-Value Arguments

Specify optional pairs of arguments as `Name1=Value1,...,NameN=ValueN`, where `Name` is the argument name and `Value` is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose `Name` in quotes.

Example: `[E,SE,Nleaf,BestLevel] = cvloss(tree,KFold=5)` specifies to use 5 cross-validation samples.

Pruning level, specified as a vector of nonnegative integers in ascending order or `"all"`.

If you specify a vector, then all elements must be at least `0` and at most `max(tree.PruneList)`. `0` indicates the full, unpruned tree, and `max(tree.PruneList)` indicates the completely pruned tree (that is, just the root node).

If you specify `"all"`, then `cvloss` operates on all subtrees (in other words, the entire pruning sequence). This specification is equivalent to using `0:max(tree.PruneList)`.

`cvloss` prunes `tree` to each level specified by `Subtrees`, and then estimates the corresponding output arguments. The size of `Subtrees` determines the size of some output arguments.

For the function to invoke `Subtrees`, the properties `PruneList` and `PruneAlpha` of `tree` must be nonempty. In other words, grow `tree` by setting `Prune="on"` when you use `fitctree`, or by pruning `tree` using `prune`.

Example: `Subtrees="all"`

Data Types: `single` | `double` | `char` | `string`

Tree size, specified as one of these values:

• `"se"``cvloss` returns the best pruning level (`BestLevel`), which corresponds to the highest pruning level with the loss within one standard deviation of the minimum (`L`+`se`, where `L` and `se` relate to the smallest value in `Subtrees`).

• `"min"``cvloss` returns the best pruning level, which corresponds to the element of `Subtrees` with the smallest loss. This element is usually the smallest element of `Subtrees`.

Example: `TreeSize="min"`

Data Types: `char` | `string`

Number of cross-validation samples, specified as a positive integer value greater than 1.

Example: `KFold=8`

Data Types: `single` | `double`

Output Arguments

collapse all

Cross-validation classification error (loss), returned as a numeric vector of the same length as `Subtrees`.

Standard error of `E`, returned as a numeric vector of the same length as `Subtrees`.

Number of leaf nodes in the pruned subtrees, returned as a vector of integer values that has the same length as `Subtrees`. Leaf nodes are terminal nodes, which give responses, not splits.

Best pruning level, returned as a numeric scalar whose value depends on `TreeSize`:

• When `TreeSize` is `"se"`, the `loss` function returns the highest pruning level whose loss is within one standard deviation of the minimum (`L`+`se`, where `L` and `se` relate to the smallest value in `Subtrees`).

• When `TreeSize` is `"min"`, the `loss` function returns the element of `Subtrees` with the smallest loss, usually the smallest element of `Subtrees`.

Alternatives

You can construct a cross-validated tree model with `crossval`, and call `kfoldLoss` instead of `cvloss`. If you are going to examine the cross-validated tree more than once, then the alternative can save time.

However, unlike `cvloss`, `kfoldLoss` does not return `SE`, `Nleaf`, or `BestLevel`. `kfoldLoss` also does not allow you to examine any error other than the classification error.

Version History

Introduced in R2011a