# crossval

Cross-validate machine learning model

## Description

sets an additional cross-validation option. You can specify only one name-value argument.
For example, you can specify the number of folds or a holdout sample proportion.`CVMdl`

= crossval(`Mdl`

,`Name,Value`

)

## Examples

### Cross-Validate SVM Classifier

Load the `ionosphere`

data set. This data set has 34 predictors and 351 binary responses for radar returns, either bad (`'b'`

) or good (`'g'`

).

load ionosphere rng(1); % For reproducibility

Train a support vector machine (SVM) classifier. Standardize the predictor data and specify the order of the classes.

SVMModel = fitcsvm(X,Y,'Standardize',true,'ClassNames',{'b','g'});

`SVMModel`

is a trained `ClassificationSVM`

classifier. `'b'`

is the negative class and `'g'`

is the positive class.

Cross-validate the classifier using 10-fold cross-validation.

CVSVMModel = crossval(SVMModel)

CVSVMModel = ClassificationPartitionedModel CrossValidatedModel: 'SVM' PredictorNames: {'x1' 'x2' 'x3' 'x4' 'x5' 'x6' 'x7' 'x8' 'x9' 'x10' 'x11' 'x12' 'x13' 'x14' 'x15' 'x16' 'x17' 'x18' 'x19' 'x20' 'x21' 'x22' 'x23' 'x24' 'x25' 'x26' 'x27' 'x28' 'x29' 'x30' 'x31' 'x32' 'x33' 'x34'} ResponseName: 'Y' NumObservations: 351 KFold: 10 Partition: [1x1 cvpartition] ClassNames: {'b' 'g'} ScoreTransform: 'none'

`CVSVMModel`

is a `ClassificationPartitionedModel`

cross-validated classifier. During cross-validation, the software completes these steps:

Randomly partition the data into 10 sets of equal size.

Train an SVM classifier on nine of the sets.

Repeat steps 1 and 2

*k*= 10 times. The software leaves out one partition each time and trains on the other nine partitions.Combine generalization statistics for each fold.

Display the first model in `CVSVMModel.Trained`

.

FirstModel = CVSVMModel.Trained{1}

FirstModel = CompactClassificationSVM ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' Alpha: [78x1 double] Bias: -0.2209 KernelParameters: [1x1 struct] Mu: [0.8888 0 0.6320 0.0406 0.5931 0.1205 0.5361 0.1286 0.5083 0.1879 0.4779 0.1567 0.3924 0.0875 0.3360 0.0789 0.3839 9.6066e-05 0.3562 -0.0308 0.3398 -0.0073 0.3590 -0.0628 0.4064 -0.0664 0.5535 -0.0749 0.3835 ... ] (1x34 double) Sigma: [0.3149 0 0.5033 0.4441 0.5255 0.4663 0.4987 0.5205 0.5040 0.4780 0.5649 0.4896 0.6293 0.4924 0.6606 0.4535 0.6133 0.4878 0.6250 0.5140 0.6075 0.5150 0.6068 0.5222 0.5729 0.5103 0.5061 0.5478 0.5712 0.5032 ... ] (1x34 double) SupportVectors: [78x34 double] SupportVectorLabels: [78x1 double]

`FirstModel`

is the first of the 10 trained classifiers. It is a `CompactClassificationSVM`

classifier.

You can estimate the generalization error by passing `CVSVMModel`

to `kfoldLoss`

.

### Specify Holdout Sample Proportion for Naive Bayes Cross-Validation

Specify a holdout sample proportion for cross-validation. By default, `crossval`

uses 10-fold cross-validation to cross-validate a naive Bayes classifier. However, you have several other options for cross-validation. For example, you can specify a different number of folds or a holdout sample proportion.

Load the `ionosphere`

data set. This data set has 34 predictors and 351 binary responses for radar returns, either bad (`'b'`

) or good (`'g'`

).

`load ionosphere`

Remove the first two predictors for stability.

X = X(:,3:end); rng('default'); % For reproducibility

Train a naive Bayes classifier using the predictors `X`

and class labels `Y`

. A recommended practice is to specify the class names. `'b'`

is the negative class and `'g'`

is the positive class. `fitcnb`

assumes that each predictor is conditionally and normally distributed.

Mdl = fitcnb(X,Y,'ClassNames',{'b','g'});

`Mdl`

is a trained `ClassificationNaiveBayes`

classifier.

Cross-validate the classifier by specifying a 30% holdout sample.

`CVMdl = crossval(Mdl,'Holdout',0.3)`

CVMdl = ClassificationPartitionedModel CrossValidatedModel: 'NaiveBayes' PredictorNames: {'x1' 'x2' 'x3' 'x4' 'x5' 'x6' 'x7' 'x8' 'x9' 'x10' 'x11' 'x12' 'x13' 'x14' 'x15' 'x16' 'x17' 'x18' 'x19' 'x20' 'x21' 'x22' 'x23' 'x24' 'x25' 'x26' 'x27' 'x28' 'x29' 'x30' 'x31' 'x32'} ResponseName: 'Y' NumObservations: 351 KFold: 1 Partition: [1x1 cvpartition] ClassNames: {'b' 'g'} ScoreTransform: 'none'

`CVMdl`

is a `ClassificationPartitionedModel`

cross-validated, naive Bayes classifier.

Display the properties of the classifier trained using 70% of the data.

TrainedModel = CVMdl.Trained{1}

TrainedModel = CompactClassificationNaiveBayes ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' DistributionNames: {1x32 cell} DistributionParameters: {2x32 cell}

`TrainedModel`

is a `CompactClassificationNaiveBayes`

classifier.

Estimate the generalization error by passing `CVMdl`

to `kfoldloss`

.

kfoldLoss(CVMdl)

ans = 0.2095

The out-of-sample misclassification error is approximately 21%.

Reduce the generalization error by choosing the five most important predictors.

idx = fscmrmr(X,Y); Xnew = X(:,idx(1:5));

Train a naive Bayes classifier for the new predictor.

Mdlnew = fitcnb(Xnew,Y,'ClassNames',{'b','g'});

Cross-validate the new classifier by specifying a 30% holdout sample, and estimate the generalization error.

```
CVMdlnew = crossval(Mdlnew,'Holdout',0.3);
kfoldLoss(CVMdlnew)
```

ans = 0.1429

The out-of-sample misclassification error is reduced from approximately 21% to approximately 14%.

### Create Cross-Validated Regression GAM Using `crossval`

Train a regression generalized additive model (GAM) by using `fitrgam`

, and create a cross-validated GAM by using `crossval`

and the holdout option. Then, use `kfoldPredict`

to predict responses for validation-fold observations using a model trained on training-fold observations.

Load the `patients`

data set.

`load patients`

Create a table that contains the predictor variables (`Age`

, `Diastolic`

, `Smoker`

, `Weight`

, `Gender`

, `SelfAssessedHealthStatus`

) and the response variable (`Systolic`

).

tbl = table(Age,Diastolic,Smoker,Weight,Gender,SelfAssessedHealthStatus,Systolic);

Train a GAM that contains linear terms for predictors.

`Mdl = fitrgam(tbl,'Systolic');`

`Mdl`

is a `RegressionGAM`

model object.

Cross-validate the model by specifying a 30% holdout sample.

rng('default') % For reproducibility CVMdl = crossval(Mdl,'Holdout',0.3)

CVMdl = RegressionPartitionedGAM CrossValidatedModel: 'GAM' PredictorNames: {'Age' 'Diastolic' 'Smoker' 'Weight' 'Gender' 'SelfAssessedHealthStatus'} CategoricalPredictors: [3 5 6] ResponseName: 'Systolic' NumObservations: 100 KFold: 1 Partition: [1x1 cvpartition] NumTrainedPerFold: [1x1 struct] ResponseTransform: 'none' IsStandardDeviationFit: 0

The `crossval`

function creates a `RegressionPartitionedGAM`

model object `CVMdl`

with the holdout option. During cross-validation, the software completes these steps:

Randomly select and reserve 30% of the data as validation data, and train the model using the rest of the data.

Store the compact, trained model in the

`Trained`

property of the cross-validated model object`RegressionPartitionedGAM`

.

You can choose a different cross-validation setting by using the `'CrossVal'`

, `'CVPartition'`

, `'KFold'`

, or `'Leaveout' `

name-value argument.

Predict responses for the validation-fold observations by using `kfoldPredict`

. The function predicts responses for the validation-fold observations by using the model trained on the training-fold observations. The function assigns `NaN`

to the training-fold observations.

yFit = kfoldPredict(CVMdl);

Find the validation-fold observation indexes, and create a table containing the observation index, observed response values, and predicted response values. Display the first eight rows of the table.

idx = find(~isnan(yFit)); t = table(idx,tbl.Systolic(idx),yFit(idx), ... 'VariableNames',{'Obseraction Index','Observed Value','Predicted Value'}); head(t)

Obseraction Index Observed Value Predicted Value _________________ ______________ _______________ 1 124 130.22 6 121 124.38 7 130 125.26 12 115 117.05 20 125 121.82 22 123 116.99 23 114 107 24 128 122.52

Compute the regression error (mean squared error) for the validation-fold observations.

L = kfoldLoss(CVMdl)

L = 43.8715

## Input Arguments

`Mdl`

— Machine learning model

full regression model object | full classification model object

Machine learning model, specified as a full regression or classification model object, as given in the following tables of supported models.

**Regression Model Object**

Model | Full Regression Model Object |
---|---|

Gaussian process regression (GPR) model | `RegressionGP` (If you supply a custom
`'ActiveSet'` in the call to `fitrgp` ,
then you cannot cross-validate the GPR model.) |

Generalized additive model (GAM) | `RegressionGAM` |

Neural network model | `RegressionNeuralNetwork` |

**Classification Model Object**

Model | Full Classification Model Object |
---|---|

Generalized additive model | `ClassificationGAM` |

k-nearest neighbor model | `ClassificationKNN` |

Naive Bayes model | `ClassificationNaiveBayes` |

Neural network model | `ClassificationNeuralNetwork` |

Support vector machine for one-class and binary classification | `ClassificationSVM` |

### Name-Value Arguments

Specify optional pairs of arguments as
`Name1=Value1,...,NameN=ValueN`

, where `Name`

is
the argument name and `Value`

is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.

*
Before R2021a, use commas to separate each name and value, and enclose*
`Name`

*in quotes.*

**Example: **`crossval(Mdl,'KFold',3)`

specifies using three folds in a
cross-validated model.

`CVPartition`

— Cross-validation partition

`[]`

(default) | `cvpartition`

partition object

Cross-validation partition, specified as a `cvpartition`

partition object created by `cvpartition`

. The partition object
specifies the type of cross-validation and the indexing for the training and
validation sets.

You can specify only one of these four name-value arguments:
`'CVPartition'`

, `'Holdout'`

,
`'KFold'`

, or `'Leaveout'`

.

**Example: **Suppose you create a random partition for 5-fold cross-validation on 500
observations by using `cvp = cvpartition(500,'KFold',5)`

. Then, you
can specify the cross-validated model by using
`'CVPartition',cvp`

.

`Holdout`

— Fraction of data for holdout validation

scalar value in the range (0,1)

Fraction of the data used for holdout validation, specified as a scalar value in
the range (0,1). If you specify `'Holdout',p`

, then the software
completes these steps:

Randomly select and reserve

`p*100`

% of the data as validation data, and train the model using the rest of the data.Store the compact, trained model in the

`Trained`

property of the cross-validated model. If`Mdl`

does not have a corresponding compact object, then`Trained`

contains a full object.

You can specify only one of these four name-value arguments:
`'CVPartition'`

, `'Holdout'`

,
`'KFold'`

, or `'Leaveout'`

.

**Example: **`'Holdout',0.1`

**Data Types: **`double`

| `single`

`KFold`

— Number of folds

`10`

(default) | positive integer value greater than 1

Number of folds to use in a cross-validated model, specified as a positive integer
value greater than 1. If you specify `'KFold',k`

, then the software
completes these steps:

Randomly partition the data into

`k`

sets.For each set, reserve the set as validation data, and train the model using the other

`k`

– 1 sets.Store the

`k`

compact, trained models in a`k`

-by-1 cell vector in the`Trained`

property of the cross-validated model. If`Mdl`

does not have a corresponding compact object, then`Trained`

contains a full object.

You can specify only one of these four name-value arguments:
`'CVPartition'`

, `'Holdout'`

,
`'KFold'`

, or `'Leaveout'`

.

**Example: **`'KFold',5`

**Data Types: **`single`

| `double`

`Leaveout`

— Leave-one-out cross-validation flag

`'off'`

(default) | `'on'`

Leave-one-out cross-validation flag, specified as `'on'`

or
`'off'`

. If you specify `'Leaveout','on'`

, then
for each of the *n* observations (where *n* is the
number of observations, excluding missing observations, specified in the
`NumObservations`

property of the model), the software completes
these steps:

Reserve the one observation as validation data, and train the model using the other

*n*– 1 observations.Store the

*n*compact, trained models in an*n*-by-1 cell vector in the`Trained`

property of the cross-validated model. If`Mdl`

does not have a corresponding compact object, then`Trained`

contains a full object.

`'CVPartition'`

, `'Holdout'`

,
`'KFold'`

, or `'Leaveout'`

.

**Example: **`'Leaveout','on'`

## Output Arguments

`CVMdl`

— Cross-validated machine learning model

cross-validated (partitioned) model object

Cross-validated machine learning model, returned as one of the cross-validated
(partitioned) model objects in the following tables, depending on the input model
`Mdl`

.

**Regression Model Object**

Model | Regression Model (`Mdl` ) | Cross-Validated Model (`CVMdl` ) |
---|---|---|

Gaussian process regression model | `RegressionGP` | `RegressionPartitionedGP` |

Generalized additive model | `RegressionGAM` | `RegressionPartitionedGAM` |

Neural network model | `RegressionNeuralNetwork` | `RegressionPartitionedNeuralNetwork` |

**Classification Model Object**

Model | Classification Model (`Mdl` ) | Cross-Validated Model (`CVMdl` ) |
---|---|---|

Generalized additive model | `ClassificationGAM` | `ClassificationPartitionedGAM` |

k-nearest neighbor model | `ClassificationKNN` | `ClassificationPartitionedModel` |

Naive Bayes model | `ClassificationNaiveBayes` | `ClassificationPartitionedModel` |

Neural network model | `ClassificationNeuralNetwork` | `ClassificationPartitionedModel` |

Support vector machine for one-class and binary classification | `ClassificationSVM` | `ClassificationPartitionedModel` |

## Tips

Assess the predictive performance of

`Mdl`

on cross-validated data by using the*kfold*functions and properties of`CVMdl`

, such as`kfoldPredict`

,`kfoldLoss`

,`kfoldMargin`

, and`kfoldEdge`

for classification and`kfoldPredict`

and`kfoldLoss`

for regression.Return a partitioned classifier with stratified partitioning by using the name-value argument

`'KFold'`

or`'Holdout'`

.Create a

`cvpartition`

object`cvp`

using`cvp =`

`cvpartition`

`(n,'KFold',k)`

. Return a partitioned classifier with nonstratified partitioning by using the name-value argument`'CVPartition',cvp`

.

## Alternative Functionality

Instead of training a model and then cross-validating it, you can create a cross-validated
model directly by using a fitting function and specifying one of these name-value argument:
`'CrossVal'`

, `'CVPartition'`

,
`'Holdout'`

, `'Leaveout'`

, or
`'KFold'`

.

## Extended Capabilities

### GPU Arrays

Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Usage notes and limitations:

This function fully supports GPU arrays for a trained classification model specified as a

`ClassificationKNN`

or`ClassificationSVM`

object.

For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

## Version History

**Introduced in R2012a**

### R2023b: A cross-validated regression neural network model is a `RegressionPartitionedNeuralNetwork`

object

Starting in R2023b, a cross-validated regression neural network model is a `RegressionPartitionedNeuralNetwork`

object. In previous releases, a cross-validated regression neural network model was a `RegressionPartitionedModel`

object.

You can create a `RegressionPartitionedNeuralNetwork`

object in two ways:

Create a cross-validated model from a regression neural network model object

`RegressionNeuralNetwork`

by using the`crossval`

object function.Create a cross-validated model by using the

`fitrnet`

function and specifying one of the name-value arguments`CrossVal`

,`CVPartition`

,`Holdout`

,`KFold`

, or`Leaveout`

.

### R2022b: A cross-validated Gaussian process regression model is a `RegressionPartitionedGP`

object

Starting in R2022b, a cross-validated Gaussian process regression (GPR) model is a `RegressionPartitionedGP`

object. In previous releases, a cross-validated GPR
model was a `RegressionPartitionedModel`

object.

You can create a `RegressionPartitionedGP`

object in two ways:

Create a cross-validated model from a GPR model object

`RegressionGP`

by using the`crossval`

object function.Create a cross-validated model by using the

`fitrgp`

function and specifying one of the name-value arguments`CrossVal`

,`CVPartition`

,`Holdout`

,`KFold`

, or`Leaveout`

.

Regardless of whether you train a full or cross-validated GPR model first, you cannot specify an `ActiveSet`

value in the call to `fitrgp`

.

## See Also

## Open Example

You have a modified version of this example. Do you want to open this example with your edits?

## MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)