testckfold
Compare accuracies of two classification models by repeated crossvalidation
Syntax
Description
testckfold
statistically assesses the accuracies of two
classification models by repeatedly crossvalidating the two models, determining the
differences in the classification loss, and then formulating the test statistic by
combining the classification loss differences. This type of test is particularly
appropriate when sample size is limited.
You can assess whether the accuracies of the classification models are different, or
whether one classification model performs better than another. Available tests include a
5by2 paired t test, a 5by2 paired F test, and
a 10by10 repeated crossvalidation t test. For more details, see
Repeated CrossValidation Tests. To speed up computations,
testckfold
supports parallel computing (requires a Parallel Computing Toolbox™ license).
returns
the test decision that results from conducting a 5by2 paired F crossvalidation
test. The null hypothesis is the classification models h
= testckfold(C1
,C2
,X1
,X2
)C1
and C2
have
equal accuracy in predicting the true class labels using the predictor
and response data in the tables X1
and X2
. h
= 1
indicates
to reject the null hypothesis at the 5% significance level.
testckfold
conducts the crossvalidation
test by applying C1
and C2
to
all predictor variables in X1
and X2
,
respectively. The true class labels in X1
and X2
must
be the same. The response variable names in X1
, X2
, C1.ResponseName
,
and C2.ResponseName
must be the same.
For examples of ways to compare models, see Tips.
uses
any of the input arguments in the previous syntaxes and additional
options specified by one or more h
= testckfold(___,Name,Value
)Name,Value
pair
arguments. For example, you can specify the type of alternative hypothesis,
the type of test, or the use of parallel computing.
Examples
Compare Classification Tree PredictorSelection Algorithms
At each node, fitctree
chooses the best predictor to split using an exhaustive search by default. Alternatively, you can choose to split the predictor that shows the most evidence of dependence with the response by conducting curvature tests. This example statistically compares classification trees grown via exhaustive search for the best splits and grown by conducting curvature tests with interaction.
Load the census1994
data set.
load census1994.mat rng(1) % For reproducibility
Grow a default classification tree using the training set, adultdata
, which is a table. The responsevariable name is 'salary'
.
C1 = fitctree(adultdata,'salary')
C1 = ClassificationTree PredictorNames: {'age' 'workClass' 'fnlwgt' 'education' 'education_num' 'marital_status' 'occupation' 'relationship' 'race' 'sex' 'capital_gain' 'capital_loss' 'hours_per_week' 'native_country'} ResponseName: 'salary' CategoricalPredictors: [2 4 6 7 8 9 10 14] ClassNames: [<=50K >50K] ScoreTransform: 'none' NumObservations: 32561
C1
is a full ClassificationTree
model. Its ResponseName
property is 'salary'
. C1
uses an exhaustive search to find the best predictor to split on based on maximal splitting gain.
Grow another classification tree using the same data set, but specify to find the best predictor to split using the curvature test with interaction.
C2 = fitctree(adultdata,'salary','PredictorSelection','interactioncurvature')
C2 = ClassificationTree PredictorNames: {'age' 'workClass' 'fnlwgt' 'education' 'education_num' 'marital_status' 'occupation' 'relationship' 'race' 'sex' 'capital_gain' 'capital_loss' 'hours_per_week' 'native_country'} ResponseName: 'salary' CategoricalPredictors: [2 4 6 7 8 9 10 14] ClassNames: [<=50K >50K] ScoreTransform: 'none' NumObservations: 32561
C2
also is a full ClassificationTree
model with ResponseName
equal to 'salary'
.
Conduct a 5by2 paired F test to compare the accuracies of the two models using the training set. Because the responsevariable names in the data sets and the ResponseName
properties are all equal, and the response data in both sets are equal, you can omit supplying the response data.
h = testckfold(C1,C2,adultdata,adultdata)
h = logical
0
h = 0
indicates to not reject the null hypothesis that C1
and C2
have the same accuracies at 5% level.
Compare Accuracies of Two Different Classification Models
Conduct a statistical test comparing the misclassification rates of the two models using a 5by2 paired F test.
Load Fisher's iris data set.
load fisheriris;
Create a naive Bayes template and a classification tree template using default options.
C1 = templateNaiveBayes; C2 = templateTree;
C1
and C2
are template objects corresponding to the naive Bayes and classification tree algorithms, respectively.
Test whether the two models have equal predictive accuracies. Use the same predictor data for each model. testckfold
conducts a 5by2, twosided, paired F test by default.
rng(1); % For reproducibility
h = testckfold(C1,C2,meas,meas,species)
h = logical
0
h = 0
indicates to not reject the null hypothesis that the two models have equal predictive accuracies.
Compare Classification Accuracies of Simple and Complex Models
Conduct a statistical test to assess whether a simpler model has better accuracy than a more complex model using a 10by10 repeated crossvalidation t test.
Load Fisher's iris data set. Create a cost matrix that penalizes misclassifying a setosa iris twice as much as misclassifying a virginica iris as a versicolor.
load fisheriris;
tabulate(species)
Value Count Percent setosa 50 33.33% versicolor 50 33.33% virginica 50 33.33%
Cost = [0 2 2;2 0 1;2 1 0]; ClassNames = {'setosa' 'versicolor' 'virginica'};... % Specifies the order of the rows and columns in Cost
The empirical distribution of the classes is uniform, and the classification cost is slightly imbalanced.
Create two ECOC templates: one that uses linear SVM binary learners and one that uses SVM binary learners equipped with the RBF kernel.
tSVMLinear = templateSVM('Standardize',true); % Linear SVM by default tSVMRBF = templateSVM('KernelFunction','RBF','Standardize',true); C1 = templateECOC('Learners',tSVMLinear); C2 = templateECOC('Learners',tSVMRBF);
C1
and C2
are ECOC template objects. C1
is prepared for linear SVM. C2
is prepared for SVM with an RBF kernel training.
Test the null hypothesis that the simpler model (C1
) is at most as accurate as the more complex model (C2
) in terms of classification costs. Conduct the 10by10 repeated crossvalidation test. Request to return pvalues and misclassification costs.
rng(1); % For reproducibility [h,p,e1,e2] = testckfold(C1,C2,meas,meas,species,... 'Alternative','greater','Test','10x10t','Cost',Cost,... 'ClassNames',ClassNames)
h = logical
0
p = 0.1077
e1 = 10×10
0 0 0 0.0667 0 0.0667 0.1333 0 0.1333 0
0.0667 0.0667 0 0 0 0 0.0667 0 0.0667 0.0667
0 0 0 0 0 0.0667 0.0667 0.0667 0.0667 0.0667
0.0667 0.0667 0 0.0667 0 0.0667 0 0 0.0667 0
0.0667 0.0667 0.0667 0 0.0667 0.0667 0 0 0 0
0 0 0.1333 0 0 0.0667 0 0 0.0667 0.0667
0.0667 0.0667 0 0 0.0667 0 0 0.0667 0 0.0667
0.0667 0 0.0667 0.0667 0 0.1333 0 0.0667 0 0
0 0.0667 0.1333 0.0667 0.0667 0 0 0 0 0
0 0.0667 0.0667 0.0667 0.0667 0 0 0.0667 0 0
e2 = 10×10
0 0 0 0.1333 0 0.0667 0.1333 0 0.2667 0
0.0667 0.0667 0 0.1333 0 0 0 0.1333 0.1333 0.0667
0.1333 0.1333 0 0 0 0.0667 0 0.0667 0.0667 0.0667
0 0.1333 0 0.0667 0.1333 0.1333 0 0 0.0667 0
0.0667 0.0667 0.0667 0 0.0667 0.1333 0.1333 0 0 0.0667
0.0667 0 0.0667 0.0667 0 0.0667 0.1333 0 0.0667 0.0667
0.2000 0.0667 0 0 0.0667 0 0 0.1333 0 0.0667
0.2000 0 0 0.1333 0 0.1333 0 0.0667 0 0
0 0.0667 0.0667 0.0667 0.1333 0 0.2000 0 0 0
0.0667 0.0667 0 0.0667 0.1333 0 0 0.0667 0.1333 0.0667
The pvalue is slightly greater than 0.10, which indicates to retain the null hypothesis that the simpler model is at most as accurate as the more complex model. This result is consistent for any significance level (Alpha
) that is at most 0.10.
e1
and e2
are 10by10 matrices containing misclassification costs. Row r corresponds to run r of the repeated cross validation. Column k corresponds to testset fold k within a particular crossvalidation run. For example, element (2,4) of e2
is 0.1333. This value means that in crossvalidation run 2, when the test set is fold 4, the estimated testset misclassification cost is 0.1333.
Select Features Using Statistical Accuracy Comparison
Reduce classification model complexity by selecting a subset of predictor variables (features) from a larger set. Then, statistically compare the accuracy between the two models.
Load the ionosphere
data set.
load ionosphere
Train an ensemble of 100 boosted classification trees using AdaBoostM1 and the entire set of predictors. Inspect the importance measure for each predictor.
t = templateTree('MaxNumSplits',1); % Weaklearner template tree object C = fitcensemble(X,Y,'Method','AdaBoostM1','Learners',t); predImp = predictorImportance(C); bar(predImp) h = gca; h.XTick = 1:2:h.XLim(2); title('Predictor Importances') xlabel('Predictor') ylabel('Importance measure')
Identify the top five predictors in terms of their importance.
[~,idxSort] = sort(predImp,'descend');
idx5 = idxSort(1:5);
Test whether the two models have equal predictive accuracies. Specify the reduced data set and then the full predictor data. Use parallel computing to speed up computations.
s = RandStream('mlfg6331_64'); Options = statset('UseParallel',true,'Streams',s,'UseSubstreams',true); [h,p,e1,e2] = testckfold(C,C,X(:,idx5),X,Y,'Options',Options)
Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 6).
h = logical
0
p = 0.4161
e1 = 5×2
0.0686 0.0795
0.0800 0.0625
0.0914 0.0568
0.0400 0.0739
0.0914 0.0966
e2 = 5×2
0.0914 0.0625
0.1257 0.0682
0.0971 0.0625
0.0800 0.0909
0.0914 0.1193
testckfold
treats trained classification models as templates, and so it ignores all fitted parameters in C
. That is, testckfold
cross validates C
using only the specified options and the predictor data to estimate the outoffold classification losses.
h = 0
indicates to not reject the null hypothesis that the two models have equal predictive accuracies. This result favors the simpler ensemble.
Input Arguments
C1
— Classification model template or trained classification model
classification model template object  trained classification model object
Classification model template or trained classification model, specified as any classification model template object or trained classification model object described in these tables.
Template Type  Returned By 

Classification tree  templateTree 
Discriminant analysis  templateDiscriminant 
Ensemble (boosting, bagging, and random subspace)  templateEnsemble 
Errorcorrecting output codes (ECOC), multiclass classification model  templateECOC 
Generalized Additive Model  templateGAM 
Gaussian kernel classification with support vector machine (SVM) or logistic regression learners  templateKernel 
kNN  templateKNN 
Linear classification with SVM or logistic regression learners  templateLinear 
Naive Bayes  templateNaiveBayes 
SVM  templateSVM 
Trained Model Type  Model Object  Returned By 

Classification tree  ClassificationTree  fitctree 
Discriminant analysis  ClassificationDiscriminant  fitcdiscr 
Ensemble of bagged classification models  ClassificationBaggedEnsemble  fitcensemble 
Ensemble of classification models  ClassificationEnsemble  fitcensemble 
ECOC model  ClassificationECOC  fitcecoc 
Generalized additive model (GAM)  ClassificationGAM  fitcgam 
kNN  ClassificationKNN  fitcknn 
Naive Bayes  ClassificationNaiveBayes  fitcnb 
Neural network  ClassificationNeuralNetwork (with
observations in rows)  fitcnet 
SVM  ClassificationSVM  fitcsvm 
For efficiency, supply a classification model template object instead of a trained classification model object.
C2
— Classification model template or trained model
classification model template object  trained classification model object
Classification model template or trained classification model, specified as any classification model template object or trained classification model object described in these tables.
Template Type  Returned By 

Classification tree  templateTree 
Discriminant analysis  templateDiscriminant 
Ensemble (boosting, bagging, and random subspace)  templateEnsemble 
Errorcorrecting output codes (ECOC), multiclass classification model  templateECOC 
Generalized Additive Model  templateGAM 
Gaussian kernel classification with support vector machine (SVM) or logistic regression learners  templateKernel 
kNN  templateKNN 
Linear classification with SVM or logistic regression learners  templateLinear 
Naive Bayes  templateNaiveBayes 
SVM  templateSVM 
Trained Model Type  Model Object  Returned By 

Classification tree  ClassificationTree  fitctree 
Discriminant analysis  ClassificationDiscriminant  fitcdiscr 
Ensemble of bagged classification models  ClassificationBaggedEnsemble  fitcensemble 
Ensemble of classification models  ClassificationEnsemble  fitcensemble 
ECOC model  ClassificationECOC  fitcecoc 
Generalized additive model (GAM)  ClassificationGAM  fitcgam 
kNN  ClassificationKNN  fitcknn 
Naive Bayes  ClassificationNaiveBayes  fitcnb 
Neural network  ClassificationNeuralNetwork (with
observations in rows)  fitcnet 
SVM  ClassificationSVM  fitcsvm 
For efficiency, supply a classification model template object instead of a trained classification model object.
X1
— Data used to apply to first full classification model or template
numeric matrix  table
Data used to apply to the first full classification model or
template, C1
, specified as a numeric matrix or
table.
Each row of X1
corresponds to one observation, and each column corresponds
to one variable. testckfold
does not support
multicolumn variables and cell arrays other than cell arrays of character
vectors.
X1
and X2
must be of
the same data type, and X1
, X2
, Y
must
have the same number of observations.
If you specify Y
as an array, then testckfold
treats all columns of X1
as separate predictor variables.
Data Types: double
 single
 table
X2
— Data used to apply to second full classification model or template
numeric matrix  table
Data used to apply to the second full classification model or
template, C2
, specified as a numeric matrix or
table.
Each row of X2
corresponds to one observation, and each column corresponds
to one variable. testckfold
does not support
multicolumn variables and cell arrays other than cell arrays of character
vectors.
X1
and X2
must be of
the same data type, and X1
, X2
, Y
must
have the same number of observations.
If you specify Y
as an array, then testckfold
treats all columns of X2
as separate predictor variables.
Data Types: double
 single
 table
Y
— True class labels
categorical array  character array  string array  logical vector  numeric vector  cell array of character vectors  character vector  string scalar
True class labels, specified as a categorical, character, or string array, a logical or numeric vector, a cell array of character vectors, or a character vector or string scalar.
For a character vector or string scalar,
X1
andX2
must be tables, their response variables must have the same name and values, andY
must be the common variable name. For example, ifX1.Labels
andX2.Labels
are the response variables, thenY
is'Labels'
andX1.Labels
andX2.Labels
must be equivalent.For all other supported data types,
Y
is an array of true class labels.If
Y
is a character array, then each element must correspond to one row of the array.X1
,X2
,Y
must have the same number of observations (rows).
If both of these statements are true, then you can omit supplying
Y
.Consequently,
testckfold
uses the common response variable in the tables. For example, if the response variables in the tables areX1.Labels
andX2.Labels
, and the values ofC1.ResponseName
andC2.ResponseName
are'Labels'
, then you do not have to supplyY
.
Data Types: categorical
 char
 string
 logical
 single
 double
 cell
NameValue Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Namevalue arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: 'Alternative','greater','Test','10x10t','Options',statsset('UseParallel',true)
specifies to test whether the first set of first predicted class labels is more
accurate than the second set, to conduct the 10by10 t test, and to use parallel
computing for crossvalidation.
Alpha
— Hypothesis test significance level
0.05
(default)  scalar value in the interval (0,1)
Hypothesis test significance level, specified as the commaseparated
pair consisting of 'Alpha'
and a scalar value in
the interval (0,1).
Example: 'Alpha',0.1
Data Types: single
 double
Alternative
— Alternative hypothesis to assess
'unequal'
(default)  'greater'
 'less'
Alternative hypothesis to assess, specified as the commaseparated
pair consisting of 'Alternative'
and one of the
values listed in the table.
Value  Alternative Hypothesis Description  Supported Tests 

'unequal' (default)  For predicting Y , the set of predictions
resulting from C1 applied to X1 and C2 applied
to X2 have unequal accuracies.  '5x2F' , '5x2t' , and '10x10t' 
'greater'  For predicting Y , the set of predictions
resulting from C1 applied to X1 is
more accurate than C2 applied to X2 .  '5x2t' and '10x10t' 
'less'  For predicting Y , the set of predictions
resulting from C1 applied to X1 is
less accurate than C2 applied to X2 .  '5x2t' and '10x10t' 
For details on supported tests, see Test
.
Example: 'Alternative','greater'
X1CategoricalPredictors
— Flag identifying categorical predictors
[]
(default)  logical vector  numeric vector  'all'
Flag identifying categorical predictors in the first testset
predictor data (X1
), specified as the commaseparated
pair consisting of 'X1CategoricalPredictors'
and
one of the following:
A numeric vector with indices from
1
throughp
, wherep
is the number of columns ofX1
.A logical vector of length
p
, where atrue
entry means that the corresponding column ofX1
is a categorical variable.'all'
, meaning all predictors are categorical.
The default is []
, which indicates that the data
contains no categorical predictors.
For a kNN classification model, valid options are
[]
and 'all'
.
You must specify X1CategoricalPredictors
if
X1
is a matrix and includes categorical
predictors. testckfold
does not use the
CategoricalPredictors
property of
C1
when C1
is a trained
classification model. If C1
is a trained model with
categorical predictors, specify
'X1CategoricalPredictors',C1.CategoricalPredictors
.
Example: 'X1CategoricalPredictors','all'
Data Types: single
 double
 logical
 char
 string
X2CategoricalPredictors
— Flag identifying categorical predictors
[]
(default)  logical vector  numeric vector  'all'
Flag identifying categorical predictors in the second testset
predictor data (X2
), specified as the commaseparated
pair consisting of 'X2CategoricalPredictors'
and
one of the following:
A numeric vector with indices from
1
throughp
, wherep
is the number of columns ofX2
.A logical vector of length
p
, where atrue
entry means that the corresponding column ofX2
is a categorical variable.'all'
, meaning all predictors are categorical.
The default is []
, which indicates that the data contains no categorical
predictors.
For a kNN classification model, valid options are
[]
and 'all'
.
You must specify X2CategoricalPredictors
if
X2
is a matrix and includes categorical
predictors. testckfold
does not use the
CategoricalPredictors
property of
C2
when C2
is a trained
classification model. If C2
is a trained model with
categorical predictors, specify
'X2CategoricalPredictors',C2.CategoricalPredictors
.
Example: 'X2CategoricalPredictors','all'
Data Types: single
 double
 logical
 char
 string
ClassNames
— Class names
categorical array  character array  string array  logical vector  numeric vector  cell array of character vectors
Class names, specified as the commaseparated pair consisting of
'ClassNames'
and a categorical, character, or
string array, logical or numeric vector, or cell array of character
vectors. You must set ClassNames
using the data
type of Y
.
If ClassNames
is a character array, then each
element must correspond to one row of the array.
Use ClassNames
to:
Specify the order of any input argument dimension that corresponds to class order. For example, use
ClassNames
to specify the order of the dimensions ofCost
.Select a subset of classes for testing. For example, suppose that the set of all distinct class names in
Y
is{'a','b','c'}
. To train and test models using observations from classes'a'
and'c'
only, specify'ClassNames',{'a','c'}
.
The default is the set of all distinct class names in
Y
.
Example: 'ClassNames',{'b','g'}
Data Types: single
 double
 logical
 char
 string
 cell
 categorical
Cost
— Classification cost
square matrix  structure array
Classification cost, specified as the commaseparated pair consisting
of 'Cost'
and a square matrix or structure array.
If you specify the square matrix
Cost
, thenCost(i,j)
is the cost of classifying a point into classj
if its true class isi
. That is, the rows correspond to the true class and the columns correspond to the predicted class. To specify the class order for the corresponding rows and columns ofCost
, additionally specify theClassNames
namevalue pair argument.If you specify the structure
S
, thenS
must have two fields:S.ClassNames
, which contains the class names as a variable of the same data type asY
. You can use this field to specify the order of the classes.S.ClassificationCosts
, which contains the cost matrix, with rows and columns ordered as inS.ClassNames
For costsensitive testing use, testcholdout
.
It is a best practice to supply the same cost matrix used to train the classification models.
The default is Cost(i,j) = 1
if i
~= j
, and Cost(i,j) = 0
if i
= j
.
Example: 'Cost',[0 1 2 ; 1 0 2; 2 2 0]
Data Types: double
 single
 struct
LossFun
— Loss function
'classiferror'
(default)  'binodeviance'
 'exponential'
 'hinge'
 function handle
Loss function, specified as the commaseparated pair consisting
of 'LossFun'
and 'classiferror'
, 'binodeviance'
, 'exponential'
, 'hinge'
,
or a function handle.
The following table lists the available loss functions.
Value Loss Function 'binodeviance'
Binomial deviance 'classiferror'
Classification error 'exponential'
Exponential loss 'hinge'
Hinge loss Specify your own function using function handle notation.
Suppose that
n = size(X,1)
is the sample size and there areK
unique classes. Your function must have the signaturelossvalue =
, where:lossfun
(C,S,W,Cost)The output argument
lossvalue
is a scalar.lossfun
is the name of your function.C
is ann
byK
logical matrix with rows indicating which class the corresponding observation belongs to. The column order corresponds to the class order in theClassNames
namevalue pair argument.Construct
C
by settingC(p,q) = 1
if observationp
is in classq
, for each row. Set all other elements of rowp
to0
.S
is ann
byK
numeric matrix of classification scores. The column order corresponds to the class order in theClassNames
namevalue pair argument.S
is a matrix of classification scores.W
is ann
by1 numeric vector of observation weights. If you passW
, the software normalizes the weights to sum to1
.Cost
is aK
byK
numeric matrix of classification costs. For example,Cost = ones(K)  eye(K)
specifies a cost of0
for correct classification and a cost of1
for misclassification.
Specify your function using
'LossFun',@
.lossfun
Options
— Parallel computing options
[]
(default)  structure array returned by statset
Parallel computing options, specified as the commaseparated pair
consisting of 'Options'
and a structure array
returned by statset
. These options
require Parallel Computing Toolbox.
testckfold
uses 'Streams'
,
'UseParallel'
, and
'UseSubtreams'
fields.
This table summarizes the available options.
Option  Description 

'Streams'  A
In that case, use a
cell array of the same size as the parallel pool.
If a parallel pool is not open, then the software
tries to open one (depending on your preferences),
and 
'UseParallel'  If you have Parallel Computing
Toolbox, then you can invoke a pool of
workers by setting
'UseParallel',true . 
'UseSubstreams'  Set to true to compute in
parallel using the stream specified by
'Streams' . Default is
false . For example, set
Streams to a type allowing
substreams, such as'mlfg6331_64'
or 'mrg32k3a' . 
Example: 'Options',statset('UseParallel',true)
Data Types: struct
Prior
— Prior probabilities
'empirical'
(default)  'uniform'
 numeric vector  structure
Prior probabilities for each class, specified as the commaseparated
pair consisting of 'Prior'
and 'empirical'
, 'uniform'
,
a numeric vector, or a structure.
This table summarizes the available options for setting prior probabilities.
Value  Description 

'empirical'  The class prior probabilities are the class relative frequencies
in Y . 
'uniform'  All class prior probabilities are equal to 1/K, where K is the number of classes. 
numeric vector  Each element is a class prior probability. Specify the order
using the ClassNames namevalue pair argument.
The software normalizes the elements such that they sum to 1 . 
structure  A structure

Example: 'Prior',struct('ClassNames',{{'setosa','versicolor'}},'ClassProbs',[1,2])
Data Types: char
 string
 single
 double
 struct
Test
— Test to conduct
'5x2F'
(default)  '5x2t'
 '10x10t'
Test to conduct, specified as the commaseparated pair consisting
of 'Test'
and one of he following: '5x2F'
, '5x2t'
, '10x10t'
.
Value  Description  Supported Alternative Hypothesis 

'5x2F' (default)  5by2 paired F test. Appropriate for twosided testing only.  'unequal' 
'5x2t'  5by2 paired t test  'unequal' , 'less' , 'greater' 
'10x10t'  10by10 repeated crossvalidation t test  'unequal' , 'less' , 'greater' 
For details on the available tests, see Repeated CrossValidation Tests. For details on supported
alternative hypotheses, see Alternative
.
Example: 'Test','10x10t'
Verbose
— Verbosity level
0
(default)  1
 2
Verbosity level, specified as the commaseparated pair consisting
of 'Verbose'
and 0
, 1
,
or 2
. Verbose
controls the amount
of diagnostic information that the software displays in the Command
Window during training of each crossvalidation fold.
This table summarizes the available verbosity level options.
Value  Description 

0  The software does not display diagnostic information. 
1  The software displays diagnostic messages every time it implements a new crossvalidation run. 
2  The software displays diagnostic messages every time it implements a new crossvalidation run, and every time it trains on a particular fold. 
Example: 'Verbose',1
Data Types: double
 single
Weights
— Observation weights
ones(size(X,1),1)
(default)  numeric vector
Observation weights, specified as the commaseparated pair consisting
of 'Weights'
and a numeric vector.
The size of Weights
must equal the number
of rows of X1
. The software weighs the observations
in each row of X
with the corresponding weight
in Weights
.
The software normalizes Weights
to sum up
to the value of the prior probability in the respective class.
Data Types: double
 single
Notes:
testckfold
treats trained classification models as templates. Therefore, it ignores all fitted parameters in the model. That is,testckfold
crossvalidates using only the options specified in the model and the predictor data.The repeated crossvalidation tests depend on the assumption that the test statistics are asymptotically normal under the null hypothesis. Highly imbalanced cost matrices (for example,
Cost
=[0 100;1 0]
) and highly discrete response distributions (that is, most of the observations are in a small number of classes) might violate the asymptotic normality assumption. For costsensitive testing, usetestcholdout
.NaN
s,<undefined>
values, empty character vectors (''
), empty strings (""
), and<missing>
values indicate missing data values.
Output Arguments
h
— Hypothesis test result
1
 0
Hypothesis test result, returned as a logical value.
h = 1
indicates the rejection of the null
hypothesis at the Alpha
significance level.
h = 0
indicates failure to reject the null hypothesis at the
Alpha
significance level.
Data Types: logical
p
— pvalue
scalar in the interval [0,1]
pvalue of the test, returned as a scalar
in the interval [0,1]. p
is the probability that
a random test statistic is at least as extreme as the observed test
statistic, given that the null hypothesis is true.
testckfold
estimates p
using
the distribution of the test statistic, which varies with the type
of test. For details on test statistics, see Repeated CrossValidation Tests.
e1
— Classification losses
numeric matrix
Classification
losses, returned as a numeric matrix. The rows of e1
correspond
to the crossvalidation run and the columns correspond to the test
fold.
testckfold
applies the first testset
predictor data (X1
) to the first classification
model (C1
) to estimate the first set of class
labels.
e1
summarizes the accuracy of the first set
of class labels predicting the true class labels (Y
)
for each crossvalidation run and fold. The meaning of the elements
of e1
depends on the type of classification loss.
e2
— Classification losses
numeric matrix
Classification
losses, returned as a numeric matrix. The rows of e2
correspond
to the crossvalidation run and the columns correspond to the test
fold.
testckfold
applies the second testset predictor data
(X2
) to the second classification model
(C2
) to estimate the second set of class
labels.
e2
summarizes the accuracy of the second set of class labels predicting the
true class labels (Y
) for each crossvalidation run and
fold. The meaning of the elements of e2
depends on the
type of classification loss.
More About
Repeated CrossValidation Tests
Repeated crossvalidation tests form the test statistic for comparing the accuracies of two classification models by combining the classification loss differences resulting from repeatedly crossvalidating the data. Repeated crossvalidation tests are useful when sample size is limited.
To conduct an RbyK test:
Randomly divide (stratified by class) the predictor data sets and true class labels into K sets, R times. Each division is called a run and each set within a run is called a fold. Each run contains the complete, but divided, data sets.
For runs r = 1 through R, repeat these steps for k = 1 through K:
Reserve fold k as a test set, and train the two classification models using their respective predictor data sets on the remaining K – 1 folds.
Predict class labels using the trained models and their respective fold k predictor data sets.
Estimate the classification loss by comparing the two sets of estimated labels to the true labels. Denote $${e}_{crk}$$ as the classification loss when the test set is fold k in run r of classification model c.
Compute the difference between the classification losses of the two models:
$${\widehat{\delta}}_{rk}={e}_{1rk}{e}_{2rk}.$$
At the end of a run, there are K classification losses per classification model.
Combine the results of step 2. For each r = 1 through R:
Estimate the withinfold averages of the differences and their average: $${\overline{\delta}}_{r}=\frac{1}{K}{\displaystyle \sum _{k=1}^{K}{\widehat{\delta}}_{kr}}.$$
Estimate the overall average of the differences: $$\overline{\delta}=\frac{1}{KR}{\displaystyle \sum _{r=1}^{R}{\displaystyle \sum _{k=1}^{K}{\widehat{\delta}}_{rk}}}.$$
Estimate the withinfold variances of the differences: $${s}_{r}^{2}=\frac{1}{K}{\displaystyle \sum}_{k=1}^{K}{\left({\widehat{\delta}}_{rk}{\overline{\delta}}_{r}\right)}^{2}.$$
Estimate the average of the withinfold differences: $${\overline{s}}^{2}=\frac{1}{R}{\displaystyle \sum _{r=1}^{R}{s}_{r}^{2}}.$$
Estimate the overall sample variance of the differences: $${S}^{2}=\frac{1}{KR1}{\displaystyle \sum _{r=1}^{R}{\displaystyle \sum}_{k=1}^{K}}{\left({\widehat{\delta}}_{rk}\overline{\delta}\right)}^{2}.$$
Compute the test statistic. All supported tests described here assume that, under H_{0}, the estimated differences are independent and approximately normally distributed, with mean 0 and a finite, common standard deviation. However, these tests violate the independence assumption, and so the teststatistic distributions are approximate.
For R = 2, the test is a paired test. The two supported tests are a paired t and F test.
The test statistic for the paired t test is
$${t}_{paired}^{\ast}=\frac{{\widehat{\delta}}_{11}}{\sqrt{{\overline{s}}^{2}}}.$$
$${t}_{paired}^{\ast}$$ has a tdistribution with R degrees of freedom under the null hypothesis.
To reduce the effects of correlation between the estimated differences, the quantity $${\widehat{\delta}}_{11}$$ occupies the numerator rather than $$\overline{\delta}$$.
5by2 paired t tests can be slightly conservative [4].
The test statistic for the paired F test is
$${F}_{paired}^{\ast}=\frac{\frac{1}{RK}{\displaystyle \sum _{r=1}^{R}{\displaystyle \sum _{k=1}^{K}{\left({\widehat{\delta}}_{rk}\right)}^{2}}}}{{\overline{s}}^{2}}.$$
$${F}_{paired}^{\ast}$$ has an F distribution with RK and R degrees of freedom.
A 5by2 paired F test has comparable power to the 5by2 t test, but is more conservative [1].
For R > 2, the test is a repeated crossvalidation test. The test statistic is
$${t}_{CV}^{\ast}=\frac{\overline{\delta}}{S/\sqrt{\nu +1}}.$$
$${t}_{CV}^{\ast}$$ has a t distribution with ν degrees of freedom. If the differences were truly independent, then ν = RK – 1. In this case, the degrees of freedom parameter must be optimized.
For a 10by10 repeated crossvalidation t test, the optimal degrees of freedom between 8 and 11 ([2] and [3]).
testckfold
uses ν = 10.
The advantage of repeated crossvalidation tests over paired tests is that the results are more repeatable [3]. The disadvantage is that they require high computational resources.
Classification Loss
Classification losses indicate the accuracy of a classification model or set of predicted labels. In general, for a fixed cost matrix, classification accuracy decreases as classification loss increases.
testckfold
returns the classification
losses (see e1
and e2
) under
the alternative hypothesis (that is, the unrestricted classification
losses). In the definitions that follow:
The classification losses focus on the first classification model. The classification losses for the second model are similar.
n_{test} is the testset sample size.
I(x) is the indicator function. If x is a true statement, then I(x) = 1. Otherwise, I(x) = 0.
$${\widehat{p}}_{1j}$$ is the predicted class assignment of classification model 1 for observation j.
y_{j} is the true class label of observation j.
Binomial deviance has the form
$${e}_{1}=\frac{{\displaystyle \sum _{j=1}^{{n}_{test}}{w}_{j}}\mathrm{log}\left(1+\mathrm{exp}\left(2{y}_{j}^{\prime}f({X}_{j})\right)\right)}{{\displaystyle \sum}_{j=1}^{{n}_{test}}{w}_{j}}$$
where:
y_{j} = 1 for the positive class and 1 for the negative class.
$$f({X}_{j})$$ is the classification score.
The binomial deviance has connections to the maximization of the binomial likelihood function. For details on binomial deviance, see [5].
Exponential loss is similar to binomial deviance and has the form
$${e}_{1}=\frac{{\displaystyle \sum _{j=1}^{{n}_{test}}{w}_{j}}\mathrm{exp}\left({y}_{j}f({X}_{j})\right)}{{\displaystyle \sum}_{j=1}^{{n}_{test}}{w}_{j}}.$$
y_{j} and $$f({X}_{j})$$ take the same forms here as in the binomial deviance formula.
Hinge loss has the form
$${e}_{1}=\frac{{\displaystyle \sum}_{j=1}^{n}{w}_{j}\mathrm{max}\left\{0,1{y}_{j}\prime f\left({X}_{j}\right)\right\}}{{\displaystyle \sum}_{j=1}^{n}{w}_{j}},$$
y_{j} and $$f({X}_{j})$$ take the same forms here as in the binomial deviance formula.
Hinge loss linearly penalizes for misclassified observations and is related to the SVM objective function used for optimization. For more details on hinge loss, see [5].
Misclassification rate, or classification error, is a scalar in the interval [0,1] representing the proportion of misclassified observations. That is, the misclassification rate for the first classification model is
$${e}_{1}=\frac{{\displaystyle \sum _{j=1}^{{n}_{test}}{w}_{j}}I({\widehat{p}}_{1j}\ne {y}_{j})}{{\displaystyle \sum _{j=1}^{{n}_{test}}{w}_{j}}}.$$
Tips
Examples of ways to compare models include:
Compare the accuracies of a simple classification model and a more complex model by passing the same set of predictor data.
Compare the accuracies of two different models using two different sets of predictors.
Perform various types of Feature Selection. For example, you can compare the accuracy of a model trained using a set of predictors to the accuracy of one trained on a subset or different set of predictors. You can arbitrarily choose the set of predictors, or use a feature selection technique like PCA or sequential feature selection (see
pca
andsequentialfs
).
If both of these statements are true, then you can omit supplying
Y
.Consequently,
testckfold
uses the common response variable in the tables.One way to perform costinsensitive feature selection is:
Create a classification model template that characterizes the first classification model (
C1
).Create a classification model template that characterizes the second classification model (
C2
).Specify two predictor data sets. For example, specify
X1
as the full predictor set andX2
as a reduced set.Enter
testckfold(C1,C2,X1,X2,Y,'Alternative','less')
. Iftestckfold
returns1
, then there is enough evidence to suggest that the classification model that uses fewer predictors performs better than the model that uses the full predictor set.
Alternatively, you can assess whether there is a significant difference between the accuracies of the two models. To perform this assessment, remove the
'Alternative','less'
specification in step 4.testckfold
conducts a twosided test, andh = 0
indicates that there is not enough evidence to suggest a difference in the accuracy of the two models.The tests are appropriate for the misclassification rate classification loss, but you can specify other loss functions (see
LossFun
). The key assumptions are that the estimated classification losses are independent and normally distributed with mean 0 and finite common variance under the twosided null hypothesis. Classification losses other than the misclassification rate can violate this assumption.Highly discrete data, imbalanced classes, and highly imbalanced cost matrices can violate the normality assumption of classification loss differences.
Algorithms
If you specify to conduct the 10by10 repeated crossvalidation t test
using 'Test','10x10t'
, then testckfold
uses
10 degrees of freedom for the t distribution to
find the critical region and estimate the pvalue.
For more details, see [2] and [3].
Alternatives
Use testcholdout
:
For test sets with larger sample sizes
To implement variants of the McNemar test to compare two classification model accuracies
For costsensitive testing using a chisquare or likelihood ratio test. The chisquare test uses
quadprog
(Optimization Toolbox), which requires an Optimization Toolbox™ license.
References
[1] Alpaydin, E. “Combined 5 x 2 CV F Test for Comparing Supervised Classification Learning Algorithms.” Neural Computation, Vol. 11, No. 8, 1999, pp. 1885–1992.
[2] Bouckaert. R. “Choosing Between Two Learning Algorithms Based on Calibrated Tests.” International Conference on Machine Learning, 2003, pp. 51–58.
[3] Bouckaert, R., and E. Frank. “Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms.” Advances in Knowledge Discovery and Data Mining, 8th PacificAsia Conference, 2004, pp. 3–12.
[4] Dietterich, T. “Approximate statistical tests for comparing supervised classification learning algorithms.” Neural Computation, Vol. 10, No. 7, 1998, pp. 1895–1923.
[5] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, 2nd Ed. New York: Springer, 2008.
Extended Capabilities
Automatic Parallel Support
Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.
To run in parallel, specify the Options
namevalue argument in the call to
this function and set the UseParallel
field of the
options structure to true
using
statset
:
"Options",statset("UseParallel",true)
For more information about parallel computing, see Run MATLAB Functions with Automatic Parallel Support (Parallel Computing Toolbox).
Version History
Introduced in R2015a
See Also
testcholdout
 templateECOC
 templateEnsemble
 templateDiscriminant
 templateTree
 templateSVM
 templateNaiveBayes
 templateKNN
Topics
Open Example
You have a modified version of this example. Do you want to open this example with your edits?
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
 América Latina (Español)
 Canada (English)
 United States (English)
Europe
 Belgium (English)
 Denmark (English)
 Deutschland (Deutsch)
 España (Español)
 Finland (English)
 France (Français)
 Ireland (English)
 Italia (Italiano)
 Luxembourg (English)
 Netherlands (English)
 Norway (English)
 Österreich (Deutsch)
 Portugal (English)
 Sweden (English)
 Switzerland
 United Kingdom (English)