# ClassificationPartitionedKernelECOC

Cross-validated kernel error-correcting output codes (ECOC) model for multiclass classification

## Description

`ClassificationPartitionedKernelECOC` is an error-correcting output codes (ECOC) model composed of kernel classification models, trained on cross-validated folds. Estimate the quality of the classification by cross-validation using one or more “kfold” functions: `kfoldPredict`, `kfoldLoss`, `kfoldMargin`, and `kfoldEdge`.

Every “kfold” method uses models trained on training-fold (in-fold) observations to predict the response for validation-fold (out-of-fold) observations. For example, suppose that you cross-validate using five folds. In this case, the software randomly assigns each observation into five groups of equal size (roughly). The training fold contains four of the groups (that is, roughly 4/5 of the data) and the validation fold contains the other group (that is, roughly 1/5 of the data). In this case, cross-validation proceeds as follows:

1. The software trains the first model (stored in `CVMdl.Trained{1}`) by using the observations in the last four groups and reserves the observations in the first group for validation.

2. The software trains the second model (stored in `CVMdl.Trained{2}`) using the observations in the first group and the last three groups. The software reserves the observations in the second group for validation.

3. The software proceeds in a similar fashion for the third, fourth, and fifth models.

If you validate by using `kfoldPredict`, the software computes predictions for the observations in group i by using the ith model. In short, the software estimates a response for every observation by using the model trained without that observation.

### Note

`ClassificationPartitionedKernelECOC` model objects do not store the predictor data set.

## Creation

You can create a `ClassificationPartitionedKernelECOC` model by training an ECOC model using `fitcecoc` and specifying these name-value pair arguments:

• `'Learners'`– Set the value to `'kernel'`, a template object returned by `templateKernel`, or a cell array of such template objects.

• One of the arguments `'CrossVal'`, `'CVPartition'`, `'Holdout'`, `'KFold'`, or `'Leaveout'`.

For more details, see `fitcecoc`.

## Properties

expand all

### Cross-Validation Properties

Cross-validated model name, specified as a character vector.

For example, `'KernelECOC'` specifies a cross-validated kernel ECOC model.

Data Types: `char`

Number of cross-validated folds, specified as a positive integer scalar.

Data Types: `double`

Cross-validation parameter values, specified as an object. The parameter values correspond to the name-value pair argument values used to cross-validate the ECOC classifier. `ModelParameters` does not contain estimated parameters.

You can access the properties of `ModelParameters` using dot notation.

Number of observations in the training data, specified as a positive numeric scalar.

Data Types: `double`

Data partition indicating how the software splits the data into cross-validation folds, specified as a `cvpartition` model.

Compact classifiers trained on cross-validation folds, specified as a cell array of `CompactClassificationECOC` models. `Trained` has k cells, where k is the number of folds.

Data Types: `cell`

Observation weights used to cross-validate the model, specified as a numeric vector. `W` has `NumObservations` elements.

The software normalizes the weights used for training so that `nansum(W)` is `1`.

Data Types: `single` | `double`

Observed class labels used to cross-validate the model, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. `Y` has `NumObservations` elements and has the same data type as the input argument `Y` that you pass to `fitcecoc` to cross-validate the model. (The software treats string arrays as cell arrays of character vectors.)

Each row of `Y` represents the observed classification of the corresponding row of `X`.

Data Types: `categorical` | `char` | `logical` | `single` | `double` | `cell`

### ECOC Properties

Binary learner loss function, specified as a character vector representing the loss function name.

By default, if all binary learners are kernel classification models using SVM, then `BinaryLoss` is `'hinge'`. If all binary learners are kernel classification models using logistic regression, then `BinaryLoss` is `'quadratic'`. To potentially increase accuracy, specify a binary loss function other than the default during a prediction or loss computation by using the `'BinaryLoss'` name-value pair argument of `kfoldPredict` or `kfoldLoss`.

Data Types: `char`

Binary learner class labels, specified as a numeric matrix or `[]`.

• If the coding matrix is the same across all folds, then `BinaryY` is a `NumObservations`-by-L matrix, where L is the number of binary learners (`size(CodingMatrix,2)`).

The elements of `BinaryY` are `–1`, `0`, or `1`, and the values correspond to dichotomous class assignments. This table describes how learner `j` assigns observation `k` to a dichotomous class corresponding to the value of `BinaryY(k,j)`.

ValueDichotomous Class Assignment
`–1`Learner `j` assigns observation `k` to a negative class.
`0`Before training, learner `j` removes observation `k` from the data set.
`1`Learner `j` assigns observation `k` to a positive class.

• If the coding matrix varies across folds, then `BinaryY` is empty (`[]`).

Data Types: `double`

Codes specifying class assignments for the binary learners, specified as a numeric matrix or `[]`.

• If the coding matrix is the same across all folds, then `CodingMatrix` is a K-by-L matrix, where K is the number of classes and L is the number of binary learners.

The elements of `CodingMatrix` are `–1`, `0`, or `1`, and the values correspond to dichotomous class assignments. This table describes how learner `j` assigns observations in class `i` to a dichotomous class corresponding to the value of `CodingMatrix(i,j)`.

ValueDichotomous Class Assignment
`–1`Learner `j` assigns observations in class `i` to a negative class.
`0`Before training, learner `j` removes observations in class `i` from the data set.
`1`Learner `j` assigns observations in class `i` to a positive class.

• If the coding matrix varies across folds, then `CodingMatrix` is empty (`[]`). You can obtain the coding matrix for each fold by using the `Trained` property. For example, `CVMdl.Trained{1}.CodingMatrix` is the coding matrix in the first fold of the cross-validated ECOC model `CVMdl`.

Data Types: `double` | `single` | `int8` | `int16` | `int32` | `int64`

### Other Classification Properties

Categorical predictor indices, specified as an empty numeric value. In general, `CategoricalPredictors` contains index values corresponding to the columns of the predictor data that contain categorical predictors. Because `ClassificationKernel` models can be trained on numeric predictor data only, this property is empty (`[]`).

Unique class labels used in training, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. `ClassNames` has the same data type as the observed class labels property `Y` and determines the class order.

Data Types: `categorical` | `char` | `logical` | `single` | `double` | `cell`

Misclassification costs, specified as a square numeric matrix. `Cost` has K rows and columns, where K is the number of classes.

`Cost(i,j)` is the cost of classifying a point into class `j` if its true class is `i`. The order of the rows and columns of `Cost` corresponds to the order of the classes in `ClassNames`.

Data Types: `double`

Predictor names in order of their appearance in the predictor data `X`, specified as a cell array of character vectors of the form `{'x1','x2',...}`. The length of `PredictorNames` is equal to the number of columns in `X`.

Data Types: `cell`

Prior class probabilities, specified as a numeric vector. `Prior` has as many elements as there are classes in `ClassNames`, and the order of the elements corresponds to the elements of `ClassNames`.

Data Types: `double`

Response variable name, specified as `'Y'`. Because `ClassificationKernel` models cannot be trained on tabular data, this property is always `'Y'`.

Data Types: `char`

Score transformation function to apply to predicted scores, specified as a function name or function handle.

For a kernel classification model `Mdl`, and before the score transformation, the predicted classification score for the observation x (row vector) is $f\left(x\right)=T\left(x\right)\beta +b.$

• $T\left(·\right)$ is a transformation of an observation for feature expansion.

• β is the estimated column vector of coefficients.

• b is the estimated scalar bias.

To change the `CVMdl` score transformation function to `function`, for example, use dot notation.

• For a built-in function, enter this code and replace `function` with a value from the table.

`CVMdl.ScoreTransform = 'function';`

ValueDescription
`'doublelogit'`1/(1 + e–2x)
`'invlogit'`log(x / (1 – x))
`'ismax'`Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0
`'logit'`1/(1 + ex)
`'none'` or `'identity'`x (no transformation)
`'sign'`–1 for x < 0
0 for x = 0
1 for x > 0
`'symmetric'`2x – 1
`'symmetricismax'`Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1
`'symmetriclogit'`2/(1 + ex) – 1

• For a MATLAB® function or a function that you define, enter its function handle.

`CVMdl.ScoreTransform = @function;`

`function` must accept a matrix of the original scores for each class, and then return a matrix of the same size representing the transformed scores for each class.

Data Types: `char` | `function_handle`

## Object Functions

 `kfoldEdge` Classification edge for cross-validated kernel ECOC model `kfoldLoss` Classification loss for cross-validated kernel ECOC model `kfoldMargin` Classification margins for cross-validated kernel ECOC model `kfoldPredict` Classify observations in cross-validated kernel ECOC model

## Examples

collapse all

Create a cross-validated, multiclass kernel ECOC classification model using `fitcecoc`.

Load Fisher's iris data set. `X` contains flower measurements, and `Y` contains the names of flower species.

```load fisheriris X = meas; Y = species;```

Cross-validate a multiclass kernel ECOC classification model that can identify the species of a flower based on the flower's measurements.

```rng(1); % For reproducibility CVMdl = fitcecoc(X,Y,'Learners','kernel','CrossVal','on')```
```CVMdl = classreg.learning.partition.ClassificationPartitionedKernelECOC CrossValidatedModel: 'KernelECOC' ResponseName: 'Y' NumObservations: 150 KFold: 10 Partition: [1x1 cvpartition] ClassNames: {'setosa' 'versicolor' 'virginica'} ScoreTransform: 'none' Properties, Methods ```

`CVMdl` is a `ClassificationPartitionedKernelECOC` cross-validated model. `fitcecoc` implements 10-fold cross-validation by default. Therefore, `CVMdl.Trained` contains a 10-by-1 cell array of ten `CompactClassificationECOC` models, one for each fold. Each compact ECOC model is composed of binary kernel classification models.

Estimate the classification error by passing `CVMdl` to `kfoldLoss`.

`error = kfoldLoss(CVMdl)`
```error = 0.0333 ```

The estimated classification error is about 3% misclassified observations.

To change default options when training ECOC models composed of kernel classification models, create a kernel classification model template using `templateKernel`, and then pass the template to `fitcecoc`.