Documentation

# resubPredict

Class: ClassificationNaiveBayes

Predict resubstitution labels of naive Bayes classifier

## Syntax

``label = resubPredict(Mdl)``
``````[label,Posterior,Cost] = predict(Mdl)``````

## Description

example

````label = resubPredict(Mdl)` returns a vector of predicted class labels (`label`) for the trained naive Bayes classifier `Mdl` using the predictor data `Mdl.X`.```

example

``````[label,Posterior,Cost] = predict(Mdl)``` additionally returns posterior probabilities (`Posterior`) and predicted (expected) misclassification costs (`Cost`) corresponding to the observations (rows) in `Mdl.X`.```

## Input Arguments

expand all

A fully trained naive Bayes classifier, specified as a `ClassificationNaiveBayes` model trained by `fitcnb`.

## Output Arguments

expand all

Predicted class labels, returned as a categorical vector, character array, logical or numeric vector, or cell array of character vectors.

`label`:

• Is the same data type as the observed class labels (`Y`) that trained `Mdl`. (The software treats string arrays as cell arrays of character vectors.)

• Has length equal to the number of rows of `X`.

• Is the class yielding the lowest expected misclassification cost (`Cost`).

Class posterior probabilities, returned as a numeric matrix. `Posterior` has rows equal to the number of rows of `Mdl.X` and columns equal to the number of distinct classes in the training data (`size(Mdl.ClassNames,1)`).

`Posterior(j,k)` is the predicted posterior probability of class `k` (i.e., in class `Mdl.ClassNames(k)`) given the observation in row `j` of `Mdl.X`.

Data Types: `double`

Expected misclassification costs, returned as a numeric matrix. `Cost` has rows equal to the number of rows of `Mdl.X` and columns equal to the number of distinct classes in the training data (`size(Mdl.ClassNames,1)`).

`Cost(j,k)` is the expected misclassification cost of the observation in row `j` of `Mdl.X` being predicted into class `k` (i.e., in class `Mdl.ClassNames(k)`).

## Examples

expand all

```load fisheriris X = meas; % Predictors Y = species; % Response```

Train a naive Bayes classifier. It is good practice to specify the class order. Assume that each predictor is conditionally, normally distributed given its label.

```Mdl = fitcnb(X,Y,... 'ClassNames',{'setosa','versicolor','virginica'});```

`Mdl` is a `ClassificationNaiveBayes` classifier.

Predict the training sample labels. Display the results for the 10 observations.

```label = resubPredict(Mdl); rng(1); % For reproducibility idx = randsample(size(X,1),10); table(Y(idx),label(idx),'VariableNames',... {'TrueLabel','PredictedLabel'})```
```ans=10×2 table TrueLabel PredictedLabel ______________ ______________ {'setosa' } {'setosa' } {'versicolor'} {'versicolor'} {'virginica' } {'virginica' } {'setosa' } {'setosa' } {'versicolor'} {'versicolor'} {'setosa' } {'setosa' } {'versicolor'} {'versicolor'} {'versicolor'} {'versicolor'} {'setosa' } {'setosa' } {'setosa' } {'setosa' } ```

```load fisheriris X = meas; % Predictors Y = species; % Response```

Train a naive Bayes classifier. It is good practice to specify the class order. Assume that each predictor is conditionally, normally distributed given its label.

```Mdl = fitcnb(X,Y,... 'ClassNames',{'setosa','versicolor','virginica'});```

`Mdl` is a `ClassificationNaiveBayes` classifier.

Estimate posterior probabilities and expected misclassification costs for the training data. Display the results for 10 observations.

```[label,Posterior,MisclassCost] = resubPredict(Mdl); rng(1); % For reproducibility idx = randsample(size(X,1),10); Mdl.ClassNames```
```ans = 3x1 cell array {'setosa' } {'versicolor'} {'virginica' } ```
```table(Y(idx),label(idx),Posterior(idx,:),'VariableNames',... {'TrueLabel','PredictedLabel','PosteriorProbability'})```
```ans=10×3 table TrueLabel PredictedLabel PosteriorProbability ______________ ______________ _________________________________________ {'setosa' } {'setosa' } 1 3.8821e-16 5.5878e-24 {'versicolor'} {'versicolor'} 1.2516e-54 1 4.5001e-06 {'virginica' } {'virginica' } 5.5646e-188 0.00058232 0.99942 {'setosa' } {'setosa' } 1 4.5352e-20 3.1301e-27 {'versicolor'} {'versicolor'} 5.0002e-69 0.99989 0.00010716 {'setosa' } {'setosa' } 1 2.9813e-18 2.1524e-25 {'versicolor'} {'versicolor'} 4.6313e-60 0.99999 7.5413e-06 {'versicolor'} {'versicolor'} 7.9205e-100 0.94293 0.057072 {'setosa' } {'setosa' } 1 1.799e-19 6.0606e-27 {'setosa' } {'setosa' } 1 1.5426e-17 1.2744e-24 ```
`MisclassCost(idx,:)`
```ans = 10×3 0.0000 1.0000 1.0000 1.0000 0.0000 1.0000 1.0000 0.9994 0.0006 0.0000 1.0000 1.0000 1.0000 0.0001 0.9999 0.0000 1.0000 1.0000 1.0000 0.0000 1.0000 1.0000 0.0571 0.9429 0.0000 1.0000 1.0000 0.0000 1.0000 1.0000 ```

The order of the columns of `Posterior` and `MisclassCost` corresponds to the order of the classes in `Mdl.ClassNames`.

expand all

## References

[1] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, Second Edition. NY: Springer, 2008.