# fitcnet

Train neural network classification model

Since R2021a

## Syntax

``Mdl = fitcnet(Tbl,ResponseVarName)``
``Mdl = fitcnet(Tbl,formula)``
``Mdl = fitcnet(Tbl,Y)``
``Mdl = fitcnet(X,Y)``
``Mdl = fitcnet(___,Name,Value)``

## Description

Use `fitcnet` to train a feedforward, fully connected neural network for classification. The first fully connected layer of the neural network has a connection from the network input (predictor data), and each subsequent layer has a connection from the previous layer. Each fully connected layer multiplies the input by a weight matrix and then adds a bias vector. An activation function follows each fully connected layer. The final fully connected layer and the subsequent softmax activation function produce the network's output, namely classification scores (posterior probabilities) and predicted labels. For more information, see Neural Network Structure.

example

````Mdl = fitcnet(Tbl,ResponseVarName)` returns a neural network classification model `Mdl` trained using the predictors in the table `Tbl` and the class labels in the `ResponseVarName` table variable.```
````Mdl = fitcnet(Tbl,formula)` returns a neural network classification model trained using the sample data in the table `Tbl`. The input argument `formula` is an explanatory model of the response and a subset of the predictor variables in `Tbl` used to fit `Mdl`.```
````Mdl = fitcnet(Tbl,Y)` returns a neural network classification model using the predictor variables in the table `Tbl` and the class labels in vector `Y`.```

example

````Mdl = fitcnet(X,Y)` returns a neural network classification model trained using the predictors in the matrix `X` and the class labels in vector `Y`.```

example

````Mdl = fitcnet(___,Name,Value)` specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. For example, you can adjust the number of outputs and the activation functions for the fully connected layers by specifying the `LayerSizes` and `Activations` name-value arguments.```

## Examples

collapse all

Train a neural network classifier, and assess the performance of the classifier on a test set.

Read the sample file `CreditRating_Historical.dat` into a table. The predictor data consists of financial ratios and industry sector information for a list of corporate customers. The response variable consists of credit ratings assigned by a rating agency. Preview the first few rows of the data set.

```creditrating = readtable("CreditRating_Historical.dat"); head(creditrating)```
``` ID WC_TA RE_TA EBIT_TA MVE_BVTD S_TA Industry Rating _____ ______ ______ _______ ________ _____ ________ _______ 62394 0.013 0.104 0.036 0.447 0.142 3 {'BB' } 48608 0.232 0.335 0.062 1.969 0.281 8 {'A' } 42444 0.311 0.367 0.074 1.935 0.366 1 {'A' } 48631 0.194 0.263 0.062 1.017 0.228 4 {'BBB'} 43768 0.121 0.413 0.057 3.647 0.466 12 {'AAA'} 39255 -0.117 -0.799 0.01 0.179 0.082 4 {'CCC'} 62236 0.087 0.158 0.049 0.816 0.324 2 {'BBB'} 39354 0.005 0.181 0.034 2.597 0.388 7 {'AA' } ```

Because each value in the `ID` variable is a unique customer ID, that is, `length(unique(creditrating.ID))` is equal to the number of observations in `creditrating`, the `ID` variable is a poor predictor. Remove the `ID` variable from the table, and convert the `Industry` variable to a `categorical` variable.

```creditrating = removevars(creditrating,"ID"); creditrating.Industry = categorical(creditrating.Industry);```

Convert the `Rating` response variable to an ordinal `categorical` variable.

```creditrating.Rating = categorical(creditrating.Rating, ... ["AAA","AA","A","BBB","BB","B","CCC"],"Ordinal",true);```

Partition the data into training and test sets. Use approximately 80% of the observations to train a neural network model, and 20% of the observations to test the performance of the trained model on new data. Use `cvpartition` to partition the data.

```rng("default") % For reproducibility of the partition c = cvpartition(creditrating.Rating,"Holdout",0.20); trainingIndices = training(c); % Indices for the training set testIndices = test(c); % Indices for the test set creditTrain = creditrating(trainingIndices,:); creditTest = creditrating(testIndices,:);```

Train a neural network classifier by passing the training data `creditTrain` to the `fitcnet` function.

`Mdl = fitcnet(creditTrain,"Rating")`
```Mdl = ClassificationNeuralNetwork PredictorNames: {'WC_TA' 'RE_TA' 'EBIT_TA' 'MVE_BVTD' 'S_TA' 'Industry'} ResponseName: 'Rating' CategoricalPredictors: 6 ClassNames: [AAA AA A BBB BB B CCC] ScoreTransform: 'none' NumObservations: 3146 LayerSizes: 10 Activations: 'relu' OutputLayerActivation: 'softmax' Solver: 'LBFGS' ConvergenceInfo: [1x1 struct] TrainingHistory: [1000x7 table] Properties, Methods ```

`Mdl` is a trained `ClassificationNeuralNetwork` classifier. You can use dot notation to access the properties of `Mdl`. For example, you can specify `Mdl.TrainingHistory` to get more information about the training history of the neural network model.

Evaluate the performance of the classifier on the test set by computing the test set classification error. Visualize the results by using a confusion matrix.

```testAccuracy = 1 - loss(Mdl,creditTest,"Rating", ... "LossFun","classiferror")```
```testAccuracy = 0.7964 ```
`confusionchart(creditTest.Rating,predict(Mdl,creditTest))`

Specify the structure of a neural network classifier, including the size of the fully connected layers.

Load the `ionosphere` data set, which includes radar signal data. `X` contains the predictor data, and `Y` is the response variable, whose values represent either good ("g") or bad ("b") radar signals.

`load ionosphere`

Separate the data into training data (`XTrain` and `YTrain`) and test data (`XTest` and `YTest`) by using a stratified holdout partition. Reserve approximately 30% of the observations for testing, and use the rest of the observations for training.

```rng("default") % For reproducibility of the partition cvp = cvpartition(Y,"Holdout",0.3); XTrain = X(training(cvp),:); YTrain = Y(training(cvp)); XTest = X(test(cvp),:); YTest = Y(test(cvp));```

Train a neural network classifier. Specify to have 35 outputs in the first fully connected layer and 20 outputs in the second fully connected layer. By default, both layers use a rectified linear unit (ReLU) activation function. You can change the activation functions for the fully connected layers by using the `Activations` name-value argument.

```Mdl = fitcnet(XTrain,YTrain, ... "LayerSizes",[35 20])```
```Mdl = ClassificationNeuralNetwork ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' NumObservations: 246 LayerSizes: [35 20] Activations: 'relu' OutputLayerActivation: 'softmax' Solver: 'LBFGS' ConvergenceInfo: [1x1 struct] TrainingHistory: [47x7 table] Properties, Methods ```

Access the weights and biases for the fully connected layers of the trained classifier by using the `LayerWeights` and `LayerBiases` properties of `Mdl`. The first two elements of each property correspond to the values for the first two fully connected layers, and the third element corresponds to the values for the final fully connected layer with a softmax activation function for classification. For example, display the weights and biases for the second fully connected layer.

`Mdl.LayerWeights{2}`
```ans = 20×35 0.0481 0.2501 -0.1535 -0.0934 0.0760 -0.0579 -0.2465 1.0411 0.3712 -1.2007 1.1162 0.4296 0.4045 0.5005 0.8839 0.4624 -0.3154 0.3454 -0.0487 0.2648 0.0732 0.5773 0.4286 0.0881 0.9468 0.2981 0.5534 1.0518 -0.0224 0.6894 0.5527 0.7045 -0.6124 0.2145 -0.0790 -0.9489 -1.8343 0.5510 -0.5751 -0.8726 0.8815 0.0203 -1.6379 2.0315 1.7599 -1.4153 -1.4335 -1.1638 -0.1715 1.1439 -0.7661 1.1230 -1.1982 -0.5409 -0.5821 -0.0627 -0.7038 -0.0817 -1.5773 -1.4671 0.2053 -0.7931 -1.6201 -0.1737 -0.7762 -0.3063 -0.8771 1.5134 -0.4611 -0.0649 -0.1910 0.0246 -0.3511 0.0097 0.3160 -0.0693 0.2270 -0.0783 -0.1626 -0.3478 0.2765 0.4179 0.0727 -0.0314 -0.1798 -0.0583 0.1375 -0.1876 0.2518 0.2137 0.1497 0.0395 0.2859 -0.0905 0.4325 -0.2012 0.0388 -0.1441 -0.1431 -0.0249 -0.2200 0.0860 -0.2076 0.0132 0.1737 -0.0415 -0.0059 -0.0753 -0.1477 -0.1621 -0.1762 0.2164 0.1710 -0.0610 -0.1402 0.1452 0.2890 0.2872 -0.2616 -0.4204 -0.2831 -0.1901 0.0036 0.0781 -0.0826 0.1588 -0.2782 0.2510 -0.1069 -0.2692 0.2306 0.2521 0.0306 0.2524 -0.4218 0.2478 0.2343 -0.1031 0.1037 0.1598 1.1848 1.6142 -0.1352 0.5774 0.5491 0.0103 0.0209 0.7219 -0.8643 -0.5578 1.3595 1.5385 1.0015 0.7416 -0.4342 0.2279 0.5667 1.1589 0.7100 0.1823 0.4171 0.7051 0.0794 1.3267 1.2659 0.3197 0.3947 0.3436 -0.1415 0.6607 1.0071 0.7726 -0.2840 0.8801 0.0848 0.2486 -0.2920 -0.0004 0.2806 0.2987 -0.2709 0.1473 -0.2580 -0.0499 -0.0755 0.2000 0.1535 -0.0285 -0.0520 -0.2523 -0.2505 -0.0437 -0.2323 0.2023 0.2061 -0.1365 0.0744 0.0344 -0.2891 0.2341 -0.1556 0.1459 0.2533 -0.0583 0.0243 -0.2949 -0.1530 0.1546 -0.0340 -0.1562 -0.0516 0.0640 0.1824 -0.0675 -0.2065 -0.0052 -0.1682 -0.1520 0.0060 0.0450 0.0813 -0.0234 0.0657 0.3219 -0.1871 0.0658 -0.2103 0.0060 -0.2831 -0.1811 -0.0988 0.2378 -0.0761 0.1714 -0.1596 -0.0011 0.0609 0.4003 0.3687 -0.2879 0.0910 0.0604 -0.2222 -0.2735 -0.1155 -0.6192 -0.7804 -0.0506 -0.4205 -0.2584 -0.2020 -0.0008 0.0534 1.0185 -0.0307 -0.0539 -0.2020 0.0368 -0.1847 0.0886 -0.4086 -0.4648 -0.3785 0.1542 -0.5176 -0.3207 0.1893 -0.0313 -0.5297 -0.1261 -0.2749 -0.6152 -0.5914 -0.3089 0.2432 -0.3955 -0.1711 0.1710 -0.4477 0.0718 0.5049 -0.1362 -0.2218 0.1637 -0.1282 -0.1008 0.1445 0.4527 -0.4887 0.0503 0.1453 0.1316 -0.3311 -0.1081 -0.7699 0.4062 -0.1105 -0.0855 0.0630 -0.1469 -0.2533 0.3976 0.0418 0.5294 0.3982 0.1027 -0.0973 -0.1282 0.2491 0.0425 0.0533 0.1578 -0.8403 -0.0535 -0.0048 1.1109 -0.0466 0.4044 0.6366 0.1863 0.5660 0.2839 0.8793 -0.5497 0.0057 0.3468 0.0980 0.3364 0.4669 0.1466 0.7883 -0.1743 0.4444 0.4535 0.1521 0.7476 0.2246 0.4473 0.2829 0.8881 0.4666 0.6334 0.3105 0.9571 0.2808 0.6483 0.1180 -0.4558 1.2486 0.2453 ⋮ ```
`Mdl.LayerBiases{2}`
```ans = 20×1 0.6147 0.1891 -0.2767 -0.2977 1.3655 0.0347 0.1509 -0.4839 -0.3960 0.9248 ⋮ ```

The final fully connected layer has two outputs, one for each class in the response variable. The number of layer outputs corresponds to the first dimension of the layer weights and layer biases.

`size(Mdl.LayerWeights{end})`
```ans = 1×2 2 20 ```
`size(Mdl.LayerBiases{end})`
```ans = 1×2 2 1 ```

To estimate the performance of the trained classifier, compute the test set classification error for `Mdl`.

```testError = loss(Mdl,XTest,YTest, ... "LossFun","classiferror")```
```testError = 0.0774 ```
`accuracy = 1 - testError`
```accuracy = 0.9226 ```

`Mdl` accurately classifies approximately 92% of the observations in the test set.

At each iteration of the training process, compute the validation loss of the neural network. Stop the training process early if the validation loss reaches a reasonable minimum.

Load the `patients` data set. Create a table from the data set. Each row corresponds to one patient, and each column corresponds to a diagnostic variable. Use the `Smoker` variable as the response variable, and the rest of the variables as predictors.

```load patients tbl = table(Diastolic,Systolic,Gender,Height,Weight,Age,Smoker);```

Separate the data into a training set `tblTrain` and a validation set `tblValidation` by using a stratified holdout partition. The software reserves approximately 30% of the observations for the validation data set and uses the rest of the observations for the training data set.

```rng("default") % For reproducibility of the partition c = cvpartition(tbl.Smoker,"Holdout",0.30); trainingIndices = training(c); validationIndices = test(c); tblTrain = tbl(trainingIndices,:); tblValidation = tbl(validationIndices,:);```

Train a neural network classifier by using the training set. Specify the `Smoker` column of `tblTrain` as the response variable. Evaluate the model at each iteration by using the validation set. Specify to display the training information at each iteration by using the `Verbose` name-value argument. By default, the training process ends early if the validation cross-entropy loss is greater than or equal to the minimum validation cross-entropy loss computed so far, six times in a row. To change the number of times the validation loss is allowed to be greater than or equal to the minimum, specify the `ValidationPatience` name-value argument.

```Mdl = fitcnet(tblTrain,"Smoker", ... "ValidationData",tblValidation, ... "Verbose",1);```
```|==========================================================================================| | Iteration | Train Loss | Gradient | Step | Iteration | Validation | Validation | | | | | | Time (sec) | Loss | Checks | |==========================================================================================| | 1| 2.602935| 26.866935| 0.262009| 0.051823| 2.793048| 0| | 2| 1.470816| 42.594723| 0.058323| 0.014575| 1.247046| 0| | 3| 1.299292| 25.854432| 0.034910| 0.005318| 1.507857| 1| | 4| 0.710465| 11.629107| 0.013616| 0.006658| 0.889157| 0| | 5| 0.647783| 2.561740| 0.005753| 0.017113| 0.766728| 0| | 6| 0.645541| 0.681579| 0.001000| 0.001492| 0.776072| 1| | 7| 0.639611| 1.544692| 0.007013| 0.003282| 0.776320| 2| | 8| 0.604189| 5.045676| 0.064190| 0.001400| 0.744919| 0| | 9| 0.565364| 5.851552| 0.068845| 0.000701| 0.694226| 0| | 10| 0.391994| 8.377717| 0.560480| 0.001128| 0.425466| 0| |==========================================================================================| | Iteration | Train Loss | Gradient | Step | Iteration | Validation | Validation | | | | | | Time (sec) | Loss | Checks | |==========================================================================================| | 11| 0.383843| 0.630246| 0.110270| 0.002463| 0.428487| 1| | 12| 0.369289| 2.404750| 0.084395| 0.001113| 0.405728| 0| | 13| 0.357839| 6.220679| 0.199197| 0.001086| 0.378480| 0| | 14| 0.344974| 2.752717| 0.029013| 0.001361| 0.367279| 0| | 15| 0.333747| 0.711398| 0.074513| 0.003426| 0.348499| 0| | 16| 0.327763| 0.804818| 0.122178| 0.000920| 0.330237| 0| | 17| 0.327702| 0.778169| 0.009810| 0.000796| 0.329095| 0| | 18| 0.327277| 0.020615| 0.004377| 0.000755| 0.329141| 1| | 19| 0.327273| 0.010018| 0.003313| 0.001056| 0.328773| 0| | 20| 0.327268| 0.019497| 0.000805| 0.001192| 0.328831| 1| |==========================================================================================| | Iteration | Train Loss | Gradient | Step | Iteration | Validation | Validation | | | | | | Time (sec) | Loss | Checks | |==========================================================================================| | 21| 0.327228| 0.113983| 0.005397| 0.000600| 0.329085| 2| | 22| 0.327138| 0.240166| 0.012159| 0.000572| 0.329406| 3| | 23| 0.326865| 0.428912| 0.036841| 0.000787| 0.329952| 4| | 24| 0.325797| 0.255227| 0.139585| 0.000781| 0.331246| 5| | 25| 0.325181| 0.758050| 0.135868| 0.001576| 0.332035| 6| |==========================================================================================| ```

Create a plot that compares the training cross-entropy loss and the validation cross-entropy loss at each iteration. By default, `fitcnet` stores the loss information inside the `TrainingHistory` property of the object `Mdl`. You can access this information by using dot notation.

```iteration = Mdl.TrainingHistory.Iteration; trainLosses = Mdl.TrainingHistory.TrainingLoss; valLosses = Mdl.TrainingHistory.ValidationLoss; plot(iteration,trainLosses,iteration,valLosses) legend(["Training","Validation"]) xlabel("Iteration") ylabel("Cross-Entropy Loss")```

Check the iteration that corresponds to the minimum validation loss. The final returned model `Mdl` is the model trained at this iteration.

```[~,minIdx] = min(valLosses); iteration(minIdx)```
```ans = 19 ```

Assess the cross-validation loss of neural network models with different regularization strengths, and choose the regularization strength corresponding to the best performing model.

Read the sample file `CreditRating_Historical.dat` into a table. The predictor data consists of financial ratios and industry sector information for a list of corporate customers. The response variable consists of credit ratings assigned by a rating agency. Preview the first few rows of the data set.

```creditrating = readtable("CreditRating_Historical.dat"); head(creditrating)```
``` ID WC_TA RE_TA EBIT_TA MVE_BVTD S_TA Industry Rating _____ ______ ______ _______ ________ _____ ________ _______ 62394 0.013 0.104 0.036 0.447 0.142 3 {'BB' } 48608 0.232 0.335 0.062 1.969 0.281 8 {'A' } 42444 0.311 0.367 0.074 1.935 0.366 1 {'A' } 48631 0.194 0.263 0.062 1.017 0.228 4 {'BBB'} 43768 0.121 0.413 0.057 3.647 0.466 12 {'AAA'} 39255 -0.117 -0.799 0.01 0.179 0.082 4 {'CCC'} 62236 0.087 0.158 0.049 0.816 0.324 2 {'BBB'} 39354 0.005 0.181 0.034 2.597 0.388 7 {'AA' } ```

Because each value in the `ID` variable is a unique customer ID, that is, `length(unique(creditrating.ID))` is equal to the number of observations in `creditrating`, the `ID` variable is a poor predictor. Remove the `ID` variable from the table, and convert the `Industry` variable to a `categorical` variable.

```creditrating = removevars(creditrating,"ID"); creditrating.Industry = categorical(creditrating.Industry);```

Convert the `Rating` response variable to an ordinal `categorical` variable.

```creditrating.Rating = categorical(creditrating.Rating, ... ["AAA","AA","A","BBB","BB","B","CCC"],"Ordinal",true);```

Create a `cvpartition` object for stratified 5-fold cross-validation. `cvp` partitions the data into five folds, where each fold has roughly the same proportions of different credit ratings. Set the random seed to the default value for reproducibility of the partition.

```rng("default") cvp = cvpartition(creditrating.Rating,"KFold",5);```

Compute the cross-validation classification error for neural network classifiers with different regularization strengths. Try regularization strengths on the order of 1/n, where n is the number of observations. Specify to standardize the data before training the neural network models.

`1/size(creditrating,1)`
```ans = 2.5432e-04 ```
```lambda = (0:0.5:5)*1e-4; cvloss = zeros(length(lambda),1); for i = 1:length(lambda) cvMdl = fitcnet(creditrating,"Rating","Lambda",lambda(i), ... "CVPartition",cvp,"Standardize",true); cvloss(i) = kfoldLoss(cvMdl,"LossFun","classiferror"); end```

Plot the results. Find the regularization strength corresponding to the lowest cross-validation classification error.

```plot(lambda,cvloss) xlabel("Regularization Strength") ylabel("Cross-Validation Loss")```

```[~,idx] = min(cvloss); bestLambda = lambda(idx)```
```bestLambda = 1.0000e-04 ```

Train a neural network classifier using the `bestLambda` regularization strength.

```Mdl = fitcnet(creditrating,"Rating","Lambda",bestLambda, ... "Standardize",true)```
```Mdl = ClassificationNeuralNetwork PredictorNames: {'WC_TA' 'RE_TA' 'EBIT_TA' 'MVE_BVTD' 'S_TA' 'Industry'} ResponseName: 'Rating' CategoricalPredictors: 6 ClassNames: [AAA AA A BBB BB B CCC] ScoreTransform: 'none' NumObservations: 3932 LayerSizes: 10 Activations: 'relu' OutputLayerActivation: 'softmax' Solver: 'LBFGS' ConvergenceInfo: [1×1 struct] TrainingHistory: [1000×7 table] Properties, Methods ```

Train a neural network classifier using the `OptimizeHyperparameters` argument to improve the resulting classifier. Using this argument causes `fitcnet` to minimize cross-validation loss over some problem hyperparameters using Bayesian optimization.

Read the sample file `CreditRating_Historical.dat` into a table. The predictor data consists of financial ratios and industry sector information for a list of corporate customers. The response variable consists of credit ratings assigned by a rating agency. Preview the first few rows of the data set.

```creditrating = readtable("CreditRating_Historical.dat"); head(creditrating)```
```ans=8×8 table ID WC_TA RE_TA EBIT_TA MVE_BVTD S_TA Industry Rating _____ ______ ______ _______ ________ _____ ________ _______ 62394 0.013 0.104 0.036 0.447 0.142 3 {'BB' } 48608 0.232 0.335 0.062 1.969 0.281 8 {'A' } 42444 0.311 0.367 0.074 1.935 0.366 1 {'A' } 48631 0.194 0.263 0.062 1.017 0.228 4 {'BBB'} 43768 0.121 0.413 0.057 3.647 0.466 12 {'AAA'} 39255 -0.117 -0.799 0.01 0.179 0.082 4 {'CCC'} 62236 0.087 0.158 0.049 0.816 0.324 2 {'BBB'} 39354 0.005 0.181 0.034 2.597 0.388 7 {'AA' } ```

Because each value in the `ID` variable is a unique customer ID, that is, `length(unique(creditrating.ID))` is equal to the number of observations in `creditrating`, the `ID` variable is a poor predictor. Remove the `ID` variable from the table, and convert the `Industry` variable to a `categorical` variable.

```creditrating = removevars(creditrating,"ID"); creditrating.Industry = categorical(creditrating.Industry);```

Convert the `Rating` response variable to an ordinal `categorical` variable.

```creditrating.Rating = categorical(creditrating.Rating, ... ["AAA","AA","A","BBB","BB","B","CCC"],"Ordinal",true);```

Partition the data into training and test sets. Use approximately 80% of the observations to train a neural network model, and 20% of the observations to test the performance of the trained model on new data. Use `cvpartition` to partition the data.

```rng("default") % For reproducibility of the partition c = cvpartition(creditrating.Rating,"Holdout",0.20); trainingIndices = training(c); % Indices for the training set testIndices = test(c); % Indices for the test set creditTrain = creditrating(trainingIndices,:); creditTest = creditrating(testIndices,:);```

Train a neural network classifier by passing the training data `creditTrain` to the `fitcnet` function, and include the `OptimizeHyperparameters` argument. For reproducibility, set the `AcquisitionFunctionName` to `"expected-improvement-plus"` in a `HyperparameterOptimizationOptions` structure. To attempt to get a better solution, set the number of optimization steps to 100 instead of the default 30. `fitcnet` performs Bayesian optimization by default. To use grid search or random search, set the `Optimizer` field in `HyperparameterOptimizationOptions`.

```rng("default") % For reproducibility Mdl = fitcnet(creditTrain,"Rating","OptimizeHyperparameters","auto", ... "HyperparameterOptimizationOptions", ... struct("AcquisitionFunctionName","expected-improvement-plus", ... "MaxObjectiveEvaluations",100))```
```|============================================================================================================================================| | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Activations | Standardize | Lambda | LayerSizes | | | result | | runtime | (observed) | (estim.) | | | | | |============================================================================================================================================| | 1 | Best | 0.55944 | 0.85659 | 0.55944 | 0.55944 | none | true | 0.05834 | 3 | | 2 | Best | 0.21488 | 10.56 | 0.21488 | 0.22858 | relu | true | 5.0811e-08 | [ 1 25] | | 3 | Accept | 0.74189 | 0.38301 | 0.21488 | 0.21522 | sigmoid | true | 0.57986 | 126 | | 4 | Accept | 0.4501 | 0.55193 | 0.21488 | 0.21509 | tanh | false | 0.018683 | 10 | | 5 | Accept | 0.43071 | 6.8079 | 0.21488 | 0.21508 | relu | true | 3.3991e-06 | [ 2 1 4] | | 6 | Accept | 0.21678 | 30.867 | 0.21488 | 0.21585 | relu | true | 6.8351e-09 | [ 2 179] | | 7 | Accept | 0.27686 | 22.333 | 0.21488 | 0.21584 | relu | true | 1.3422e-06 | [ 78 4 2] | | 8 | Accept | 0.24571 | 13.56 | 0.21488 | 0.21583 | tanh | false | 1.8747e-06 | [ 10 3 19] | | 9 | Best | 0.21297 | 39.621 | 0.21297 | 0.21299 | tanh | false | 0.00052 | [ 1 61 64] | | 10 | Accept | 0.74189 | 0.82366 | 0.21297 | 0.21299 | tanh | false | 0.15325 | [ 47 148 271] | | 11 | Accept | 0.74189 | 0.28355 | 0.21297 | 0.21302 | relu | false | 0.091971 | [ 3 2 64] | | 12 | Accept | 0.22123 | 29.531 | 0.21297 | 0.21307 | tanh | false | 1.7719e-06 | [ 3 64 38] | | 13 | Accept | 0.74189 | 0.52092 | 0.21297 | 0.213 | tanh | false | 0.51268 | [233 146 6] | | 14 | Accept | 0.30197 | 46.694 | 0.21297 | 0.213 | relu | true | 3.4968e-08 | [295 17] | | 15 | Accept | 0.2136 | 21.808 | 0.21297 | 0.21302 | tanh | false | 4.2565e-05 | [ 1 61] | | 16 | Accept | 0.21519 | 27.504 | 0.21297 | 0.21378 | tanh | false | 3.562e-05 | [ 1 2 91] | | 17 | Accept | 0.2136 | 7.4304 | 0.21297 | 0.21379 | relu | true | 3.1901e-09 | 1 | | 18 | Accept | 0.22028 | 31.251 | 0.21297 | 0.21296 | tanh | false | 6.7097e-05 | [ 3 144] | | 19 | Accept | 0.21615 | 36.667 | 0.21297 | 0.21399 | tanh | false | 7.8065e-08 | [ 1 197 4] | | 20 | Accept | 0.2651 | 27.152 | 0.21297 | 0.21401 | tanh | false | 3.3248e-09 | [ 6 112] | |============================================================================================================================================| | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Activations | Standardize | Lambda | LayerSizes | | | result | | runtime | (observed) | (estim.) | | | | | |============================================================================================================================================| | 21 | Accept | 0.29339 | 19.958 | 0.21297 | 0.21399 | relu | true | 4.2341e-09 | [ 27 10 54] | | 22 | Accept | 0.25556 | 115.66 | 0.21297 | 0.21295 | tanh | false | 3.3922e-09 | [277 228 2] | | 23 | Accept | 0.2136 | 7.7187 | 0.21297 | 0.21294 | tanh | false | 3.9912e-07 | 1 | | 24 | Accept | 0.2918 | 47.115 | 0.21297 | 0.21294 | tanh | false | 3.9317e-08 | [154 20 55] | | 25 | Accept | 0.22123 | 40.451 | 0.21297 | 0.21293 | tanh | false | 0.00066511 | [273 7] | | 26 | Accept | 0.21456 | 8.1443 | 0.21297 | 0.21294 | tanh | true | 1.745e-08 | [ 1 2] | | 27 | Accept | 0.28417 | 121.37 | 0.21297 | 0.21294 | tanh | true | 3.3445e-07 | [271 239 132] | | 28 | Accept | 0.31882 | 34.873 | 0.21297 | 0.21294 | tanh | true | 3.2546e-09 | 259 | | 29 | Accept | 0.21329 | 7.056 | 0.21297 | 0.21294 | tanh | true | 1.4764e-07 | 1 | | 30 | Accept | 0.21488 | 7.9763 | 0.21297 | 0.21293 | tanh | true | 4.2304e-05 | [ 1 3] | | 31 | Accept | 0.28862 | 36.1 | 0.21297 | 0.21293 | tanh | true | 0.0026476 | [ 1 12 193] | | 32 | Accept | 0.23872 | 43.329 | 0.21297 | 0.21293 | tanh | true | 0.00012483 | 291 | | 33 | Accept | 0.21551 | 9.2561 | 0.21297 | 0.21293 | tanh | true | 3.5356e-06 | [ 1 9] | | 34 | Accept | 0.74189 | 0.38512 | 0.21297 | 0.21293 | tanh | true | 5.226 | 284 | | 35 | Accept | 0.2136 | 7.8087 | 0.21297 | 0.21293 | sigmoid | false | 2.953e-08 | 1 | | 36 | Accept | 0.21742 | 6.1235 | 0.21297 | 0.21293 | sigmoid | false | 1.2958e-06 | 2 | | 37 | Accept | 0.2918 | 72.069 | 0.21297 | 0.21303 | sigmoid | false | 1.2858e-07 | [298 128] | | 38 | Accept | 0.74189 | 4.0814 | 0.21297 | 0.21293 | sigmoid | false | 0.00049631 | [ 1 56 285] | | 39 | Accept | 0.21424 | 8.8157 | 0.21297 | 0.21293 | sigmoid | false | 2.3823e-07 | [ 1 2] | | 40 | Accept | 0.21488 | 11.584 | 0.21297 | 0.21293 | sigmoid | false | 3.231e-09 | [ 1 34] | |============================================================================================================================================| | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Activations | Standardize | Lambda | LayerSizes | | | result | | runtime | (observed) | (estim.) | | | | | |============================================================================================================================================| | 41 | Accept | 0.21488 | 8.5467 | 0.21297 | 0.21293 | none | false | 3.9919e-09 | [ 1 1] | | 42 | Accept | 0.2206 | 17.637 | 0.21297 | 0.21301 | none | false | 1.4528e-07 | 103 | | 43 | Accept | 0.21964 | 49.16 | 0.21297 | 0.21293 | none | false | 4.0062e-09 | [289 77] | | 44 | Accept | 0.21551 | 8.4409 | 0.21297 | 0.21293 | none | false | 1.8166e-05 | [ 1 7 2] | | 45 | Accept | 0.25302 | 6.8665 | 0.21297 | 0.21293 | none | false | 0.00093672 | [273 5 1] | | 46 | Accept | 0.21901 | 70.44 | 0.21297 | 0.21293 | none | false | 1.0943e-05 | [285 133 97] | | 47 | Accept | 0.74189 | 0.19575 | 0.21297 | 0.213 | none | false | 0.33807 | [ 1 93] | | 48 | Accept | 0.21615 | 33.742 | 0.21297 | 0.21292 | none | false | 3.1207e-08 | [ 2 3 290] | | 49 | Accept | 0.21837 | 21.618 | 0.21297 | 0.213 | none | false | 0.00010795 | [239 5] | | 50 | Accept | 0.21519 | 5.9516 | 0.21297 | 0.21292 | none | false | 1.0462e-06 | 1 | | 51 | Accept | 0.21488 | 13.421 | 0.21297 | 0.21292 | none | true | 3.2351e-09 | [ 66 1] | | 52 | Accept | 0.21519 | 7.0643 | 0.21297 | 0.21292 | none | true | 1.3037e-07 | [ 1 2] | | 53 | Accept | 0.22028 | 33.638 | 0.21297 | 0.213 | none | true | 4.9681e-08 | [272 17 4] | | 54 | Accept | 0.21488 | 2.7953 | 0.21297 | 0.21292 | none | true | 1.1517e-08 | [ 1 18 2] | | 55 | Accept | 0.2206 | 33.822 | 0.21297 | 0.21292 | none | true | 5.4074e-06 | [287 4 11] | | 56 | Accept | 0.22441 | 28.892 | 0.21297 | 0.213 | sigmoid | true | 3.1871e-09 | [ 1 141 5] | | 57 | Accept | 0.28544 | 49.046 | 0.21297 | 0.213 | sigmoid | true | 1.5445e-07 | [271 8 47] | | 58 | Accept | 0.31151 | 42.681 | 0.21297 | 0.213 | sigmoid | true | 3.1992e-09 | 269 | | 59 | Accept | 0.29371 | 58.27 | 0.21297 | 0.213 | relu | false | 3.3691e-09 | [241 91] | | 60 | Accept | 0.74189 | 0.4131 | 0.21297 | 0.21301 | relu | true | 30.931 | [232 6] | |============================================================================================================================================| | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Activations | Standardize | Lambda | LayerSizes | | | result | | runtime | (observed) | (estim.) | | | | | |============================================================================================================================================| | 61 | Accept | 0.24348 | 9.6687 | 0.21297 | 0.21291 | sigmoid | true | 5.2088e-08 | [ 1 4 1] | | 62 | Accept | 0.64844 | 2.7232 | 0.21297 | 0.21301 | relu | false | 3.6858e-07 | [ 1 21 1] | | 63 | Accept | 0.21456 | 32.99 | 0.21297 | 0.21291 | none | true | 3.6582e-06 | [ 1 80 188] | | 64 | Best | 0.21265 | 18.62 | 0.21265 | 0.21267 | sigmoid | true | 9.6673e-06 | [ 1 75] | | 65 | Accept | 0.226 | 11.419 | 0.21265 | 0.21268 | sigmoid | true | 1.5077e-06 | [ 1 24 1] | | 66 | Accept | 0.23331 | 102.48 | 0.21265 | 0.21268 | sigmoid | true | 1.5026e-05 | [287 214 74] | | 67 | Accept | 0.2206 | 30.992 | 0.21265 | 0.21267 | none | true | 7.5629e-07 | [ 34 2 264] | | 68 | Accept | 0.21869 | 4.3461 | 0.21265 | 0.21268 | none | true | 6.758e-05 | [ 1 1 1] | | 69 | Accept | 0.21869 | 51.008 | 0.21265 | 0.21268 | none | true | 6.1541e-05 | [175 23 253] | | 70 | Accept | 0.21519 | 46.352 | 0.21265 | 0.21267 | sigmoid | false | 5.8406e-07 | [ 1 12 288] | | 71 | Accept | 0.74189 | 0.35284 | 0.21265 | 0.21268 | sigmoid | false | 31.7 | [151 36] | | 72 | Accept | 0.29625 | 5.4205 | 0.21265 | 0.21268 | sigmoid | true | 0.00015423 | [ 1 35] | | 73 | Accept | 0.21647 | 2.6142 | 0.21265 | 0.21268 | none | false | 0.00024113 | [ 1 35] | | 74 | Accept | 0.21901 | 76.616 | 0.21265 | 0.2127 | none | true | 2.0906e-05 | [ 6 235 284] | | 75 | Accept | 0.2171 | 32.606 | 0.21265 | 0.21268 | none | false | 0.00010157 | [ 6 5 298] | | 76 | Accept | 0.21996 | 9.2912 | 0.21265 | 0.21268 | tanh | true | 0.00023083 | [ 1 13] | | 77 | Accept | 0.74189 | 0.32671 | 0.21265 | 0.21269 | none | true | 31.208 | 222 | | 78 | Accept | 0.21519 | 35.616 | 0.21265 | 0.21269 | tanh | false | 4.4635e-06 | [ 1 7 151] | | 79 | Accept | 0.21392 | 9.7813 | 0.21265 | 0.21269 | relu | true | 1.5577e-08 | [ 1 21] | | 80 | Accept | 0.21488 | 21.138 | 0.21265 | 0.21269 | none | false | 2.1706e-07 | [ 1 185] | |============================================================================================================================================| | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Activations | Standardize | Lambda | LayerSizes | | | result | | runtime | (observed) | (estim.) | | | | | |============================================================================================================================================| | 81 | Accept | 0.21424 | 69.272 | 0.21265 | 0.21118 | tanh | false | 5.8903e-07 | [ 1 230 101] | | 82 | Accept | 0.21488 | 27.59 | 0.21265 | 0.21113 | none | true | 9.4233e-09 | [222 2] | | 83 | Accept | 0.21933 | 52.768 | 0.21265 | 0.21112 | none | false | 1.0916e-06 | [274 12 211] | | 84 | Accept | 0.21456 | 43.454 | 0.21265 | 0.21106 | tanh | true | 4.2988e-08 | [ 1 4 247] | | 85 | Accept | 0.21488 | 9.6532 | 0.21265 | 0.21103 | tanh | true | 3.2433e-09 | [ 1 4 2] | | 86 | Accept | 0.21424 | 7.4065 | 0.21265 | 0.21104 | tanh | true | 6.8749e-07 | 1 | | 87 | Accept | 0.25366 | 47.819 | 0.21265 | 0.21106 | sigmoid | false | 3.6866e-09 | [292 20] | | 88 | Accept | 0.2225 | 13.107 | 0.21265 | 0.21108 | none | true | 0.00035663 | [235 12] | | 89 | Accept | 0.21805 | 1.9952 | 0.21265 | 0.21114 | none | true | 0.00036004 | [ 1 2] | | 90 | Accept | 0.74189 | 0.96416 | 0.21265 | 0.21112 | relu | false | 30.55 | [275 169 155] | | 91 | Accept | 0.21488 | 5.7708 | 0.21265 | 0.21119 | none | true | 3.2456e-09 | [ 1 238 31] | | 92 | Accept | 0.21392 | 31.018 | 0.21265 | 0.21122 | sigmoid | false | 9.3344e-09 | [ 1 185] | | 93 | Accept | 0.21488 | 8.0701 | 0.21265 | 0.21236 | relu | true | 6.5865e-09 | 1 | | 94 | Accept | 0.34298 | 1.3016 | 0.21265 | 0.21267 | tanh | false | 0.00020571 | 1 | | 95 | Accept | 0.29784 | 87.985 | 0.21265 | 0.21269 | tanh | false | 2.0857e-05 | [ 15 297 124] | | 96 | Accept | 0.33153 | 30.766 | 0.21265 | 0.21302 | tanh | false | 0.00021639 | [ 4 135 1] | | 97 | Accept | 0.21519 | 20.949 | 0.21265 | 0.21299 | tanh | true | 2.1898e-05 | [ 1 9 57] | | 98 | Accept | 0.21996 | 51.698 | 0.21265 | 0.21389 | none | false | 3.8536e-05 | [270 139] | | 99 | Best | 0.21202 | 49.605 | 0.21202 | 0.21386 | none | false | 1.7719e-08 | [280 59 2] | | 100 | Accept | 0.21488 | 3.0963 | 0.21202 | 0.21383 | none | false | 1.9173e-08 | 1 | ```

```__________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 100 reached. Total function evaluations: 100 Total elapsed time: 2577.3756 seconds Total objective function evaluation time: 2526.3743 Best observed feasible point: Activations Standardize Lambda LayerSizes ___________ ___________ __________ _________________ none false 1.7719e-08 280 59 2 Observed objective function value = 0.21202 Estimated objective function value = 0.21541 Function evaluation time = 49.6049 Best estimated feasible point (according to models): Activations Standardize Lambda LayerSizes ___________ ___________ __________ _______________ none false 0.00010157 6 5 298 Estimated objective function value = 0.21383 Estimated function evaluation time = 32.5882 ```
```Mdl = ClassificationNeuralNetwork PredictorNames: {'WC_TA' 'RE_TA' 'EBIT_TA' 'MVE_BVTD' 'S_TA' 'Industry'} ResponseName: 'Rating' CategoricalPredictors: 6 ClassNames: [AAA AA A BBB BB B CCC] ScoreTransform: 'none' NumObservations: 3146 HyperparameterOptimizationResults: [1×1 BayesianOptimization] LayerSizes: [6 5 298] Activations: 'none' OutputLayerActivation: 'softmax' Solver: 'LBFGS' ConvergenceInfo: [1×1 struct] TrainingHistory: [1000×7 table] Properties, Methods ```

`Mdl` is a trained `ClassificationNeuralNetwork` classifier. The model corresponds to the best estimated feasible point, as opposed to the best observed feasible point. (For details on this distinction, see `bestPoint`.) You can use dot notation to access the properties of `Mdl`. For example, you can specify `Mdl.HyperparameterOptimizationResults` to get more information about the optimization of the neural network model.

Find the classification accuracy of the model on the test data set. Visualize the results by using a confusion matrix.

```modelAccuracy = 1 - loss(Mdl,creditTest,"Rating", ... "LossFun","classiferror")```
```modelAccuracy = 0.8041 ```
`confusionchart(creditTest.Rating,predict(Mdl,creditTest))`

The model has all predicted classes within one unit of the true classes, meaning all predictions are off by no more than one rating.

Train a neural network classifier using the `OptimizeHyperparameters` argument to improve the resulting classification accuracy. Use the `hyperparameters` function to specify larger-than-default values for the number of layers used and the layer size range.

Read the sample file `CreditRating_Historical.dat` into a table. The predictor data consists of financial ratios and industry sector information for a list of corporate customers. The response variable consists of credit ratings assigned by a rating agency.

`creditrating = readtable("CreditRating_Historical.dat");`

Because each value in the `ID` variable is a unique customer ID, that is, `length(unique(creditrating.ID))` is equal to the number of observations in `creditrating`, the `ID` variable is a poor predictor. Remove the `ID` variable from the table, and convert the `Industry` variable to a `categorical` variable.

```creditrating = removevars(creditrating,"ID"); creditrating.Industry = categorical(creditrating.Industry);```

Convert the `Rating` response variable to an ordinal `categorical` variable.

```creditrating.Rating = categorical(creditrating.Rating, ... ["AAA","AA","A","BBB","BB","B","CCC"],"Ordinal",true);```

Partition the data into training and test sets. Use approximately 80% of the observations to train a neural network model, and 20% of the observations to test the performance of the trained model on new data. Use `cvpartition` to partition the data.

```rng("default") % For reproducibility of the partition c = cvpartition(creditrating.Rating,"Holdout",0.20); trainingIndices = training(c); % Indices for the training set testIndices = test(c); % Indices for the test set creditTrain = creditrating(trainingIndices,:); creditTest = creditrating(testIndices,:);```

List the hyperparameters available for this problem of fitting the `Rating` response.

```params = hyperparameters("fitcnet",creditTrain,"Rating"); for ii = 1:length(params) disp(ii);disp(params(ii)) end```
``` 1 optimizableVariable with properties: Name: 'NumLayers' Range: [1 3] Type: 'integer' Transform: 'none' Optimize: 1 2 optimizableVariable with properties: Name: 'Activations' Range: {'relu' 'tanh' 'sigmoid' 'none'} Type: 'categorical' Transform: 'none' Optimize: 1 3 optimizableVariable with properties: Name: 'Standardize' Range: {'true' 'false'} Type: 'categorical' Transform: 'none' Optimize: 1 4 optimizableVariable with properties: Name: 'Lambda' Range: [3.1786e-09 31.7864] Type: 'real' Transform: 'log' Optimize: 1 5 optimizableVariable with properties: Name: 'LayerWeightsInitializer' Range: {'glorot' 'he'} Type: 'categorical' Transform: 'none' Optimize: 0 6 optimizableVariable with properties: Name: 'LayerBiasesInitializer' Range: {'zeros' 'ones'} Type: 'categorical' Transform: 'none' Optimize: 0 7 optimizableVariable with properties: Name: 'Layer_1_Size' Range: [1 300] Type: 'integer' Transform: 'log' Optimize: 1 8 optimizableVariable with properties: Name: 'Layer_2_Size' Range: [1 300] Type: 'integer' Transform: 'log' Optimize: 1 9 optimizableVariable with properties: Name: 'Layer_3_Size' Range: [1 300] Type: 'integer' Transform: 'log' Optimize: 1 10 optimizableVariable with properties: Name: 'Layer_4_Size' Range: [1 300] Type: 'integer' Transform: 'log' Optimize: 0 11 optimizableVariable with properties: Name: 'Layer_5_Size' Range: [1 300] Type: 'integer' Transform: 'log' Optimize: 0 ```

To try more layers than the default of 1 through 3, set the range of `NumLayers` (optimizable variable 1) to its maximum allowable size, `[1 5]`. Also, set `Layer_4_Size` and `Layer_5_Size` (optimizable variables 10 and 11, respectively) to be optimized.

```params(1).Range = [1 5]; params(10).Optimize = true; params(11).Optimize = true;```

Set the range of all layer sizes (optimizable variables 7 through 11) to `[1 400]` instead of the default `[1 300]`.

```for ii = 7:11 params(ii).Range = [1 400]; end```

Train a neural network classifier by passing the training data `creditTrain` to the `fitcnet` function, and include the `OptimizeHyperparameters` argument set to `params`. For reproducibility, set the `AcquisitionFunctionName` to `"expected-improvement-plus"` in a `HyperparameterOptimizationOptions` structure. To attempt to get a better solution, set the number of optimization steps to 100 instead of the default 30.

```rng("default") % For reproducibility Mdl = fitcnet(creditTrain,"Rating","OptimizeHyperparameters",params, ... "HyperparameterOptimizationOptions", ... struct("AcquisitionFunctionName","expected-improvement-plus", ... "MaxObjectiveEvaluations",100))```
```|============================================================================================================================================| | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Activations | Standardize | Lambda | LayerSizes | | | result | | runtime | (observed) | (estim.) | | | | | |============================================================================================================================================| | 1 | Best | 0.74189 | 2.2062 | 0.74189 | 0.74189 | sigmoid | true | 0.68961 | [104 1 5 3 1] | | 2 | Best | 0.2225 | 70.081 | 0.2225 | 0.24316 | relu | true | 0.00058564 | [ 38 208 162] | | 3 | Accept | 0.63891 | 13.086 | 0.2225 | 0.22698 | sigmoid | true | 1.9768e-06 | [ 1 25 1 287 7] | | 4 | Best | 0.21933 | 33.886 | 0.21933 | 0.22307 | none | false | 1.3353e-06 | 320 | | 5 | Accept | 0.74189 | 0.27024 | 0.21933 | 0.21936 | relu | true | 2.7056 | [ 1 2 1] | | 6 | Accept | 0.29148 | 96.764 | 0.21933 | 0.21936 | relu | true | 1.0503e-06 | [301 31 400] | | 7 | Accept | 0.6869 | 4.2153 | 0.21933 | 0.21936 | relu | true | 0.0113 | [ 97 5 56] | | 8 | Accept | 0.74189 | 0.28736 | 0.21933 | 0.21936 | relu | true | 0.053563 | [ 2 92 1] | | 9 | Accept | 0.25238 | 74.737 | 0.21933 | 0.2221 | relu | true | 0.00010812 | [ 8 137 232] | | 10 | Accept | 0.29784 | 213.19 | 0.21933 | 0.21936 | relu | true | 2.3488e-07 | [ 30 397 364] | | 11 | Accept | 0.74189 | 0.27991 | 0.21933 | 0.21936 | none | true | 10.18 | 204 | | 12 | Best | 0.21392 | 35.925 | 0.21392 | 0.21395 | none | false | 3.4691e-06 | [ 7 355 2] | | 13 | Accept | 0.74189 | 0.82149 | 0.21392 | 0.21395 | none | false | 31.657 | [193 53 5 90 355] | | 14 | Accept | 0.21488 | 45.397 | 0.21392 | 0.21443 | none | false | 8.607e-06 | [126 80 2 86 2] | | 15 | Accept | 0.2349 | 60.527 | 0.21392 | 0.21443 | relu | false | 9.4208e-06 | [ 38 6 379 4] | | 16 | Accept | 0.21901 | 46.638 | 0.21392 | 0.21443 | relu | false | 0.0018197 | [ 6 20 205 30 51] | | 17 | Accept | 0.22282 | 68.41 | 0.21392 | 0.21443 | relu | false | 1.2196e-07 | [ 5 3 91 45 163] | | 18 | Accept | 0.74189 | 1.5076 | 0.21392 | 0.21387 | relu | false | 10.565 | [394 397 39] | | 19 | Accept | 0.24348 | 57.89 | 0.21392 | 0.21442 | relu | false | 2.7033e-08 | [ 52 49 195 11 2] | | 20 | Accept | 0.21933 | 54.865 | 0.21392 | 0.21411 | relu | false | 5.3281e-09 | [ 4 26 276 4] | |============================================================================================================================================| | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Activations | Standardize | Lambda | LayerSizes | | | result | | runtime | (observed) | (estim.) | | | | | |============================================================================================================================================| | 21 | Accept | 0.21583 | 101.52 | 0.21392 | 0.21413 | relu | false | 0.00095213 | [ 98 25 120 70 321] | | 22 | Accept | 0.74189 | 1.1203 | 0.21392 | 0.21413 | tanh | false | 10.324 | [ 5 19 325 100 286] | | 23 | Accept | 0.2225 | 76.344 | 0.21392 | 0.21413 | tanh | true | 3.1717e-07 | [ 4 3 400] | | 24 | Accept | 0.21996 | 39.348 | 0.21392 | 0.21412 | tanh | true | 6.0973e-06 | [ 6 3 202 2] | | 25 | Accept | 0.74189 | 0.70734 | 0.21392 | 0.21389 | tanh | true | 0.47944 | [ 91 21 276 10 202] | | 26 | Accept | 0.6424 | 7.8651 | 0.21392 | 0.21391 | relu | true | 4.153e-06 | [ 27 1 208 1 20] | | 27 | Accept | 0.23808 | 124.09 | 0.21392 | 0.21391 | relu | false | 4.7143e-07 | [116 111 327 4 9] | | 28 | Accept | 0.21869 | 59.477 | 0.21392 | 0.21394 | none | false | 0.00020517 | [213 245 1 45 6] | | 29 | Accept | 0.74189 | 0.84795 | 0.21392 | 0.21394 | tanh | true | 0.066046 | [ 2 222 63] | | 30 | Accept | 0.23013 | 44.975 | 0.21392 | 0.21394 | tanh | true | 1.6445e-07 | [184 1 32 21] | | 31 | Accept | 0.21583 | 30.499 | 0.21392 | 0.214 | none | false | 8.3607e-09 | [172 13 1] | | 32 | Accept | 0.29021 | 162.91 | 0.21392 | 0.2114 | relu | true | 0.0054118 | [ 79 385 325] | | 33 | Accept | 0.22028 | 7.3966 | 0.21392 | 0.21435 | none | false | 6.2688e-07 | [ 5 13] | | 34 | Accept | 0.21488 | 4.797 | 0.21392 | 0.21359 | none | false | 2.5162e-08 | [ 1 1 17] | | 35 | Accept | 0.21805 | 10.065 | 0.21392 | 0.21515 | relu | false | 3.3182e-05 | [ 6 5 3 13] | | 36 | Accept | 0.23268 | 9.1618 | 0.21392 | 0.21493 | relu | false | 3.9676e-09 | [ 36 4] | | 37 | Accept | 0.21519 | 44.065 | 0.21392 | 0.21394 | none | false | 2.1955e-07 | [ 16 34 350 4 31] | | 38 | Accept | 0.33249 | 26.542 | 0.21392 | 0.21231 | relu | false | 0.0010092 | [ 24 1 207] | | 39 | Accept | 0.21583 | 21.537 | 0.21392 | 0.21394 | relu | false | 2.5221e-05 | [ 1 95] | | 40 | Accept | 0.22123 | 89.369 | 0.21392 | 0.21394 | relu | true | 0.0002332 | [ 5 392 160] | |============================================================================================================================================| | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Activations | Standardize | Lambda | LayerSizes | | | result | | runtime | (observed) | (estim.) | | | | | |============================================================================================================================================| | 41 | Accept | 0.28894 | 229.82 | 0.21392 | 0.21393 | relu | true | 5.2515e-05 | [153 394 315] | | 42 | Accept | 0.22123 | 166.4 | 0.21392 | 0.21393 | none | false | 4.1509e-09 | [235 399 62 148] | | 43 | Accept | 0.27654 | 19.776 | 0.21392 | 0.21392 | relu | false | 1.1969e-06 | [ 75 18] | | 44 | Accept | 0.2705 | 91.89 | 0.21392 | 0.21393 | relu | false | 3.9338e-09 | [ 78 387 42 65] | | 45 | Accept | 0.21678 | 159.34 | 0.21392 | 0.21396 | none | false | 3.3979e-05 | [ 2 350 376 2] | | 46 | Accept | 0.21678 | 5.3698 | 0.21392 | 0.21396 | none | false | 0.00019489 | [ 10 4] | | 47 | Best | 0.2136 | 40.323 | 0.2136 | 0.21359 | none | false | 5.8608e-08 | [ 21 382 2] | | 48 | Accept | 0.22918 | 18.359 | 0.2136 | 0.21359 | relu | true | 3.1819e-09 | [ 3 71] | | 49 | Accept | 0.27591 | 81.573 | 0.2136 | 0.21359 | relu | false | 8.1967e-06 | [ 55 388 56] | | 50 | Accept | 0.29593 | 10.722 | 0.2136 | 0.21359 | tanh | true | 2.5573e-06 | 28 | | 51 | Accept | 0.31532 | 81.712 | 0.2136 | 0.21361 | tanh | true | 1.7419e-06 | [216 24 25 62 94] | | 52 | Accept | 0.21869 | 46.876 | 0.2136 | 0.21361 | relu | false | 3.3288e-09 | [ 25 1 310] | | 53 | Accept | 0.21837 | 44.823 | 0.2136 | 0.21359 | none | false | 1.3416e-05 | [ 2 2 386 33] | | 54 | Accept | 0.23872 | 86.465 | 0.2136 | 0.21359 | tanh | true | 3.1991e-09 | [ 9 2 233 13 297] | | 55 | Accept | 0.21742 | 22.42 | 0.2136 | 0.21359 | none | false | 0.00017978 | [346 36] | | 56 | Accept | 0.3506 | 53.374 | 0.2136 | 0.2136 | relu | false | 8.9375e-08 | [213 1 22 222] | | 57 | Accept | 0.21583 | 47.939 | 0.2136 | 0.2136 | relu | false | 4.0858e-09 | [ 1 20 75 7 160] | | 58 | Accept | 0.25048 | 63.899 | 0.2136 | 0.2136 | relu | false | 1.8367e-05 | [133 18 5 8 265] | | 59 | Accept | 0.21392 | 24.587 | 0.2136 | 0.2136 | relu | false | 0.00025743 | [ 4 49 78] | | 60 | Accept | 0.21996 | 57.638 | 0.2136 | 0.21361 | none | false | 6.077e-09 | [ 18 2 199 34 291] | |============================================================================================================================================| | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Activations | Standardize | Lambda | LayerSizes | | | result | | runtime | (observed) | (estim.) | | | | | |============================================================================================================================================| | 61 | Accept | 0.21837 | 52.847 | 0.2136 | 0.21359 | none | false | 4.7921e-05 | [ 53 3 5 33 388] | | 62 | Accept | 0.22028 | 46.08 | 0.2136 | 0.21359 | none | false | 4.2742e-09 | [206 87 9 20 39] | | 63 | Accept | 0.21774 | 15.034 | 0.2136 | 0.21359 | none | false | 1.0053e-07 | [ 68 3] | | 64 | Accept | 0.23554 | 68.289 | 0.2136 | 0.21359 | relu | true | 3.3518e-09 | [ 3 389 60] | | 65 | Accept | 0.22759 | 2.6688 | 0.2136 | 0.2136 | none | false | 0.00079006 | 64 | | 66 | Accept | 0.22187 | 55.67 | 0.2136 | 0.2136 | relu | false | 4.3532e-07 | [ 1 11 383] | | 67 | Accept | 0.21805 | 113.63 | 0.2136 | 0.21359 | relu | false | 3.3578e-09 | [ 4 4 384 244] | | 68 | Accept | 0.21742 | 39.749 | 0.2136 | 0.21359 | relu | false | 0.00042226 | [ 27 7 13 237] | | 69 | Accept | 0.29911 | 22.327 | 0.2136 | 0.2136 | sigmoid | false | 3.1977e-09 | [ 66 31] | | 70 | Accept | 0.28544 | 17.354 | 0.2136 | 0.21359 | sigmoid | false | 2.1618e-07 | 59 | | 71 | Accept | 0.4342 | 17.862 | 0.2136 | 0.2136 | sigmoid | false | 1.1526e-05 | [ 53 28 9 27 2] | | 72 | Accept | 0.24793 | 41.903 | 0.2136 | 0.21359 | sigmoid | false | 3.2532e-09 | 280 | | 73 | Accept | 0.74189 | 0.24831 | 0.2136 | 0.21359 | sigmoid | false | 29.321 | [ 58 1 5 3] | | 74 | Accept | 0.21805 | 11.378 | 0.2136 | 0.21359 | relu | false | 5.0967e-08 | [ 1 5 42] | | 75 | Accept | 0.21964 | 16.802 | 0.2136 | 0.2136 | none | true | 3.3747e-09 | [ 56 273] | | 76 | Accept | 0.21488 | 1.4504 | 0.2136 | 0.21359 | none | true | 3.6101e-09 | [ 1 19] | | 77 | Accept | 0.21456 | 9.5126 | 0.2136 | 0.2136 | none | true | 1.8426e-07 | [ 1 76 2] | | 78 | Accept | 0.21488 | 25.866 | 0.2136 | 0.21359 | none | true | 1.9217e-07 | [ 1 3 322 5] | | 79 | Accept | 0.21996 | 7.2836 | 0.2136 | 0.20963 | none | true | 3.5146e-09 | 182 | | 80 | Accept | 0.21996 | 26.22 | 0.2136 | 0.20986 | none | true | 1.9249e-08 | [ 51 79 345] | |============================================================================================================================================| | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Activations | Standardize | Lambda | LayerSizes | | | result | | runtime | (observed) | (estim.) | | | | | |============================================================================================================================================| | 81 | Accept | 0.21996 | 16.72 | 0.2136 | 0.20976 | none | true | 5.6038e-08 | [269 6] | | 82 | Accept | 0.21837 | 67.424 | 0.2136 | 0.21359 | none | true | 2.2486e-05 | [ 15 334 161] | | 83 | Accept | 0.21901 | 52.193 | 0.2136 | 0.2136 | none | true | 2.325e-07 | [ 43 397 22 5 4] | | 84 | Accept | 0.2136 | 25.949 | 0.2136 | 0.20893 | none | true | 1.4375e-05 | [ 3 23 161] | | 85 | Accept | 0.22568 | 9.2788 | 0.2136 | 0.21359 | relu | false | 0.00036954 | [ 1 25] | | 86 | Accept | 0.22123 | 9.0294 | 0.2136 | 0.2139 | none | true | 8.9433e-06 | 63 | | 87 | Accept | 0.21551 | 73.231 | 0.2136 | 0.20857 | relu | false | 0.00013186 | [ 1 10 235 79 56] | | 88 | Accept | 0.21996 | 45.161 | 0.2136 | 0.21359 | none | true | 4.6415e-06 | [274 61] | | 89 | Accept | 0.24253 | 35.809 | 0.2136 | 0.21359 | none | true | 0.0043392 | [105 351 3 2 244] | | 90 | Accept | 0.21392 | 26.066 | 0.2136 | 0.21359 | none | true | 0.0004037 | [ 68 57 5 189] | | 91 | Accept | 0.24634 | 8.1577 | 0.2136 | 0.21359 | tanh | false | 3.2373e-09 | 11 | | 92 | Accept | 0.23713 | 60.74 | 0.2136 | 0.2136 | tanh | false | 3.2168e-09 | [ 7 32 316 6] | | 93 | Accept | 0.23331 | 46.265 | 0.2136 | 0.2136 | tanh | false | 2.7471e-07 | [ 7 6 6 255] | | 94 | Accept | 0.22791 | 238.99 | 0.2136 | 0.2136 | tanh | false | 2.4117e-07 | [ 2 386 364 66] | | 95 | Accept | 0.30769 | 66.556 | 0.2136 | 0.2136 | relu | true | 3.2605e-09 | [380 72] | | 96 | Accept | 0.30038 | 70.252 | 0.2136 | 0.2136 | tanh | false | 9.629e-08 | [346 55] | | 97 | Accept | 0.2136 | 240.45 | 0.2136 | 0.21358 | tanh | false | 3.0728e-08 | [ 1 9 319 337 168] | | 98 | Accept | 0.21488 | 8.1832 | 0.2136 | 0.21358 | none | false | 4.8562e-09 | [ 1 108] | | 99 | Accept | 0.31945 | 33.121 | 0.2136 | 0.20612 | relu | false | 5.058e-07 | [ 1 214 6 2 13] | | 100 | Accept | 0.23299 | 79.247 | 0.2136 | 0.2058 | tanh | false | 1.4126e-07 | [204 1 298 3] | ```

```__________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 100 reached. Total function evaluations: 100 Total elapsed time: 4964.939 seconds Total objective function evaluation time: 4901.9365 Best observed feasible point: Activations Standardize Lambda LayerSizes ___________ ___________ __________ ________________ none false 5.8608e-08 21 382 2 Observed objective function value = 0.2136 Estimated objective function value = 0.21443 Function evaluation time = 40.3226 Best estimated feasible point (according to models): Activations Standardize Lambda LayerSizes ___________ ___________ __________ _____________ relu false 0.00025743 4 49 78 Estimated objective function value = 0.2058 Estimated function evaluation time = 25.2207 ```
```Mdl = ClassificationNeuralNetwork PredictorNames: {'WC_TA' 'RE_TA' 'EBIT_TA' 'MVE_BVTD' 'S_TA' 'Industry'} ResponseName: 'Rating' CategoricalPredictors: 6 ClassNames: [AAA AA A BBB BB B CCC] ScoreTransform: 'none' NumObservations: 3146 HyperparameterOptimizationResults: [1×1 BayesianOptimization] LayerSizes: [4 49 78] Activations: 'relu' OutputLayerActivation: 'softmax' Solver: 'LBFGS' ConvergenceInfo: [1×1 struct] TrainingHistory: [1000×7 table] Properties, Methods ```

Find the classification accuracy of the model on the test data set. Visualize the results by using a confusion matrix.

```testAccuracy = 1 - loss(Mdl,creditTest,"Rating", ... "LossFun","classiferror")```
```testAccuracy = 0.8117 ```
`confusionchart(creditTest.Rating,predict(Mdl,creditTest))`

The model has all predicted classes within one unit of the true classes, meaning all predictions are off by no more than one rating.

## Input Arguments

collapse all

Sample data used to train the model, specified as a table. Each row of `Tbl` corresponds to one observation, and each column corresponds to one predictor variable. Optionally, `Tbl` can contain one additional column for the response variable. Multicolumn variables and cell arrays other than cell arrays of character vectors are not allowed.

• If `Tbl` contains the response variable, and you want to use all remaining variables in `Tbl` as predictors, then specify the response variable by using `ResponseVarName`.

• If `Tbl` contains the response variable, and you want to use only a subset of the remaining variables in `Tbl` as predictors, then specify a formula by using `formula`.

• If `Tbl` does not contain the response variable, then specify a response variable by using `Y`. The length of the response variable and the number of rows in `Tbl` must be equal.

Response variable name, specified as the name of a variable in `Tbl`.

You must specify `ResponseVarName` as a character vector or string scalar. For example, if the response variable `Y` is stored as `Tbl.Y`, then specify it as `"Y"`. Otherwise, the software treats all columns of `Tbl`, including `Y`, as predictors when training the model.

The response variable must be a categorical, character, or string array; a logical or numeric vector; or a cell array of character vectors. If `Y` is a character array, then each element of the response variable must correspond to one row of the array.

A good practice is to specify the order of the classes by using the `ClassNames` name-value argument.

Data Types: `char` | `string`

Explanatory model of the response variable and a subset of the predictor variables, specified as a character vector or string scalar in the form `"Y~x1+x2+x3"`. In this form, `Y` represents the response variable, and `x1`, `x2`, and `x3` represent the predictor variables.

To specify a subset of variables in `Tbl` as predictors for training the model, use a formula. If you specify a formula, then the software does not use any variables in `Tbl` that do not appear in `formula`.

The variable names in the formula must be both variable names in `Tbl` (`Tbl.Properties.VariableNames`) and valid MATLAB® identifiers. You can verify the variable names in `Tbl` by using the `isvarname` function. If the variable names are not valid, then you can convert them by using the `matlab.lang.makeValidName` function.

Data Types: `char` | `string`

Class labels used to train the model, specified as a numeric, categorical, or logical vector; a character or string array; or a cell array of character vectors.

• If `Y` is a character array, then each element of the class labels must correspond to one row of the array.

• The length of `Y` must be equal to the number of rows in `Tbl` or `X`.

• A good practice is to specify the class order by using the `ClassNames` name-value argument.

Data Types: `single` | `double` | `categorical` | `logical` | `char` | `string` | `cell`

Predictor data used to train the model, specified as a numeric matrix.

By default, the software treats each row of `X` as one observation, and each column as one predictor.

The length of `Y` and the number of observations in `X` must be equal.

To specify the names of the predictors in the order of their appearance in `X`, use the `PredictorNames` name-value argument.

Note

If you orient your predictor matrix so that observations correspond to columns and specify `'ObservationsIn','columns'`, then you might experience a significant reduction in computation time.

Data Types: `single` | `double`

Note

The software treats `NaN`, empty character vector (`''`), empty string (`""`), `<missing>`, and `<undefined>` elements as missing values, and removes observations with any of these characteristics:

• Missing value in the response variable (for example, `Y` or `ValidationData``{2}`)

• At least one missing value in a predictor observation (for example, row in `X` or `ValidationData{1}`)

• `NaN` value or `0` weight (for example, value in `Weights` or `ValidationData{3}`)

• Class label with `0` prior probability (value in `Prior`)

### Name-Value Arguments

Specify optional pairs of arguments as `Name1=Value1,...,NameN=ValueN`, where `Name` is the argument name and `Value` is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose `Name` in quotes.

Example: ```fitcnet(X,Y,'LayerSizes',[10 10],'Activations',["relu","tanh"])``` specifies to create a neural network with two fully connected layers, each with 10 outputs. The first layer uses a rectified linear unit (ReLU) activation function, and the second uses a hyperbolic tangent activation function.

Neural Network Options

collapse all

Sizes of the fully connected layers in the neural network model, specified as a positive integer vector. The ith element of `LayerSizes` is the number of outputs in the ith fully connected layer of the neural network model.

`LayerSizes` does not include the size of the final fully connected layer that uses a softmax activation function. For more information, see Neural Network Structure.

Example: `'LayerSizes',[100 25 10]`

Activation functions for the fully connected layers of the neural network model, specified as a character vector, string scalar, string array, or cell array of character vectors with values from this table.

ValueDescription
`'relu'`

Rectified linear unit (ReLU) function — Performs a threshold operation on each element of the input, where any value less than zero is set to zero, that is,

`$f\left(x\right)=\left\{\begin{array}{cc}x,& x\ge 0\\ 0,& x<0\end{array}$`

`'tanh'`

Hyperbolic tangent (tanh) function — Applies the `tanh` function to each input element

`'sigmoid'`

Sigmoid function — Performs the following operation on each input element:

`$f\left(x\right)=\frac{1}{1+{e}^{-x}}$`

`'none'`

Identity function — Returns each input element without performing any transformation, that is, f(x) = x

• If you specify one activation function only, then `Activations` is the activation function for every fully connected layer of the neural network model, excluding the final fully connected layer. The activation function for the final fully connected layer is always softmax (see Neural Network Structure).

• If you specify an array of activation functions, then the ith element of `Activations` is the activation function for the ith layer of the neural network model.

Example: `'Activations','sigmoid'`

Function to initialize the fully connected layer weights, specified as `'glorot'` or `'he'`.

ValueDescription
`'glorot'`Initialize the weights with the Glorot initializer [1] (also known as the Xavier initializer). For each layer, the Glorot initializer independently samples from a uniform distribution with zero mean and variable `2/(I+O)`, where `I` is the input size and `O` is the output size for the layer.
`'he'`Initialize the weights with the He initializer [2]. For each layer, the He initializer samples from a normal distribution with zero mean and variance `2/I`, where `I` is the input size for the layer.

Example: `'LayerWeightsInitializer','he'`

Type of initial fully connected layer biases, specified as `'zeros'` or `'ones'`.

• If you specify the value `'zeros'`, then each fully connected layer has an initial bias of 0.

• If you specify the value `'ones'`, then each fully connected layer has an initial bias of 1.

Example: `'LayerBiasesInitializer','ones'`

Data Types: `char` | `string`

Predictor data observation dimension, specified as `'rows'` or `'columns'`.

Note

If you orient your predictor matrix so that observations correspond to columns and specify `'ObservationsIn','columns'`, then you might experience a significant reduction in computation time. You cannot specify `'ObservationsIn','columns'` for predictor data in a table.

Example: `'ObservationsIn','columns'`

Data Types: `char` | `string`

Regularization term strength, specified as a nonnegative scalar. The software composes the objective function for minimization from the cross-entropy loss function and the ridge (L2) penalty term.

Example: `'Lambda',1e-4`

Data Types: `single` | `double`

Flag to standardize the predictor data, specified as a numeric or logical `0` (`false`) or `1` (`true`). If you set `Standardize` to `true`, then the software centers and scales each numeric predictor variable by the corresponding column mean and standard deviation. The software does not standardize the categorical predictors.

Example: `'Standardize',true`

Data Types: `single` | `double` | `logical`

Convergence Control Options

collapse all

Verbosity level, specified as `0` or `1`. The `'Verbose'` name-value argument controls the amount of diagnostic information that `fitcnet` displays at the command line.

ValueDescription
`0``fitcnet` does not display diagnostic information.
`1``fitcnet` periodically displays diagnostic information.

By default, `StoreHistory` is set to `true` and `fitcnet` stores the diagnostic information inside of `Mdl`. Use `Mdl.TrainingHistory` to access the diagnostic information.

Example: `'Verbose',1`

Data Types: `single` | `double`

Frequency of verbose printing, which is the number of iterations between printing to the command window, specified as a positive integer scalar. A value of 1 indicates to print diagnostic information at every iteration.

Note

To use this name-value argument, set `Verbose` to `1`.

Example: `'VerboseFrequency',5`

Data Types: `single` | `double`

Flag to store the training history, specified as a numeric or logical `0` (`false`) or `1` (`true`). If `StoreHistory` is set to `true`, then the software stores diagnostic information inside of `Mdl`, which you can access by using `Mdl.TrainingHistory`.

Example: `'StoreHistory',false`

Data Types: `single` | `double` | `logical`

Initial step size, specified as a positive scalar or `'auto'`. By default, `fitcnet` does not use the initial step size to determine the initial Hessian approximation used in training the model (see Training Solver). However, if you specify an initial step size ${‖{s}_{0}‖}_{\infty }$, then the initial inverse-Hessian approximation is $\frac{{‖{s}_{0}‖}_{\infty }}{{‖\nabla {ℒ}_{0}‖}_{\infty }}I$. $\nabla {ℒ}_{0}$ is the initial gradient vector, and $I$ is the identity matrix.

To have `fitcnet` determine an initial step size automatically, specify the value as `'auto'` . In this case, the function determines the initial step size by using ${‖{s}_{0}‖}_{\infty }=0.5{‖{\eta }_{0}‖}_{\infty }+0.1$. ${s}_{0}$ is the initial step vector, and ${\eta }_{0}$ is the vector of unconstrained initial weights and biases.

Example: `'InitialStepSize','auto'`

Data Types: `single` | `double` | `char` | `string`

Maximum number of training iterations, specified as a positive integer scalar.

The software returns a trained model regardless of whether the training routine successfully converges. `Mdl.ConvergenceInfo` contains convergence information.

Example: `'IterationLimit',1e8`

Data Types: `single` | `double`

Relative gradient tolerance, specified as a nonnegative scalar.

Let ${ℒ}_{t}$ be the loss function at training iteration t, $\nabla {ℒ}_{t}$ be the gradient of the loss function with respect to the weights and biases at iteration t, and $\nabla {ℒ}_{0}$ be the gradient of the loss function at an initial point. If $\mathrm{max}|\nabla {ℒ}_{t}|\le a\cdot \text{GradientTolerance}$, where $a=\mathrm{max}\left(1,\mathrm{min}|{ℒ}_{t}|,\mathrm{max}|\nabla {ℒ}_{0}|\right)$, then the training process terminates.

Example: `'GradientTolerance',1e-5`

Data Types: `single` | `double`

Loss tolerance, specified as a nonnegative scalar.

If the function loss at some iteration is smaller than `LossTolerance`, then the training process terminates.

Example: `'LossTolerance',1e-8`

Data Types: `single` | `double`

Step size tolerance, specified as a nonnegative scalar.

If the step size at some iteration is smaller than `StepTolerance`, then the training process terminates.

Example: `'StepTolerance',1e-4`

Data Types: `single` | `double`

Validation data for training convergence detection, specified as a cell array or table.

During the training process, the software periodically estimates the validation loss by using `ValidationData`. If the validation loss increases more than `ValidationPatience` times in a row, then the software terminates the training.

You can specify `ValidationData` as a table if you use a table `Tbl` of predictor data that contains the response variable. In this case, `ValidationData` must contain the same predictors and response contained in `Tbl`. The software does not apply weights to observations, even if `Tbl` contains a vector of weights. To specify weights, you must specify `ValidationData` as a cell array.

If you specify `ValidationData` as a cell array, then it must have the following format:

• `ValidationData{1}` must have the same data type and orientation as the predictor data. That is, if you use a predictor matrix `X`, then `ValidationData{1}` must be an m-by-p or p-by-m matrix of predictor data that has the same orientation as `X`. The predictor variables in the training data `X` and `ValidationData{1}` must correspond. Similarly, if you use a predictor table `Tbl` of predictor data, then `ValidationData{1}` must be a table containing the same predictor variables contained in `Tbl`. The number of observations in `ValidationData{1}` and the predictor data can vary.

• `ValidationData{2}` must match the data type and format of the response variable, either `Y` or `ResponseVarName`. If `ValidationData{2}` is an array of class labels, then it must have the same number of elements as the number of observations in `ValidationData{1}`. The set of all distinct labels of `ValidationData{2}` must be a subset of all distinct labels of `Y`. If `ValidationData{1}` is a table, then `ValidationData{2}` can be the name of the response variable in the table. If you want to use the same `ResponseVarName` or `formula`, you can specify `ValidationData{2}` as `[]`.

• Optionally, you can specify `ValidationData{3}` as an m-dimensional numeric vector of observation weights or the name of a variable in the table `ValidationData{1}` that contains observation weights. The software normalizes the weights with the validation data so that they sum to 1.

If you specify `ValidationData` and want to display the validation loss at the command line, set `Verbose` to `1`.

Number of iterations between validation evaluations, specified as a positive integer scalar. A value of 1 indicates to evaluate validation metrics at every iteration.

Note

To use this name-value argument, you must specify `ValidationData`.

Example: `'ValidationFrequency',5`

Data Types: `single` | `double`

Stopping condition for validation evaluations, specified as a nonnegative integer scalar. The training process stops if the validation loss is greater than or equal to the minimum validation loss computed so far, `ValidationPatience` times in a row. You can check the `Mdl.TrainingHistory` table to see the running total of times that the validation loss is greater than or equal to the minimum (`Validation Checks`).

Example: `'ValidationPatience',10`

Data Types: `single` | `double`

Other Classification Options

collapse all

Categorical predictors list, specified as one of the values in this table. The descriptions assume that the predictor data has observations in rows and predictors in columns.

ValueDescription
Vector of positive integers

Each entry in the vector is an index value indicating that the corresponding predictor is categorical. The index values are between 1 and `p`, where `p` is the number of predictors used to train the model.

If `fitcnet` uses a subset of input variables as predictors, then the function indexes the predictors using only the subset. The `CategoricalPredictors` values do not count the response variable, observation weights variable, or any other variables that the function does not use.

Logical vector

A `true` entry means that the corresponding predictor is categorical. The length of the vector is `p`.

Character matrixEach row of the matrix is the name of a predictor variable. The names must match the entries in `PredictorNames`. Pad the names with extra blanks so each row of the character matrix has the same length.
String array or cell array of character vectorsEach element in the array is the name of a predictor variable. The names must match the entries in `PredictorNames`.
`"all"`All predictors are categorical.

By default, if the predictor data is in a table (`Tbl`), `fitcnet` assumes that a variable is categorical if it is a logical vector, categorical vector, character array, string array, or cell array of character vectors. If the predictor data is a matrix (`X`), `fitcnet` assumes that all predictors are continuous. To identify any other predictors as categorical predictors, specify them by using the `CategoricalPredictors` name-value argument.

For the identified categorical predictors, `fitcnet` creates dummy variables using two different schemes, depending on whether a categorical variable is unordered or ordered. For an unordered categorical variable, `fitcnet` creates one dummy variable for each level of the categorical variable. For an ordered categorical variable, `fitcnet` creates one less dummy variable than the number of categories. For details, see Automatic Creation of Dummy Variables.

Example: `'CategoricalPredictors','all'`

Data Types: `single` | `double` | `logical` | `char` | `string` | `cell`

Names of classes to use for training, specified as a categorical, character, or string array; a logical or numeric vector; or a cell array of character vectors. `ClassNames` must have the same data type as the response variable in `Tbl` or `Y`.

If `ClassNames` is a character array, then each element must correspond to one row of the array.

Use `ClassNames` to:

• Specify the order of the classes during training.

• Specify the order of any input or output argument dimension that corresponds to the class order. For example, use `ClassNames` to specify the order of the dimensions of `Cost` or the column order of classification scores returned by `predict`.

• Select a subset of classes for training. For example, suppose that the set of all distinct class names in `Y` is `["a","b","c"]`. To train the model using observations from classes `"a"` and `"c"` only, specify `"ClassNames",["a","c"]`.

The default value for `ClassNames` is the set of all distinct class names in the response variable in `Tbl` or `Y`.

Example: `"ClassNames",["b","g"]`

Data Types: `categorical` | `char` | `string` | `logical` | `single` | `double` | `cell`

Since R2023a

Misclassification cost, specified as a square matrix or structure array.

• If you specify a square matrix `Cost` and the true class of an observation is `i`, then `Cost(i,j)` is the cost of classifying a point into class `j`. That is, rows correspond to the true classes, and columns correspond to the predicted classes. To specify the class order for the corresponding rows and columns of `Cost`, also set the `ClassNames` name-value argument.

• If you specify a structure `S`, then it must have two fields:

• `S.ClassNames`, which contains the class names as a variable of the same data type as `Y`

• `S.ClassificationCosts`, which contains the cost matrix with rows and columns ordered as in `S.ClassNames`

The default value for `Cost` is ```ones(K) – eye(K)```, where `K` is the number of distinct classes.

Example: `"Cost",[0 1; 2 0]`

Data Types: `single` | `double` | `struct`

Predictor variable names, specified as a string array of unique names or cell array of unique character vectors. The functionality of `'PredictorNames'` depends on the way you supply the training data.

• If you supply `X` and `Y`, then you can use `'PredictorNames'` to assign names to the predictor variables in `X`.

• The order of the names in `PredictorNames` must correspond to the predictor order in `X`. Assuming that `X` has the default orientation, with observations in rows and predictors in columns, `PredictorNames{1}` is the name of `X(:,1)`, `PredictorNames{2}` is the name of `X(:,2)`, and so on. Also, `size(X,2)` and `numel(PredictorNames)` must be equal.

• By default, `PredictorNames` is `{'x1','x2',...}`.

• If you supply `Tbl`, then you can use `'PredictorNames'` to choose which predictor variables to use in training. That is, `fitcnet` uses only the predictor variables in `PredictorNames` and the response variable during training.

• `PredictorNames` must be a subset of `Tbl.Properties.VariableNames` and cannot include the name of the response variable.

• By default, `PredictorNames` contains the names of all predictor variables.

• A good practice is to specify the predictors for training using either `'PredictorNames'` or `formula`, but not both.

Example: `'PredictorNames',{'SepalLength','SepalWidth','PetalLength','PetalWidth'}`

Data Types: `string` | `cell`

Since R2023a

Prior class probabilities, specified as a value in this table.

ValueDescription
`"empirical"`The class prior probabilities are the class relative frequencies in `Y`.
`"uniform"`All class prior probabilities are equal to 1/K, where K is the number of classes.
numeric vectorEach element is a class prior probability. Order the elements according to `Mdl``.ClassNames` or specify the order using the `ClassNames` name-value argument. The software normalizes the elements to sum to `1`.
structure

A structure `S` with two fields:

• `S.ClassNames` contains the class names as a variable of the same type as `Y`.

• `S.ClassProbs` contains a vector of corresponding prior probabilities. The software normalizes the elements to sum to `1`.

Example: `"Prior",struct("ClassNames",["b","g"],"ClassProbs",1:2)`

Data Types: `single` | `double` | `char` | `string` | `struct`

Response variable name, specified as a character vector or string scalar.

Example: `"ResponseName","response"`

Data Types: `char` | `string`

Score transformation, specified as a character vector, string scalar, or function handle.

This table summarizes the available character vectors and string scalars.

ValueDescription
`"doublelogit"`1/(1 + e–2x)
`"invlogit"`log(x / (1 – x))
`"ismax"`Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0
`"logit"`1/(1 + ex)
`"none"` or `"identity"`x (no transformation)
`"sign"`–1 for x < 0
0 for x = 0
1 for x > 0
`"symmetric"`2x – 1
`"symmetricismax"`Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1
`"symmetriclogit"`2/(1 + ex) – 1

For a MATLAB function or a function you define, use its function handle for the score transform. The function handle must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores).

Example: `"ScoreTransform","logit"`

Data Types: `char` | `string` | `function_handle`

Observation weights, specified as a nonnegative numeric vector or the name of a variable in `Tbl`. The software weights each observation in `X` or `Tbl` with the corresponding value in `Weights`. The length of `Weights` must equal the number of observations in `X` or `Tbl`.

If you specify the input data as a table `Tbl`, then `Weights` can be the name of a variable in `Tbl` that contains a numeric vector. In this case, you must specify `Weights` as a character vector or string scalar. For example, if the weights vector `W` is stored as `Tbl.W`, then specify it as `'W'`. Otherwise, the software treats all columns of `Tbl`, including `W`, as predictors or the response variable when training the model.

By default, `Weights` is `ones(n,1)`, where `n` is the number of observations in `X` or `Tbl`.

The software normalizes `Weights` to sum to the value of the prior probability in the respective class.

Data Types: `single` | `double` | `char` | `string`

Note

You cannot use any cross-validation name-value argument together with the `'OptimizeHyperparameters'` name-value argument. You can modify the cross-validation for `'OptimizeHyperparameters'` only by using the `'HyperparameterOptimizationOptions'` name-value argument.

Cross-Validation Options

collapse all

Flag to train a cross-validated classifier, specified as `'on'` or `'off'`.

If you specify `'on'`, then the software trains a cross-validated classifier with 10 folds.

You can override this cross-validation setting using the `CVPartition`, `Holdout`, `KFold`, or `Leaveout` name-value argument. You can use only one cross-validation name-value argument at a time to create a cross-validated model.

Alternatively, cross-validate later by passing `Mdl` to `crossval`.

Example: `'Crossval','on'`

Data Types: `char` | `string`

Cross-validation partition, specified as a `cvpartition` partition object created by `cvpartition`. The partition object specifies the type of cross-validation and the indexing for the training and validation sets.

To create a cross-validated model, you can specify only one of these four name-value arguments: `CVPartition`, `Holdout`, `KFold`, or `Leaveout`.

Example: Suppose you create a random partition for 5-fold cross-validation on 500 observations by using `cvp = cvpartition(500,'KFold',5)`. Then, you can specify the cross-validated model by using `'CVPartition',cvp`.

Fraction of the data used for holdout validation, specified as a scalar value in the range (0,1). If you specify `'Holdout',p`, then the software completes these steps:

1. Randomly select and reserve `p*100`% of the data as validation data, and train the model using the rest of the data.

2. Store the compact, trained model in the `Trained` property of the cross-validated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: `CVPartition`, `Holdout`, `KFold`, or `Leaveout`.

Example: `'Holdout',0.1`

Data Types: `double` | `single`

Number of folds to use in a cross-validated model, specified as a positive integer value greater than 1. If you specify `'KFold',k`, then the software completes these steps:

1. Randomly partition the data into `k` sets.

2. For each set, reserve the set as validation data, and train the model using the other `k` – 1 sets.

3. Store the `k` compact, trained models in a `k`-by-1 cell vector in the `Trained` property of the cross-validated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: `CVPartition`, `Holdout`, `KFold`, or `Leaveout`.

Example: `'KFold',5`

Data Types: `single` | `double`

Leave-one-out cross-validation flag, specified as `'on'` or `'off'`. If you specify `'Leaveout','on'`, then for each of the n observations (where n is the number of observations, excluding missing observations, specified in the `NumObservations` property of the model), the software completes these steps:

1. Reserve the one observation as validation data, and train the model using the other n – 1 observations.

2. Store the n compact, trained models in an n-by-1 cell vector in the `Trained` property of the cross-validated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: `CVPartition`, `Holdout`, `KFold`, or `Leaveout`.

Example: `'Leaveout','on'`

Hyperparameter Optimization Options

collapse all

Parameters to optimize, specified as one of the following:

• `'none'` — Do not optimize.

• `'auto'` — Use `{'Activations','Lambda','LayerSizes','Standardize'}`.

• `'all'` — Optimize all eligible parameters.

• String array or cell array of eligible parameter names.

• Vector of `optimizableVariable` objects, typically the output of `hyperparameters`.

The optimization attempts to minimize the cross-validation loss (error) for `fitcnet` by varying the parameters. For information about cross-validation loss (although in a different context), see Classification Loss. To control the cross-validation type and other aspects of the optimization, use the `HyperparameterOptimizationOptions` name-value argument.

Note

The values of `'OptimizeHyperparameters'` override any values you specify using other name-value arguments. For example, setting `'OptimizeHyperparameters'` to `'auto'` causes `fitcnet` to optimize hyperparameters corresponding to the `'auto'` option and to ignore any specified values for the hyperparameters.

The eligible parameters for `fitcnet` are:

• `Activations``fitcnet` optimizes `Activations` over the set `{'relu','tanh','sigmoid','none'}`.

• `Lambda``fitcnet` optimizes `Lambda` over continuous values in the range `[1e-5,1e5]/NumObservations`, where the value is chosen uniformly in the log transformed range.

• `LayerBiasesInitializer``fitcnet` optimizes `LayerBiasesInitializer` over the two values `{'zeros','ones'}`.

• `LayerWeightsInitializer``fitcnet` optimizes `LayerWeightsInitializer` over the two values `{'glorot','he'}`.

• `LayerSizes``fitcnet` optimizes over the three values `1`, `2`, and `3` fully connected layers, excluding the final fully connected layer. `fitcnet` optimizes each fully connected layer separately over `1` through `300` sizes in the layer, sampled on a logarithmic scale.

Note

When you use the `LayerSizes` argument, the iterative display shows the size of each relevant layer. For example, if the current number of fully connected layers is `3`, and the three layers are of sizes `10`, `79`, and `44` respectively, the iterative display shows `LayerSizes` for that iteration as `[10 79 44]`.

Note

To access up to five fully connected layers or a different range of sizes in a layer, use `hyperparameters` to select the optimizable parameters and ranges.

• `Standardize``fitcnet` optimizes `Standardize` over the two values `{true,false}`.

Set nondefault parameters by passing a vector of `optimizableVariable` objects that have nondefault values. As an example, this code sets the range of `NumLayers` to ```[1 5]``` and optimizes `Layer_4_Size` and `Layer_5_Size`:

```load fisheriris params = hyperparameters('fitcnet',meas,species); params(1).Range = [1 5]; params(10).Optimize = true; params(11).Optimize = true;```

Pass `params` as the value of `OptimizeHyperparameters`. For an example using nondefault parameters, see Customize Neural Network Classifier Optimization.

By default, the iterative display appears at the command line, and plots appear according to the number of hyperparameters in the optimization. For the optimization and plots, the objective function is the misclassification rate. To control the iterative display, set the `Verbose` field of the `'HyperparameterOptimizationOptions'` name-value argument. To control the plots, set the `ShowPlots` field of the `'HyperparameterOptimizationOptions'` name-value argument.

For an example, see Improve Neural Network Classifier Using OptimizeHyperparameters.

Example: `'OptimizeHyperparameters','auto'`

Options for optimization, specified as a structure. This argument modifies the effect of the `OptimizeHyperparameters` name-value argument. All fields in the structure are optional.

Field NameValuesDefault
`Optimizer`
• `'bayesopt'` — Use Bayesian optimization. Internally, this setting calls `bayesopt`.

• `'gridsearch'` — Use grid search with `NumGridDivisions` values per dimension.

• `'randomsearch'` — Search at random among `MaxObjectiveEvaluations` points.

`'gridsearch'` searches in a random order, using uniform sampling without replacement from the grid. After optimization, you can get a table in grid order by using the command `sortrows(Mdl.HyperparameterOptimizationResults)`.

`'bayesopt'`
`AcquisitionFunctionName`

• `'expected-improvement-per-second-plus'`

• `'expected-improvement'`

• `'expected-improvement-plus'`

• `'expected-improvement-per-second'`

• `'lower-confidence-bound'`

• `'probability-of-improvement'`

Acquisition functions whose names include `per-second` do not yield reproducible results because the optimization depends on the runtime of the objective function. Acquisition functions whose names include `plus` modify their behavior when they are overexploiting an area. For more details, see Acquisition Function Types.

`'expected-improvement-per-second-plus'`
`MaxObjectiveEvaluations`Maximum number of objective function evaluations.`30` for `'bayesopt'` and `'randomsearch'`, and the entire grid for `'gridsearch'`
`MaxTime`

Time limit, specified as a positive real scalar. The time limit is in seconds, as measured by `tic` and `toc`. The run time can exceed `MaxTime` because `MaxTime` does not interrupt function evaluations.

`Inf`
`NumGridDivisions`For `'gridsearch'`, the number of values in each dimension. The value can be a vector of positive integers giving the number of values for each dimension, or a scalar that applies to all dimensions. This field is ignored for categorical variables.`10`
`ShowPlots`Logical value indicating whether to show plots. If `true`, this field plots the best observed objective function value against the iteration number. If you use Bayesian optimization (`Optimizer` is `'bayesopt'`), then this field also plots the best estimated objective function value. The best observed objective function values and best estimated objective function values correspond to the values in the `BestSoFar (observed)` and ```BestSoFar (estim.)``` columns of the iterative display, respectively. You can find these values in the properties `ObjectiveMinimumTrace` and `EstimatedObjectiveMinimumTrace` of `Mdl.HyperparameterOptimizationResults`. If the problem includes one or two optimization parameters for Bayesian optimization, then `ShowPlots` also plots a model of the objective function against the parameters.`true`
`SaveIntermediateResults`Logical value indicating whether to save results when `Optimizer` is `'bayesopt'`. If `true`, this field overwrites a workspace variable named `'BayesoptResults'` at each iteration. The variable is a `BayesianOptimization` object.`false`
`Verbose`

Display at the command line:

• `0` — No iterative display

• `1` — Iterative display

• `2` — Iterative display with extra information

For details, see the `bayesopt` `Verbose` name-value argument and the example Optimize Classifier Fit Using Bayesian Optimization.

`1`
`UseParallel`Logical value indicating whether to run Bayesian optimization in parallel, which requires Parallel Computing Toolbox™. Due to the nonreproducibility of parallel timing, parallel Bayesian optimization does not necessarily yield reproducible results. For details, see Parallel Bayesian Optimization.`false`
`Repartition`

Logical value indicating whether to repartition the cross-validation at every iteration. If this field is `false`, the optimizer uses a single partition for the optimization.

The setting `true` usually gives the most robust results because it takes partitioning noise into account. However, for good results, `true` requires at least twice as many function evaluations.

`false`
Use no more than one of the following three options.
`CVPartition`A `cvpartition` object, as created by `cvpartition``'Kfold',5` if you do not specify a cross-validation field
`Holdout`A scalar in the range `(0,1)` representing the holdout fraction
`Kfold`An integer greater than 1

Example: `'HyperparameterOptimizationOptions',struct('MaxObjectiveEvaluations',60)`

Data Types: `struct`

## Output Arguments

collapse all

Trained neural network classifier, returned as a `ClassificationNeuralNetwork` or `ClassificationPartitionedModel` object.

If you set any of the name-value arguments `CrossVal`, `CVPartition`, `Holdout`, `KFold`, or `Leaveout`, then `Mdl` is a `ClassificationPartitionedModel` object. Otherwise, `Mdl` is a `ClassificationNeuralNetwork` model.

To reference properties of `Mdl`, use dot notation.

collapse all

### Neural Network Structure

The default neural network classifier has the following layer structure.

StructureDescription

Input — This layer corresponds to the predictor data in `Tbl` or `X`.

First fully connected layer — This layer has 10 outputs by default.

• You can widen the layer or add more fully connected layers to the network by specifying the `LayerSizes` name-value argument.

• You can find the weights and biases for this layer in the `Mdl.LayerWeights{1}` and `Mdl.LayerBiases{1}` properties of `Mdl`, respectively.

ReLU activation function — `fitcnet` applies this activation function to the first fully connected layer.

Final fully connected layer — This layer has K outputs, where K is the number of classes in the response variable.

• You can find the weights and biases for this layer in the `Mdl.LayerWeights{end}` and `Mdl.LayerBiases{end}` properties of `Mdl`, respectively.

Softmax function (for both binary and multiclass classification) — `fitcnet` applies this activation function to the final fully connected layer. The function takes each input xi and returns the following, where K is the number of classes in the response variable:

`$f\left({x}_{i}\right)=\frac{\mathrm{exp}\left({x}_{i}\right)}{\sum _{j=1}^{K}\mathrm{exp}\left({x}_{j}\right)}.$`

The results correspond to the predicted classification scores (or posterior probabilities).

Output — This layer corresponds to the predicted class labels.

For an example that shows how a neural network classifier with this layer structure returns predictions, see Predict Using Layer Structure of Neural Network Classifier.

## Tips

• Always try to standardize the numeric predictors (see `Standardize`). Standardization makes predictors insensitive to the scales on which they are measured.

• After training a model, you can generate C/C++ code that predicts labels for new data. Generating C/C++ code requires MATLAB Coder™. For details, see Introduction to Code Generation.

## Algorithms

collapse all

### Training Solver

`fitcnet` uses a limited-memory Broyden-Fletcher-Goldfarb-Shanno quasi-Newton algorithm (LBFGS) [3] as its loss function minimization technique, where the software minimizes the cross-entropy loss. The LBFGS solver uses a standard line-search method with an approximation to the Hessian.

### `Cost`, `Prior`, and `Weights`

• If you specify the `Cost`, `Prior`, and `Weights` name-value arguments, the output model object stores the specified values in the `Cost`, `Prior`, and `W` properties, respectively. The `Cost` property stores the user-specified cost matrix as is. The `Prior` and `W` properties store the prior probabilities and observation weights, respectively, after normalization. For details, see Misclassification Cost Matrix, Prior Probabilities, and Observation Weights.

• The software uses the `Cost` property for prediction, but not training. Therefore, `Cost` is not read-only; you can change the property value by using dot notation after creating the trained model.

## References

[1] Glorot, Xavier, and Yoshua Bengio. “Understanding the difficulty of training deep feedforward neural networks.” In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp. 249–256. 2010.

[2] He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification.” In Proceedings of the IEEE international conference on computer vision, pp. 1026–1034. 2015.

[3] Nocedal, J. and S. J. Wright. Numerical Optimization, 2nd ed., New York: Springer, 2006.

## Version History

Introduced in R2021a

expand all