Main Content

fitrnet

Train neural network regression model

Since R2021a

Description

Use fitrnet to train a feedforward, fully connected neural network for regression. The first fully connected layer of the neural network has a connection from the network input (predictor data), and each subsequent layer has a connection from the previous layer. Each fully connected layer multiplies the input by a weight matrix and then adds a bias vector. An activation function follows each fully connected layer, excluding the last. The final fully connected layer produces the network's output, namely predicted response values. For more information, see Neural Network Structure.

example

Mdl = fitrnet(Tbl,ResponseVarName) returns a neural network regression model Mdl trained using the predictors in the table Tbl and the response values in the ResponseVarName table variable.

Mdl = fitrnet(Tbl,formula) returns a neural network regression model trained using the sample data in the table Tbl. The input argument formula is an explanatory model of the response and a subset of the predictor variables in Tbl used to fit Mdl.

Mdl = fitrnet(Tbl,Y) returns a neural network regression model using the predictor variables in the table Tbl and the response values in vector Y.

example

Mdl = fitrnet(X,Y) returns a neural network regression model trained using the predictors in the matrix X and the response values in vector Y.

example

Mdl = fitrnet(___,Name,Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. For example, you can adjust the number of outputs and the activation functions for the fully connected layers by specifying the LayerSizes and Activations name-value arguments.

Examples

collapse all

Train a neural network regression model, and assess the performance of the model on a test set.

Load the carbig data set, which contains measurements of cars made in the 1970s and early 1980s. Create a table containing the predictor variables Acceleration, Displacement, and so on, as well as the response variable MPG.

load carbig
cars = table(Acceleration,Displacement,Horsepower, ...
    Model_Year,Origin,Weight,MPG);

Remove rows of cars where the table has missing values.

cars = rmmissing(cars);

Categorize the cars based on whether they were made in the USA.

cars.Origin = categorical(cellstr(cars.Origin));
cars.Origin = mergecats(cars.Origin,["France","Japan",...
    "Germany","Sweden","Italy","England"],"NotUSA");

Partition the data into training and test sets. Use approximately 80% of the observations to train a neural network model, and 20% of the observations to test the performance of the trained model on new data. Use cvpartition to partition the data.

rng("default") % For reproducibility of the data partition
c = cvpartition(height(cars),"Holdout",0.20);
trainingIdx = training(c); % Training set indices
carsTrain = cars(trainingIdx,:);
testIdx = test(c); % Test set indices
carsTest = cars(testIdx,:);

Train a neural network regression model by passing the carsTrain training data to the fitrnet function. For better results, specify to standardize the predictor data.

Mdl = fitrnet(carsTrain,"MPG","Standardize",true)
Mdl = 
  RegressionNeuralNetwork
           PredictorNames: {'Acceleration'  'Displacement'  'Horsepower'  'Model_Year'  'Origin'  'Weight'}
             ResponseName: 'MPG'
    CategoricalPredictors: 5
        ResponseTransform: 'none'
          NumObservations: 314
               LayerSizes: 10
              Activations: 'relu'
    OutputLayerActivation: 'none'
                   Solver: 'LBFGS'
          ConvergenceInfo: [1x1 struct]
          TrainingHistory: [1000x7 table]


Mdl is a trained RegressionNeuralNetwork model. You can use dot notation to access the properties of Mdl. For example, you can specify Mdl.TrainingHistory to get more information about the training history of the neural network model.

Evaluate the performance of the regression model on the test set by computing the test mean squared error (MSE). Smaller MSE values indicate better performance.

testMSE = loss(Mdl,carsTest,"MPG")
testMSE = 6.8780

Specify the structure of the neural network regression model, including the size of the fully connected layers.

Load the carbig data set, which contains measurements of cars made in the 1970s and early 1980s. Create a matrix X containing the predictor variables Acceleration, Cylinders, and so on. Store the response variable MPG in the variable Y.

load carbig
X = [Acceleration Cylinders Displacement Weight];
Y = MPG;

Delete rows of X and Y where either array has missing values.

R = rmmissing([X Y]);
X = R(:,1:end-1);
Y = R(:,end);

Partition the data into training data (XTrain and YTrain) and test data (XTest and YTest). Reserve approximately 20% of the observations for testing, and use the rest of the observations for training.

rng("default") % For reproducibility of the partition
c = cvpartition(length(Y),"Holdout",0.20);
trainingIdx = training(c); % Indices for the training set
XTrain = X(trainingIdx,:);
YTrain = Y(trainingIdx);
testIdx = test(c); % Indices for the test set
XTest = X(testIdx,:);
YTest = Y(testIdx);

Train a neural network regression model. Specify to standardize the predictor data, and to have 30 outputs in the first fully connected layer and 10 outputs in the second fully connected layer. By default, both layers use a rectified linear unit (ReLU) activation function. You can change the activation functions for the fully connected layers by using the Activations name-value argument.

Mdl = fitrnet(XTrain,YTrain,"Standardize",true, ...
    "LayerSizes",[30 10])
Mdl = 
  RegressionNeuralNetwork
             ResponseName: 'Y'
    CategoricalPredictors: []
        ResponseTransform: 'none'
          NumObservations: 319
               LayerSizes: [30 10]
              Activations: 'relu'
    OutputLayerActivation: 'none'
                   Solver: 'LBFGS'
          ConvergenceInfo: [1x1 struct]
          TrainingHistory: [1000x7 table]


Access the weights and biases for the fully connected layers of the trained model by using the LayerWeights and LayerBiases properties of Mdl. The first two elements of each property correspond to the values for the first two fully connected layers, and the third element corresponds to the values for the final fully connected layer for regression. For example, display the weights and biases for the first fully connected layer.

Mdl.LayerWeights{1}
ans = 30×4

    0.0122    0.0116   -0.0094    0.1174
   -0.4400   -1.5674   -0.1234   -2.2396
    0.3370    0.2628   -1.9752    0.2937
   -2.9872   -3.1024   -0.9050   -1.5978
    0.7721    2.2010    1.3134    0.2364
    0.1718    1.8862   -3.0548   -0.4272
    0.9583   -0.0591   -0.9272   -0.3960
    1.6701   -0.1617   -1.2640    0.7811
   -0.7890   -0.8045    0.2993    1.5391
    0.2053   -2.3423    1.7768    1.1690
      ⋮

Mdl.LayerBiases{1}
ans = 30×1

   -0.4448
   -1.0814
   -0.5026
   -0.9984
    0.2245
   -2.1709
    1.6112
    1.3802
   -1.2855
    0.1969
      ⋮

The final fully connected layer has one output. The number of layer outputs corresponds to the first dimension of the layer weights and layer biases.

size(Mdl.LayerWeights{end})
ans = 1×2

     1    10

size(Mdl.LayerBiases{end})
ans = 1×2

     1     1

To estimate the performance of the trained model, compute the test set mean squared error (MSE) for Mdl. Smaller MSE values indicate better performance.

testMSE = loss(Mdl,XTest,YTest)
testMSE = 16.8576

Compare the predicted test set response values to the true response values. Plot the predicted miles per gallon (MPG) along the vertical axis and the true MPG along the horizontal axis. Points on the reference line indicate correct predictions. A good model produces predictions that are scattered near the line.

testPredictions = predict(Mdl,XTest);
plot(YTest,testPredictions,".")
hold on
plot(YTest,YTest)
hold off
xlabel("True MPG")
ylabel("Predicted MPG")

At each iteration of the training process, compute the validation loss of the neural network. Stop the training process early if the validation loss reaches a reasonable minimum.

Load the patients data set. Create a table from the data set. Each row corresponds to one patient, and each column corresponds to a diagnostic variable. Use the Systolic variable as the response variable, and the rest of the variables as predictors.

load patients
tbl = table(Age,Diastolic,Gender,Height,Smoker,Weight,Systolic);

Separate the data into a training set tblTrain and a validation set tblValidation. The software reserves approximately 30% of the observations for the validation data set and uses the rest of the observations for the training data set.

rng("default") % For reproducibility of the partition
c = cvpartition(size(tbl,1),"Holdout",0.30);
trainingIndices = training(c);
validationIndices = test(c);
tblTrain = tbl(trainingIndices,:);
tblValidation = tbl(validationIndices,:);

Train a neural network regression model by using the training set. Specify the Systolic column of tblTrain as the response variable. Evaluate the model at each iteration by using the validation set. Specify to display the training information at each iteration by using the Verbose name-value argument. By default, the training process ends early if the validation loss is greater than or equal to the minimum validation loss computed so far, six times in a row. To change the number of times the validation loss is allowed to be greater than or equal to the minimum, specify the ValidationPatience name-value argument.

Mdl = fitrnet(tblTrain,"Systolic", ...
    "ValidationData",tblValidation, ...
    "Verbose",1);
|==========================================================================================|
| Iteration  | Train Loss | Gradient   | Step       | Iteration  | Validation | Validation |
|            |            |            |            | Time (sec) | Loss       | Checks     |
|==========================================================================================|
|           1|  516.021993| 3220.880047|    0.644473|    0.023000|  568.289202|           0|
|           2|  313.056754|  229.931405|    0.067026|    0.009587|  304.023695|           0|
|           3|  308.461807|  277.166516|    0.011122|    0.008932|  296.935608|           0|
|           4|  262.492770|  844.627934|    0.143022|    0.002883|  240.559640|           0|
|           5|  169.558740| 1131.714363|    0.336463|    0.004403|  152.531663|           0|
|           6|   89.134368|  362.084104|    0.382677|    0.002484|   83.147478|           0|
|           7|   83.309729|  994.830303|    0.199923|    0.002555|   76.634122|           0|
|           8|   70.731524|  327.637362|    0.041366|    0.001595|   66.421750|           0|
|           9|   66.650091|  124.369963|    0.125232|    0.002874|   65.914063|           0|
|          10|   66.404753|   36.699328|    0.016768|    0.002803|   65.357335|           0|
|==========================================================================================|
| Iteration  | Train Loss | Gradient   | Step       | Iteration  | Validation | Validation |
|            |            |            |            | Time (sec) | Loss       | Checks     |
|==========================================================================================|
|          11|   66.357143|   46.712988|    0.009405|    0.002829|   65.306106|           0|
|          12|   66.268225|   54.079264|    0.007953|    0.004812|   65.234391|           0|
|          13|   65.788550|   99.453225|    0.030942|    0.004654|   64.869708|           0|
|          14|   64.821095|  186.344649|    0.048078|    0.008989|   64.191533|           0|
|          15|   62.353896|  319.273873|    0.107160|    0.007247|   62.618374|           0|
|          16|   57.836593|  447.826470|    0.184985|    0.019623|   60.087065|           0|
|          17|   51.188884|  524.631067|    0.253062|    0.000378|   56.646294|           0|
|          18|   41.755601|  189.072516|    0.318515|    0.002153|   49.046823|           0|
|          19|   37.539854|   78.602559|    0.382284|    0.000381|   44.633562|           0|
|          20|   36.845322|  151.837884|    0.211286|    0.006133|   47.291367|           1|
|==========================================================================================|
| Iteration  | Train Loss | Gradient   | Step       | Iteration  | Validation | Validation |
|            |            |            |            | Time (sec) | Loss       | Checks     |
|==========================================================================================|
|          21|   36.218289|   62.826818|    0.142748|    0.001283|   46.139104|           2|
|          22|   35.776921|   53.606315|    0.215188|    0.000320|   46.170460|           3|
|          23|   35.729085|   24.400342|    0.060096|    0.001292|   45.318023|           4|
|          24|   35.622031|    9.602277|    0.121153|    0.000322|   45.791861|           5|
|          25|   35.573317|   10.735070|    0.126854|    0.000308|   46.062826|           6|
|==========================================================================================|

Create a plot that compares the training mean squared error (MSE) and the validation MSE at each iteration. By default, fitrnet stores the loss information inside the TrainingHistory property of the object Mdl. You can access this information by using dot notation.

iteration = Mdl.TrainingHistory.Iteration;
trainLosses = Mdl.TrainingHistory.TrainingLoss;
valLosses = Mdl.TrainingHistory.ValidationLoss;
plot(iteration,trainLosses,iteration,valLosses)
legend(["Training","Validation"])
xlabel("Iteration")
ylabel("Mean Squared Error")

Check the iteration that corresponds to the minimum validation MSE. The final returned model Mdl is the model trained at this iteration.

[~,minIdx] = min(valLosses);
iteration(minIdx)
ans = 19

Assess the cross-validation loss of neural network models with different regularization strengths, and choose the regularization strength corresponding to the best performing model.

Load the carbig data set, which contains measurements of cars made in the 1970s and early 1980s. Create a table containing the predictor variables Acceleration, Displacement, and so on, as well as the response variable MPG.

load carbig
cars = table(Acceleration,Displacement,Horsepower, ...
    Model_Year,Origin,Weight,MPG);

Delete rows of cars where the table has missing values.

cars = rmmissing(cars);

Categorize the cars based on whether they were made in the USA.

cars.Origin = categorical(cellstr(cars.Origin));
cars.Origin = mergecats(cars.Origin,["France","Japan", ...
    "Germany","Sweden","Italy","England"],"NotUSA");

Create a cvpartition object for 5-fold cross-validation. cvp partitions the data into five folds, where each fold has roughly the same number of observations. Set the random seed to the default value for reproducibility of the partition.

rng("default")
n = size(cars,1);
cvp = cvpartition(n,"KFold",5);

Compute the cross-validation mean squared error (MSE) for neural network regression models with different regularization strengths. Try regularization strengths on the order of 1/n, where n is the number of observations. Specify to standardize the data before training the neural network models.

1/n
ans = 0.0026
lambda = (0:0.5:5)*1e-3;
cvloss = zeros(length(lambda),1);
for i = 1:length(lambda)
    cvMdl = fitrnet(cars,"MPG","Lambda",lambda(i), ...
        "CVPartition",cvp,"Standardize",true);
    cvloss(i) = kfoldLoss(cvMdl);
end

Plot the results. Find the regularization strength corresponding to the lowest cross-validation MSE.

plot(lambda,cvloss)
xlabel("Regularization Strength")
ylabel("Cross-Validation Loss")

Figure contains an axes object. The axes object with xlabel Regularization Strength, ylabel Cross-Validation Loss contains an object of type line.

[~,idx] = min(cvloss);
bestLambda = lambda(idx)
bestLambda = 0.0045

Train a neural network regression model using the bestLambda regularization strength.

Mdl = fitrnet(cars,"MPG","Lambda",bestLambda, ...
    "Standardize",true)
Mdl = 
  RegressionNeuralNetwork
           PredictorNames: {'Acceleration'  'Displacement'  'Horsepower'  'Model_Year'  'Origin'  'Weight'}
             ResponseName: 'MPG'
    CategoricalPredictors: 5
        ResponseTransform: 'none'
          NumObservations: 392
               LayerSizes: 10
              Activations: 'relu'
    OutputLayerActivation: 'none'
                   Solver: 'LBFGS'
          ConvergenceInfo: [1×1 struct]
          TrainingHistory: [761×7 table]


  Properties, Methods

Create a neural network with low error by using the OptimizeHyperparameters argument. This argument causes fitrnet to minimize cross-validation loss over some problem hyperparameters by using Bayesian optimization.

Load the carbig data set, which contains measurements of cars made in the 1970s and early 1980s. Create a table containing the predictor variables Acceleration, Displacement, and so on, as well as the response variable MPG.

load carbig
cars = table(Acceleration,Displacement,Horsepower, ...
    Model_Year,Origin,Weight,MPG);

Delete rows of cars where the table has missing values.

cars = rmmissing(cars);

Categorize the cars based on whether they were made in the USA.

cars.Origin = categorical(cellstr(cars.Origin));
cars.Origin = mergecats(cars.Origin,["France","Japan",...
    "Germany","Sweden","Italy","England"],"NotUSA");

Partition the data into training and test sets. Use approximately 80% of the observations to train a neural network model, and 20% of the observations to test the performance of the trained model on new data. Use cvpartition to partition the data.

rng("default") % For reproducibility of the data partition
c = cvpartition(height(cars),"Holdout",0.20);
trainingIdx = training(c); % Training set indices
carsTrain = cars(trainingIdx,:);
testIdx = test(c); % Test set indices
carsTest = cars(testIdx,:);

Train a regression neural network using the OptimizeHyperparameters argument set to "auto". For reproducibility, set the AcquisitionFunctionName to "expected-improvement-plus" in a HyperparameterOptimizationOptions structure. fitrnet performs Bayesian optimization by default. To use grid search or random search, set the Optimizer field in HyperparameterOptimizationOptions.

rng("default") % For reproducibility
Mdl = fitrnet(carsTrain,"MPG","OptimizeHyperparameters","auto", ...
    "HyperparameterOptimizationOptions",struct("AcquisitionFunctionName","expected-improvement-plus"))
|============================================================================================================================================|
| Iter | Eval   | Objective:  | Objective   | BestSoFar   | BestSoFar   |  Activations |  Standardize |       Lambda |            LayerSizes |
|      | result | log(1+loss) | runtime     | (observed)  | (estim.)    |              |              |              |                       |
|============================================================================================================================================|
|    1 | Best   |       2.223 |       9.231 |       2.223 |       2.223 |         relu |         true |        3.841 | [101  47  15]         |
|    2 | Accept |      3.0797 |      6.4178 |       2.223 |      2.2571 |      sigmoid |        false |   7.5401e-07 | [100  17]             |
|    3 | Best   |      2.1171 |      2.4398 |      2.1171 |      2.1312 |         relu |         true |      0.01569 |  15                   |
|    4 | Accept |      2.5142 |      4.2068 |      2.1171 |      2.1326 |         none |         true |   0.00016461 | [  2 145   8]         |
|    5 | Accept |      3.0246 |     0.26994 |      2.1171 |      2.1172 |         relu |         true |   5.4264e-08 |  1                    |
|    6 | Accept |      2.9859 |     0.77249 |      2.1171 |       2.171 |         relu |         true |       0.1243 | [  5   1]             |
|    7 | Accept |        2.14 |      2.5549 |      2.1171 |      2.1173 |         relu |         true |    0.0082696 |  17                   |
|    8 | Accept |      2.7596 |     0.41156 |      2.1171 |      2.1173 |         relu |         true |       5.8567 |  72                   |
|    9 | Accept |      3.0702 |      6.3673 |      2.1171 |      2.1173 |         relu |         true |   4.4611e-07 | [ 77  24  12]         |
|   10 | Accept |      2.2126 |      1.8954 |      2.1171 |      2.1177 |         relu |         true |   4.1722e-07 |  9                    |
|   11 | Accept |      2.9998 |      16.958 |      2.1171 |      2.1177 |         relu |         true |   0.00088575 | [250  47  63]         |
|   12 | Accept |      3.3504 |      11.524 |      2.1171 |      2.1173 |         relu |         true |   1.2716e-06 | [103  55  15]         |
|   13 | Accept |       2.223 |      1.7197 |      2.1171 |      2.1174 |         relu |         true |    0.0003368 |  10                   |
|   14 | Accept |      6.4098 |     0.22799 |      2.1171 |      2.1301 |         relu |         true |       251.71 | [ 67  34 275]         |
|   15 | Accept |       6.412 |     0.12626 |      2.1171 |      2.1175 |         relu |         true |       298.04 | [ 30  23  10]         |
|   16 | Accept |      2.1882 |      1.6043 |      2.1171 |      2.1176 |         relu |         true |   5.2998e-05 |  6                    |
|   17 | Accept |      2.5141 |     0.46234 |      2.1171 |      2.1176 |         none |         true |    0.0031007 |  4                    |
|   18 | Accept |      2.5139 |      2.2299 |      2.1171 |      2.1176 |         none |         true |      0.07401 | [ 33  16  83]         |
|   19 | Accept |      2.5756 |     0.10945 |      2.1171 |      2.1173 |         none |         true |       1.6796 |  2                    |
|   20 | Best   |      2.0906 |      8.1833 |      2.0906 |      2.0906 |         relu |         true |      0.58373 | [ 13  58  65]         |
|============================================================================================================================================|
| Iter | Eval   | Objective:  | Objective   | BestSoFar   | BestSoFar   |  Activations |  Standardize |       Lambda |            LayerSizes |
|      | result | log(1+loss) | runtime     | (observed)  | (estim.)    |              |              |              |                       |
|============================================================================================================================================|
|   21 | Accept |      2.4488 |      2.5839 |      2.0906 |       2.091 |         relu |         true |   3.4514e-06 |  26                   |
|   22 | Accept |      2.5142 |      3.9131 |      2.0906 |       2.091 |         none |         true |   3.9367e-06 |  255                  |
|   23 | Accept |      2.5142 |     0.21115 |      2.0906 |      2.0909 |         none |         true |   9.1909e-08 | [ 27  12  14]         |
|   24 | Accept |      6.3852 |     0.32065 |      2.0906 |      2.0908 |         none |         true |       91.409 | [ 27 193  71]         |
|   25 | Accept |      2.5312 |      15.695 |      2.0906 |      2.0908 |      sigmoid |        false |      0.00062 | [165  66]             |
|   26 | Accept |       2.588 |      4.0838 |      2.0906 |      2.0908 |      sigmoid |        false |     0.035987 |  100                  |
|   27 | Accept |      3.9253 |      6.7469 |      2.0906 |      2.0908 |      sigmoid |        false |       3.0045 | [  5 296]             |
|   28 | Accept |      2.1903 |      8.4032 |      2.0906 |      2.0911 |         relu |         true |       1.1661 | [  3 300 232]         |
|   29 | Accept |      2.5142 |      2.5771 |      2.0906 |      2.0912 |         none |         true |    1.636e-06 | [  1 294  27]         |
|   30 | Accept |      2.1336 |      6.7976 |      2.0906 |      2.0911 |         relu |         true |     0.039606 | [  4 299]             |

__________________________________________________________
Optimization completed.
MaxObjectiveEvaluations of 30 reached.
Total function evaluations: 30
Total elapsed time: 138.7963 seconds
Total objective function evaluation time: 129.0442

Best observed feasible point:
    Activations    Standardize    Lambda       LayerSizes  
    ___________    ___________    _______    ______________

       relu           true        0.58373    13    58    65

Observed objective function value = 2.0906
Estimated objective function value = 2.0911
Function evaluation time = 8.1833

Best estimated feasible point (according to models):
    Activations    Standardize    Lambda       LayerSizes  
    ___________    ___________    _______    ______________

       relu           true        0.58373    13    58    65

Estimated objective function value = 2.0911
Estimated function evaluation time = 8.1846

Figure contains an axes object. The axes object with title Min objective vs. Number of function evaluations, xlabel Function evaluations, ylabel Min objective contains 2 objects of type line. These objects represent Min observed objective, Estimated min objective.

Mdl = 
  RegressionNeuralNetwork
                       PredictorNames: {'Acceleration'  'Displacement'  'Horsepower'  'Model_Year'  'Origin'  'Weight'}
                         ResponseName: 'MPG'
                CategoricalPredictors: 5
                    ResponseTransform: 'none'
                      NumObservations: 314
    HyperparameterOptimizationResults: [1×1 BayesianOptimization]
                           LayerSizes: [13 58 65]
                          Activations: 'relu'
                OutputLayerActivation: 'none'
                               Solver: 'LBFGS'
                      ConvergenceInfo: [1×1 struct]
                      TrainingHistory: [1000×7 table]


  Properties, Methods

Find the mean squared error of the resulting model on the test data set.

testMSE = loss(Mdl,carsTest,"MPG")
testMSE = 7.3273

Create a neural network with low error by using the OptimizeHyperparameters argument. This argument causes fitrnet to search for hyperparameters that give a model with low cross-validation error. Use the hyperparameters function to specify larger-than-default values for the number of layers used and the layer size range.

Load the carbig data set, which contains measurements of cars made in the 1970s and early 1980s. Create a table containing the predictor variables Acceleration, Displacement, and so on, as well as the response variable MPG.

load carbig
cars = table(Acceleration,Displacement,Horsepower, ...
    Model_Year,Origin,Weight,MPG);

Delete rows of cars where the table has missing values.

cars = rmmissing(cars);

Categorize the cars based on whether they were made in the USA.

cars.Origin = categorical(cellstr(cars.Origin));
cars.Origin = mergecats(cars.Origin,["France","Japan",...
    "Germany","Sweden","Italy","England"],"NotUSA");

Partition the data into training and test sets. Use approximately 80% of the observations to train a neural network model, and 20% of the observations to test the performance of the trained model on new data. Use cvpartition to partition the data.

rng("default") % For reproducibility of the data partition
c = cvpartition(height(cars),"Holdout",0.20);
trainingIdx = training(c); % Training set indices
carsTrain = cars(trainingIdx,:);
testIdx = test(c); % Test set indices
carsTest = cars(testIdx,:);

List the hyperparameters available for this problem of fitting the MPG response.

params = hyperparameters("fitrnet",carsTrain,"MPG");
for ii = 1:length(params)
    disp(ii);disp(params(ii))
end
     1

  optimizableVariable with properties:

         Name: 'NumLayers'
        Range: [1 3]
         Type: 'integer'
    Transform: 'none'
     Optimize: 1

     2

  optimizableVariable with properties:

         Name: 'Activations'
        Range: {'relu'  'tanh'  'sigmoid'  'none'}
         Type: 'categorical'
    Transform: 'none'
     Optimize: 1

     3

  optimizableVariable with properties:

         Name: 'Standardize'
        Range: {'true'  'false'}
         Type: 'categorical'
    Transform: 'none'
     Optimize: 1

     4

  optimizableVariable with properties:

         Name: 'Lambda'
        Range: [3.1847e-08 318.4713]
         Type: 'real'
    Transform: 'log'
     Optimize: 1

     5

  optimizableVariable with properties:

         Name: 'LayerWeightsInitializer'
        Range: {'glorot'  'he'}
         Type: 'categorical'
    Transform: 'none'
     Optimize: 0

     6

  optimizableVariable with properties:

         Name: 'LayerBiasesInitializer'
        Range: {'zeros'  'ones'}
         Type: 'categorical'
    Transform: 'none'
     Optimize: 0

     7

  optimizableVariable with properties:

         Name: 'Layer_1_Size'
        Range: [1 300]
         Type: 'integer'
    Transform: 'log'
     Optimize: 1

     8

  optimizableVariable with properties:

         Name: 'Layer_2_Size'
        Range: [1 300]
         Type: 'integer'
    Transform: 'log'
     Optimize: 1

     9

  optimizableVariable with properties:

         Name: 'Layer_3_Size'
        Range: [1 300]
         Type: 'integer'
    Transform: 'log'
     Optimize: 1

    10

  optimizableVariable with properties:

         Name: 'Layer_4_Size'
        Range: [1 300]
         Type: 'integer'
    Transform: 'log'
     Optimize: 0

    11

  optimizableVariable with properties:

         Name: 'Layer_5_Size'
        Range: [1 300]
         Type: 'integer'
    Transform: 'log'
     Optimize: 0

To try more layers than the default of 1 through 3, set the range of NumLayers (optimizable variable 1) to its maximum allowable size, [1 5]. Also, set Layer_4_Size and Layer_5_Size (optimizable variables 10 and 11, respectively) to be optimized.

params(1).Range = [1 5];
params(10).Optimize = true;
params(11).Optimize = true;

Set the range of all layer sizes (optimizable variables 7 through 11) to [1 400] instead of the default [1 300].

for ii = 7:11
    params(ii).Range = [1 400];
end

Train a regression neural network using the OptimizeHyperparameters argument set to params. For reproducibility, set the AcquisitionFunctionName to "expected-improvement-plus" in a HyperparameterOptimizationOptions structure. To attempt to get a better solution, set the number of optimization steps to 60 instead of the default 30.

rng("default") % For reproducibility
Mdl = fitrnet(carsTrain,"MPG","OptimizeHyperparameters",params, ...
    "HyperparameterOptimizationOptions", ...
    struct("AcquisitionFunctionName","expected-improvement-plus", ...
    "MaxObjectiveEvaluations",60))
|============================================================================================================================================|
| Iter | Eval   | Objective:  | Objective   | BestSoFar   | BestSoFar   |  Activations |  Standardize |       Lambda |            LayerSizes |
|      | result | log(1+loss) | runtime     | (observed)  | (estim.)    |              |              |              |                       |
|============================================================================================================================================|
|    1 | Best   |      4.9294 |     0.35241 |      4.9294 |      4.9294 |      sigmoid |        false |       70.242 | [  3  22 223]         |
|    2 | Best   |       2.211 |       3.976 |       2.211 |      2.3191 |         relu |         true |     0.089397 | [  2  95]             |
|    3 | Accept |      2.7225 |      23.043 |       2.211 |      2.2929 |      sigmoid |        false |   2.5899e-07 | [303  60  59]         |
|    4 | Accept |      3.5246 |      4.4994 |       2.211 |      2.2883 |         relu |        false |   5.1748e-05 | [102   5  15   1]     |
|    5 | Accept |      2.2357 |      3.2875 |       2.211 |      2.2164 |         relu |         true |     0.095678 | [  2  68]             |
|    6 | Accept |      3.0174 |      1.3174 |       2.211 |      2.2144 |         relu |         true |    0.0031767 | [  2   1]             |
|    7 | Accept |      2.3385 |     0.64635 |       2.211 |      2.2199 |         relu |         true |     0.043248 |  2                    |
|    8 | Accept |      4.8512 |     0.52613 |       2.211 |      2.2199 |         relu |         true |        3.387 | [  2  23   5   1]     |
|    9 | Accept |      2.4583 |     0.16388 |       2.211 |      2.2236 |         relu |         true |       1.0849 | [  1  10]             |
|   10 | Accept |      3.1863 |      3.8647 |       2.211 |      2.2237 |         relu |         true |     0.061861 | [ 63   1   1 112]     |
|   11 | Accept |      3.8592 |      3.3615 |       2.211 |      2.2235 |         relu |         true |      0.20233 | [  2  45   1   4  59] |
|   12 | Accept |      3.3752 |      3.4719 |       2.211 |      2.2111 |         relu |         true |    1.556e-05 | [  4  18   1 104]     |
|   13 | Accept |      6.4116 |     0.15784 |       2.211 |      2.2198 |         relu |         true |       287.34 | [ 34 196]             |
|   14 | Accept |      2.3537 |     0.21589 |       2.211 |      2.2104 |         relu |         true |       5.3986 | [  2  12]             |
|   15 | Accept |      3.1122 |    0.077105 |       2.211 |      2.2109 |         relu |         true |       7.2543 |  1                    |
|   16 | Accept |      6.4092 |     0.11676 |       2.211 |      2.2142 |         relu |         true |       241.19 | [  1 389]             |
|   17 | Best   |      2.1517 |     0.69926 |      2.1517 |      2.1523 |         relu |         true |      0.24096 |  4                    |
|   18 | Best   |      2.1273 |     0.87139 |      2.1273 |      2.1274 |         relu |         true |      0.11077 |  5                    |
|   19 | Accept |      4.1318 |     0.14418 |      2.1273 |      2.1274 |      sigmoid |        false |   4.8026e-06 | [ 12 106  59]         |
|   20 | Accept |      2.2859 |     0.61843 |      2.1273 |      2.1269 |         relu |         true |       8.3707 | [  2   9   1   5   9] |
|============================================================================================================================================|
| Iter | Eval   | Objective:  | Objective   | BestSoFar   | BestSoFar   |  Activations |  Standardize |       Lambda |            LayerSizes |
|      | result | log(1+loss) | runtime     | (observed)  | (estim.)    |              |              |              |                       |
|============================================================================================================================================|
|   21 | Accept |      2.1981 |      11.621 |      2.1273 |      2.1265 |         relu |         true |       4.0719 | [203 124   1  62]     |
|   22 | Accept |      4.1318 |       0.117 |      2.1273 |      2.1269 |      sigmoid |        false |   5.7744e-08 | [317   4  60   1]     |
|   23 | Accept |      2.9406 |      2.6457 |      2.1273 |      2.1268 |         relu |         true |       7.2868 | [ 23   7   1 373]     |
|   24 | Accept |      5.4267 |     0.15109 |      2.1273 |      2.1276 |         relu |         true |       3.4444 | [  1 253   1]         |
|   25 | Accept |      3.5359 |      1.7515 |      2.1273 |      2.1276 |         relu |         true |       36.471 | [ 51   3 204  71]     |
|   26 | Accept |      4.1542 |      1.3619 |      2.1273 |      2.1276 |         relu |         true |       1.2334 | [  5   4   1  95]     |
|   27 | Accept |      2.3033 |      15.761 |      2.1273 |      2.1276 |         relu |         true |     0.028889 | [ 42 348]             |
|   28 | Accept |      4.1318 |    0.093199 |      2.1273 |      2.1276 |      sigmoid |        false |   5.9314e-08 | [109   9]             |
|   29 | Accept |      3.0644 |       18.95 |      2.1273 |      2.1276 |      sigmoid |        false |   3.2982e-08 | [388   3 331]         |
|   30 | Accept |      2.8076 |      4.1115 |      2.1273 |      2.1277 |         relu |         true |   0.00077627 |  183                  |
|   31 | Accept |      3.3041 |      3.4421 |      2.1273 |      2.1277 |         relu |         true |   2.1595e-05 |  116                  |
|   32 | Accept |      3.1379 |      11.325 |      2.1273 |      2.1276 |         relu |         true |   2.2732e-05 | [187  41]             |
|   33 | Accept |      3.3071 |      6.2584 |      2.1273 |      2.1277 |         relu |         true |   2.7221e-07 | [120  23]             |
|   34 | Accept |      2.2511 |      4.5188 |      2.1273 |      2.1277 |         relu |         true |       2.6888 | [  2 104 142  60]     |
|   35 | Accept |      2.3491 |      7.7419 |      2.1273 |      2.1277 |         relu |         true |       4.3755 | [  1 322 277  53]     |
|   36 | Accept |      6.3658 |      0.2106 |      2.1273 |       2.129 |         relu |         true |       60.596 | [  4  17  12  47]     |
|   37 | Accept |      2.1727 |      4.9758 |      2.1273 |      2.1291 |         relu |         true |    0.0059602 | [  2 110]             |
|   38 | Accept |      2.5005 |      28.335 |      2.1273 |      2.1288 |         relu |         true |     0.052893 | [252  99 208  55]     |
|   39 | Accept |      2.2474 |       31.45 |      2.1273 |      2.1301 |         relu |         true |        6.086 | [356 136 307  70]     |
|   40 | Best   |      2.0745 |      37.552 |      2.0745 |      2.0746 |         relu |         true |      0.55888 | [288 115 213 120]     |
|============================================================================================================================================|
| Iter | Eval   | Objective:  | Objective   | BestSoFar   | BestSoFar   |  Activations |  Standardize |       Lambda |            LayerSizes |
|      | result | log(1+loss) | runtime     | (observed)  | (estim.)    |              |              |              |                       |
|============================================================================================================================================|
|   41 | Accept |      2.0896 |      26.315 |      2.0745 |      2.0747 |         relu |         true |      0.98992 | [270  74 258  28]     |
|   42 | Accept |      4.1421 |     0.53203 |      2.0745 |      2.0746 |         relu |         true |       13.353 | [  4 376   1 149]     |
|   43 | Accept |      2.6447 |      5.5647 |      2.0745 |      2.0746 |         relu |         true |     0.026383 | [ 18 118   1  23]     |
|   44 | Accept |      2.4817 |      27.009 |      2.0745 |      2.0747 |         relu |         true |     0.013213 | [389 175]             |
|   45 | Accept |      2.3857 |      6.6975 |      2.0745 |      2.0746 |         relu |         true |    0.0012278 | [  4 386]             |
|   46 | Accept |      2.0888 |      6.0115 |      2.0745 |      2.0746 |         relu |         true |      0.12715 |  354                  |
|   47 | Accept |      4.0279 |       1.866 |      2.0745 |      2.0747 |         relu |         true |       3.1997 | [  8  46   1   7]     |
|   48 | Accept |      2.1107 |      3.9274 |      2.0745 |      2.0747 |         relu |         true |      0.87573 | [ 75   3]             |
|   49 | Accept |      2.8679 |      17.581 |      2.0745 |      2.0747 |         relu |         true |     0.014349 | [382  19   2 217]     |
|   50 | Accept |        2.12 |      31.823 |      2.0745 |      2.0748 |         relu |         true |       1.4981 | [  9 250 205 316]     |
|   51 | Accept |      2.0956 |      9.3003 |      2.0745 |      2.0749 |         relu |         true |      0.50519 | [ 13  25 234]         |
|   52 | Accept |      2.0788 |      20.963 |      2.0745 |      2.0748 |         relu |         true |      0.20245 | [ 30 340  72]         |
|   53 | Accept |      2.0793 |      15.073 |      2.0745 |      2.0749 |         relu |         true |      0.30508 | [230  27 157]         |
|   54 | Best   |      2.0571 |      14.353 |      2.0571 |      2.0572 |         relu |         true |      0.40191 | [ 58  58  83   4]     |
|   55 | Accept |      2.2477 |      5.5372 |      2.0571 |      2.0572 |         relu |         true |     0.056099 | [  8   2 166]         |
|   56 | Accept |      2.2329 |      9.6228 |      2.0571 |      2.0571 |         relu |         true |      0.15146 | [  1  46 169   9]     |
|   57 | Accept |      2.2506 |      11.931 |      2.0571 |      2.0571 |         relu |         true |    0.0068432 | [  2 263  19]         |
|   58 | Accept |      2.2439 |      6.1584 |      2.0571 |      2.0572 |         relu |         true |    0.0037586 | [  2 191   2]         |
|   59 | Accept |      6.3612 |     0.15672 |      2.0571 |      2.0572 |         relu |         true |       56.064 | [ 19   2   2   1   9] |
|   60 | Accept |      2.7839 |      5.6539 |      2.0571 |      2.0573 |         relu |         true |     0.011315 | [ 18  33  49]         |

__________________________________________________________
Optimization completed.
MaxObjectiveEvaluations of 60 reached.
Total function evaluations: 60
Total elapsed time: 494.8036 seconds
Total objective function evaluation time: 469.8626

Best observed feasible point:
    Activations    Standardize    Lambda          LayerSizes     
    ___________    ___________    _______    ____________________

       relu           true        0.40191    58    58    83     4

Observed objective function value = 2.0571
Estimated objective function value = 2.0573
Function evaluation time = 14.3527

Best estimated feasible point (according to models):
    Activations    Standardize    Lambda          LayerSizes     
    ___________    ___________    _______    ____________________

       relu           true        0.40191    58    58    83     4

Estimated objective function value = 2.0573
Estimated function evaluation time = 14.3565

Figure contains an axes object. The axes object with title Min objective vs. Number of function evaluations, xlabel Function evaluations, ylabel Min objective contains 2 objects of type line. These objects represent Min observed objective, Estimated min objective.

Mdl = 
  RegressionNeuralNetwork
                       PredictorNames: {'Acceleration'  'Displacement'  'Horsepower'  'Model_Year'  'Origin'  'Weight'}
                         ResponseName: 'MPG'
                CategoricalPredictors: 5
                    ResponseTransform: 'none'
                      NumObservations: 314
    HyperparameterOptimizationResults: [1×1 BayesianOptimization]
                           LayerSizes: [58 58 83 4]
                          Activations: 'relu'
                OutputLayerActivation: 'none'
                               Solver: 'LBFGS'
                      ConvergenceInfo: [1×1 struct]
                      TrainingHistory: [1000×7 table]


  Properties, Methods

Find the mean squared error of the resulting model on the test data set.

testMSE = loss(Mdl,carsTest,"MPG")
testMSE = 7.1939

Input Arguments

collapse all

Sample data used to train the model, specified as a table. Each row of Tbl corresponds to one observation, and each column corresponds to one predictor variable. Optionally, Tbl can contain one additional column for the response variable. Multicolumn variables and cell arrays other than cell arrays of character vectors are not allowed.

  • If Tbl contains the response variable, and you want to use all remaining variables in Tbl as predictors, then specify the response variable by using ResponseVarName.

  • If Tbl contains the response variable, and you want to use only a subset of the remaining variables in Tbl as predictors, then specify a formula by using formula.

  • If Tbl does not contain the response variable, then specify a response variable by using Y. The length of the response variable and the number of rows in Tbl must be equal.

Response variable name, specified as the name of a variable in Tbl. The response variable must be a numeric vector.

You must specify ResponseVarName as a character vector or string scalar. For example, if Tbl stores the response variable Y as Tbl.Y, then specify it as 'Y'. Otherwise, the software treats all columns of Tbl, including Y, as predictors when training the model.

Data Types: char | string

Explanatory model of the response variable and a subset of the predictor variables, specified as a character vector or string scalar in the form "Y~x1+x2+x3". In this form, Y represents the response variable, and x1, x2, and x3 represent the predictor variables.

To specify a subset of variables in Tbl as predictors for training the model, use a formula. If you specify a formula, then the software does not use any variables in Tbl that do not appear in formula.

The variable names in the formula must be both variable names in Tbl (Tbl.Properties.VariableNames) and valid MATLAB® identifiers. You can verify the variable names in Tbl by using the isvarname function. If the variable names are not valid, then you can convert them by using the matlab.lang.makeValidName function.

Data Types: char | string

Response data, specified as a numeric vector. The length of Y must be equal to the number of observations in X or Tbl.

Data Types: single | double

Predictor data used to train the model, specified as a numeric matrix.

By default, the software treats each row of X as one observation, and each column as one predictor.

The length of Y and the number of observations in X must be equal.

To specify the names of the predictors in the order of their appearance in X, use the PredictorNames name-value argument.

Note

If you orient your predictor matrix so that observations correspond to columns and specify 'ObservationsIn','columns', then you might experience a significant reduction in computation time.

Data Types: single | double

Note

The software treats NaN, empty character vector (''), empty string (""), <missing>, and <undefined> elements as missing values, and removes observations with any of these characteristics:

  • Missing value in the response (for example, Y or ValidationData{2})

  • At least one missing value in a predictor observation (for example, row in X or ValidationData{1})

  • NaN value or 0 weight (for example, value in Weights or ValidationData{3})

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: fitrnet(X,Y,'LayerSizes',[10 10],'Activations',["relu","tanh"]) specifies to create a neural network with two fully connected layers, each with 10 outputs. The first layer uses a rectified linear unit (ReLU) activation function, and the second uses a hyperbolic tangent activation function.

Neural Network Options

collapse all

Sizes of the fully connected layers in the neural network model, specified as a positive integer vector. The ith element of LayerSizes is the number of outputs in the ith fully connected layer of the neural network model.

LayerSizes does not include the size of the final fully connected layer. For more information, see Neural Network Structure.

Example: 'LayerSizes',[100 25 10]

Activation functions for the fully connected layers of the neural network model, specified as a character vector, string scalar, string array, or cell array of character vectors with values from this table.

ValueDescription
'relu'

Rectified linear unit (ReLU) function — Performs a threshold operation on each element of the input, where any value less than zero is set to zero, that is,

f(x)={x,x00,x<0

'tanh'

Hyperbolic tangent (tanh) function — Applies the tanh function to each input element

'sigmoid'

Sigmoid function — Performs the following operation on each input element:

f(x)=11+ex

'none'

Identity function — Returns each input element without performing any transformation, that is, f(x) = x

  • If you specify one activation function only, then Activations is the activation function for every fully connected layer of the neural network model, excluding the final fully connected layer (see Neural Network Structure).

  • If you specify an array of activation functions, then the ith element of Activations is the activation function for the ith layer of the neural network model.

Example: 'Activations','sigmoid'

Function to initialize the fully connected layer weights, specified as 'glorot' or 'he'.

ValueDescription
'glorot'Initialize the weights with the Glorot initializer [1] (also known as the Xavier initializer). For each layer, the Glorot initializer independently samples from a uniform distribution with zero mean and variance 2/(I+O), where I is the input size and O is the output size for the layer.
'he'Initialize the weights with the He initializer [2]. For each layer, the He initializer samples from a normal distribution with zero mean and variance 2/I, where I is the input size for the layer.

Example: 'LayerWeightsInitializer','he'

Type of initial fully connected layer biases, specified as 'zeros' or 'ones'.

  • If you specify the value 'zeros', then each fully connected layer has an initial bias of 0.

  • If you specify the value 'ones', then each fully connected layer has an initial bias of 1.

Example: 'LayerBiasesInitializer','ones'

Data Types: char | string

Predictor data observation dimension, specified as 'rows' or 'columns'.

Note

If you orient your predictor matrix so that observations correspond to columns and specify 'ObservationsIn','columns', then you might experience a significant reduction in computation time. You cannot specify 'ObservationsIn','columns' for predictor data in a table.

Example: 'ObservationsIn','columns'

Data Types: char | string

Regularization term strength, specified as a nonnegative scalar. The software composes the objective function for minimization from the mean squared error (MSE) loss function and the ridge (L2) penalty term.

Example: 'Lambda',1e-4

Data Types: single | double

Flag to standardize the predictor data, specified as a numeric or logical 0 (false) or 1 (true). If you set Standardize to true, then the software centers and scales each numeric predictor variable by the corresponding column mean and standard deviation. The software does not standardize the categorical predictors.

Example: 'Standardize',true

Data Types: single | double | logical

Convergence Control Options

collapse all

Verbosity level, specified as 0 or 1. The 'Verbose' name-value argument controls the amount of diagnostic information that fitrnet displays at the command line.

ValueDescription
0fitrnet does not display diagnostic information.
1fitrnet periodically displays diagnostic information.

By default, StoreHistory is set to true and fitrnet stores the diagnostic information inside of Mdl. Use Mdl.TrainingHistory to access the diagnostic information.

Example: 'Verbose',1

Data Types: single | double

Frequency of verbose printing, which is the number of iterations between printing to the command window, specified as a positive integer scalar. A value of 1 indicates to print diagnostic information at every iteration.

Note

To use this name-value argument, set Verbose to 1.

Example: 'VerboseFrequency',5

Data Types: single | double

Flag to store the training history, specified as a numeric or logical 0 (false) or 1 (true). If StoreHistory is set to true, then the software stores diagnostic information inside of Mdl, which you can access by using Mdl.TrainingHistory.

Example: 'StoreHistory',false

Data Types: single | double | logical

Initial step size, specified as a positive scalar or 'auto'. By default, fitrnet does not use the initial step size to determine the initial Hessian approximation used in training the model (see Training Solver). However, if you specify an initial step size s0, then the initial inverse-Hessian approximation is s00I. 0 is the initial gradient vector, and I is the identity matrix.

To have fitrnet determine an initial step size automatically, specify the value as 'auto'. In this case, the function determines the initial step size by using s0=0.5η0+0.1. s0 is the initial step vector, and η0 is the vector of unconstrained initial weights and biases.

Example: 'InitialStepSize','auto'

Data Types: single | double | char | string

Maximum number of training iterations, specified as a positive integer scalar.

The software returns a trained model regardless of whether the training routine successfully converges. Mdl.ConvergenceInfo contains convergence information.

Example: 'IterationLimit',1e8

Data Types: single | double

Relative gradient tolerance, specified as a nonnegative scalar.

Let t be the loss function at training iteration t, t be the gradient of the loss function with respect to the weights and biases at iteration t, and 0 be the gradient of the loss function at an initial point. If max|t|aGradientTolerance, where a=max(1,min|t|,max|0|), then the training process terminates.

Example: 'GradientTolerance',1e-5

Data Types: single | double

Loss tolerance, specified as a nonnegative scalar.

If the function loss at some iteration is smaller than LossTolerance, then the training process terminates.

Example: 'LossTolerance',1e-8

Data Types: single | double

Step size tolerance, specified as a nonnegative scalar.

If the step size at some iteration is smaller than StepTolerance, then the training process terminates.

Example: 'StepTolerance',1e-4

Data Types: single | double

Validation data for training convergence detection, specified as a cell array or table.

During the training process, the software periodically estimates the validation loss by using ValidationData. If the validation loss increases more than ValidationPatience times in a row, then the software terminates the training.

You can specify ValidationData as a table if you use a table Tbl of predictor data that contains the response variable. In this case, ValidationData must contain the same predictors and response contained in Tbl. The software does not apply weights to observations, even if Tbl contains a vector of weights. To specify weights, you must specify ValidationData as a cell array.

If you specify ValidationData as a cell array, then it must have the following format:

  • ValidationData{1} must have the same data type and orientation as the predictor data. That is, if you use a predictor matrix X, then ValidationData{1} must be an m-by-p or p-by-m matrix of predictor data that has the same orientation as X. The predictor variables in the training data X and ValidationData{1} must correspond. Similarly, if you use a predictor table Tbl of predictor data, then ValidationData{1} must be a table containing the same predictor variables contained in Tbl. The number of observations in ValidationData{1} and the predictor data can vary.

  • ValidationData{2} must match the data type and format of the response variable, either Y or ResponseVarName. If ValidationData{2} is an array of responses, then it must have the same number of elements as the number of observations in ValidationData{1}. If ValidationData{1} is a table, then ValidationData{2} can be the name of the response variable in the table. If you want to use the same ResponseVarName or formula, you can specify ValidationData{2} as [].

  • Optionally, you can specify ValidationData{3} as an m-dimensional numeric vector of observation weights or the name of a variable in the table ValidationData{1} that contains observation weights. The software normalizes the weights with the validation data so that they sum to 1.

If you specify ValidationData and want to display the validation loss at the command line, set Verbose to 1.

Number of iterations between validation evaluations, specified as a positive integer scalar. A value of 1 indicates to evaluate validation metrics at every iteration.

Note

To use this name-value argument, you must specify ValidationData.

Example: 'ValidationFrequency',5

Data Types: single | double

Stopping condition for validation evaluations, specified as a nonnegative integer scalar. Training stops if the validation loss is greater than or equal to the minimum validation loss computed so far, ValidationPatience times in a row. You can check the Mdl.TrainingHistory table to see the running total of times that the validation loss is greater than or equal to the minimum (Validation Checks).

Example: 'ValidationPatience',10

Data Types: single | double

Other Regression Options

collapse all

Categorical predictors list, specified as one of the values in this table. The descriptions assume that the predictor data has observations in rows and predictors in columns.

ValueDescription
Vector of positive integers

Each entry in the vector is an index value indicating that the corresponding predictor is categorical. The index values are between 1 and p, where p is the number of predictors used to train the model.

If fitrnet uses a subset of input variables as predictors, then the function indexes the predictors using only the subset. The CategoricalPredictors values do not count the response variable, observation weights variable, or any other variables that the function does not use.

Logical vector

A true entry means that the corresponding predictor is categorical. The length of the vector is p.

Character matrixEach row of the matrix is the name of a predictor variable. The names must match the entries in PredictorNames. Pad the names with extra blanks so each row of the character matrix has the same length.
String array or cell array of character vectorsEach element in the array is the name of a predictor variable. The names must match the entries in PredictorNames.
"all"All predictors are categorical.

By default, if the predictor data is in a table (Tbl), fitrnet assumes that a variable is categorical if it is a logical vector, categorical vector, character array, string array, or cell array of character vectors. If the predictor data is a matrix (X), fitrnet assumes that all predictors are continuous. To identify any other predictors as categorical predictors, specify them by using the CategoricalPredictors name-value argument.

For the identified categorical predictors, fitrnet creates dummy variables using two different schemes, depending on whether a categorical variable is unordered or ordered. For an unordered categorical variable, fitrnet creates one dummy variable for each level of the categorical variable. For an ordered categorical variable, fitrnet creates one less dummy variable than the number of categories. For details, see Automatic Creation of Dummy Variables.

Example: 'CategoricalPredictors','all'

Data Types: single | double | logical | char | string | cell

Predictor variable names, specified as a string array of unique names or cell array of unique character vectors. The functionality of 'PredictorNames' depends on the way you supply the training data.

  • If you supply X and Y, then you can use 'PredictorNames' to assign names to the predictor variables in X.

    • The order of the names in PredictorNames must correspond to the predictor order in X. Assuming that X has the default orientation, with observations in rows and predictors in columns, PredictorNames{1} is the name of X(:,1), PredictorNames{2} is the name of X(:,2), and so on. Also, size(X,2) and numel(PredictorNames) must be equal.

    • By default, PredictorNames is {'x1','x2',...}.

  • If you supply Tbl, then you can use 'PredictorNames' to choose which predictor variables to use in training. That is, fitrnet uses only the predictor variables in PredictorNames and the response variable during training.

    • PredictorNames must be a subset of Tbl.Properties.VariableNames and cannot include the name of the response variable.

    • By default, PredictorNames contains the names of all predictor variables.

    • A good practice is to specify the predictors for training using either 'PredictorNames' or formula, but not both.

Example: 'PredictorNames',{'SepalLength','SepalWidth','PetalLength','PetalWidth'}

Data Types: string | cell

Response variable name, specified as a character vector or string scalar.

  • If you supply Y, then you can use ResponseName to specify a name for the response variable.

  • If you supply ResponseVarName or formula, then you cannot use ResponseName.

Example: "ResponseName","response"

Data Types: char | string

Observation weights, specified as a nonnegative numeric vector or the name of a variable in Tbl. The software weights each observation in X or Tbl with the corresponding value in Weights. The length of Weights must equal the number of observations in X or Tbl.

If you specify the input data as a table Tbl, then Weights can be the name of a variable in Tbl that contains a numeric vector. In this case, you must specify Weights as a character vector or string scalar. For example, if weights vector W is stored as Tbl.W, then specify it as 'W'. Otherwise, the software treats all columns of Tbl, including W, as predictors when training the model.

By default, Weights is ones(n,1), where n is the number of observations in X or Tbl.

fitrnet normalizes the weights to sum to 1.

Data Types: single | double | char | string

Note

You cannot use any cross-validation name-value argument together with the 'OptimizeHyperparameters' name-value argument. You can modify the cross-validation for 'OptimizeHyperparameters' only by using the 'HyperparameterOptimizationOptions' name-value argument.

Cross-Validation Options

collapse all

Flag to train a cross-validated model, specified as 'on' or 'off'.

If you specify 'on', then the software trains a cross-validated model with 10 folds.

You can override this cross-validation setting using the CVPartition, Holdout, KFold, or Leaveout name-value argument. You can use only one cross-validation name-value argument at a time to create a cross-validated model.

Alternatively, cross-validate later by passing Mdl to crossval.

Example: 'Crossval','on'

Data Types: char | string

Cross-validation partition, specified as a cvpartition object that specifies the type of cross-validation and the indexing for the training and validation sets.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout.

Example: Suppose you create a random partition for 5-fold cross-validation on 500 observations by using cvp = cvpartition(500,KFold=5). Then, you can specify the cross-validation partition by setting CVPartition=cvp.

Fraction of the data used for holdout validation, specified as a scalar value in the range [0,1]. If you specify Holdout=p, then the software completes these steps:

  1. Randomly select and reserve p*100% of the data as validation data, and train the model using the rest of the data.

  2. Store the compact trained model in the Trained property of the cross-validated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout.

Example: Holdout=0.1

Data Types: double | single

Number of folds to use in the cross-validated model, specified as a positive integer value greater than 1. If you specify KFold=k, then the software completes these steps:

  1. Randomly partition the data into k sets.

  2. For each set, reserve the set as validation data, and train the model using the other k – 1 sets.

  3. Store the k compact trained models in a k-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout.

Example: KFold=5

Data Types: single | double

Leave-one-out cross-validation flag, specified as "on" or "off". If you specify Leaveout="on", then for each of the n observations (where n is the number of observations, excluding missing observations, specified in the NumObservations property of the model), the software completes these steps:

  1. Reserve the one observation as validation data, and train the model using the other n – 1 observations.

  2. Store the n compact trained models in an n-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout.

Example: Leaveout="on"

Data Types: char | string

Hyperparameter Optimization Options

collapse all

Parameters to optimize, specified as one of the following:

  • 'none' — Do not optimize.

  • 'auto' — Use {'Activations','Lambda','LayerSizes','Standardize'}.

  • 'all' — Optimize all eligible parameters.

  • String array or cell array of eligible parameter names.

  • Vector of optimizableVariable objects, typically the output of hyperparameters.

The optimization attempts to minimize the cross-validation loss (error) for fitrnet by varying the parameters. To control the cross-validation type and other aspects of the optimization, use the HyperparameterOptimizationOptions name-value argument.

Note

The values of OptimizeHyperparameters override any values you specify using other name-value arguments. For example, setting OptimizeHyperparameters to "auto" causes fitrnet to optimize hyperparameters corresponding to the "auto" option and to ignore any specified values for the hyperparameters.

The eligible parameters for fitrnet are:

  • Activationsfitrnet optimizes Activations over the set {'relu','tanh','sigmoid','none'}.

  • Lambdafitrnet optimizes Lambda over continuous values in the range [1e-5,1e5]/NumObservations, where the value is chosen uniformly in the log transformed range.

  • LayerBiasesInitializerfitrnet optimizes LayerBiasesInitializer over the two values {'zeros','ones'}.

  • LayerWeightsInitializerfitrnet optimizes LayerWeightsInitializer over the two values {'glorot','he'}.

  • LayerSizesfitrnet optimizes over the three values 1, 2, and 3 fully connected layers, excluding the final fully connected layer. fitrnet optimizes each fully connected layer separately over 1 through 300 sizes in the layer, sampled on a logarithmic scale.

    Note

    When you use the LayerSizes argument, the iterative display shows the size of each relevant layer. For example, if the current number of fully connected layers is 3, and the three layers are of sizes 10, 79, and 44 respectively, the iterative display shows LayerSizes for that iteration as [10 79 44].

    Note

    To access up to five fully connected layers or a different range of sizes in a layer, use hyperparameters to select the optimizable parameters and ranges.

  • Standardizefitrnet optimizes Standardize over the two values {true,false}.

Set nondefault parameters by passing a vector of optimizableVariable objects that have nondefault values. As an example, this code sets the range of NumLayers to [1 5] and optimizes Layer_4_Size and Layer_5_Size:

load carsmall
params = hyperparameters('fitrtree',[Horsepower,Weight],MPG);
params(1).Range = [1 5];
params(10).Optimize = true;
params(11).Optimize = true;

Pass params as the value of OptimizeHyperparameters. For an example, see Custom Hyperparameter Optimization in Neural Network.

By default, the iterative display appears at the command line, and plots appear according to the number of hyperparameters in the optimization. For the optimization and plots, the objective function is log(1 + cross-validation loss). To control the iterative display, set the Verbose field of the 'HyperparameterOptimizationOptions' name-value argument. To control the plots, set the ShowPlots field of the 'HyperparameterOptimizationOptions' name-value argument.

For an example, see Minimize Cross-Validation Error in Neural Network.

Example: 'OptimizeHyperparameters','auto'

Options for optimization, specified as a structure. This argument modifies the effect of the OptimizeHyperparameters name-value argument. All fields in the structure are optional.

Field NameValuesDefault
Optimizer
  • 'bayesopt' — Use Bayesian optimization. Internally, this setting calls bayesopt.

  • 'gridsearch' — Use grid search with NumGridDivisions values per dimension.

  • 'randomsearch' — Search at random among MaxObjectiveEvaluations points.

'gridsearch' searches in a random order, using uniform sampling without replacement from the grid. After optimization, you can get a table in grid order by using the command sortrows(Mdl.HyperparameterOptimizationResults).

'bayesopt'
AcquisitionFunctionName

  • 'expected-improvement-per-second-plus'

  • 'expected-improvement'

  • 'expected-improvement-plus'

  • 'expected-improvement-per-second'

  • 'lower-confidence-bound'

  • 'probability-of-improvement'

Acquisition functions whose names include per-second do not yield reproducible results because the optimization depends on the runtime of the objective function. Acquisition functions whose names include plus modify their behavior when they are overexploiting an area. For more details, see Acquisition Function Types.

'expected-improvement-per-second-plus'
MaxObjectiveEvaluationsMaximum number of objective function evaluations.30 for 'bayesopt' and 'randomsearch', and the entire grid for 'gridsearch'
MaxTime

Time limit, specified as a positive real scalar. The time limit is in seconds, as measured by tic and toc. The run time can exceed MaxTime because MaxTime does not interrupt function evaluations.

Inf
NumGridDivisionsFor 'gridsearch', the number of values in each dimension. The value can be a vector of positive integers giving the number of values for each dimension, or a scalar that applies to all dimensions. This field is ignored for categorical variables.10
ShowPlotsLogical value indicating whether to show plots. If true, this field plots the best observed objective function value against the iteration number. If you use Bayesian optimization (Optimizer is 'bayesopt'), then this field also plots the best estimated objective function value. The best observed objective function values and best estimated objective function values correspond to the values in the BestSoFar (observed) and BestSoFar (estim.) columns of the iterative display, respectively. You can find these values in the properties ObjectiveMinimumTrace and EstimatedObjectiveMinimumTrace of Mdl.HyperparameterOptimizationResults. If the problem includes one or two optimization parameters for Bayesian optimization, then ShowPlots also plots a model of the objective function against the parameters.true
SaveIntermediateResultsLogical value indicating whether to save results when Optimizer is 'bayesopt'. If true, this field overwrites a workspace variable named 'BayesoptResults' at each iteration. The variable is a BayesianOptimization object.false
Verbose

Display at the command line:

  • 0 — No iterative display

  • 1 — Iterative display

  • 2 — Iterative display with extra information

For details, see the bayesopt Verbose name-value argument and the example Optimize Classifier Fit Using Bayesian Optimization.

1
UseParallelLogical value indicating whether to run Bayesian optimization in parallel, which requires Parallel Computing Toolbox™. Due to the nonreproducibility of parallel timing, parallel Bayesian optimization does not necessarily yield reproducible results. For details, see Parallel Bayesian Optimization.false
Repartition

Logical value indicating whether to repartition the cross-validation at every iteration. If this field is false, the optimizer uses a single partition for the optimization.

The setting true usually gives the most robust results because it takes partitioning noise into account. However, for good results, true requires at least twice as many function evaluations.

false
Use no more than one of the following three options.
CVPartitionA cvpartition object, as created by cvpartition'Kfold',5 if you do not specify a cross-validation field
HoldoutA scalar in the range (0,1) representing the holdout fraction
KfoldAn integer greater than 1

Example: 'HyperparameterOptimizationOptions',struct('MaxObjectiveEvaluations',60)

Data Types: struct

Output Arguments

collapse all

Trained neural network regression model, returned as a RegressionNeuralNetwork or RegressionPartitionedNeuralNetwork object.

If you set any of the name-value arguments CrossVal, CVPartition, Holdout, KFold, or Leaveout, then Mdl is a RegressionPartitionedNeuralNetwork object. Otherwise, Mdl is a RegressionNeuralNetwork model.

To reference properties of Mdl, use dot notation.

More About

collapse all

Neural Network Structure

The default neural network regression model has the following layer structure.

StructureDescription

Default neural network regression model structure, with one customizable fully connected layer with a ReLU activation

Input — This layer corresponds to the predictor data in Tbl or X.

First fully connected layer — This layer has 10 outputs by default.

  • You can widen the layer or add more fully connected layers to the network by specifying the LayerSizes name-value argument.

  • You can find the weights and biases for this layer in the Mdl.LayerWeights{1} and Mdl.LayerBiases{1} properties of Mdl, respectively.

ReLU activation function — fitrnet applies this activation function to the first fully connected layer.

  • You can change the activation function by specifying the Activations name-value argument.

Final fully connected layer — This layer has one output.

  • You can find the weights and biases for this layer in the Mdl.LayerWeights{end} and Mdl.LayerBiases{end} properties of Mdl, respectively.

Output — This layer corresponds to the predicted response values.

For an example that shows how a regression neural network model with this layer structure returns predictions, see Predict Using Layer Structure of Regression Neural Network Model.

Tips

  • Always try to standardize the numeric predictors (see Standardize). Standardization makes predictors insensitive to the scales on which they are measured.

  • After training a model, you can generate C/C++ code that predicts responses for new data. Generating C/C++ code requires MATLAB Coder™. For details, see Introduction to Code Generation.

Algorithms

collapse all

Training Solver

fitrnet uses a limited-memory Broyden-Fletcher-Goldfarb-Shanno quasi-Newton algorithm (LBFGS) [3] as its loss function minimization technique, where the software minimizes the mean squared error (MSE). The LBFGS solver uses a standard line-search method with an approximation to the Hessian.

References

[1] Glorot, Xavier, and Yoshua Bengio. “Understanding the difficulty of training deep feedforward neural networks.” In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp. 249–256. 2010.

[2] He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification.” In Proceedings of the IEEE international conference on computer vision, pp. 1026–1034. 2015.

[3] Nocedal, J. and S. J. Wright. Numerical Optimization, 2nd ed., New York: Springer, 2006.

Extended Capabilities

Version History

Introduced in R2021a

expand all