# quantileError

Class: TreeBagger

Quantile loss using bag of regression trees

## Syntax

``err = quantileError(Mdl,X)``
``err = quantileError(Mdl,X,ResponseVarName)``
``err = quantileError(Mdl,X,Y)``
``err = quantileError(___,Name,Value)``

## Description

example

````err = quantileError(Mdl,X)` returns half of the mean absolute deviation (MAD) from comparing the true responses in the table `X` to the predicted medians resulting from applying the bag of regression trees `Mdl` to the observations of the predictor data in `X`. `Mdl` must be a `TreeBagger` model object.The response variable name in `X` must have the same name as the response variable in the table containing the training data. ```

example

````err = quantileError(Mdl,X,ResponseVarName)` uses the true response and predictor variables contained in the table `X`. `ResponseVarName` is the name of the response variable and `Mdl.PredictorNames` contain the names of the predictor variables.```

example

````err = quantileError(Mdl,X,Y)` uses the predictor data in the table or matrix `X` and the response data in the vector `Y`.```

example

````err = quantileError(___,Name,Value)` uses any of the previous syntaxes and additional options specified by one or more `Name,Value` pair arguments. For example, specify quantile probabilities, the error type, or which trees to include in the quantile-regression-error estimation.```

## Input Arguments

expand all

Bag of regression trees, specified as a `TreeBagger` model object created by `TreeBagger`. The value of `Mdl.Method` must be `regression`.

Sample data used to estimate quantiles, specified as a numeric matrix or table.

Each row of `X` corresponds to one observation, and each column corresponds to one variable. If you specify `Y`, then the number of rows in `X` must be equal to the length of `Y`.

• For a numeric matrix:

• The variables making up the columns of `X` must have the same order as the predictor variables that trained `Mdl` (stored in `Mdl.PredictorNames`).

• If you trained `Mdl` using a table (for example, `Tbl`), then `X` can be a numeric matrix if `Tbl` contains all numeric predictor variables. If `Tbl` contains heterogeneous predictor variables (for example, numeric and categorical data types), then `quantileError` throws an error.

• Specify `Y` for the true responses.

• For a table:

• `quantileError` does not support multi-column variables and cell arrays other than cell arrays of character vectors.

• If you trained `Mdl` using a table (for example, `Tbl`), then all predictor variables in `X` must have the same variable names and data types as those variables that trained `Mdl` (stored in `Mdl.PredictorNames`). However, the column order of `X` does not need to correspond to the column order of `Tbl`. `Tbl` and `X` can contain additional variables (response variables, observation weights, etc.).

• If you trained `Mdl` using a numeric matrix, then the predictor names in `Mdl.PredictorNames` and corresponding predictor variable names in `X` must be the same. To specify predictor names during training, see the `PredictorNames` name-value pair argument of `TreeBagger`. All predictor variables in `X` must be numeric vectors. `X` can contain additional variables (response variables, observation weights, etc.).

• If `X` contains the response variable:

• If the response variable has the same name as the response variable that trained `Mdl`, then you do not have to supply the response variable name or vector of true responses. `quantileError` uses that variable for the true responses by default.

• You can specify `ResponseVarName` or `Y` for the true responses.

Data Types: `table` | `double` | `single`

Response variable name, specified as a character vector or string scalar. `ResponseVarName` must be the name of the response variable in the table of sample data `X`.

If the table `X` contains the response variable, and it has the same name as the response variable used to train `Mdl`, then you do not have to specify `ResponseVarName`. `quantileError` uses that variable for the true responses by default.

Data Types: `char` | `string`

True responses, specified as a numeric vector. The number of rows in `X` must be equal to the length of `Y`.

Data Types: `double` | `single`

### Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside quotes. You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Ensemble error type, specified as the comma-separated pair consisting of `'Mode'` and a value in this table. Suppose `tau` is the value of `Quantile`.

ValueDescription
`'cumulative'`

`err` is a `Mdl.NumTrees`-by-`numel(tau)` numeric matrix of cumulative quantile regression errors. `err(j,k)` is the `tau(k)` quantile regression error using the learners in `Mdl.Trees(1:j)` only.

`'ensemble'`

`err` is a 1-by-`numel(tau)` numeric vector of cumulative quantile regression errors for the entire ensemble. `err(k)` is the `tau(k)` ensemble quantile regression error.

`'individual'`

`err` is a `Mdl.NumTrees`-by-`numel(tau)` numeric matrix of quantile regression errors from individual learners. `err(j,k)` is the `tau(k)` quantile regression error using the learner in `Mdl.Trees(j)` only.

For `'cumulative'` and `'individual'`, if you include fewer trees in quantile estimation using `Trees` or `UseInstanceForTree`, then the number of rows in `err` decreases from `Mdl.NumTrees`.

Example: `'Mode','cumulative'`

Observation weights, specified as the comma-separated pair consisting of `'Weights'` and a numeric vector of positive values with length equal to `size(X,1)`. `quantileError` uses `Weights` to compute the weighted average of the deviations when estimating the quantile regression error.

By default, `quantileError` attributes a weight of `1` to each observation, which yields an unweighted average of the deviations.

Quantile probability, specified as the comma-separated pair consisting of `'Quantile'` and a numeric vector containing values in the interval [0,1]. For each element in `Quantile`, `quantileError` returns corresponding quantile regression errors for all probabilities in `Quantile`.

Example: `'Quantile',[0 0.25 0.5 0.75 1]`

Data Types: `single` | `double`

Indices of trees to use in response estimation, specified as the comma-separated pair consisting of `'Trees'` and `'all'` or a numeric vector of positive integers. Indices correspond to the cells of `Mdl.Trees`; each cell therein contains a tree in the ensemble. The maximum value of `Trees` must be less than or equal to the number of trees in the ensemble (`Mdl.NumTrees`).

For `'all'`, `quantileError` uses all trees in the ensemble (that is, the indices `1:Mdl.NumTrees`).

Values other than the default can affect the number of rows in `err`.

Example: `'Trees',[1 10 Mdl.NumTrees]`

Data Types: `char` | `string` | `single` | `double`

Weights to attribute to responses from individual trees, specified as the comma-separated pair consisting of `'TreeWeights'` and a numeric vector of `numel(trees)` nonnegative values. `trees` is the value of `Trees`.

If you specify `'Mode','individual'`, then `quantileError` ignores `TreeWeights`.

Data Types: `single` | `double`

Indicators specifying which trees to use to make predictions for each observation, specified as the comma-separated pair consisting of `'UseInstanceForTree'` and an n-by-`Mdl.Trees` logical matrix. n is the number of observations (rows) in `X`. Rows of `UseInstanceForTree` correspond to observations and columns correspond to learners in `Mdl.Trees`. `'all'` indicates to use all trees for all observations when estimating the quantiles.

If `UseInstanceForTree(j,k)` = `true`, then `quantileError` uses the tree in `Mdl.Trees(k)` when it predicts the response for the observation `X(j,:)`.

You can estimate quantiles using the response data in `Mdl.Y` directly instead of using the predictions from the random forest by specifying a row composed entirely of `false` values. For example, to estimate the quantile for observation `j` using the response data, and to use the predictions from the random forest for all other observations, specify this matrix:

```UseInstanceForTree = true(size(Mdl.X,2),Mdl.NumTrees); UseInstanceForTree(j,:) = false(1,Mdl.NumTrees); ```

Values other than the default can affect the number of rows in `err`. Also, the value of `Trees` affects the value of `UseInstanceForTree`. Suppose that `U` is the value of `UseInstanceForTree`. `quantileError` ignores the columns of `U` corresponding to trees not being used in estimation from the specification of `Trees`. That is, `quantileError` resets the value of `'UseInstanceForTree'` to `U(:,trees)`, where `trees` is the value of `'Trees'`.

Data Types: `char` | `string` | `logical`

## Output Arguments

expand all

Half of the quantile regression error, returned as a numeric scalar or `T`-by-`numel(tau)` matrix. `tau` is the value of `Quantile`.

`T` depends on the values of `Mode`, `Trees`, `UseInstanceForTree`, and `Quantile`. Suppose that you specify `'Trees',trees` and you use the default value of `'UseInstanceForTree'`.

• For `'Mode','cumulative'`, `err` is a `numel(trees)`-by-`numel(tau)` numeric matrix. `err(j,k)` is the `tau(k)` cumulative quantile regression error using the learners in `Mdl.Trees(trees(1:j))`.

• For `'Mode','ensemble'`, `err` is a `1`-by-`numel(tau)` numeric vector. `err(k)` is the `tau(k)` cumulative quantile regression error using the learners in `Mdl.Trees(trees)`.

• For `'Mode','individual'`, `err` is a `numel(trees)`-by-`numel(tau)` numeric matrix. `err(j,k)` is the `tau(k)` quantile regression error using the learner in `Mdl.Trees(trees(j))`.

## Examples

expand all

Load the `carsmall` data set. Consider a model that predicts the fuel economy of a car given its engine displacement, weight, and number of cylinders. Consider `Cylinders` a categorical variable.

```load carsmall Cylinders = categorical(Cylinders); X = table(Displacement,Weight,Cylinders,MPG);```

Train an ensemble of bagged regression trees using the entire data set. Specify 100 weak learners.

```rng(1); % For reproducibility Mdl = TreeBagger(100,X,'MPG','Method','regression');```

`Mdl` is a `TreeBagger` ensemble.

Perform quantile regression, and estimate the MAD of the entire ensemble using the predicted conditional medians.

`err = quantileError(Mdl,X)`
```err = 1.2339 ```

Because `X` is a table containing the response and commensurate variable names, you do not have to specify the response variable name or data. However, you can specify the response using this syntax.

`err = quantileError(Mdl,X,'MPG')`
```err = 1.2339 ```

Load the `carsmall` data set. Consider a model that predicts the fuel economy of a car given its engine displacement, weight, and number of cylinders.

```load carsmall X = table(Displacement,Weight,Cylinders,MPG);```

Randomly split the data into two sets: 75% training and 25% testing. Extract the subset indices.

```rng(1); % For reproducibility cvp = cvpartition(size(X,1),'Holdout',0.25); idxTrn = training(cvp); idxTest = test(cvp);```

Train an ensemble of bagged regression trees using the training set. Specify 250 weak learners.

`Mdl = TreeBagger(250,X(idxTrn,:),'MPG','Method','regression');`

Estimate the cumulative 0.25, 0.5, and 0.75 quantile regression errors for the test set. Pass the predictor data in as a numeric matrix, and the response data in as a vector.

```err = quantileError(Mdl,X{idxTest,1:3},MPG(idxTest),'Quantile',[0.25 0.5 0.75],... 'Mode','cumulative');```

`err` is a 250-by-3 matrix of cumulative quantile regression errors. Columns correspond to quantile probabilities and rows correspond to trees in the ensemble. The errors are cumulative, so they incorporate aggregated predictions from previous trees. Although, `Mdl` was trained using a table, if all predictor variables in the table are numeric, then you can supply a matrix of predictor data instead.

Plot the cumulative quantile errors on the same plot.

```figure; plot(err); legend('0.25 quantile error','0.5 quantile error','0.75 quantile error'); ylabel('Quantile error'); xlabel('Tree index'); title('Cumulative Quantile Regression Error')```

Training using about 60 trees appears to be enough for the first two quartiles, but the third quartile requires about 150 trees.

expand all

## Tips

• To tune the number of trees in the ensemble, set `'Mode','cumulative'` and plot the quantile regression errors with respect to tree indices. The maximal number of required trees is the tree index where the quantile regression error appears to level off.

• To investigate the performance of a model when the training sample is small, use `oobQuantileError` instead.

## References

[1] Breiman, L. Random Forests. Machine Learning 45, pp. 5–32, 2001.

[2] Meinshausen, N. “Quantile Regression Forests.” Journal of Machine Learning Research, Vol. 7, 2006, pp. 983–999.