# predictorImportance

Estimates of predictor importance for regression ensemble

## Syntax

```imp = predictorImportance(ens) [imp,ma] = predictorImportance(ens) ```

## Description

`imp = predictorImportance(ens)` computes estimates of predictor importance for `ens` by summing these estimates over all weak learners in the ensemble. `imp` has one element for each input predictor in the data used to train this ensemble. A high value indicates that this predictor is important for `ens`.

```[imp,ma] = predictorImportance(ens)``` returns a `P`-by-`P` matrix with predictive measures of association for `P` predictors.

## Input Arguments

 `ens` A regression ensemble, created by `fitrensemble`, or by the `compact` method.

## Output Arguments

 `imp` A row vector with the same number of elements as the number of predictors (columns) in `ens``.X`. The entries are the estimates of predictor importance, with `0` representing the smallest possible importance. `ma` A `P`-by-`P` matrix of predictive measures of association for `P` predictors. Element `ma(I,J)` is the predictive measure of association averaged over surrogate splits on predictor `J` for which predictor `I` is the optimal split predictor. `predictorImportance` averages this predictive measure of association over all trees in the ensemble.

## Examples

expand all

Estimate the predictor importance for all predictor variables in the data.

Load the `carsmall` data set.

`load carsmall`

Grow an ensemble of 100 regression trees for `MPG` using `Acceleration`, `Cylinders`, `Displacement`, `Horsepower`, `Model_Year`, and `Weight` as predictors. Specify tree stumps as the weak learners.

```X = [Acceleration Cylinders Displacement Horsepower Model_Year Weight]; t = templateTree('MaxNumSplits',1); ens = fitrensemble(X,MPG,'Method','LSBoost','Learners',t);```

Estimate the predictor importance for all predictor variables.

`imp = predictorImportance(ens)`
```imp = 1×6 0.0150 0 0.0066 0.1111 0.0437 0.5181 ```

`Weight`, the last predictor, has the most impact on mileage. The second predictor has importance 0, which means that the number of cylinders has no impact on predictions made with `ens`.

Estimate the predictor importance for all variables in the data and where the regression tree ensemble contains surrogate splits.

Load the `carsmall` data set.

`load carsmall`

Grow an ensemble of 100 regression trees for `MPG` using `Acceleration`, `Cylinders`, `Displacement`, `Horsepower`, `Model_Year`, and `Weight` as predictors. Specify tree stumps as the weak learners, and also identify surrogate splits.

```X = [Acceleration Cylinders Displacement Horsepower Model_Year Weight]; t = templateTree('MaxNumSplits',1,'Surrogate','on'); ens = fitrensemble(X,MPG,'Method','LSBoost','Learners',t);```

Estimate the predictor importance and predictive measures of association for all predictor variables.

`[imp,ma] = predictorImportance(ens)`
```imp = 1×6 0.2141 0.3798 0.4369 0.6498 0.3728 0.5700 ```
```ma = 6×6 1.0000 0.0098 0.0102 0.0098 0.0033 0.0067 0 1.0000 0 0 0 0 0.0056 0.0084 1.0000 0.0078 0.0022 0.0084 0.3537 0.4769 0.5834 1.0000 0.1612 0.5827 0.0061 0.0070 0.0063 0.0064 1.0000 0.0056 0.0154 0.0296 0.0533 0.0447 0.0070 1.0000 ```

Comparing `imp` to the results in Estimate Predictor Importance, `Horsepower` has the greatest impact on mileage, with `Weight` having the second greatest impact.

Element `ma(i,j)` is the predictive measure of association averaged over surrogate splits on predictor `j` for which predictor `i` is the optimal split predictor. This average is computed by summing positive values of the predictive measure of association over optimal splits on predictor `i` and surrogate splits on predictor `j` and dividing by the total number of optimal splits on predictor `i`, including splits for which the predictive measure of association between predictors `i` and `j` is negative.