# surrogateAssociation

Mean predictive measure of association for surrogate splits in regression tree

## Syntax

```ma = surrogateAssociation(tree) ma = surrogateAssociation(tree,N) ```

## Description

`ma = surrogateAssociation(tree)` returns a matrix of predictive measures of association for the predictors in `tree`.

`ma = surrogateAssociation(tree,N)` returns a matrix of predictive measures of association averaged over the nodes in vector `N`.

## Input Arguments

 `tree` A regression tree constructed with `fitrtree`, or a compact regression tree constructed with `compact`. `N` Vector of node numbers in `tree`.

## Output Arguments

 `ma` `ma = surrogateAssociation(tree)` returns a `P`-by-`P` matrix, where `P` is the number of predictors in `tree`. `ma(i,j)` is the predictive measure of association between the optimal split on variable `i` and a surrogate split on variable `j`. For more details, see Algorithms.`ma = surrogateAssociation(tree,N)` returns a `P`-by-`P` representing the predictive measure of association between variables averaged over nodes in the vector `N`. `N` contains node numbers from `1` to `max(tree.NumNodes)`.

## Examples

expand all

Load the `carsmall` data set. Specify `Displacement`, `Horsepower`, and `Weight` as predictor variables.

```load carsmall X = [Displacement Horsepower Weight];```

Grow a regression tree using `MPG` as the response. Specify to use surrogate splits for missing values.

`tree = fitrtree(X,MPG,'surrogate','on');`

Find the mean predictive measure of association between the predictor variables.

`ma = surrogateAssociation(tree)`
```ma = 3×3 1.0000 0.2167 0.5083 0.4521 1.0000 0.3769 0.2540 0.2659 1.0000 ```

Find the mean predictive measure of association averaged over the odd-numbered nodes in `tree`.

```N = 1:2:tree.NumNodes; ma = surrogateAssociation(tree,N)```
```ma = 3×3 1.0000 0.1250 0.6875 0.5632 1.0000 0.5861 0.3333 0.3148 1.0000 ```

expand all

## Algorithms

Element `ma(i,j)` is the predictive measure of association averaged over surrogate splits on predictor `j` for which predictor `i` is the optimal split predictor. This average is computed by summing positive values of the predictive measure of association over optimal splits on predictor `i` and surrogate splits on predictor `j` and dividing by the total number of optimal splits on predictor `i`, including splits for which the predictive measure of association between predictors `i` and `j` is negative.