collintest

Belsley collinearity diagnostics

Syntax

[sValue,condIdx,VarDecomp]
 = collintest(X)

VarDecompTbl = collintest(Tbl)

[___] = collintest(___,Name=Value)

collintest(ax,Plot="on",___)

[___,h]
= collintest(___,Plot="on")

Description

example

[sValue,condIdx,VarDecomp] = collintest(X) displays, at the command window, Belsley collinearity diagnostics for assessing the strength and sources of collinearity among variables in the input matrix of time series data. The function also returns the singular values in decreasing order, condition indices, and variance decomposition proportions.

example

VarDecompTbl = collintest(Tbl) displays the Belsley collinearity diagnostics on all the variables of the input table or timetable. The function also returns a table containing variables for the singular values and condition indices, and variables for the variance-decomposition proportions associated with each time series.

To select a subset of variables, for which to compute collinearity diagnostics, use the DataVariables name-value argument.

example

[___] = collintest(___,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. collintest returns the output argument combination for the corresponding input arguments. For example, collintest(Tbl,Plot="on",Display="off",DataVariables=1:5) plots the Belslely collinearity diagnostics for the first 5 variables of the table Tbl to a figure instead of the command window.

collintest(ax,Plot="on",___) plots on the axes specified by ax instead of the current axes (gca). ax can precede any of the input argument combinations in the previous syntaxes.

[___,h] = collintest(___,Plot="on") plots the diagnostics of the input series and additionally returns handles to plotted graphics objects h. Use elements of h to modify properties of the plot after you create it.

Examples

collapse all

Compute Belsley Collinearity Diagnostics on Matrix of Data

Open Live Script

Display collinearity diagnostics for multiple time series using the default options of collintest. Input the time series data as a numeric matrix.

Load data of Canadian inflation and interest rates Data_Canada.mat, which contains the series in the matrix Data.

load Data_Canada

Display the Belsley collinearity diagnostics at the command window. Return the singular values, condition indices, and variance decomposition proportions.

series'

ans = 5x1 cell
    {'(INF_C) Inflation rate (CPI-based)'         }
    {'(INF_G) Inflation rate (GDP deflator-based)'}
    {'(INT_S) Interest rate (short-term)'         }
    {'(INT_M) Interest rate (medium-term)'        }
    {'(INT_L) Interest rate (long-term)'          }

[sValue,condIdx,VarDecomp] = collintest(Data);

Variance Decomposition

 sValue  condIdx   Var1    Var2    Var3    Var4    Var5  
---------------------------------------------------------
 2.1748    1      0.0012  0.0018  0.0003  0.0000  0.0001 
 0.4789   4.5413  0.0261  0.0806  0.0035  0.0006  0.0012 
 0.1602  13.5795  0.3386  0.3802  0.0811  0.0011  0.0137 
 0.1211  17.9617  0.6138  0.5276  0.1918  0.0004  0.0193 
 0.0248  87.8245  0.0202  0.0099  0.7233  0.9979  0.9658

Only the last row in the display has a condition index larger than the default tolerance, 30. In this row, the last three variables (in the last three columns) have variance-decomposition proportions exceeding the default tolerance, 0.5. These results suggest that the short-, medium-, and long-term interest rates exhibit multicollinearity.

collintest organizes the outputs in the display table.

sValue

sValue = 5×1

    2.1748
    0.4789
    0.1602
    0.1211
    0.0248

condIdx

condIdx = 5×1

    1.0000
    4.5413
   13.5795
   17.9617
   87.8245

VarDecomp

VarDecomp = 5×5

    0.0012    0.0018    0.0003    0.0000    0.0001
    0.0261    0.0806    0.0035    0.0006    0.0012
    0.3386    0.3802    0.0811    0.0011    0.0137
    0.6138    0.5276    0.1918    0.0004    0.0193
    0.0202    0.0099    0.7233    0.9979    0.9658

Compute Belsley Collinearity Diagnostics on Table Variables

Open Live Script

Display and return collinearity diagnostics for multiple time series, which are variables in a table, using default options.

Load data of Canadian inflation and interest rates Data_Canada.mat. Convert the table DataTable to a timetable.

load Data_Canada
dates = datetime(dates,ConvertFrom="datenum");
TT = table2timetable(DataTable,RowTimes=dates);
TT.Observations = [];

Display the Belsley collinearity diagnostics, using all default options.

VarDecompTbl = collintest(TT)

Variance Decomposition

 sValue  condIdx   INF_C   INF_G   INT_S   INT_M   INT_L 
---------------------------------------------------------
 2.1748    1      0.0012  0.0018  0.0003  0.0000  0.0001 
 0.4789   4.5413  0.0261  0.0806  0.0035  0.0006  0.0012 
 0.1602  13.5795  0.3386  0.3802  0.0811  0.0011  0.0137 
 0.1211  17.9617  0.6138  0.5276  0.1918  0.0004  0.0193 
 0.0248  87.8245  0.0202  0.0099  0.7233  0.9979  0.9658

VarDecompTbl=5×7 table
     sValue     condIdx      INF_C        INF_G        INT_S         INT_M         INT_L   
    ________    _______    _________    _________    __________    __________    __________

      2.1748         1     0.0012446    0.0017784    0.00033202    4.2326e-05    8.0328e-05
     0.47889    4.5413        0.0261     0.080594     0.0034869    0.00057749      0.001159
     0.16015    13.579       0.33864      0.38021      0.081126     0.0011166      0.013662
     0.12108    17.962       0.61384      0.52756       0.19176    0.00035545      0.019308
    0.024763    87.825      0.020173    0.0098575       0.72329       0.99791       0.96579

collintest returns collinearity diagnostics in the table VarDecompTbl, where variables correspond to the singular values, condition indices, and variance-decomposition proportions of each variable in the data (sValue, condIdx, and VarDecomp). The command window display and output table have a similar form.

By default, collintest computes collinearity diagnostics for all variables in the input table. To select a subset of variables from an input table, set the DataVariables option.

Extract the variance-decomposition proportions from the output table.

varnames = DataTable.Properties.VariableNames;
VarDecomp = VarDecompTbl(:,varnames)

VarDecomp=5×5 table
      INF_C        INF_G        INT_S         INT_M         INT_L   
    _________    _________    __________    __________    __________

    0.0012446    0.0017784    0.00033202    4.2326e-05    8.0328e-05
       0.0261     0.080594     0.0034869    0.00057749      0.001159
      0.33864      0.38021      0.081126     0.0011166      0.013662
      0.61384      0.52756       0.19176    0.00035545      0.019308
     0.020173    0.0098575       0.72329       0.99791       0.96579

Plot Belsley Collinearity Diagnostics

Open Live Script

Plot collinearity diagnostics for all time series in a table.

Load data of Canadian inflation and interest rates Data_Canada.mat.

load Data_Canada

Plot the Belsley collinearity diagnostics for all series.

collintest(DataTable,Plot="on");

Variance Decomposition

 sValue  condIdx   INF_C   INF_G   INT_S   INT_M   INT_L 
---------------------------------------------------------
 2.1748    1      0.0012  0.0018  0.0003  0.0000  0.0001 
 0.4789   4.5413  0.0261  0.0806  0.0035  0.0006  0.0012 
 0.1602  13.5795  0.3386  0.3802  0.0811  0.0011  0.0137 
 0.1211  17.9617  0.6138  0.5276  0.1918  0.0004  0.0193 
 0.0248  87.8245  0.0202  0.0099  0.7233  0.9979  0.9658

The plot corresponds to the values in the last row of the variance-decomposition proportions, which are the only proportions with a condition index larger than the default tolerance of 30. The interest rate series have variance-decomposition proportions exceeding the default tolerance of 0.5 (red markers in the plot).

Plot Belsley Collinearity Diagnostics for Selected Variables and Intercept

Open Live Script

Compute collinearity diagnostics for selected time series and an intercept.

Load the credit default data set Data_CreditDefaults.mat. The table DataTable contains the default rate of investment-grade corporate bonds series (IGD, the response variable) and several predictor variables.

load Data_CreditDefaults

Consider a multiple regression model for the default rate that includes an intercept term.

Include a variable in the table of data that represents the intercept in the design matrix (that is, a column of ones). Place the intercept variable at the beginning of the table.

Const = ones(height(DataTable),1);
DataTable = addvars(DataTable,Const,Before=1);

Create a variable that contains all predictor variable names.

varnames = DataTable.Properties.VariableNames;
prednames = varnames(varnames ~= "IGD");

Graph a correlation plot of all predictor variables except for the intercept dummy variable.

figure
corrplot(DataTable,DataVariables=prednames(2:end), ...
    TestR="on");

The predictor BBB is moderately linearly associated with the other predictors, while all other predictors appear unassociated with each other.

Plot the Belsley collinearity diagnostics of the predictor variables. Adjust the following options for the collinearity diagnostics:

Set the condition index tolerance to 10.
Set the variance-decomposition proportion tolerance to 0.5.

figure
collintest(DataTable,Plot="on",DataVariables=prednames, ...
    TolIdx=10,TolProp=0.5);

Variance Decomposition

 sValue  condIdx   Const    AGE     BBB     CPF     SPR  
---------------------------------------------------------
 2.0605    1      0.0015  0.0024  0.0020  0.0140  0.0025 
 0.8008   2.5730  0.0016  0.0025  0.0004  0.8220  0.0023 
 0.2563   8.0400  0.0037  0.3208  0.0105  0.0004  0.3781 
 0.1710  12.0464  0.2596  0.0950  0.8287  0.1463  0.0001 
 0.1343  15.3405  0.7335  0.5793  0.1585  0.0173  0.6170

The row associated with condition index 12 (row 4) has one predictor (BBB) with a proportion above the tolerance 0.5, but collinearity requires two or more predictors for a dependency.

The row associated with condition index 15.3 (row 5) shows a weak dependence involving AGE, SPR, and the intercept, which the correlation plot does not expose.

Input Arguments

collapse all

`X` — Time series data
numeric matrix

Time series data, specified as a numObs-by-numVars numeric matrix. Each column of X corresponds to a variable, and each row corresponds to an observation.

Data Types: double

`Tbl` — Time series data
table | timetable

Time series data, specified as a table or timetable with numObs rows. Each row of Tbl is an observation.

Specify numVars variables to include in the diagnostics computations by using the DataVariables argument. The selected variables must be numeric.

`ax` — Axes on which to plot
`Axes` object

Axes on which to plot, specified as an Axes object.

By default, collintest plots to the current axes (gca).

Note

To specify a model containing an intercept, include a variable (column) of ones in the time series data.
collintest scales all variables to unit length before computing diagnostics; do not center the variables in the data.
Impute or remove all missing observations (indicated by NaN entries) in the input data before passing the set to collintest.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: collintest(Tbl,Plot="on",Display="off",DataVariables=1:5) plots the Belslely collinearity diagnostics for the first 5 variables of the table Tbl to a figure instead of the command window.

`VarNames` — Unique variable names used in displays and plots of results
string vector | character vector | cell vector of strings | cell vector of character vectors

Unique variable names used in displays and plots of the results, specified as a string vector or cell vector of strings of a length numVars. VarNames(j) specifies the name to use for variable X(:,j) or DataVariables(j).

If an intercept term is present, VarNames must include the intercept term (e.g., include the name "Const").

The software truncates all variable names to the first five characters.

If the input time series data is a matrix X, the default is {'var1','var2',...}.
If the input time series data is a table or timetable Tbl, the default is Tbl.Properties.VariableNames.

Example: VarNames=["Const" "AGE" "BBD"]

Data Types: char | cell | string

`Display` — Flag for command window display of results
`"on"` (default) | `"off"` | character vector

Flag for a command window display of results, specified as a value in this table.

Value	Description
`"on"`	`collintest` displays all outputs in tabular form to the command window.
`"off"`	`collintest` does not display the results to the command window.

Example: Display="off"

Data Types: char | string

`Plot` — Flag for plotting results
`"off"` (default) | `"on"` | character vector

Flag for plotting results to a figure, specified as a value in this table.

Value Description

Value	Description
`"on"`	`collintest` plots critical rows of the output `VarDecomp`, specifically, rows with condition indices above the input tolerance `TolIdx`. If a group of at least two variables in a critical row have variance-decomposition proportions above the input tolerance `TolProp`, the group is identified with red markers.
`"off"`	`collintest` does not plot results to a figure.

"on"

collintest plots critical rows of the output VarDecomp, specifically, rows with condition indices above the input tolerance TolIdx.

If a group of at least two variables in a critical row have variance-decomposition proportions above the input tolerance TolProp, the group is identified with red markers.

"off" collintest does not plot results to a figure.

Example: Plot="on"

Data Types: char | string

`TolIdx` — Condition index tolerance
`30` (default) | numeric scalar of at least 1

Condition index tolerance, specified as a scalar value of at least 1.

collintest uses TolIdx to decide which indices are large enough to infer a near dependency in the data. TolIdx is used only when the Plot argument is "on".

Example: TolIdx=25

Data Types: double

`TolProp` — Variance-decomposition proportion tolerance
`0.5` (default) | numeric scalar in [0,1]

Variance-decomposition proportion tolerance, specified as a numeric scalar in the interval [0,1].

collintest uses TolProp to decide which variables are involved in any near dependency. TolProp is used only when the Plot argument is "on".

Example: TolProp=0.4

Data Types: double

`DataVariables` — Variables in `Tbl`
all variables (default) | string vector | cell vector of character vectors | vector of integers | logical vector

Variables in Tbl for which collintest computes Belsley collinearity diagnostics, specified as a string vector or cell vector of character vectors containing variable names in Tbl.Properties.VariableNames, or an integer or logical vector representing the indices of names. The selected variables must be numeric.

Example: DataVariables=["GDP" "CPI"]

Example: DataVariables=[true true false false] or DataVariables=[1 2] selects the first and second table variables.

Data Types: double | logical | char | cell | string

Output Arguments

collapse all

`sValue` — Singular values
numeric vector

Singular values of the scaled design matrix composed of the specified time series variables, returned as a numeric vector with elements in descending order. collintest returns sValue when you supply the input X.

`condIdx` — Condition indices
numeric vector

Condition indices, returned as a numeric vector with elements in ascending order.

All condition indices have value between 1 and the condition number of the scaled design matrix of the specified time series variables. collintest returns condIdx when you supply the input X.

Large indices identify near dependencies among the specified variables. The size of the indices is a measure of how near dependencies are to collinearity.

`VarDecomp` — Variance-decomposition proportions
numeric matrix

Variance-decomposition proportions, returned as a numVars-by-numVars numeric matrix.

Large proportions, combined with a large condition index, identify groups of variables involved in near dependencies. collintest returns VarDecomp when you supply the input X.

The size of the proportions is a measure of how badly the regression is degraded by the dependency.

`VarDecompTbl` — Collinearity diagnostics summary
table

Collinearity diagnostics summary, returned as a table with variables for the outputs sValue, condIdx, and VarDecomp. collintest returns Tbl when you supply the input Tbl. The value of the VarNames argument determines the variable names of the columns of VarDecomp.

`h` — Handles to plotted graphics objects
graphics array

Handles to plotted graphics objects, returned as a graphics array. h contains unique plot identifiers, which you can use to query or modify properties of the plot.

collintest plots only when you set Plot="on".

More About

collapse all

Belsley Collinearity Diagnostics

Belsley collinearity diagnostics assess the strength and sources of collinearity among variables in a multiple linear regression model.

To assess collinearity, the software computes singular values of the scaled variable matrix, X, and then converts them to condition indices. The conditional indices identify the number and strength of any near dependencies between variables in the variable matrix. The software decomposes the variance of the ordinary least squares (OLS) estimates of the regression coefficients in terms of the singular values to identify variables involved in each near dependency, and the extent to which the dependencies degrade the regression.

Condition Indices

The condition indices (condIdx) for a scaled matrix X identify the number and strength of any near dependencies in X.

For scaled matrix X with p columns and singular values (sValue) $S_{1} \geq S_{2} \geq \dots \geq S_{p}$ , the condition indices of the columns of X are $S_{1} / S_{j}$ (sValue(1)/sValue(j)), where j = 1,...,p.

All condition indices are bounded between one and the condition number.

Condition Number

The condition number of a scaled matrix X is an overall diagnostic for detecting collinearity.

For scaled matrix X with p columns and singular values (sValue) $S_{1} \geq S_{2} \geq \dots \geq S_{p}$ , the condition number is $S_{1} / S_{p}$ (sValue(1)/sValue(end)).

The condition number achieves its lower bound of one when the columns of scaled X are orthonormal. The condition number rises as variates exhibit greater dependency.

A limitation of the condition number as a diagnostic is that it fails to provide specifics on the strength and sources of any near dependencies.

Multiple Linear Regression Model

A multiple linear regression model is a model of the form $Y = X β + ε .$ X is a design matrix of regression variables, and β is a vector of regression coefficients.

Singular Values

The singular values (sValue) of a scaled matrix X are the diagonal elements of the matrix S in the singular value decomposition $U S V^{'} .$

In descending order, the singular values of the scaled matrix X with p columns are $S_{1} \geq S_{2} \geq \dots \geq S_{p}$ .

Variance-Decomposition Proportions

Variance-decomposition proportions identify groups of variates involved in near dependencies, and the extent to which the dependencies degrade the regression.

From the singular value decomposition $U S V^{'}$ of scaled design matrix X (with p columns), define the following quantities:

V is the matrix of orthonormal eigenvectors of $X^{'} X$ .
The singular values (sValue) $S_{1} \geq S_{2} \geq \dots \geq S_{p}$ are the ordered diagonal elements of the matrix S.

The variance of the OLS estimate of multiple linear regression coefficient i, β_i, is proportional to the sum

$V {(i, 1)}^{2} / S_{1}^{2} + V {(i, 2)}^{2} / S_{2}^{2} + \dots + V {(i, p)}^{2} / S_{p}^{2},$

where $V (i, j)$ denotes element (i,j) of V.

Variance-decomposition proportion (i,j) (VarDecomp) is the proportion of term j in the sum relative to the entire sum, j = 1,...,p.

The terms $S_{j}^{2}$ are the eigenvalues of scaled $X^{'} X$ . Thus, large variance-decomposition proportions correspond to small eigenvalues of $X^{'} X$ , a common diagnostic for collinearity. The singular value decomposition provides a more direct, numerically stable view of the eigensystem of scaled $X^{'} X$ .

Tips

For purposes of collinearity diagnostics, Belsley [1] shows that column scaling of the design matrix composed of the input time series data is always desirable. However, he also shows that centering the data in X is undesirable. For models with an intercept, if you center the data in X, the role of the constant term in any near dependency is hidden, and yields misleading diagnostics.
Tolerances for identifying large condition indices and variance-decomposition proportions are comparable to critical values in standard hypothesis tests. Experience determines the most useful tolerance, but experiments suggest the collintest defaults are good starting points [1].

References

[1] Belsley, D. A., E. Kuh, and R. E. Welsh. Regression Diagnostics. New York, NY: John Wiley & Sons, Inc., 1980.

[2] Judge, G. G., W. E. Griffiths, R. C. Hill, H. Lϋtkepohl, and T. C. Lee. The Theory and Practice of Econometrics. New York, NY: John Wiley & Sons, Inc., 1985.

Version History

Introduced in R2012a

expand all

R2022a: `collintest` returns a results table when you supply a table of data

If you supply a table of time series data Tbl, collintest returns a table containing variables for the singular values sValue and condition indices condIdx, and variables for the variance-decomposition proportions VarDecomp associated with each time series, from which collinearity is diagnosed.

Before R2022a, collintest returned sValue, condIdx, and VarDecomp in separate positions of the output when you supplied a table of input data.

Starting in R2022a, if you supply a table of input data, update your code to return all collinearity diagnostic outputs in the first output position. The second optional output is the graphics object h.

[VarDecompTbl,h] = collintest(Tbl,Name=Value)

collintest issues an error if you request more outputs.

Also, access results by using table indexing. For more details, see Access Data in Tables.

collintest

Syntax

Description

Examples

Compute Belsley Collinearity Diagnostics on Matrix of Data

Compute Belsley Collinearity Diagnostics on Table Variables

Plot Belsley Collinearity Diagnostics

Plot Belsley Collinearity Diagnostics for Selected Variables and Intercept

Input Arguments

`X` — Time series data
numeric matrix

`Tbl` — Time series data
table | timetable

`ax` — Axes on which to plot
`Axes` object

Name-Value Arguments

`VarNames` — Unique variable names used in displays and plots of results
string vector | character vector | cell vector of strings | cell vector of character vectors

`Display` — Flag for command window display of results
`"on"` (default) | `"off"` | character vector

`Plot` — Flag for plotting results
`"off"` (default) | `"on"` | character vector

`TolIdx` — Condition index tolerance
`30` (default) | numeric scalar of at least 1

`TolProp` — Variance-decomposition proportion tolerance
`0.5` (default) | numeric scalar in [0,1]

`DataVariables` — Variables in `Tbl`
all variables (default) | string vector | cell vector of character vectors | vector of integers | logical vector

Output Arguments

`sValue` — Singular values
numeric vector

`condIdx` — Condition indices
numeric vector

`VarDecomp` — Variance-decomposition proportions
numeric matrix

`VarDecompTbl` — Collinearity diagnostics summary
table

`h` — Handles to plotted graphics objects
graphics array

More About

Belsley Collinearity Diagnostics

Condition Indices

Condition Number

Multiple Linear Regression Model

Singular Values

Variance-Decomposition Proportions

Tips

References

Version History

R2022a: `collintest` returns a results table when you supply a table of data

See Also

Apps

Functions

Topics

collintest

Syntax

Description

Examples

Compute Belsley Collinearity Diagnostics on Matrix of Data

Compute Belsley Collinearity Diagnostics on Table Variables

Plot Belsley Collinearity Diagnostics

Plot Belsley Collinearity Diagnostics for Selected Variables and Intercept

Input Arguments

X — Time series data numeric matrix

Tbl — Time series data table | timetable

ax — Axes on which to plot Axes object

Name-Value Arguments

VarNames — Unique variable names used in displays and plots of results string vector | character vector | cell vector of strings | cell vector of character vectors

Display — Flag for command window display of results "on" (default) | "off" | character vector

Plot — Flag for plotting results "off" (default) | "on" | character vector

TolIdx — Condition index tolerance 30 (default) | numeric scalar of at least 1

TolProp — Variance-decomposition proportion tolerance 0.5 (default) | numeric scalar in [0,1]

DataVariables — Variables in Tbl all variables (default) | string vector | cell vector of character vectors | vector of integers | logical vector

Output Arguments

sValue — Singular values numeric vector

condIdx — Condition indices numeric vector

VarDecomp — Variance-decomposition proportions numeric matrix

VarDecompTbl — Collinearity diagnostics summary table

h — Handles to plotted graphics objects graphics array

More About

Belsley Collinearity Diagnostics

Condition Indices

Condition Number

Multiple Linear Regression Model

Singular Values

Variance-Decomposition Proportions

Tips

References

Version History

R2022a: collintest returns a results table when you supply a table of data

See Also

Apps

Functions

Topics

`X` — Time series data
numeric matrix

`Tbl` — Time series data
table | timetable

`ax` — Axes on which to plot
`Axes` object

`VarNames` — Unique variable names used in displays and plots of results
string vector | character vector | cell vector of strings | cell vector of character vectors

`Display` — Flag for command window display of results
`"on"` (default) | `"off"` | character vector

`Plot` — Flag for plotting results
`"off"` (default) | `"on"` | character vector

`TolIdx` — Condition index tolerance
`30` (default) | numeric scalar of at least 1

`TolProp` — Variance-decomposition proportion tolerance
`0.5` (default) | numeric scalar in [0,1]

`DataVariables` — Variables in `Tbl`
all variables (default) | string vector | cell vector of character vectors | vector of integers | logical vector

`sValue` — Singular values
numeric vector

`condIdx` — Condition indices
numeric vector

`VarDecomp` — Variance-decomposition proportions
numeric matrix

`VarDecompTbl` — Collinearity diagnostics summary
table

`h` — Handles to plotted graphics objects
graphics array

R2022a: `collintest` returns a results table when you supply a table of data