# Tobit

Create `Tobit`

model object for exposure at
default

## Description

Create and analyze a `Tobit`

model object to calculate
the exposure at default (EAD) using this workflow:

Use

`fitEADModel`

to create a`Tobit`

model object.Use

`predict`

to predict the EAD.Use

`modelDiscrimination`

to return AUROC and ROC data. You can plot the results using`modelDiscriminationPlot`

.Use

`modelAccuracy`

to return the R-squared, RMSE, correlation, and sample mean error of predicted and observed EAD data. You can plot the results using`modelAccuracyPlot`

.

## Creation

### Description

specifies options using one or more name-value pair arguments in
addition to the input arguments in the previous syntax. The optional
name-value pair arguments set the model object properties. For
example, `TobitEADModel`

= fitLGDModel(___,`Name,Value`

)```
eadModel =
fitEADModel(EADData,ModelType,'PredictorVars',{'UtilizationRate','Age','Marriage'},'ConversionMeasure',"ccf",'DrawnVar','Drawn','LimitVar','Limit','ResponseVar','EAD')
```

creates an `eadModel`

object using a
`Tobit`

model type.

### Input Arguments

`data`

— Data for exposure at default

table

Data for exposure at default, specified as a table.

**Data Types: **`table`

`ModelType`

— Model type

string with value `"Tobit"`

| character vector with value `'Tobit'`

Model type, specified as a string with the value of
`"Tobit"`

or a character vector with the value
of `'Tobit'`

.

**Data Types: **`char`

| `string`

`Tobit`

Name-Value Pair ArgumentsSpecify
optional comma-separated pairs of `Name,Value`

arguments.
`Name`

is the argument name and
`Value`

is the corresponding value.
`Name`

must appear inside quotes. You can specify
several name and value pair arguments in any order as
`Name1,Value1,...,NameN,ValueN`

.

**Example:**

```
eadModel =
fitEADModel(EADData,ModelType,'PredictorVars',{'UtilizationRate','Age','Marriage'},'ConversionMeasure',"ccf",'DrawnVar','Drawn','LimitVar','Limit','ResponseVar','EAD')
```

`ModelID`

— User-defined model ID

`"Tobit"`

(default) | string | character vector

User-defined model ID, specified as the comma-separated pair
consisting of `'ModelID'`

and a string or
character vector. The software uses the
`ModelID`

text to format outputs and is
expected to be short.

**Data Types: **`string`

| `char`

`Description`

— User-defined description for model

`""`

(default) | string | character vector

User-defined description for model, specified as the
comma-separated pair consisting of
`'Description'`

and a string or character
vector.

**Data Types: **`string`

| `char`

`PredictorVars`

— Predictor variables

all columns of `data`

except
for `ResponseVar`

(default) | string array | cell array of character vectors

Predictor variables, specified as the comma-separated pair
consisting of `'PredictorVars'`

and a string
array or cell array of character vectors.
`PredictorVars`

indicates which columns in
the `data`

input contain the predictor
information. By default, `PredictorVars`

is set
to all the columns in the `data`

input except
for `ResponseVar`

.

**Data Types: **`string`

| `cell`

`ResponseVar`

— Response variable

last column of `data`

(default) | string | character vector

Response variable, specified as the comma-separated pair
consisting of `'ResponseVar'`

and a string or
character vector. The response variable contains the EAD data
and must be a numeric variable with values between
`0`

and `1`

(inclusive).
An EAD value of `0`

indicates no loss (full
recovery), `1`

indicates total loss (no
recovery), and values between `0`

and
`1`

indicate a partial loss. By default,
`ResponseVar`

is set to the last
column.

**Data Types: **`string`

| `char`

`LimitVar`

— Limit variable

last column of `data`

(default) | string | character vector

Limit variable, specified as the comma-separated pair
consisting of `'LimitVar'`

and a string or
character vector. `LimitVar`

indicates which
column in `data`

contains the limit amount.
`LimitVar`

is required when
`ConversionMeasure`

is
`'ccf'`

or `'lcf'`

.

**Data Types: **`string`

| `char`

`DrawnVar`

— Drawn variable

last column of `data`

(default) | string | character vector

Drawn variable, specified as the comma-separated pair
consisting of `'DrawnVar'`

and a string or
character vector. `DrawnVar`

indicates which
column in `data`

contains the limit amount.
`DrawnVar`

is required when
`ConversionMeasure`

is
`'ccf'`

.

**Data Types: **`string`

| `char`

`ConversionMeasure`

— Conversion measure for EAD response values

`"ccf"`

(default) | character vector with value of `'ccf'`

or
`'lcf'`

| string with value of `"ccf"`

or
`"lcf"`

Response transform, specified as the comma-separated pair
consisting of `'ConversionMeasure'`

and a
character vector or string.

`"ccf"`

— Credit conversion factor (CCF) is the portion of the undrawn amount that will be converted into credit. The undrawn amount is the limit minus the drawn amount. The EAD thus becomes the drawn amount plus the CCF times the limit minus the drawn amount (`EAD = Drawn + CCF*(Limit - Drawn)`

) .`"lcf"`

— Limit conversion factor (LCF) is a fraction of the limit representing the total exposure. The EAD is then defined as the LCF times the limit (`EAD = LCF*Limit`

).

**Data Types: **`string`

| `char`

`CensoringSide`

— Censoring side

`"both"`

(default) | character vector with value of `'left'`

,
`'right'`

, or
`'both'`

| string with value of `"left"`

,
`"right"`

, or
`"both"`

Censoring side, specified as the comma-separated pair
consisting of `'CensoringSide'`

and a character
vector or string. `CensoringSide`

indicates
whether the desired Tobit model is left-censored,
right-censored, or censored on both sides.

**Data Types: **`string`

| `char`

`LeftLimit`

— Left-censoring limit

`0`

(default) | numeric between `0`

and
`1`

Left-censoring limit, specified as the comma-separated pair
consisting of `'LeftLimit'`

and a scalar
numeric between `0`

and
`1`

.

**Data Types: **`double`

`RightLimit`

— Right-censoring limit

`1`

(default) | numeric between `0`

and
`1`

Right-censoring limit, specified as the comma-separated pair
consisting of `'RightLimit'`

and a scalar
numeric between `0`

and
`1`

.

**Data Types: **`double`

`SolverOptions`

— `optimoptions`

object

object

Options for fitting, specified as the comma-separated pair
consisting of `'SolverOptions'`

and an
`optimoptions`

object that is created using
`optimoptions`

from
Optimization Toolbox™. The defaults for the
`optimoptions`

object are:

`"Display"`

—`"none"`

`"Algorithm"`

—`"sqp"`

`"MaxFunctionEvaluations"`

—`500`

✕ Number of model coefficients`"MaxIterations"`

— The number of Tobit model coefficients is determined at run time; it depends on the number of predictors and the number of categories in the categorical predictors.

**Data Types: **`object`

## Properties

`ModelID`

— User-defined model ID

`Tobit`

(default) | string

User-defined model ID, returned as a string.

**Data Types: **`string`

`Description`

— User-defined description

`""`

(default) | string

User-defined description, returned as a string.

**Data Types: **`string`

`UnderlyingModel`

— Underlying statistical model

compact linear model

This property is read-only.

Underlying statistical model, returned as a compact linear model
object. The compact version of the underlying regression model is an
instance of the `classreg.regr.CompactLinearModel`

class. For more information, see `fitlm`

and `CompactLinearModel`

.

**Data Types: **`string`

`PredictorVars`

— Predictor variables

all columns of `data`

except for the
`ResponseVar`

(default) | string array

Predictor variables, returned as a string array.

**Data Types: **`string`

`ResponseVar`

— Response variable

last column of `data`

(default) | string

Response variable, returned as a string.

**Data Types: **`string`

`LimitVar`

— Limit variable

`[]`

(default) | string

Limit variable, returned as a string.

**Data Types: **`string`

`DrawnVar`

— Drawn variable

`[]`

(default) | string

Drawn variable, returned as a string.

**Data Types: **`string`

`ConversionMeasure`

— Conversion measure for EAD response values

`"ccf"`

(default) | string with value of `"ccf"`

or
`"lcf"`

Response transform, returned as a string.

**Data Types: **`string`

`CensoringSide`

— Censoring side

`"both"`

(default) | string with value of `"left"`

,
`"right"`

, or `"both"`

This property is read-only.

Censoring side, returned as a string.

**Data Types: **`string`

`LeftLimit`

— Left-censoring limit

`0`

(default) | numeric between `0`

and `1`

This property is read-only.

Left-censoring limit, returned as a scalar numeric between
`0`

and `1`

.

**Data Types: **`double`

`RightLimit`

— Right-censoring limit

`1`

(default) | numeric between `0`

and `1`

This property is read-only.

Right-censoring limit, returned as a scalar numeric between
`0`

and `1`

.

**Data Types: **`double`

## Object Functions

`predict` | Predict exposure at default |

`modelDiscrimination` | Compute AUROC and ROC data |

`modelDiscriminationPlot` | Plot ROC curve |

`modelAccuracy` | Compute R-square, RMSE, correlation, and sample mean error of predicted and observed EADs |

`modelAccuracyPlot` | Scatter plot of predicted and observed EADs |

## Examples

### Create Tobit EAD Model

This example shows how to use `fitEADModel`

to create a `Tobit`

model for exposure at default (EAD).

**Load EAD Data**

Load the EAD data.

```
load EADData.mat
head(EADData)
```

`ans=`*8×6 table*
UtilizationRate Age Marriage Limit Drawn EAD
_______________ ___ ___________ __________ __________ __________
0.24359 25 not married 44776 10907 44740
0.96946 44 not married 2.1405e+05 2.0751e+05 40678
0 40 married 1.6581e+05 0 1.6567e+05
0.53242 38 not married 1.7375e+05 92506 1593.5
0.2583 30 not married 26258 6782.5 54.175
0.17039 54 married 1.7357e+05 29575 576.69
0.18586 27 not married 19590 3641 998.49
0.85372 42 not married 2.0712e+05 1.7682e+05 1.6454e+05

rng('default'); NumObs = height(EADData); c = cvpartition(NumObs,'HoldOut',0.4); TrainingInd = training(c); TestInd = test(c);

**Select Model Type**

Select a model type for `Tobit`

or `Regression`

.

`ModelType = "Tobit";`

**Select Conversion Measure**

Select a conversion measure for the EAD response values.

`ConversionMeasure = "LCF";`

**Create Tobit EAD Model**

Use `fitEADModel`

to create a `Tobit`

model using the `EADData`

.

eadModel = fitEADModel(EADData,ModelType,'PredictorVars',{'UtilizationRate','Age','Marriage'}, ... 'ConversionMeasure',ConversionMeasure,'DrawnVar','Drawn','LimitVar','Limit','ResponseVar','EAD'); disp(eadModel);

Tobit with properties: CensoringSide: "both" LeftLimit: 0 RightLimit: 1 ModelID: "Tobit" Description: "" UnderlyingModel: [1x1 risk.internal.credit.TobitModel] PredictorVars: ["UtilizationRate" "Age" "Marriage"] ResponseVar: "EAD" LimitVar: "Limit" DrawnVar: "Drawn" ConversionMeasure: "lcf"

Display the underlying model. The underlying model's response variable is the transformation of the EAD response data. Use the `'LimitVar'`

and `'DrawnVar'`

name-value arguments to modify the transformation.

disp(eadModel.UnderlyingModel);

Tobit regression model: EAD_lcf = max(0,min(Y*,1)) Y* ~ 1 + UtilizationRate + Age + Marriage Estimated coefficients: Estimate SE tStat pValue __________ __________ ________ ________ (Intercept) 0.22735 0.025005 9.0922 0 UtilizationRate 0.47364 0.016531 28.652 0 Age -0.0013929 0.00061479 -2.2657 0.023517 Marriage_not married -0.006888 0.01208 -0.57022 0.56856 (Sigma) 0.36419 0.003878 93.913 0 Number of observations: 4378 Number of left-censored observations: 0 Number of uncensored observations: 4377 Number of right-censored observations: 1 Log-likelihood: -1791.06

**Predict EAD**

EAD prediction operates on the underlying compact statistical model and then transforms the predicted values back to the EAD scale. You can specify the `predict`

function with different options for the `'ModelLevel'`

name-vale argument.

predictedEAD = predict(eadModel,EADData(TestInd,:),'ModelLevel','ead'); predictedConversion = predict(eadModel,EADData(TestInd,:),'ModelLevel','ConversionMeasure');

**Validate EAD Model**

For model validation, use `modelDiscrimination`

, `modelDiscriminationPlot`

, `modelAccuracy`

, and `modelAccuracyPlot`

.

Use `modelDiscrimination`

and then `modelDiscriminationPlot`

to plot the ROC curve.

ModelLevel = "ConversionMeasure"; [DiscMeasure1,DiscData1] = modelDiscrimination(eadModel,EADData(TestInd,:),'ModelLevel',ModelLevel); modelDiscriminationPlot(eadModel,EADData(TestInd, :),'ModelLevel',ModelLevel,'SegmentBy','Marriage');

Use `modelAccuracy`

and then `modelAccuracyPlot`

to show a scatter plot of the predictions.

YData = "Observed"; [AccMeasure1,AccData1] = modelAccuracy(eadModel,EADData(TestInd,:),'ModelLevel',ModelLevel); modelAccuracyPlot(eadModel,EADData(TestInd,:),'ModelLevel',ModelLevel,'YData',YData);

Plot a histogram of observed with respect to the predicted EAD.

figure; histogram(AccData1.Observed); hold on; histogram(AccData1.(('Predicted_' + ModelType))); legend('Observed','Predicted');

## More About

### Exposure at Default Tobit Models

The exposure at default (EAD) Tobit models fit a Tobit model to EAD data.

Tobit models are “censored” regression models. Tobit models assume that the
response variable can be observed only within certain limits, and no value
outside the limits can be observed. Using `ModelLevel`

, you
can set the Tobit model level to `EAD`

, `CCF`

,
or `LCF`

conversion measures. The `EAD`

model
level does not have any range, the `CCF`

conversion measure has
a range of `-Inf`

to `1`

, and the
`LCF`

conversion measure is `0`

to
`1`

. A distribution of response values where there is a
high frequency of observations at the limits is consistent with the model
assumptions.

The Tobit model combines the following two formulas:

$$\begin{array}{l}Y=\mathrm{min}\left\{\mathrm{max}\left\{L,{Y}^{*}\right\},R\right\}\\ {Y}^{*}={\beta}_{0}+{\beta}_{1}{X}_{1}+\mathrm{...}+{\beta}_{p}{X}_{p}+\sigma \epsilon =X\beta +\sigma \epsilon \end{array}$$

where

*Y*is the observed response variable, the observed EAD data for an EAD model.*L*is the left limit, the lower bound for the response values, typically`0`

for EAD models.*R*is the right limit, the upper bound for the response values, typically`1`

for EAD models.*Y*^{*}is a latent, unobserved variable.β

_{j}is the coefficient of the*j*th predictor (or the intercept for*j*=`0`

).σ is the standard deviation of the error term.

ε is the error term, assumed to follow a standard normal distribution.

The first formula above is written using `min`

and
`max`

operators and is equivalent to

$$Y=\left\{\begin{array}{l}L\text{if}{Y}^{*}\le L\\ {Y}^{*}\text{if}L{Y}^{*}R\\ R\text{if}{Y}^{*}\ge R\end{array}\right\}$$

The standard deviation of the error is explicitly indicated in the formulas.
Unlike traditional regression least-squares estimation, where the standard
deviation of the error can be inferred from the residuals, for Tobit models the
estimation is via maximum likelihood and the standard deviation needs to be
handled explicitly during the estimation. If there are *p*
predictor variables, the Tobit model estimates *p*+2
coefficients, namely, one coefficient for each predictor, plus an intercept,
plus a standard deviation.

Three censoring side options are supported in the Tobit EAD models with the
`CensoringSide`

name-value argument:

`'both'`

— This option is the default option, with censoring on both sides. The estimation uses left and right limits.`'left'`

— The left-censored version of the model has no right limit (or*R*= ∞). The relationship between*Y*and*Y*^{*}is*Y*=`max`

{*L*,*Y*^{*}}.`'right'`

— The right-censored version of the model has no left limit (or*L*= -∞). The relationship between*Y*and*Y*^{*}is*Y*=`min`

{*Y*^{*},*R*}.

The parameters of the Tobit model are estimated using maximum likelihood. For
observation *i* = 1,…,*n*, the likelihood
function is

$$LF(\beta ,\sigma |{X}_{i},{Y}_{i})=\left\{\begin{array}{l}\Phi (L;{X}_{i}\beta \text{,}\sigma \text{)if}{Y}_{i}\le L\\ \varphi ({Y}_{i}{\text{;X}}_{i}\beta \text{,}\sigma \text{)if}L{Y}_{i}R\\ 1-\Phi (R;{X}_{i}\beta ,\sigma )\text{if}{Y}_{i}\ge R\end{array}\right\}$$

where

Φ(

*x*;*m*,*s*) is the cumulative normal distribution with mean*m*and standard deviation*s*.φ(

*x*;*m*,*s*) is the normal density function with mean*m*and standard deviation*s*.

This likelihood function is for models censored on both sides. For
left-censored models, the right limit has no effect, and the likelihood function
has two cases only (*R* = ∞); likewise for right-censored
models (*L* = -∞).

The log-likelihood function is the sum of the logarithm of the likelihood functions for individual observations

$$LLF(\beta ,\sigma |X,Y)={\displaystyle \sum _{i=1}^{n}\mathrm{log}(LF(}\beta ,\sigma |{X}_{i},{Y}_{i}))$$

The parameters are estimated by maximizing the log-likelihood function. The only constraint is that the σ parameter must be positive.

To predict an EAD value, Tobit EAD models return the unconditional expected value of the response, given the predictor values

$$EA{D}_{i}^{pred}=E\left[{Y}_{i}|{X}_{i}\right]$$

The expression for the expected value can be separated into the cases

$$\begin{array}{l}E\left[Y\right]=E\left[Y|Y=L\right]P(Y=L)\\ +E\left[Y|L<Y<R\right]P(L<Y<R)\\ +E\left[Y|Y=R\right]P(Y=R)\end{array}$$

Using the previous expression and the properties of the (truncated) normal distribution, it follows that

$$E\left[{Y}_{i}|{X}_{i}\right]=\Phi ({a}_{i})L+(\Phi ({b}_{i})-\Phi ({a}_{i}))({X}_{i}\beta +\sigma {\lambda}_{i})+(1-\Phi ({b}_{i}))R$$

where

$${a}_{i}=\frac{L-{X}_{i}\beta}{\sigma},{b}_{i}=\frac{R-{X}_{i}\beta}{\sigma},\text{and}{\lambda}_{i}=\frac{\varphi ({a}_{i})-\varphi ({b}_{i})}{\Phi ({b}_{i})-\Phi ({a}_{i})}$$

This expression applies to the models censored on both sides. For models
censored on one side only, the corresponding expressions can be derived from
here. For example, for left-censored models, let the *R* limit
in the expression above go to infinity, and the resulting expression is

$$E\left[{Y}_{i}|{X}_{i}\right]=\Phi ({a}_{i})L+(1-\Phi ({a}_{i}))\left({X}_{i}\beta \text{+}\sigma \text{}\frac{\varphi ({a}_{i})}{1-\Phi ({a}_{i})}\right)$$

Similarly, for right-censored models, the *L* limit is
decreased to minus infinity to get

$$E\left[{Y}_{i}|{X}_{i}\right]=\Phi ({b}_{i})\left({X}_{i}\beta -\sigma \text{}\frac{\varphi ({b}_{i})}{\Phi ({b}_{i})}\right)+(1-\Phi ({b}_{i}))R$$

## References

[1] Baesens, Bart, Daniel
Roesch, and Harald Scheule. *Credit Risk Analytics: Measurement
Techniques, Applications, and Examples in SAS.* Wiley,
2016.

[2] Bellini, Tiziano.
*IFRS 9 and CECL Credit Risk Modelling and Validation: A Practical
Guide with Examples Worked in R and SAS.* San Diego, CA: Elsevier,
2019.

[3] Brown, Iain.
*Developing Credit Risk Models Using SAS Enterprise Miner and
SAS/STAT: Theory and Applications.* SAS Institute,
2014.

[4] Roesch, Daniel and Harald
Scheule. *Deep Credit Risk.* Independently published,
2020.

## See Also

### Functions

**Introduced in R2021b**

## Open Example

You have a modified version of this example. Do you want to open this example with your edits?

## MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

# Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)