This example shows how to fit different types of models to loss given default (LGD) data. This example demonstrates the following approaches:
For all of these approaches, this example shows:
How to fit a model using training data where the LGD is a function of other variables or predictors.
How to predict on testing data.
The Model Comparison section contains a detailed comparison that includes visualizations and several prediction error metrics for of all models in this example.
LGD is one of the main parameters for credit risk analysis. Although there are different approaches to estimate credit loss reserves and credit capital, common methodologies require the estimation of probabilities of default (PD), loss given default (LGD), and exposure at default (EAD). The reserves and capital requirements are computed using formulas or simulations that use these parameters. For example, the loss reserves are usually estimated as the expected loss (EL), given by the following formula:
.
Practitioners have decades of experience modeling and forecasting PDs. However, the modeling of LGD (and also EAD) started much later. One reason is the relative scarcity of LGD data compared to PD data. Credit default data (for example, missed payments) is easier to collect and more readily available than are the losses ultimately incurred in the event of a default. When an account is moved to the recovery stage, the information can be transferred to a different system, loans can get consolidated, the recovery process may take a long time, and multiple costs are incurred during the process, some which are hard to track in detail. However, banks have stepped up their efforts to collect data that can be used for LGD modeling, in part due to regulations that require the estimation of these risk parameters, and the modeling of LGD (and EAD) is now widespread in industry.
This example uses simulated LGD data, but the workflow has been applied to real data sets to fit LGD models, predict LGD values, and compare models. The focus of this example is not to suggest a particular approach, but to show how these different models can be fit, how the models are used to predict LGD values, and how to compare the models. This example is also a starting point for variations and extensions of these models; for example, you may want to use more advanced classification and regression tools as part of a two-stage model.
The three predictors in this example are loan specific. However, you can use the approaches described in this example with data sets that include multiple predictors and even macroeconomic variables. Also, you can use models that include macroeconomic predictors for stress testing or lifetime LGD modeling to support regulatory requirements such as CCAR, IFRS 9, and CECL.
The data set in this example is simulated data that captures common features of LGD data. For example, a common feature is the distribution of LGD values, which has high frequencies at 0 (full recovery), and also many observations at 1 (no recovery at all). Another characteristic of LGD data is a significant amount of "noise" or "unexplained" data. You can visualize this "noise" in scatter plots of the response against the predictors, where the dots do not seem to follow a clear trend, and yet some underlying relationships can be detected. Also, it is common to get significant prediction errors for LGD models. Empirical studies show that LGD models have high prediction errors in general. For example, in [4] the authors report R-squared values ranging from 4% to 43% for a range of models across different portfolios. In this example, all approaches get R-squared values just under 10%. Moreover, finding useful predictors in practice may require important insights into the lending environment of a specific portfolio, for example, knowledge of the legal framework and the collection process [2]. The simulated data set includes only three predictors and these are variables frequently found in LGD models, namely, the loan-to-value ratio, the age of the loan, and whether the borrower lives in the property or if the borrower bought it for investment purposes.
Data preparation for LGD modeling is beyond the scope of this example. This example assumes the data has been previously prepared, since the focus of the example is on how to fit LGD models and how to use them for prediction. Data preparation for LGD modeling requires a significant amount of work in practice. Data preparation requires consolidation of account information, pulling data from multiple data sources, accounting for different costs and discount rates, and screening predictors [1] [2].
Load the data set from the LGDData.mat
file. The data set is stored in the data
table. It contains the three predictors and the LGD variable, which is the response variable.
Here is a preview of the data and the descriptions of the data set and the variables.
load('LGDData.mat')
disp(head(data))
LTV Age Type LGD _______ _______ ___________ _________ 0.89101 0.39716 residential 0.032659 0.70176 2.0939 residential 0.43564 0.72078 2.7948 residential 0.0064766 0.37013 1.237 residential 0.007947 0.36492 2.5818 residential 0 0.796 1.5957 residential 0.14572 0.60203 1.1599 residential 0.025688 0.92005 0.50253 investment 0.063182
disp(data.Properties.Description)
Loss given default (LGD) data. This is a simulated data set.
disp([data.Properties.VariableNames' data.Properties.VariableDescriptions'])
{'LTV' } {'Loan-to-Value (LTV) ratio at...'} {'Age' } {'Age of the loan in years at ...'} {'Type'} {'Type of property, either res...'} {'LGD' } {'Loss given default' }
LGD data commonly has values of 0
(no losses, full recovery) or 1
(no recovery at all). The distribution of values in between 0
and 1
takes different shapes depending on the portfolio type and other characteristics.
histogram(data.LGD) title('LGD Distribution') ylabel('Frequency') xlabel('Observed LGD')
Explore the relationships between the predictors and the response. The Spearman correlation between the selected predictor and the LGD is displayed first. The Spearman correlation is one of the rank order statistics commonly used for LGD modeling [5].
SelectedPredictor ='LTV'; fprintf('Spearman correlation between %s and LGD: %g',SelectedPredictor,corr(double(data.(SelectedPredictor)),data.LGD,'Type','Spearman'))
Spearman correlation between LTV and LGD: 0.271204
if isnumeric(data.(SelectedPredictor)) scatter(data.(SelectedPredictor),data.LGD) X = [ones(height(data),1) data.(SelectedPredictor)]; b = X\data.LGD; y = X*b; hold on plot(data.(SelectedPredictor),y) ylim([0 1]) hold off xlabel(SelectedPredictor) ylabel('LGD') end
For numeric predictors, there is a scatter plot of the LGD against the selected predictor values, with a superimposed linear fit. There is a significant amount of noise in the data, with points scattered all over the plot. This is a common situation for LGD data modeling. The density of the dots is sometimes different in different areas of the plot, suggesting relationships. The slope of the linear fit and the Spearman correlation give more information about the relationship between the selected predictor and the response.
Visually assessing the density of the points in a scatter plot might not be a reliable approach to understand the distribution of the data. To better understand the distribution of LGD values for different levels of a selected predictor, create a box plot.
% Choose the number of discretization levels for numeric predictors NumLevels =3; if isnumeric(data.(SelectedPredictor)) PredictorEdges = linspace(min(data.(SelectedPredictor)),max(data.(SelectedPredictor)),NumLevels+1); PredictorDiscretized = discretize(data.(SelectedPredictor),PredictorEdges,'Categorical',string(PredictorEdges(2:end))); boxplot(data.LGD,PredictorDiscretized) xlabel([SelectedPredictor ' Discretized']) ylabel('LGD') else boxplot(data.LGD,data.(SelectedPredictor)) xlabel(SelectedPredictor) ylabel('LGD') end
For categorical data, the box plot is straightforward since a small number of levels are already given. For numeric data, you can discretize the data first and then generate the box plot. Different box sizes and heights show that the distribution of LGD values changes for different predictor levels. A monotonic trend in the median (red horizontal line in the center of the boxes) shows a potential linear relationship between the predictor and the LGD (though possibly a mild relationship, due to the wide distributions).
The basic approach to predict LGD is to simply use the mean of the LGD data. Although this is a straightforward approach, easy to understand and use, the downside is that the mean is a constant value and this approach sheds no light on the sensitivity of LGD to other risk factors. In particular, the predictors in the data set are ignored.
To introduce sensitivity to predictors, the mean LGD values can be estimated over different groups or segments of the data, where the groups are defined using ranges of the predictor values. This approach is still a relatively straightforward approach, yet it can noticeably reduce the prediction error compared to a single mean LGD value for all observations.
To start, separate the data set into training and testing data. The same training and testing data sets are used for all approaches in this example.
NumObs = height(data); % Reset the random stream state, for reproducibility % Comment this line out to generate different data partitions each time the example is run rng('default'); c = cvpartition(NumObs,'HoldOut',0.4); TrainingInd = training(c); TestInd = test(c);
In this example, the groups are defined using the three predictors. LTV
is discretized into low and high levels. Age
is discretized into young and old loans. Type
already has two levels, namely, residential and investment. The groups are all the combinations of these values (for example, low LTV, young loan, residential, and so on).
The number of levels and the specific cut off points are for illustration purposes only, you can base other discretizations on different criteria. Also, using all predictors for the discretization may not be ideal when the data set contains many predictors. In some cases, using a single predictor, or a couple of predictors, may be enough to find useful groups with distinct mean LGD values. When the data includes macro information, the grouping may include a macro variable; for example, the mean LGD value should be different over recessions vs. economic expansions.
Compute the mean LGD over the eight data groups using the training data.
% Discretize LTV LTVEdges = [0 0.5 max(data.LTV)]; data.LTVDiscretized = discretize(data.LTV,LTVEdges,'Categorical',{'low','high'}); % Discretize Age AgeEdges = [0 2 max(data.Age)]; data.AgeDiscretized = discretize(data.Age,AgeEdges,'Categorical',{'young','old'}); % Find group means on training data gs = groupsummary(data(TrainingInd,:),{'LTVDiscretized','AgeDiscretized','Type'},'mean','LGD'); disp(gs)
LTVDiscretized AgeDiscretized Type GroupCount mean_LGD ______________ ______________ ___________ __________ ________ low young residential 163 0.12166 low young investment 26 0.087331 low old residential 175 0.021776 low old investment 23 0.16379 high young residential 1134 0.16489 high young investment 257 0.25977 high old residential 265 0.066068 high old investment 50 0.11779
For prediction, the test data is mapped into the 8 groups, and then the corresponding group mean is set as the predicted LGD value.
LGDGroupTest = (data.LTVDiscretized(TestInd)=='high')*4 +... (data.AgeDiscretized(TestInd)=='old')*2 +... (data.Type(TestInd)=='investment') + 1; LGDPredictedByGroupMeans = gs.mean_LGD(LGDGroupTest);
Store the observed LGD and the predicted LGD in a new table dataLGDPredicted
. This table stores predicted LGD values for all other approaches in the example.
dataLGDPredicted = table; dataLGDPredicted.Observed = data.LGD(TestInd); dataLGDPredicted.GroupMeans = LGDPredictedByGroupMeans; disp(head(dataLGDPredicted))
Observed GroupMeans _________ __________ 0.0064766 0.066068 0.007947 0.12166 0.063182 0.25977 0 0.066068 0.10904 0.16489 0 0.16489 0.89463 0.16489 0 0.021776
The Model Comparison section has a more detailed comparison of all models that includes visualizations and prediction error metrics.
A natural approach is to use a regression model to explicitly model a relationship between the LGD and some predictors. LGD data, however, is bounded in the unit interval, whereas the response variable for linear regression models is, in theory, unbounded.
To apply simple linear regression approaches, the LGD data can be transformed. A common transformation is the logit function, which leads to the following regression model:
LGD values of 0
or 1
cause the logit function to take infinite values, so the LGD data is typically truncated before applying the transformation.
data.LGDTruncated = data.LGD; data.LGDTruncated(data.LGD==0) = 0.00001; data.LGDTruncated(data.LGD==1) = 0.99999; data.LGDLogit = log(data.LGDTruncated./(1-data.LGDTruncated));
Below is the histogram of the transformed LGD data that uses the logit function. The range of values spans positive and negative values, which is consistent with the linear regression requirements. The distribution still shows significant mass probability points at the ends of the distribution.
histogram(data.LGDLogit)
title('Logit Transformation of Truncated LGD Data')
Other transformations are suggested in the literature [1]. For example, instead of the logit function, the truncated LGD values can be mapped with the inverse standard normal distribution (similar to a probit model).
Fit the model using the training data.
mdlRegression = fitlm(data(TrainingInd,:),'LGDLogit ~ 1 + LTV + Age + Type');
disp(mdlRegression)
Linear regression model: LGDLogit ~ 1 + LTV + Age + Type Estimated Coefficients: Estimate SE tStat pValue ________ ________ _______ __________ (Intercept) -4.7549 0.36041 -13.193 3.0997e-38 LTV 2.8565 0.41777 6.8377 1.0531e-11 Age -1.5397 0.085716 -17.963 3.3172e-67 Type_investment 1.4358 0.2475 5.8012 7.587e-09 Number of observations: 2093, Error degrees of freedom: 2089 Root Mean Squared Error: 4.24 R-squared: 0.206, Adjusted R-Squared: 0.205 F-statistic vs. constant model: 181, p-value = 2.42e-104
The model coefficients match the findings in the exploratory data analysis, with a positive coefficient for LTV
, a negative coefficient for Age
, and a positive coefficient for investment properties in the Type
variable.
The model predictions are on the transformed space, therefore the inverse logit transformation (also known as the logistic, or sigmoid function) must be applied to the model prediction to get a final predicted value for LGD.
LogitLGDPredicted = predict(mdlRegression,data(TestInd,:)); dataLGDPredicted.Regression = 1./(1+exp(-LogitLGDPredicted)); disp(head(dataLGDPredicted))
Observed GroupMeans Regression _________ __________ __________ 0.0064766 0.066068 0.00091169 0.007947 0.12166 0.0036758 0.063182 0.25977 0.18774 0 0.066068 0.0010877 0.10904 0.16489 0.011213 0 0.16489 0.041992 0.89463 0.16489 0.052947 0 0.021776 3.7188e-06
The Model Comparison section at the end of this example has a more detailed comparison of all models that includes visualizations and prediction error metrics. In particular, the histogram of the predicted LGD values shows that the regression model predicts many LGD values near zero, even though the high probability near zero was not explicitly modeled.
In a beta regression model for LGD, the model does not directly predict a single LGD value, it predicts an entire distribution of LGDs (given the predictor values). From that distribution, a value must be determined to predict a single LGD value for a loan, typically the mean of that distribution.
Technically, given the predictor values and model coefficients and , you can:
Compute values for the parameters (mean) and (sometimes called the "sample size") of the beta distribution with the following formulas:
Compute values for and , the typical parameters of the beta distribution, with these formulas:
Evaluate the density function of the corresponding beta distribution for a given level of , where is the gamma function; see [1] for details:
For fitting the model, once the density function is evaluated, you can update the likelihood function and find the optimal coefficients with a maximum likelihood approach. See the Local Functions section where the maximum likelihood function hLogLikelihoodBeta
is defined.
For prediction, once the model coefficients are fit, a prediction can be made, typically using the mean of the distribution, that is, the parameter, as the predicted LGD value.
Fit the beta regression model using training data with maximum likelihood. The maximization of the hLogLikelihoodBeta
function is performed with the unconstrained solver fminunc
from Optimization Toolbox™.
% Convert Type to numeric binary TypeDummy = dummyvar(data.Type); data.Type01 = TypeDummy(:,2); ColumnNames = {'Intercept' 'LTV' 'Age' 'Type'}; NumCols = length(ColumnNames); % Initial guess x0 = 0.1*ones(2*NumCols,1); % Predictors Matrix and LGD, training % Use truncated LGD to avoid numerical issues at the boundaries XTrain = [ones(sum(TrainingInd),1) data.LTV(TrainingInd) data.Age(TrainingInd) data.Type01(TrainingInd)]; LGDTrain = data.LGDTruncated(TrainingInd); % Minimize negative likelihood objFunctionBeta = @(x)(-hLogLikelihoodBeta(x,XTrain,LGDTrain)); [Estimate,~,~,~,~,Hessian] = fminunc(objFunctionBeta,x0);
Computing finite-difference Hessian using objective function. Local minimum found. Optimization completed because the size of the gradient is less than the value of the optimality tolerance.
ParameterNamesBeta = [strcat('Mu_',ColumnNames) strcat('Nu_',ColumnNames)]; ModelParametersBeta = array2table(Estimate,'RowNames',ParameterNamesBeta); ModelParametersBeta.StdError = sqrt(diag(inv(Hessian))); ModelParametersBeta.DF = size(XTrain,1)*ones(length(Estimate),1); ModelParametersBeta.T = Estimate./ModelParametersBeta.StdError; ModelParametersBeta.PValue = 2*(1-tcdf(abs(ModelParametersBeta.T),ModelParametersBeta.DF)); ModelParametersBeta.ConfidenceInterval = Estimate+ModelParametersBeta.StdError*[-1.96 1.96]; disp(ModelParametersBeta)
Estimate StdError DF T PValue ConfidenceInterval ________ ________ ____ _______ __________ ____________________ Mu_Intercept -1.3772 0.13201 2093 -10.433 0 -1.636 -1.1185 Mu_LTV 0.6027 0.15087 2093 3.9947 6.7017e-05 0.30698 0.89841 Mu_Age -0.47464 0.040264 2093 -11.788 0 -0.55356 -0.39572 Mu_Type 0.4537 0.085143 2093 5.3287 1.0954e-07 0.28682 0.62058 Nu_Intercept -0.16338 0.12591 2093 -1.2976 0.19457 -0.41017 0.083406 Nu_LTV 0.055904 0.14719 2093 0.37981 0.70412 -0.23259 0.3444 Nu_Age 0.22888 0.040335 2093 5.6743 1.5858e-08 0.14982 0.30793 Nu_Type -0.14101 0.078154 2093 -1.8042 0.071342 -0.29419 0.012176
For prediction, recall that the beta regression links the predictors to an entire beta distribution. For example, suppose that a loan has an LTV
of 0.7
and an Age of 1.1
years, and it is an investment property. The beta regression model gives us a prediction for the and parameters for this loan, and the model predicts that for this loan the range of possible LGD values follows the corresponding beta distribution.
XExample = [1 0.7 1.1 1]; MuExample = 1/(1+exp(-XExample*Estimate(1:NumCols))); NuExample = exp(XExample*Estimate(NumCols+1:end)); AlphaExample = MuExample*NuExample; BetaExample = (1-MuExample)*NuExample; xDomain = 0.01:0.01:0.99; pBeta = betapdf(xDomain,AlphaExample,BetaExample); plot(xDomain,pBeta) title('Predicted Distribution, Single Loan') xlabel('Possible LGD') ylabel('Predicted Density')
The shape of the distribution has the U-shaped pattern of the data. However, this is a predicted distribution of LGD values for a single loan. This distribution can be very useful for simulation purposes. However, to predict an LGD value for this loan, a method is required that goes from an entire distribution to a single value.
One way to predict would be to randomly draw a value from the previous distribution. This would tend to give predicted values towards the ends of the unit interval, and the overall shape of the distribution for a data set would match the U-shaped patter of observed LGD values. However, even if the shape of the distribution looked correct, a random draw from the distribution does not work well for prediction purposes. Two points with the same predictor values would have two different predicted LGDs, which is counterintuitive. Moreover, the prediction error at the observation level could be large, since many loans with small observed LGDs could get random predictions of large LGDs, and vice versa.
To reduce the prediction error at the individual level, the expected value of the beta distribution is typically used to predict. The distribution of predicted values with this approach does not have the expected U-shaped pattern because the mean value tends to be away from the boundaries of the unit interval. However, by using the mean of the beta distribution, all observations with the same predictor values get the same predicted LGDs. Moreover, the mean may not be close to values that are on the ends of the distribution, but the average error might be smaller compared to the random draws from the previous approach.
Predict using the mean of the beta distribution. Remember that the expected value of the distribution is the parameter, so the mean value prediction is straightforward.
XTest = [ones(sum(TestInd),1) data.LTV(TestInd) data.Age(TestInd) data.Type01(TestInd)]; MuTest = 1./(1+exp(-XTest*Estimate(1:NumCols))); dataLGDPredicted.Beta = MuTest; disp(head(dataLGDPredicted))
Observed GroupMeans Regression Beta _________ __________ __________ ________ 0.0064766 0.066068 0.00091169 0.093695 0.007947 0.12166 0.0036758 0.14915 0.063182 0.25977 0.18774 0.35262 0 0.066068 0.0010877 0.096434 0.10904 0.16489 0.011213 0.18858 0 0.16489 0.041992 0.2595 0.89463 0.16489 0.052947 0.26767 0 0.021776 3.7188e-06 0.021315
The Model Comparison section at the end of this example has a more detailed comparison of all models that includes visualizations and prediction error metrics. In particular, the histogram of the predicted LGD values shows that the beta regression approach does not produce a U-shaped distribution. However, this approach does have good performance under the other metrics reported.
Tobit or censored regression is designed for models where the response is bounded. The idea is that there is an underlying (latent) linear model but that the observed response values, in this case the LGD values, are truncated. For this example, model the 0
lower bound (left censored) using the model formula
with:
The model parameters are all the s and the standard deviation of the error . The parameters are estimated using maximum likelihood. See the Local Functions section, where the maximum likelihood function hLogLikelihoodTobitLeftCensored
is defined. The formulas for the left and right censored version of the model are also included at the end of the Local Functions section.
Fit the Tobit regression model using training data and maximum likelihood. The minimization of the hLogLikelihoodTobitLeftCensored
function is performed with the constrained solver fmincon
from Optimization Toolbox™. The only constraint is on the standard deviation parameter such that .
x0 = 0.1*ones(NumCols+1,1);
lb = -Inf*ones(NumCols+1,1);
lb(end) = 1e-6; % Sigma must be positive
objFunctionTobit = @(x)(-hLogLikelihoodTobitLeftCensored(x,XTrain,data.LGD(TrainingInd)));
[Estimate,~,~,~,~,~,Hessian] = fmincon(objFunctionTobit,x0,[],[],[],[],lb);
Local minimum possible. Constraints satisfied. fmincon stopped because the size of the current step is less than the value of the step size tolerance and constraints are satisfied to within the value of the constraint tolerance.
Estimate = Estimate(:); ModelParametersTobit = array2table(Estimate,... "RowNames",[ColumnNames {'Sigma'}],"VariableNames",{'Estimate'}); ModelParametersTobit.StdError = sqrt(diag(inv(Hessian))); ModelParametersTobit.DF = size(XTrain,1)*ones(length(Estimate),1); ModelParametersTobit.T = Estimate./ModelParametersTobit.StdError; ModelParametersTobit.PValue = 2*(1-tcdf(abs(ModelParametersTobit.T),ModelParametersTobit.DF)); ModelParametersTobit.ConfidenceInterval = Estimate+ModelParametersTobit.StdError*[-1.96 1.96]; disp(ModelParametersTobit)
Estimate StdError DF T PValue ConfidenceInterval _________ _________ ____ _______ __________ _____________________ Intercept 0.057338 0.0085801 2093 6.6826 2.9994e-11 0.040521 0.074154 LTV 0.20031 0.041117 2093 4.8718 1.1889e-06 0.11972 0.28091 Age -0.094065 0.0058637 2093 -16.042 0 -0.10556 -0.082572 Type 0.10072 0.015867 2093 6.3479 2.6682e-10 0.069621 0.13182 Sigma 0.28835 0.0082123 2093 35.112 0 0.27225 0.30444
Perform prediction with test data. For prediction, one could think of applying the Tobit formula directly, setting the noise term . However, because the noise term is inside the max operator, this would not match the actual expected value of the LGD under this model: . When the expectation is taken, a different closed-form formula is obtained. In fact, there are two possibilities for the expected values commonly used to predict Tobit models, one is conditional, and one is unconditional. The hPredictTobitLeftCensored
function implements the application of the direct formula , as well as, the conditional and unconditional expectations (using the optional third input). By default, the function performs prediction with the unconditional expectation. See the Local Functions section for details.
dataLGDPredicted.Tobit = hPredictTobitLeftCensored(Estimate,XTest); disp(head(dataLGDPredicted))
Observed GroupMeans Regression Beta Tobit _________ __________ __________ ________ ________ 0.0064766 0.066068 0.00091169 0.093695 0.087026 0.007947 0.12166 0.0036758 0.14915 0.12275 0.063182 0.25977 0.18774 0.35262 0.31806 0 0.066068 0.0010877 0.096434 0.092504 0.10904 0.16489 0.011213 0.18858 0.16539 0 0.16489 0.041992 0.2595 0.22152 0.89463 0.16489 0.052947 0.26767 0.23471 0 0.021776 3.7188e-06 0.021315 0.010055
The Model Comparison section at the end of this example has a more detailed comparison of all models that includes visualizations and prediction error with different metrics. As with the beta regression, the histogram of the predicted LGD values for the Tobit model does not have a U-shaped distribution, but it ranks well compared to other models.
Two-stage LGD models separate the case with no losses (LGD equal to 0
) from the cases with actual losses (LGD greater than 0
) and build two models. The stage 1 model is a classification model to predict whether the loan will have positive LGD. The stage 2 model a regression-type model to predict the actual LGD when the LGD is expected to be positive. The prediction is the expected value of the two combined models, which is the product of the probability of having a loss (stage 1 prediction) times the expected LGD value (stage 2 prediction).
In this example, a logistic regression model is used for the stage 1. Stage two is a regression on a logit transformation of the positive LGD data. More sophisticated models can be used for stage 1 and stage 2 models, see for example [4] or [6]. Also, another extension is to explicitly handle the LGD = 1
boundary. The stage 1 model would produce probabilities of observing an LGD of 0
, an LGD of 1
, and an LGD value strictly between 0
and 1
. The stage 2 model would predict LGD values when the LGD is expected to be strictly between 0
and 1
.
Fit the stage 1 model using the training data. The response variable is an indicator with a value of 1
if the observed LGD is positive, and 0
if the observed LGD is zero.
IndLGDPositive = data.LGD>0; data.LGDPositive = IndLGDPositive; disp(head(data))
LTV Age Type LGD LTVDiscretized AgeDiscretized LGDTruncated LGDLogit Type01 LGDPositive _______ _______ ___________ _________ ______________ ______________ ____________ ________ ______ ___________ 0.89101 0.39716 residential 0.032659 high young 0.032659 -3.3884 0 true 0.70176 2.0939 residential 0.43564 high old 0.43564 -0.25887 0 true 0.72078 2.7948 residential 0.0064766 high old 0.0064766 -5.0331 0 true 0.37013 1.237 residential 0.007947 low young 0.007947 -4.827 0 true 0.36492 2.5818 residential 0 low old 1e-05 -11.513 0 false 0.796 1.5957 residential 0.14572 high young 0.14572 -1.7686 0 true 0.60203 1.1599 residential 0.025688 high young 0.025688 -3.6357 0 true 0.92005 0.50253 investment 0.063182 high young 0.063182 -2.6965 1 true
mdl1 = fitglm(data(TrainingInd,:),"LGDPositive ~ 1 + LTV + Age + Type",'Distribution',"binomial"); disp(mdl1)
Generalized linear regression model: logit(LGDPositive) ~ 1 + LTV + Age + Type Distribution = Binomial Estimated Coefficients: Estimate SE tStat pValue ________ ________ _______ __________ (Intercept) 1.3157 0.21193 6.2083 5.3551e-10 LTV 1.3159 0.25328 5.1954 2.0433e-07 Age -0.79597 0.053607 -14.848 7.1323e-50 Type_investment 0.66784 0.17019 3.9241 8.7051e-05 2093 observations, 2089 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 404, p-value = 2.68e-87
An ROC plot for the stage 1 model is commonly reported using test data.
PredictedProbLGDPositive = predict(mdl1,data(TestInd,:));
[x,y,~,AUC] = perfcurve(data.LGDPositive(TestInd),PredictedProbLGDPositive,1);
plot(x,y)
title(sprintf('ROC Stage 1 Model (AUROC: %g)',AUC))
Fit the stage 2 model using only the training data with a positive LGD. This is the same type of model used earlier in the Regression section, however, this time it is fitted using only observations from the training data with positive LGDs.
dataLGDPositive = data(TrainingInd&IndLGDPositive,{'LTV','Age','Type','LGDTruncated'}); dataLGDPositive.LGDTransform = log(dataLGDPositive.LGDTruncated./(1-dataLGDPositive.LGDTruncated)); mdl2 = fitlm(dataLGDPositive,"LGDTransform ~ 1 + LTV + Age + Type"); disp(mdl2)
Linear regression model: LGDTransform ~ 1 + LTV + Age + Type Estimated Coefficients: Estimate SE tStat pValue ________ ________ _______ __________ (Intercept) -2.9083 0.27538 -10.561 3.2039e-25 LTV 1.3883 0.31838 4.3604 1.384e-05 Age -0.46116 0.081939 -5.6281 2.1608e-08 Type_investment 0.78223 0.18096 4.3226 1.6407e-05 Number of observations: 1546, Error degrees of freedom: 1542 Root Mean Squared Error: 2.8 R-squared: 0.0521, Adjusted R-Squared: 0.0503 F-statistic vs. constant model: 28.3, p-value = 8.73e-18
Perform prediction on the test data using the combined models for stage 1 and stage 2. The predicted LGD is the product of the probability of observing a positive LGD from the stage 1 model times the expected LGD value predicted by the stage 2 model.
% Predict in the transformed space and apply inverse logit to recover the % LGD prediction for stage 2 PredictedLGDPositive = predict(mdl2,data(TestInd,:)); PredictedLGDPositive = 1./(1+exp(-PredictedLGDPositive)); % PredictedProbLGDPositive is computed before the ROC curve above % Final LGD prediction is the product of stage 1 and stage 2 predictions dataLGDPredicted.TwoStage = PredictedProbLGDPositive.*PredictedLGDPositive; disp(head(dataLGDPredicted))
Observed GroupMeans Regression Beta Tobit TwoStage _________ __________ __________ ________ ________ _________ 0.0064766 0.066068 0.00091169 0.093695 0.087026 0.020038 0.007947 0.12166 0.0036758 0.14915 0.12275 0.034025 0.063182 0.25977 0.18774 0.35262 0.31806 0.2388 0 0.066068 0.0010877 0.096434 0.092504 0.022818 0.10904 0.16489 0.011213 0.18858 0.16539 0.060072 0 0.16489 0.041992 0.2595 0.22152 0.097685 0.89463 0.16489 0.052947 0.26767 0.23471 0.11142 0 0.021776 3.7188e-06 0.021315 0.010055 0.0003689
The Model Comparison section at the end of this example has a more detailed comparison of all models that includes visualizations and prediction error metrics. This approach also ranks well compared to other models and the histogram of the predicted LGD values shows high frequencies near 0
.
To evaluate the performance of LGD models, different metrics are commonly used. One metric is the R-squared of the linear fit regressing the observed LGD values on the predicted values. A second metric is some correlation or rank order statistic; this example uses the Spearman correlation. For prediction error, root mean squared error (RMSE) is a common metric. Also, a simple metric sometimes reported is the difference between the mean LGD value in the training data and the mean LGD value of the predictions. These four metrics are reported below, sorted by decreasing R-squared values.
ModelNames = dataLGDPredicted.Properties.VariableNames(2:end); % Remove 'Observed' NumModels = length(ModelNames); SampleMeanError = zeros(NumModels,1); RSquared = zeros(NumModels,1); Spearman = zeros(NumModels,1); RMSE = zeros(NumModels,1); lmAll = struct; meanLGDTest = mean(dataLGDPredicted.Observed); for ii=1:NumModels % R-squared, and store linear model fit for visualization section Formula = ['Observed ~ 1 + ' ModelNames{ii}]; lmAll.(ModelNames{ii}) = fitlm(dataLGDPredicted,Formula); RSquared(ii) = lmAll.(ModelNames{ii}).Rsquared.Ordinary; % Spearman correlation Spearman(ii) = corr(dataLGDPredicted.Observed,dataLGDPredicted.(ModelNames{ii}),'type','Spearman'); % Root mean square error RMSE(ii) = sqrt(mean((dataLGDPredicted.Observed-dataLGDPredicted.(ModelNames{ii})).^2)); % Sample mean error SampleMeanError(ii) = mean(dataLGDPredicted.(ModelNames{ii}))-meanLGDTest; end PerformanceMetrics = table(RSquared,Spearman,RMSE,SampleMeanError,'RowNames',ModelNames); PerformanceMetrics = sortrows(PerformanceMetrics,'RSquared','descend'); disp(PerformanceMetrics)
RSquared Spearman RMSE SampleMeanError ________ ________ _______ _______________ TwoStage 0.090814 0.41987 0.24197 -0.060619 Tobit 0.085498 0.42224 0.23685 0.032766 Beta 0.080804 0.41557 0.24112 0.052396 Regression 0.070867 0.42152 0.25988 -0.10759 GroupMeans 0.041622 0.33807 0.2406 0.0078124
For the particular training vs. test partition used in this example, the two-stage model has the highest R-squared, although for other partitions, Tobit has the highest R-squared value. Even though the group means approach does not have a high R-squared value, it typically has the smallest sample mean error (mean of predicted LGD values minus mean LGD in the test data). The group means are also competitive for the RMSE metric.
Report the model performance one approach at a time, including visualizations. Display the metrics for the selected model.
ModelSelected =
"TwoStage";
disp(PerformanceMetrics(ModelSelected,:))
RSquared Spearman RMSE SampleMeanError ________ ________ _______ _______________ TwoStage 0.090814 0.41987 0.24197 -0.060619
Plot the regression fit (observed LGD vs. predicted LGD), which is a common visual tool to assess the model performance. The R-squared reported above is the R-squared of this regression. The plot shows a significant amount of error for all models. A good predictive model would have the points located mostly along the diagonal, and not be scattered all over the unit square. However, the metrics above do show some differences in predictive performance for different models that can be important in practice.
plot(lmAll.(ModelSelected)) xlim([0 1]) ylim([0 1])
Compare the histograms of the predicted and observed LGD values. For some models, the distribution of predicted values shows high frequencies near zero, similar to the U shape of the observed LGD distribution. However, matching the shape of the distribution does not mean high accuracy at the level of individual predictions; some models show better prediction error even though their histogram does not have a U shape.
LGDEdges = 0:0.1:1; % Ten bins to better show the distribution shape y1 = histcounts(dataLGDPredicted.(ModelSelected),LGDEdges); y2 = histcounts(dataLGDPredicted.Observed,LGDEdges); bar((LGDEdges(1:end-1)+LGDEdges(2:end))/2,[y1; y2]) title(strcat(ModelSelected,' Model')) ylabel('Frequency') xlabel('LGD') legend('Predicted','Observed') grid on
Show the box plot of the observed LGD values for different ranges of predicted LGD values. A coarser discretization (five bins only) smooths some noise out and better summarizes the underlying relationship. Ideally, the median (red horizontal line in the middle) should have a monotonic trend and be clearly different from one level to the next. Tall boxes also mean that there is a significant amount of error around the predicted values, which in some cases may be due to very few observations in that level. For a good predictive model, the boxes should be short and be located near the diagonal as you move from one level to the next.
LGDEdges = linspace(min(dataLGDPredicted.(ModelSelected)),max(dataLGDPredicted.(ModelSelected)),6); % Five bins LGDDiscretized = discretize(dataLGDPredicted.(ModelSelected),LGDEdges,'Categorical',string(LGDEdges(2:end))); boxplot(dataLGDPredicted.Observed,LGDDiscretized) ylim([0 1]) title(strcat(ModelSelected,' Model')) xlabel('Predicted LGD, Discretized') ylabel('Observed LGD')
This example shows multiple approaches for LGD modeling and prediction. The workflow in this example can be adapted to further analyze the models discussed here or to implement and validate other modeling approaches. This example can be extended to perform a more thorough comparison of LGD models (see for example [3] and [4]).
The example can also be extended to perform a cross-validation analysis to either benchmark alternative models or to fine-tune hyperparameters. For example, better cut off points for the group means could be selected using cross-validation, or alternative transformations of the LGD response values (logit, probit) could be benchmarked to select the one with the best performance. This example can also be a starting point to perform a backtesting analysis using out-of-time data; see for example [5].
[1] Baesens, B., D. Rosch, and H. Scheule. Credit Risk Analytics. Wiley, 2016.
[2] Johnston Ross, E., and L. Shibut. "What Drives Loss Given Default? Evidence from Commercial Real Estate Loans at Failed Banks." Federal Deposit Insurance Corporation, Center for Financial Research, Working Paper 2015-03, March 2015.
[3] Li, P., X. Zhang, and X. Zhao. "Modeling Loss Given Default. "Federal Deposit Insurance Corporation, Center for Financial Research, Working Paper 2018-03, July 2018.
[4] Loterman, G., I. Brown, D. Martens, C. Mues, and B. Baesens. "Benchmarking Regression Algorithms for Loss Given Default Modeling." International Journal of Forecasting. Vol. 28, No.1, pp. 161–170, 2012.
[5] Loterman, G., M. Debruyne, K. Vanden Branden, T. Van Gestel, and C. Mues. "A Proposed Framework for Backtesting Loss Given Default Models." Journal of Risk Model Validation. Vol. 8, No. 1, pp. 69-90, March 2014.
[6] Tanoue, Y., and S. Yamashita. "Loss Given Default Estimation: A Two-Stage Model with Classification Tree-Based Boosting and Support Vector Logistic Regression." Journal of Risk. Vol. 21 No. 4, pp. 19-37, 2019.
Beta Log Likelihood
The log-likelihood function for a beta regression model is
,
where:
and
The predictor matrix and the vector of observed values come from training data with observations. The density function for the beta distribution is given by ( is the Gamma function)
Given the data, the log-likelihood function is a function of the coefficient parameters and , even though the formulas above do not make this dependency explicitly to simplify the notation. The log-likelihood function is maximized by varying the and parameters. The distribution parameters and > 0, and and are intermediate transformations required to evaluate the log-likelihood function. For more information, see for example [1].
function f = hLogLikelihoodBeta(Params, X, y) nCols = size(X,2); b = Params(1:nCols); c = Params(nCols+1:end); % Linear predictors yMu = X*b; yNu = X*c; mu = 1 ./ (1 + exp(-yMu)); nu = exp(yNu); % Transform to standard parameterization alpha = mu .* nu; beta = (1-mu) .* nu; % Negative log-likelihood likelihood = betapdf(y,alpha,beta); f = sum(log(likelihood)); end
Tobit log likelihood
The log-likelihood function is defined as
where:
In other words, for the non-censored cases (LGD > 0), the normal density function (normpdf
) gives the likelihood, whereas for the censored cases (LGD = 0, or more generally LGD <= 0), it is the cumulative distribution function (normcdf
) that defines the likelihood. For more information, see for example [1].
function llf = hLogLikelihoodTobitLeftCensored(Params,X,LGD) Params = Params(:); beta = Params(1:end-1); sigma = Params(end); Y = X*beta; LeftCensored = LGD==0; lf = zeros(size(Y)); lf(~LeftCensored) = normpdf(LGD(~LeftCensored),Y(~LeftCensored),sigma); lf(LeftCensored) = normcdf(0,Y(LeftCensored),sigma); llf = sum(log(lf)); end
Tobit prediction
The definition of the Tobit model has two cases: when the LGD is zero, and when it is positive. The unconditional expected value is then
The expectation in the first term is zero. Using the normality assumption on the error term, the second term can be manipulated to show that the unconditional expected value of the LGD is given by
Sometimes, the conditional expected value (conditional on LGD > 0) is used. The formula is almost the same, except for the probability factor.
The helper function hPredictTobitLeftCensored
allows you to choose between unconditional or conditional expected values. It also supports the computation of the LGD value using the basic Tobit formula .
function lgdPredicted = hPredictTobitLeftCensored(Params,X,Type) if nargin < 3 Type = 'unconditional'; end Params = Params(:); beta = Params(1:end-1); if strcmpi(Type,'formula') lgdPredicted = max(0,X*beta); else sigma = Params(end); YStd = X*beta/sigma; phiYStd = normpdf(YStd); PhiYStd = normcdf(YStd); if strcmpi(Type,'unconditional') lgdPredicted = PhiYStd.*(X*beta+sigma*(phiYStd./PhiYStd)); elseif strcmpi(Type,'conditional') lgdPredicted = X*beta+sigma*(phiYStd./PhiYStd); end end end
Tobit model with both the left and right boundaries
The likelihood function has a third case. In general, if the left boundary value is L (0
in this example) and the right boundary value is R (1
in this example), the likelihood function is given by:
For prediction, the derivations of the expected LGD value must consider the possibility that the LGD is on the right boundary. It can be shown that the unconditional expected value can be computed as follows:
where:
The conditional expectation is .