Panel data regression comparison

Question

Nick on 25 Mar 2022

0
Link

Direct link to this question

https://ch.mathworks.com/matlabcentral/answers/1680749-panel-data-regression-comparison

Answered: Gabo on 27 Aug 2024

I have a very large panel data and would like to apply a number of simple machine learning techniques (Logistic Regression, Decision Trees, Bagged Trees).

During my preparation I came across fitglm and fitLifetimePDModel, the latter of which is meant to capture panel data. I was trying to understand how/if that differs from fitglm because when I try the below, the results are exactly the same. Is that right?

Why is that? For example, under fitglm I'm not telling the program that each customer can have more than one data points.

Thank you

load RetailCreditPanelData.mat
pdModel_1 = fitLifetimePDModel(data,"Logistic", 'AgeVar','YOB', 'IDVar','ID', 'LoanVars','ScoreGroup','ResponseVar','Default');
disp(pdModel_1.Model)
pdModel_2 = fitglm(data,'Default ~ 1 + ScoreGroup + YOB', 'Distribution','binomial', 'link', 'logit');
disp(pdModel_2)

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Sai Pavan on 20 Oct 2023

0
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/1680749-panel-data-regression-comparison#answer_1337191

Hi Nick,

I understand that you are trying to learn the difference between “fitglm” and “fitLifetimePDModel” functions and want to know why the functions are producing same results.

The “fitLifetimePDModel” function is specifically designed to handle panel data for lifetime models, where each observation represents a customer with multiple data points over time considering the dependence and correlation among the observations within each individual when fitting the model.
On the other hand, “fitglm” is a more general function for fitting generalized linear models, including logistic regression and Poisson regression, treating each observation as independent, without considering any panel structure.
The reason for the results to be exactly same is that both “fitLifetimePDModel” and “fitglm” use logistic regression with the same link function (logit) and distribution (binomial) when fitting the model. In your “fitglm” function call, you explicitly specified the logistic regression formula, which matches the formula used by “fitLifetimePDModel”. Therefore, the resulting models are identical.

Please refer to the below documentation to learn more about “fitglm” and “fitLifetimePDModel” functions:

Hope it helps.

Regards,

Sai Pavan

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Answer 2

Gabo on 27 Aug 2024

0
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/1680749-panel-data-regression-comparison#answer_1505934

Hi Nick,

You (and Sai) are correct that the model coefficients you get using fitglm and fitLifetimePDModel with the 'logistic' option are the same. For 'logistic' and 'probit', fitLifetimePDModel calls fitglm under the hood, however the model you get is a "wrapper" if you will that offers additional functionality (see next paragraph). The fitLifetimePDModel function does use the panel data structure to estimate the time interval between consecutive rows, which is very important for the 'Cox' lifetime PD model, but not as important for 'logistic' and 'probit'.

The lifetime PD models have the predict method (to predict PDs), but also the predictLifetime (to predict cumulative, survival, marginal probabilities) and the validation functions: modelDiscrimination, modelDiscriminationPlot, modelCalibration, modelCalibrationPlot. There is a lot of information in the Documentation, but this page may be a good starting point: https://www.mathworks.com/help/risk/overview-of-lifetime-probability-of-default.html.

Now, there is also the fitglme function, for generalized mixed-effects models. Take a look a this example in the Documentation and search for 'fitglme': https://www.mathworks.com/help/risk/stress-testing-retail-credit-default-probabilities-using-panel-data-1.html. There is a discussion on training a mixed effects model with the same data, and it has the syntax to do it. That model would take into account panel data information as well.

For any model you train without fitLifetimePDModel, for example, decision trees, bagged trees, mixed effects, if you're interested in lifetime prediction or discrimination/calibration capabilities of lifetime PD models, you can consider training your model and then wrap it as a "custom" lifetime PD model using customLifetimePDModel, see for example: https://www.mathworks.com/help/risk/create-custom-pd-model-for-decision-tree-using-function-handle.html.

Hope this helps,

Gabo

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Panel data regression comparison

0 Comments
Show -2 older commentsHide -2 older comments

Answers (2)

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Panel data regression comparison

0 Comments Show -2 older commentsHide -2 older comments

Answers (2)

0 Comments Show -2 older commentsHide -2 older comments

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments