Loan Default Modeling: Reframed and Remodeled
- Alice Wong
- Mar 29, 2022
- 2 min read
Loan default is often defined by regulators as non-payment within a window of 90 days, so it's not such a bad thing after all if you just modeled a binary outcome of repayment within the window or not. However, the reality is that someone could repay after the 90-day window, and many lenders (secretly) hope that borrowers will be late in repayment so they can charge more interest.
In the light of the possibility of repayment at any time from the near to very distant future and the benefit of receiving repayments further along in the time horizon, it's important to frame loan default modeling more as loan repayment modeling. Even more specifically, loan repayments (note the plural) modeling, as you'll see in 3) below. Here are a few types of models from the field of Biostatistics (coincidentally) that you can consider for this.
1) Regular Cox-PH survival analysis
The most intuitive one here where you're answering the question:
at a given time t, what is the probability that the borrower will repay at time t1, where t1 could be any point in time after time t. This is obviously more flexible than logistic regression as you don't have to predefine a window of time in which repayment occurs.
(See Link 1 in comments for the basics of Cox-PH survival analysis.)
1a) Because most borrowers are likely to be repaying before the deadline, you'll probably have a lognormal survival curve, which is unusual but still modelable in special software commands.
1b) A cure model can be used if it's expected that your survival curve will have a long flat tail, i.e. if you have a lot of bad actors/frauds who'll never repay. See Link 2.
2) Accelerated failure time model
An accelerated failure time (AFT) model is a type of survival analysis model whose outcome variable is simply the time to event. An accelerated failure-time is more flexible than a Cox-PH model because it's semi-parametric. You'll probably see a lognormal distribution of time to repayment, since there'll be an early peak before the deadline.
3) Mixed effects models
The question of repayment in full may not even be the most pertinent question. Borrowers often repay fractions of their loan at different times. Hence, what you might really want to model here is the percentage of their total loan they will pay for, say, each month after they take out a loan. This is very similar to predicting sales over time, and you can do that using a longitudinal mixed effects model. A mixed effects model is particularly useful for this type of problem rather than another time series model like ARIMA just because the kind of time frame we're looking at in this industry, which would typically be at the monthly or weekly level at the most granular, would fit mixed effects models, while ARIMA typically has way more granular time points and much fewer subjects' trajectories.
留言