Wednesday, December 07, 2016

The Trouble with DTI as an Underwriting Variable--and as an Overlay

Access to mortgage credit continues to be a problem.  Laurie Goodman at the Urban Institute shows that, under normal circumstances (say those of the pre-2002 period), we would expect to see 1 million more mortgage originations per year in the market than we are seeing. I suspect an important reason for this is the primacy of Debt-to-Income (DTI) as an underwriting variable.

There are two issues here.  First, while DTI is a predictor of mortgage default, it is a fairly weak predictor.  The reason is that it tends to be measured badly, for a variety of reasons.  For instance, suppose someone applying for a loan has salary income and non-salary income.  If the salary income is sufficient to obtain a mortgage, both the borrower and the lender have incentives not to report the more difficult to document non-salary income.  The borrower's income will thus be understated, the DTI will be overstated, and the variable's measurement contaminated.  There are a number of other examples that also apply.

Let's get more specific.  Below are results from a linear default probability regression model based on the performance of all fixed rate mortgages purchased by Freddie Mac in the first quarter of 2004. This is a good year to pick, because it is rich in high DTI loans, and because its loans went through a (ahem) difficult period.  The coefficients are predicting probability of not defaulting.

                              COEF          SE             T-STAT
FICO >= 620    .1324914   .0039244    33.76   
FICO >= 680    .1259424   .0021756    57.89   
FICO >= 740    .0600775   .0020249    29.67   
FICO >= 790   -.0030439   .0036585    -0.83   
CLTV >=  60    -.0336153   .0025297   -13.29  
CLTV >=  80    -.0375928   .0021508   -17.48   
CLTV >=  90     -.0155193   .0029713    -5.22   
CLTV >=  95     -.0261145   .0035061    -7.45   
DTI                    -.0013991    .000069   -20.26   
Broker              -.0439482   .0308106    -1.43   
Corresp.           -.0128272   .0277559    -0.46   
Other                -.0295511   .0277441    -1.07   
Cash-out           -.0520243   .0023775   -21.88   
Refi no cash      -.0364152   .0021331   -17.07  

The definition of default is ever-90 days late.  I tried adding a quadratic term for DTI, but it was not different from zero.  This is an estimation sample with 166,585 randomly chosen observations; I did not include 114,583 observations so I could do out-of-sample prediction (which will come later).  The default rate for the estimation sample is 14.34 percent; for the hold out sample is 14.31 percent, so Stata's random number generator did its job properly.  For those that care, the R^2 is .12.

Note that while DTI is significant, it is not particularly important as a predictor of default.  To place this in context, note that a cash-out refinance is 5.2 percentage points more likely to default than a purchase money loan, while a 10 percentage point change in DTI will produce a 1.3 percent increase the probability of default.  One can look at the other coefficients to see the point more broadly.

But while this is an issue, it is not a big issue.  It is certainly reasonable to include DTI within the confines of a scoring model based on its contribution to a regression.  The problem arises when we look at overlays.

The Consumer Financial Protection Board has deemed mortgages with DTIs above 43 percent to not be "qualified."  This means lenders making these loans do not have a safe-harbor for proving that the loans meet an ability to repay standard.  Fannie and Freddie are for now exempt from this rule, but they have generally not been willing to originate loans with DTIs in excess of 45 percent.  This basically means that no matter the loan-applicant's score arising from a regression model predicting default, if her DTI is above 45 percent, she will not get a loan.

This is not only analytically incoherent, it means that high quality borrowers are failing to get loans, and that the mix of loans being originated is worse in quality than it otherwise would be.  That's because a well-specified regression will do a better job sorting borrowers more likely to default than a heuristic such as a DTI limit.

To make the point, I run the following comparison using my holdout sample: the default rate observed if we use the DTI cut-off rule vs a rule that ranks borrowers based on default likelihood.  If we used the DTI rule, we would have made loans to 91185 borrowers within the holdout sample, and observed a default rate of 14.0 percent.  If we use the regression based rule, and make loans to slightly more borrowers (91194--I am having trouble nailing the 91185 number), we get an observed default rate of 10.0 percent.  One could obviously loosen up on the regression rule, give more borrowers access to credit, and still have better loan performance.  

Let's do one more exercise, and impose the DTI rule on top of the regression rule I used above.  The number of borrowers getting loans drops to 73133 (or about 20 percent), while the default rate drops by .7 percent relative to the model alone.  That means an awful lot of borrowers are rejected in exchange for a modest improvement in default.  If one used the model alone to reduce the number of approved loans by 20 percent, one would improve default performance by 1.4 percent relative to the 10 percent baseline.  In short, whether the goal is access to credit, or loan performance (or, ideally, both), regression based underwriting just works far better than DTI overlays.  

(I am happy to send code and results to anyone interested).

Update: if you want output files, please write directly to me at  To obtain the dataset, which is freely available, you need to register with Freddie Mac at link referenced above.