question archive In this example we will consider data form a consumer-to-consumer (C2C) lending market, in which borrowers can post loan listings and lenders can invest in these loans
Subject:EconomicsPrice: Bought3
In this example we will consider data form a consumer-to-consumer (C2C) lending market, in which borrowers can post loan listings and lenders can invest in these loans. From the lenders perspective, it would be useful to get a sense of how likely the loan is to default, given some information on it. Data is available (you can see it in the file "loandatamodified.csv") on approximately 5,500 loans and whether these loans ultimately failed (defaulted) or were current. In addition to this information, we also have data on the following 4 features:
a) The amount of the loan in $
b) The age of the loan in months
c) The borrower rate (the interest rate that the borrower pays the lender)
d) The borrower's credit rating (which is either "good" or "low"). Since this is a categorical feature, we convert it into a dummy (numerical) feature which takes value 1 for low and 0 for good. (Note: the original data has many credit rating categories that range from AA, B, C all the way down to High Risk. For simplicity, I clubbed these ratings into only two categories, either "good" or "low").
A logistic model is trained on this data, with "fail/default" being the positive case and "loan current" being the negative case. The following model is obtained:
Coefficients
Estimate
Intercepts -7.707e+00
Amount 3.171e-05
Age 3.624e-01
Borrower.rate 1.302e+01
RatingLow 7.887e-01