Loan Prediction
Predict whether or not loans acquired by Fannie Mae will go into foreclosure. Fannie Mae acquires loans from other lenders as a way of inducing them to lend more. Fannie Mae releases data on the loans it has acquired and their performance afterwards here.
Installation
Download the data
- Clone this repo to your computer.
- Get into the folder using
cd loan-prediction
. - Run
mkdir data
. - Switch into the
data
directory usingcd data
. - Download the data files from Fannie Mae into the
data
directory.- You can find the data here.
- You'll need to register with Fannie Mae to download the data.
- It's recommended to download all the data from 2012 Q1 to present.
- Extract all of the
.zip
files you downloaded.- On OSX, you can run
find ./ -name \*.zip -exec unzip {} \;
. - At the end, you should have a bunch of text files called
Acquisition_YQX.txt
, andPerformance_YQX.txt
, whereY
is a year, andX
is a number from1
to4
.
- On OSX, you can run
- Remove all the zip files by running
rm *.zip
. - Switch back into the
loan-prediction
directory usingcd ..
.
Install the requirements
- Install the requirements using
pip install -r requirements.txt
.- Make sure you use Python 3.
- You may want to use a virtual environment for this.
Usage
- Run
mkdir processed
to create a directory for our processed datasets. - Run
python assemble.py
to combine theAcquisition
andPerformance
datasets.- This will create
Acquisition.txt
andPerformance.txt
in theprocessed
folder.
- This will create
- Run
python annotate.py
.- This will create training data from
Acquisition.txt
andPerformance.txt
. - It will add a file called
train.csv
to theprocessed
folder.
- This will create training data from
- Run
python predict.py
.- This will run cross validation across the training set, and print the accuracy score.
Extending this
If you want to extend this work, here are a few places to start:
- Generate more features in
annotate.py
. - Switch algorithms in
predict.py
. - Add in a way to make predictions on future data.
- Try seeing if you can predict if a bank should have issued the loan.
- Remove any columns from
train
that the bank wouldn't have known at the time of issuing the loan.- Some columns are known when Fannie Mae bought the loan, but not before
- Make predictions.
- Remove any columns from
- Explore seeing if you can predict columns other than
foreclosure_status
.- Can you predict how much the property will be worth at sale time?
- Explore the nuances between performance updates.
- Can you predict how many times the borrower will be late on payments?
- Can you map out the typical loan lifecycle?