Credit-Card-Application-Fraud-Detection-using-Supervised-machine-learning-models
The provided dataset contained application (identity) fraud cases. It was a supervised problem as the data included a column showing the application’s fraud label (whether an application was fraudulent or not). It also contained several identifying data fields about the applicant such as SSN, address, phone number, etc. The dataset had 1,000,000 records and 10 data fields. We first described and visualized each of the 10 data fields and treated all frivolous values. Then we created 634 candidate variables and performed feature selection to reduce them to 30. Finally, we used a few different machine learning algorithms (both linear and nonlinear) to predict fraudulent applications records.