• Stars
    star
    187
  • Rank 206,464 (Top 5 %)
  • Language
    Jupyter Notebook
  • Created over 7 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

“魔镜杯”风控算法大赛 拍拍贷风控模型,接近冠军分数

“魔镜杯”风控算法大赛

拍拍贷“魔镜风控系统”从平均400个数据维度评估用户当前的信用状态,给每个借款人打出当前状态的信用分,在此基础上,再结合新发标的信息,打出对于每个标的6个月内逾期率的预测,为投资人提供了关键的决策依据,促进健康高效的互联网金融。拍拍贷首次开放丰富而真实的历史数据,邀你PK“魔镜风控系统”,通过机器学习技术,你能设计出更具预测准确率和计算性能的违约预测算法吗?

我的成绩:在第一阶段数据集(没有使用第二阶段数据集)得到auc(官方确定衡量标准):0.794587,接近比赛冠军分数,因为比赛已经结束无法提交,所以这个结果不具有严格可对比性,不过很大程度上也已经很接近了。

一、思路

1.1 数据清洗

  • 删除数据缺失比例很大的列,比如超过20%为nan
  • 删除数据缺失比例大的行,并保持删除的行数不超过总体的1%
  • 填补剩余缺失值,通过value_count观察是连续/离散变量,然后用最高频/平均数填补nan。这里通过观察,而不是判断类型是否object,更贴近实际情况

1.2 feature分类

  • 所有的分类中,如果其中最大频率的值出现超过一定阈值(50%),则把这列转化成为2值。比如[0,1,2,0,0,0,4,0,3]转化为[0,1,1,0,0,0,1,0,1]
  • 剩余的feature中,根据dtype,把所有features分为numerical和categorical 2类
  • numerical中,如果unique num不超过10个,也归属为categorical分类

1.3 outlier删除

  • 所有的numerical feature,画出在不同target下的分布图,stripplot(with jitter),类似于boxplot,不过更方便于大值outlier寻找。
melt = pd.melt(train_master, id_vars=['target'], value_vars = [f for f in numerical_features])
g = sns.FacetGrid(data=melt, col="variable", col_wrap=4, sharex=False, sharey=False)
g.map(sns.stripplot, 'target', 'value', jitter=True, palette="muted")
  • 绘制所有numerical features的密度图,并且可以观察出,它们都可以通过求对数转化为更接近正态分布
for f in numerical_features_log:
    train_master[f + '_log'] = np.log1p(train_master[f])
  • 转化为log分布后,可以再删除一些极小的outlier。

1.4 Feature Engineering

other 2 datasets

train_loginfo:对Idx做group,提取记录数,LogInfo1独立数,活跃日期数,日期跨度

train_userinfo:对于Idx做group,提取记录数,UserupdateInfo1独立数、UserupdateInfo1/UserupdateInfo2独立数,日期跨度。以及每种UserupdateInfo1/UserupdateInfo2的数量。

解析日期

arrow lib,把日期解析成年、月、日、周、星期几、月初/月中/月末。带入模型前进行one-hot encoding

新feature

  • at_home,猜测UserInfo_2和UserInfo_8可能表示用户的当前居住地和户籍地,从而判断用户是否在老家。

1.5 训练前准备

指定one-hot encoding features

这里不要自动推算get_dummies所使用的列,pandas会自动选择object类型,而有些非object feature,实际含义也是categorical的,也需要被one-hot encoding

train_master_ = pd.get_dummies(train_master_, columns=finally_dummy_columns)

normalized

X_train = StandardScaler().fit_transform(X_train)

1.6 训练评估

Cross Validation

使用StratifiedKFold保证预测target的分布合理,并且shuffle随机。

cv = StratifiedKFold(n_splits=3, shuffle=True)

AUC评估

auc = cross_val_score(estimator, X_train, y_train, scoring='roc_auc', cv=cv).mean()

模型算法

  • XGBClassifier
  • RidgeClassifier
  • LogisticRegression
  • AdaBoostClassifier
  • VotingClassifier组合上面4种,做Ensembling

More Repositories

1

Tianchi-Medical-LungTumorDetect

天池医疗AI大赛[第一季]:肺部结节智能诊断 UNet/VGG/Inception/ResNet/DenseNet
Jupyter Notebook
408
star
2

TimeSeriesPrediction

Time Series Prediction, Stateful LSTM; 时间序列预测,洗发水销量/股票走势预测,有状态循环神经网络
Jupyter Notebook
53
star
3

SSD_Keras

Single Shot MultiBox Detector(SSD)目标检测算法
Jupyter Notebook
44
star
4

AudioRecognition

Google Speech Command Dataset Classification Neural Network, CNN, RNN
Jupyter Notebook
24
star
5

Reinforcement_Learning

Series of Reinforcement Learning: Q-Learning, Sarsa, SarsaLambda, Deep Q Learning(DQN);一些列强化学习算法,玩OpenAI-gym游戏
Python
8
star
6

neural_network_from_scratch

Neural network/Back Propagation implemented from scratch for MNIST.从零开始实现神经网络和反向传播算法,识别MNIST
Python
8
star
7

Kaggle-Dogs-vs-Cats

Dogs-vs-Cats image classification
Jupyter Notebook
5
star
8

DSTL-Image-Segmentation

DSTL Satellite Imagery Feature Detection tried with U-Net Model
Jupyter Notebook
4
star
9

NeuralNetworkArchitecture

Popular deep learning models plot image, including VGG16/19 InceptionV3, ResNet50, DenseNet, etc.
3
star
10

Kaggle-House-Prices

Kaggle House Prices Problem
Jupyter Notebook
3
star
11

Collaborative-Filtering-Tensorflow

Model based Collaborative Filtering implemented with Tensorflow, 基于模型的协同过滤算法的纯TensorFlow实现
Jupyter Notebook
3
star
12

Seaborn

Machine Learning Visualization via Seaborn
Jupyter Notebook
2
star
13

GAN_Example

Quite **simple and clear** GAN Example to simulate gaussian curve.
Jupyter Notebook
2
star
14

Kaggle-Titanic

The Kaggle Titanic Problem
Jupyter Notebook
1
star
15

jupyter-themes-grade3

forked from dunovank/jupyter-themes, add personal taste
CSS
1
star