guoday/Tencent2018_Lookalike_Rank7th

Stars
307
Rank 136,109 (Top 3 %)
Language
Python
Created over 6 years ago
Updated over 6 years ago

guoday/Tencent2018_Lookalike_Rank7th

guoday

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

The code for 2018 Tencent College Algorithm Contest, and the online result ranks 7th.

环境配置和依赖库：
- python3
- TensorFlow >=1.3
- scikit-learn
- Ubuntu 16.04, 128G内存, 8路显卡（能够并行完整跑完run.sh的环境，需要5张显卡，否则请修改run.sh，自己合理安排显卡的利用）
特征工程说明：
- 基本特征：31个ID特征
- 长度特征：多值特征的长度，共11个特征（如interest的个数）
- 计数特征：uid的总数和正负样本数，uid和所有广告特性的正负样本数，共25个特征（如uid与advertiserId交叉ID出现的正负样本数）
- 总特征：31+11+25=67个
模型结构与参数说明：

此次比赛使用了两种模型，nffm与xdeepfm，相关模型成绩如下：

复赛数据复赛+初赛数据 ensemble

nffm(单模型) 0.764 0.770 0.7734

xdeepfm(单模型) 0.763 0.766 0.770

nffm+xdeepfm 0.768 - 0.7740
nffm：nfm的变种，在Bi-Interaction中替换FFM，并且只使用用户与广告的交叉，https://arxiv.org/pdf/1708.05027v1.pdf. 参数如下：
- embedding的维度：16
- 优化器与学习率，batch大小：adam，0.00002，4096
- MLP的隐藏节点与激活函数：[128,128,128,1]，['relu','relu','relu','linear']
- 多值特征使用mean pooling
- epoch=1
xdeepfm: https://arxiv.org/pdf/1803.05170.pdf. 参数如下
- embedding的维度：16
- 优化器与学习率，batch大小：adam，0.00002，4096
- MLP的隐藏节点与激活函数：[400,400,400,1]，['relu','relu','relu','linear']
- 卷积核维度与激活函数：[128,128,128], identity
- epoch=1
如需达到线上0.7748的成绩，修改nffm的MLP参数，300与512，各训练五个模型，进行ensemble即可

数据预处理：
- 把初赛数据放在data/preliminary_data，把复赛数据放在data/final_data
- python3 -u src/extract_features.py (如果不使用初赛数据，请在调用pre_data函数时，设置preliminary_path=None)
训练模型:

 python3 -u src/train.py gpu_id chunk_id chunk_num model_name sub_name

gpu_id：用于训练模型的gpu的编号（如第一张卡，设置为0）
chunk_num：将数据分成chunk_num块，如chunk_num=5，则把数据分成ABCDE五部分
chunk_id：数据块的起始编码，如chunk_id=2，则训练顺序为CDEAB。（便于ensemble）
model_name：nffm/xdeepfm
sub_name：预测结果的文件名，保存在result_sub目录中

ensemble

  python3 src/combine_nffm.py
  python3 src/combine_xdeepfm.py
  python3 src/combine.py

pipeline

  bash run.sh

Tencent2020_Rank1st

The code for 2020 Tencent College Algorithm Contest, and the online result ranks 1st.

Tencent2019_Preliminary_Rank1st

The code for 2019 Tencent College Algorithm Contest, and the online result ranks 1st in the preliminary.

CCF-BDCI-Sentiment-Analysis-Baseline

The code for CCF-BDCI-Sentiment-Analysis-Baseline

ctrNet-tool

This's the tool for CTR, including FM, FFM, NFFM and so on.

ICME2019-CTR

The Code for ICME2019 Grand Challenge: Short Video Understanding (Single Model Ranks 6th)

Dialog-to-Action

The code for the 2018 NeurIPS paper "Dialog-to-Action: Conversational Question Answering Over a Large-Scale Knowledge Base"

PaiPaiDai2018_rank16

ESIM model with lanuage model

BDC_2018

Preliminary rank_13

Question-Generation-VAE

The code for the EMNLP2018 paper "Question Generation from SQL Queries Improves Neural Semantic Parsing"

ESIM

ESIM model implemented by tensorflow