Large Batch Training for CTR Prediction (CowClip)
LargeBatchCTR aims to train CTR prediction models with large batch (~128k). The framework is based on DeepCTR. You can run the code on a V100 GPU to feel the fast training speed.
Adaptive Column-wise Clipping (CowClip) method from paper "CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10 minutes on 1 GPU" is implemented in this repo.
Get Started
First, download dataset to the data folder. Use data_utils.py
to preprocess the data for training.
python data_utils.py --dataset criteo_kaggle --split rand
Then, use train.py
to train the network.
# Criteo (baseline)
CUDA_VISIBLE_DEVICES=0 python train.py --dataset criteo_kaggle --model DeepFM
# Avazu (baseline)
CUDA_VISIBLE_DEVICES=0 python train.py --dataset avazu --model DeepFM
For large batch training with CowClip, do as follows:
# Criteo (8K)
CUDA_VISIBLE_DEVICES=0 python train.py --dataset criteo_kaggle --model DeepFM --lr_embed 1e-4 --warmup 1 --init_stddev 1e-2 --clip 1 --bound 1e-5 --bs 8192 --l2 8e-05 --lr 22.6274e-4
# Criteo (128K)
CUDA_VISIBLE_DEVICES=0 python train.py --dataset criteo_kaggle --model DeepFM --lr_embed 1e-4 --warmup 1 --init_stddev 1e-2 --clip 1 --bound 1e-5 --bs 131072 --l2 128e-05 --lr 90.5096e-4
# Avazu (64K)
CUDA_VISIBLE_DEVICES=0 python train.py --dataset avazu --model DeepFM --lr_embed 1e-4 --warmup 1 --init_stddev 1e-2 --clip 1 --bound 1e-4 --bs 65536 --l2 64e-05 --lr 8e-4
CowClip Quick Look
Dataset List
- Criteo Kaggle: download
train.txt
indata/criteo_kaggle/
- Avazu: download
train
indata/avazu/
Hyperparameters
The meaning of hyperparameters in the command line is as follows:
params | name |
---|---|
--bs | batch size |
--lr_embed | learning rate for the embedding layer |
--lr | learning rate for the dense weights |
--l2 | L2-regularization weight λ |
--clip | CowClip coefficient r |
--bound | CowClip bound ζ |
--warmup | number of epochs to warmup on dense weights |
--init_stddev | initialization weight standard deviation |
The hyperparameters neet to be scaled are listed as follows. For Criteo dataset:
bs | lr | l2 | ζ | DeepFM AUC(%) | Time(min) |
---|---|---|---|---|---|
1K | 8e-4 | 1e-5 | 1e-5 | 80.86 | 768 |
2K | 11.31e-4 | 2e-5 | 1e-5 | 80.93 | 390 |
4K | 16e-4 | 4e-5 | 1e-5 | 80.97 | 204 |
8K | 22.62e-4 | 8e-5 | 1e-5 | 80.97 | 102 |
16K | 32e-4 | 16e-5 | 1e-5 | 80.94 | 48 |
32K | 45.25e-4 | 32e-5 | 1e-5 | 80.95 | 27 |
64K | 64e-4 | 64e-5 | 1e-5 | 80.96 | 15 |
128K | 90.50e-4 | 128e-5 | 1e-5 | 80.90 | 9 |
For Avazu dataset:
bs | lr | l2 | ζ | DeepFM AUC(%) | Time(min) |
---|---|---|---|---|---|
1K | 1e-4 | 1e-5 | 1e-3 | 78.83 | 210 |
2K | 1.41e-4 | 2e-5 | 1e-3 | 78.82 | 108 |
4K | 2e-4 | 4e-5 | 1e-4 | 78.90 | 54 |
8K | 2.83e-4 | 8e-5 | 1e-4 | 79.06 | 30 |
16K | 4e-4 | 16e-5 | 1e-4 | 79.01 | 17 |
32K | 5.66e-4 | 32e-5 | 1e-4 | 78.82 | 10 |
64K | 8e-4 | 64e-5 | 1e-4 | 78.82 | 6.7 |
128K | 16e-4 | 96e-5 | 1e-4 | 78.80 | 4.8 |
Model List
Model | Paper |
---|---|
Wide & Deep | [DLRS 2016]Wide & Deep Learning for Recommender Systems |
DeepFM | [IJCAI 2017]DeepFM: A Factorization-Machine based Neural Network for CTR Prediction |
Deep & Cross Network | [ADKDD 2017]Deep & Cross Network for Ad Click Predictions |
DCN V2 | [arxiv 2020]DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems |
Requirements
Tensorflow 2.4.0
Tensorflow-Addons
pip install -r requirements.txt
Citation
@article{zheng2022cowclip,
title={{CowClip}: Reducing {CTR} Prediction Model Training Time from 12 hours to 10 minutes on 1 {GPU}},
author={Zangwei Zheng, Pengtai Xu, Xuan Zou, Da Tang, Zhen Li, Chenguang Xi, Peng Wu, Leqi Zou, Yijie Zhu, Ming Chen, Xiangzhuo Ding, Fuzhao Xue, Ziheng Qing, Youlong Cheng, Yang You},
journal={arXiv},
volume={abs/2204.06240},
year={2022}
}