• Stars
    star
    232
  • Rank 167,571 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created about 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Implementations of some methods in news recommendation.

News Recommendation

The repository currently includes the following models.

Models in published papers

Model Full name Paper
NRMS Neural News Recommendation with Multi-Head Self-Attention https://www.aclweb.org/anthology/D19-1671/
NAML Neural News Recommendation with Attentive Multi-View Learning https://arxiv.org/abs/1907.05576
LSTUR Neural News Recommendation with Long- and Short-term User Representations https://www.aclweb.org/anthology/P19-1033.pdf
DKN Deep Knowledge-Aware Network for News Recommendation https://dl.acm.org/doi/abs/10.1145/3178876.3186175
Hi-Fi Ark Deep User Representation via High-Fidelity Archive Network https://www.ijcai.org/Proceedings/2019/424
TANR Neural News Recommendation with Topic-Aware News Representation https://www.aclweb.org/anthology/P19-1110.pdf

Experimental models

Model Description
Exp1 NRMS + (Sub)category + Ensemble + Positional embedding

Get started

Basic setup.

git clone https://github.com/yusanshi/NewsRecommendation
cd NewsRecommendation
pip3 install -r requirements.txt

Download and preprocess the data.

mkdir data && cd data
# Download GloVe pre-trained word embedding
wget https://nlp.stanford.edu/data/glove.840B.300d.zip
sudo apt install unzip
unzip glove.840B.300d.zip -d glove
rm glove.840B.300d.zip

# Download MIND dataset
# By downloading the dataset, you agree to the [Microsoft Research License Terms](https://go.microsoft.com/fwlink/?LinkID=206977). For more detail about the dataset, see https://msnews.github.io/.

# Uncomment the following lines to use the MIND Large dataset (Note MIND Large test set doesn't have labels, see #11)
# wget https://mind201910small.blob.core.windows.net/release/MINDlarge_train.zip https://mind201910small.blob.core.windows.net/release/MINDlarge_dev.zip https://mind201910small.blob.core.windows.net/release/MINDlarge_test.zip
# unzip MINDlarge_train.zip -d train
# unzip MINDlarge_dev.zip -d val
# unzip MINDlarge_test.zip -d test
# rm MINDlarge_*.zip

# Uncomment the following lines to use the MIND Small dataset (Note MIND Small doesn't have a test set, so we just copy the validation set as test set :)
wget https://mind201910small.blob.core.windows.net/release/MINDsmall_train.zip https://mind201910small.blob.core.windows.net/release/MINDsmall_dev.zip
unzip MINDsmall_train.zip -d train
unzip MINDsmall_dev.zip -d val
cp -r val test # MIND Small has no test set :)
rm MINDsmall_*.zip

# Preprocess data into appropriate format
cd ..
python3 src/data_preprocess.py
# Remember you shoud modify `num_*` in `src/config.py` by the output of `src/data_preprocess.py`

Modify src/config.py to select target model. The configuration file is organized into general part (which is applied to all models) and model-specific part (that some models not have).

vim src/config.py

Run.

# Train and save checkpoint into `checkpoint/{model_name}/` directory
python3 src/train.py
# Load latest checkpoint and evaluate on the test set
python3 src/evaluate.py

You can visualize metrics with TensorBoard.

tensorboard --logdir=runs

# or
tensorboard --logdir=runs/{model_name}
# for a specific model

Tip: by adding REMARK environment variable, you can make the runs name in TensorBoard more meaningful. For example, REMARK=num-filters-300-window-size-5 python3 src/train.py.

Results

Model AUC MRR nDCG@5 nDCG@10 Remark
NRMS
NAML
LSTUR
DKN
Hi-Fi Ark
TANR

Checkpoints: https://drive.google.com/open?id=TODO

You can verify the results by simply downloading them and running MODEL_NAME=XXXX python3 src/evaluate.py.

Credits

More Repositories

1

emgithub

Embed a file from GitHub repository just like GitHub Gist.
HTML
364
star
2

hnuthesis

湖南大学硕士学位论文 LaTeX 模板
TeX
51
star
3

DKN

An implementation of DKN (Deep Knowledge-Aware Network for News Recommendation) in PyTorch.
Python
48
star
4

USTC-choose-course

中科大教务系统刷课 Python 小脚本。
Python
23
star
5

Greedy-Snake-Verilog

Greedy Snake game on Nexys 4 DDR with Verilog.
Verilog
17
star
6

RecHub

A library for GNN-based recommendation system.
Python
16
star
7

LSEC-GNN

16
star
8

USTC-grade-notification

USTC 新教务系统成绩通知脚本。
Python
13
star
9

handwriting-go-away

Handwriting text generator.
Vue
13
star
10

noname

无名杀(仿三国杀)
JavaScript
9
star
11

Jiaguomeng-Assist

《家国梦》辅助脚本,能够自动收集金币、升级建筑、运输货物、点击商店中的红包和相册。
Python
8
star
12

pdf-image-binarization

Binarize all images in a scanned PDF file. 扫描版 PDF 黑白化/二值化。
Python
7
star
13

ucas-checkin

中国科学院大学开学报到提醒小工具。
Python
7
star
14

reflower

Reflow a PDF file for e-readers like Kindle.
Python
7
star
15

cloud-clipboard

Sync your clipboard across devices.
HTML
6
star
16

idiom-solitaire

Hackergame 2019 成语接龙题解
Python
6
star
17

ustc-cas-secret

利用 USTC CAS 给同学展示密信(使用场景示例:你是助教,给学生自助查分)
Python
5
star
18

bib-helper

User scripts for quickly copying BibTeX records in Google Scholar and DBLP.
JavaScript
4
star
19

USTC-grade-query

USTC 快速查成绩。
Python
3
star
20

fused-pointpillars

Python
3
star
21

USTC-academic-report

Get email notifications of new academic reports.
Python
3
star
22

HAN

An implementation of HAN (Hierarchical Attention Networks for Document Classification) in PyTorch.
Python
2
star
23

bank-management-system

A simple bank management system in Vue.js and Flask. (USTC course project)
Vue
2
star
24

USTC-tools

USTC 工具集合。
1
star
25

ros-detection

CMake
1
star
26

Base64-Encoder-Verilog

使用 Verilog 在 Nexys 4 DDR 上完成流水线 CPU 后连接键盘、显示器、编写汇编程序实现的 Base64 编码器。
Verilog
1
star
27

vm-sync

Python
1
star