DKN
This repository is the implementation of DKN (arXiv):
DKN: Deep Knowledge-Aware Network for News Recommendation
Hongwei Wang, Fuzheng Zhang, Xing Xie, Minyi Guo
The Web Conference 2018 (WWW 2018)
DKN is a deep knowledge-aware network that takes advantage of knowledge graph representation in news recommendation. The main components in DKN is a KCNN module and an attention module:
- The KCNN module is to learn from semantic-level and knowledge-level representations of news jointly. The multiple channels and alignment of words and entities enable KCNN to combine information from heterogeneous sources.
- The attention module is to model the different impacts of a userβs diverse historical interests on current candidate news.
Files in the folder
data/
kg/
Fast-TransX
: an efficient implementation of TransE and its extended models for Knowledge Graph Embedding (from https://github.com/thunlp/Fast-TransX);kg.txt
: knowledge graph file;kg_preprocess.py
: pre-process the knowledge graph and output knowledge embedding files for DKN;prepare_data_for_transx.py
: generate the required input files for Fast-TransX;
news/
news_preprocess.py
: pre-process the news dataset;raw_test.txt
: raw test data file;raw_train.txt
: raw train data file;
src/
: implementations of DKN.
Note: Due to the pricacy policies of Bing News and file size limits on Github, the released raw dataset and the knowledge graph in this repository is only a small sample of the original ones reported in the paper.
Format of input files
- raw_train.txt and raw_test.txt:
user_id[TAB]news_title[TAB]label[TAB]entity_info
for each line, wherenews_title
is a list of wordsw1 w2 ... wn
, andentity_info
is a list of pairs of entity id and entity name:entity_id_1:entity_name;entity_id_2:entity_name...
- kg.txt:
head[TAB]relation[TAB]tail
for each line, wherehead
andtail
are entity ids andrelation
is the relation id.
Required packages
The code has been tested running under Python 3.6.5, with the following packages installed (along with their dependencies):
- tensorflow-gpu == 1.4.0
- numpy == 1.14.5
- sklearn == 0.19.1
- pandas == 0.23.0
- gensim == 3.5.0
Running the code
$ cd data/news
$ python news_preprocess.py
$ cd ../kg
$ python prepare_data_for_transx.py
$ cd Fast-TransX/transE/ (note: you can also choose other KGE methods)
$ g++ transE.cpp -o transE -pthread -O3 -march=native
$ ./transE
$ cd ../..
$ python kg_preprocess.py
$ cd ../../src
$ python main.py (note: use -h to check optional arguments)