• Stars
    star
    474
  • Rank 90,467 (Top 2 %)
  • Language
    Python
  • License
    MIT License
  • Created over 3 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

CRSLab is an open-source toolkit for building Conversational Recommender System (CRS).

CRSLab

Pypi Latest Version Release License arXiv Documentation Status

Paper | Docs | 中文版

CRSLab is an open-source toolkit for building Conversational Recommender System (CRS). It is developed based on Python and PyTorch. CRSLab has the following highlights:

  • Comprehensive benchmark models and datasets: We have integrated commonly-used 6 datasets and 18 models, including graph neural network and pre-training models such as R-GCN, BERT and GPT-2. We have preprocessed these datasets to support these models, and release for downloading.
  • Extensive and standard evaluation protocols: We support a series of widely-adopted evaluation protocols for testing and comparing different CRS.
  • General and extensible structure: We design a general and extensible structure to unify various conversational recommendation datasets and models, in which we integrate various built-in interfaces and functions for quickly development.
  • Easy to get started: We provide simple yet flexible configuration for new researchers to quickly start in our library.
  • Human-machine interaction interfaces: We provide flexible human-machine interaction interfaces for researchers to conduct qualitative analysis.

RecBole v0.1 architecture
Figure 1: The overall framework of CRSLab

Installation

CRSLab works with the following operating systems:

  • Linux
  • Windows 10
  • macOS X

CRSLab requires Python version 3.7 or later.

CRSLab requires torch version 1.8. If you want to use CRSLab with GPU, please ensure that CUDA or CUDAToolkit version is 10.2 or later. Please use the combinations shown in this Link to ensure the normal operation of PyTorch Geometric.

Install PyTorch

Use PyTorch Locally Installation or Previous Versions Installation commands to install PyTorch. For example, on Linux and Windows 10:

# CUDA 10.2
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=10.2 -c pytorch

# CUDA 11.1
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge

# CPU Only
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cpuonly -c pytorch

If you want to use CRSLab with GPU, make sure the following command prints True after installation:

$ python -c "import torch; print(torch.cuda.is_available())"
>>> True

Install PyTorch Geometric

Ensure that at least PyTorch 1.8.0 is installed:

$ python -c "import torch; print(torch.__version__)"
>>> 1.8.0

Find the CUDA version PyTorch was installed with:

$ python -c "import torch; print(torch.version.cuda)"
>>> 11.1

For Linux:

Install the relevant packages:

conda install pyg -c pyg

For others:

Check PyG installation documents to install the relevant packages.

Install CRSLab

You can install from pip:

pip install crslab

OR install from source:

git clone https://github.com/RUCAIBox/CRSLab && cd CRSLab
pip install -e .

Quick-Start

With the source code, you can use the provided script for initial usage of our library with cpu by default:

python run_crslab.py --config config/crs/kgsf/redial.yaml

The system will complete the data preprocessing, and training, validation, testing of each model in turn. Finally it will get the evaluation results of specified models.

If you want to save pre-processed datasets and training results of models, you can use the following command:

python run_crslab.py --config config/crs/kgsf/redial.yaml --save_data --save_system

In summary, there are following arguments in run_crslab.py:

  • --config or -c: relative path for configuration file(yaml).
  • --gpu or -g: specify GPU id(s) to use, we now support multiple GPUs. Defaults to CPU(-1).
  • --save_data or -sd: save pre-processed dataset.
  • --restore_data or -rd: restore pre-processed dataset from file.
  • --save_system or -ss: save trained system.
  • --restore_system or -rs: restore trained system from file.
  • --debug or -d: use validation dataset to debug your system.
  • --interact or -i: interact with your system instead of training.
  • --tensorboard or -tb: enable tensorboard to monitor train performance.

Models

In CRSLab, we unify the task description of conversational recommendation into three sub-tasks, namely recommendation (recommend user-preferred items), conversation (generate proper responses) and policy (select proper interactive action). The recommendation and conversation sub-tasks are the core of a CRS and have been studied in most of works. The policy sub-task is needed by recent works, by which the CRS can interact with users through purposeful strategy. As the first release version, we have implemented 18 models in the four categories of CRS model, Recommendation model, Conversation model and Policy model.

Category Model Graph Neural Network? Pre-training Model?
CRS Model ReDial
KBRD
KGSF
TG-ReDial
INSPIRED
×


×
×
×
×
×

Recommendation model Popularity
GRU4Rec
SASRec
TextCNN
R-GCN
BERT
×
×
×
×

×
×
×
×
×
×
Conversation model HERD
Transformer
GPT-2
×
×
×
×
×
Policy model PMI
MGCG
Conv-BERT
Topic-BERT
Profile-BERT
×
×
×
×
×
×
×


Among them, the four CRS models integrate the recommendation model and the conversation model to improve each other, while others only specify an individual task.

For Recommendation model and Conversation model, we have respectively implemented the following commonly-used automatic evaluation metrics:

Category Metrics
Recommendation Metrics Hit@{1, 10, 50}, MRR@{1, 10, 50}, NDCG@{1, 10, 50}
Conversation Metrics PPL, BLEU-{1, 2, 3, 4}, Embedding Average/Extreme/Greedy, Distinct-{1, 2, 3, 4}
Policy Metrics Accuracy, Hit@{1,3,5}

Datasets

We have collected and preprocessed 6 commonly-used human-annotated datasets, and each dataset was matched with proper KGs as shown below:

Dataset Dialogs Utterances Domains Task Definition Entity KG Word KG
ReDial 10,006 182,150 Movie -- DBpedia ConceptNet
TG-ReDial 10,000 129,392 Movie Topic Guide CN-DBpedia HowNet
GoRecDial 9,125 170,904 Movie Action Choice DBpedia ConceptNet
DuRecDial 10,200 156,000 Movie, Music Goal Plan CN-DBpedia HowNet
INSPIRED 1,001 35,811 Movie Social Strategy DBpedia ConceptNet
OpenDialKG 13,802 91,209 Movie, Book Path Generate DBpedia ConceptNet

Performance

We have trained and test the integrated models on the TG-Redial dataset, which is split into training, validation and test sets using a ratio of 8:1:1. For each conversation, we start from the first utterance, and generate reply utterances or recommendations in turn by our model. We perform the evaluation on the three sub-tasks.

Recommendation Task

Model Hit@1 Hit@10 Hit@50 MRR@1 MRR@10 MRR@50 NDCG@1 NDCG@10 NDCG@50
SASRec 0.000446 0.00134 0.0160 0.000446 0.000576 0.00114 0.000445 0.00075 0.00380
TextCNN 0.00267 0.0103 0.0236 0.00267 0.00434 0.00493 0.00267 0.00570 0.00860
BERT 0.00722 0.00490 0.0281 0.00722 0.0106 0.0124 0.00490 0.0147 0.0239
KBRD 0.00401 0.0254 0.0588 0.00401 0.00891 0.0103 0.00401 0.0127 0.0198
KGSF 0.00535 0.0285 0.0771 0.00535 0.0114 0.0135 0.00535 0.0154 0.0259
TG-ReDial 0.00793 0.0251 0.0524 0.00793 0.0122 0.0134 0.00793 0.0152 0.0211

Conversation Task

Model BLEU@1 BLEU@2 BLEU@3 BLEU@4 Dist@1 Dist@2 Dist@3 Dist@4 Average Extreme Greedy PPL
HERD 0.120 0.0141 0.00136 0.000350 0.181 0.369 0.847 1.30 0.697 0.382 0.639 472
Transformer 0.266 0.0440 0.0145 0.00651 0.324 0.837 2.02 3.06 0.879 0.438 0.680 30.9
GPT2 0.0858 0.0119 0.00377 0.0110 2.35 4.62 8.84 12.5 0.763 0.297 0.583 9.26
KBRD 0.267 0.0458 0.0134 0.00579 0.469 1.50 3.40 4.90 0.863 0.398 0.710 52.5
KGSF 0.383 0.115 0.0444 0.0200 0.340 0.910 3.50 6.20 0.888 0.477 0.767 50.1
TG-ReDial 0.125 0.0204 0.00354 0.000803 0.881 1.75 7.00 12.0 0.810 0.332 0.598 7.41

Policy Task

Model Hit@1 Hit@10 Hit@50 MRR@1 MRR@10 MRR@50 NDCG@1 NDCG@10 NDCG@50
MGCG 0.591 0.818 0.883 0.591 0.680 0.683 0.591 0.712 0.729
Conv-BERT 0.597 0.814 0.881 0.597 0.684 0.687 0.597 0.716 0.731
Topic-BERT 0.598 0.828 0.885 0.598 0.690 0.693 0.598 0.724 0.737
TG-ReDial 0.600 0.830 0.893 0.600 0.693 0.696 0.600 0.727 0.741

The above results were obtained from our CRSLab in preliminary experiments. However, these algorithms were implemented and tuned based on our understanding and experiences, which may not achieve their optimal performance. If you could yield a better result for some specific algorithm, please kindly let us know. We will update this table after the results are verified.

Releases

Releases Date Features
v0.1.1 1 / 4 / 2021 Basic CRSLab
v0.1.2 3 / 28 / 2021 CRSLab

Contributions

Please let us know if you encounter a bug or have any suggestions by filing an issue.

We welcome all contributions from bug fixes to new features and extensions.

We expect all contributions discussed in the issue tracker and going through PRs.

We thank the nice contributions through PRs from @shubaoyu, @ToheartZhang.

Citing

If you find CRSLab useful for your research or development, please cite our Paper:

@article{crslab,
    title={CRSLab: An Open-Source Toolkit for Building Conversational Recommender System},
    author={Kun Zhou, Xiaolei Wang, Yuanhang Zhou, Chenzhan Shang, Yuan Cheng, Wayne Xin Zhao, Yaliang Li, Ji-Rong Wen},
    year={2021},
    journal={arXiv preprint arXiv:2101.00939}
}

Team

CRSLab was developed and maintained by AI Box group in RUC.

License

CRSLab uses MIT License.

More Repositories

1

LLMSurvey

The official GitHub page for the survey paper "A Survey of Large Language Models".
Python
9,332
star
2

RecBole

A unified, comprehensive and efficient recommendation library
Python
3,243
star
3

TextBox

TextBox 2.0 is a text generation library with pre-trained language models
Python
1,065
star
4

Awesome-RSPapers

Recommender System Papers
902
star
5

RecSysDatasets

This is a repository of public data sources for Recommender Systems (RS).
Python
739
star
6

LLMBox

A comprehensive library for implementing LLMs, including a unified training pipeline and comprehensive model evaluation.
Python
410
star
7

Top-conference-paper-list

A collection of classified and organized top conference paper list.
362
star
8

HaluEval

This is the repository of HaluEval, a large-scale hallucination evaluation benchmark for Large Language Models.
Python
298
star
9

LLMRank

[ECIR'24] Implementation of "Large Language Models are Zero-Shot Rankers for Recommender Systems"
Python
205
star
10

Negative-Sampling-Paper

This repository collects 100 papers related to negative sampling methods.
173
star
11

DenseRetrieval

170
star
12

RecBole2.0

An up-to-date, comprehensive and flexible recommendation library
167
star
13

RecBole-GNN

Efficient and extensible GNNs enhanced recommender library based on RecBole.
Python
159
star
14

UniSRec

[KDD'22] Official PyTorch implementation for "Towards Universal Sequence Representation Learning for Recommender Systems".
Python
158
star
15

NCL

[WWW'22] Official PyTorch implementation for "Improving Graph Collaborative Filtering with Neighborhood-enriched Contrastive Learning".
Python
113
star
16

RSPapers

Must-read papers on Recommender System. 推荐系统相关论文整理(内含40篇论文,并持续更新中)
89
star
17

FMLP-Rec

Python
86
star
18

RecBole-CDR

This is a library built upon RecBole for cross-domain recommendation algorithms
Python
78
star
19

MVP

This repository is the official implementation of our paper MVP: Multi-task Supervised Pre-training for Natural Language Generation.
67
star
20

VQ-Rec

[WWW'23] PyTorch implementation for "Learning Vector-Quantized Item Representation for Transferable Sequential Recommenders".
Python
51
star
21

RecBole-PJF

Python
46
star
22

ChatCoT

The official repository of "ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models"
Python
41
star
23

CORE

[SIGIR'22] Official PyTorch implementation for "CORE: Simple and Effective Session-based Recommendation within Consistent Representation Space".
Python
38
star
24

Multi-View-Co-Teaching

Code for our CIKM 2020 paper "Learning to Match Jobs with Resumes from Sparse Interaction Data using Multi-View Co-Teaching Network"
Python
29
star
25

JiuZhang

Our code will be public soon .
Python
25
star
26

ELMER

This repository is the official implementation of our EMNLP 2022 paper ELMER: A Non-Autoregressive Pre-trained Language Model for Efficient and Effective Text Generation
Python
24
star
27

BAMBOO

Python
23
star
28

Language-Specific-Neurons

Python
17
star
29

RecBole-DA

Python
17
star
30

CARP

Python
16
star
31

SAFE

The pytorch implementation of the SAFE model presented in NAACL-Findings-2022
Python
16
star
32

RecBole-TRM

Python
13
star
33

Erya

12
star
34

MML

Python
12
star
35

Context-Tuning

This is the repository for COLING 2022 paper "Context-Tuning: Learning Contextualized Prompts for Natural Language Generation".
11
star
36

UniWeb

The official repository for our ACL 2023 Findings paper: The Web Can Be Your Oyster for Improving Language Models
9
star
37

PPGM

[ICDM'22] PyTorch implementation for "Privacy-Preserved Neural Graph Similarity Learning".
Python
6
star
38

LIVE

The official repository our ACL 2023 paper: "Learning to Imagine: Visually-Augmented Natural Language Generation"."
Python
5
star
39

Social-Datasets

A collection of social datasets for RecBole-GNN.
5
star
40

M3SRec

4
star
41

FIGA

Python
3
star
42

Contrastive-Curriculum-Learning

Python
3
star
43

Data-CUBE

3
star
44

Div-Ref

The official repository of "Not All Metrics Are Guilty: Improving NLG Evaluation Diversifying References".
Python
2
star
45

MVS

The implementation code of the TOIS paper MVS "Enhancing Multi-View Smoothness for Sequential Recommendation Models"
Python
2
star
46

GenRec

Python
1
star
47

ETRec

Python
1
star