• Stars
    star
    108
  • Rank 319,353 (Top 7 %)
  • Language
    Python
  • License
    MIT License
  • Created over 4 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

An Open-Source Package for Chinese Open-domain Conversational Chatbot (中文闲聊对话系统,一键部署微信闲聊机器人)

OpenDialog

我们现在拥有了测试接口了,搜索微信公众号 OpenDialog 可以使用

OpenDialog建立在基于PyTorch的transformers之上。 提供一系列transformer-based的中文开放域对话模型(闲聊对话),网罗已有的数据资源并持续不断的补充对应的中文对话系统的数据集,意图构建一个开源的中文闲聊对话平台。

最新进展:

  • 2020.8.20, 完成LCCC-GPT-Large生成式Open-Domain预训练模型的接口,运行下面代码可以启动对应的服务

    ./run_flask lccc <gpu_id>
  • 2020.10.26, 完成一批bi-encoder的检索式对话模型(bert-bi-encoder, polyencoder等)

  • ...

使用教程

1. 项目结构和文件简述

OpenDialog核心文件和目录:

  • data: 数据集,配置文件,词表,词向量,数据集处理脚本
  • models: 对话模型
  • metrics: 评价指标
  • multiview: 多角度重排模型,针对获得对话候选回复进行重排序
  • ckpt: 存放训练模型
  • rest: 存放tensorboard日志和test阶段生成的结果文件
  • utils: 存放工具函数
  • dataloader.py: 数据集加载脚本
  • main.py: 主运行文件
  • header.py: 需要导入的package
  • eval.py: 调用metrics中的评价指标的评估脚本,测试rest中生成文件的结果
  • run.sh: 运行批处理脚本
  • run_flask.sh: 调用模型,启动服务

2. 准备环境

  1. 基础系统环境: Linux/Ubuntu-16.04+, Python 3.6+, GPU (default 1080 Ti)

  2. 安装python依赖库

pip install -r requirements.txt
  1. 安装 ElasticSearch

    基于检索的对话系统需要首先使用elasticsearch进行粗筛。同时为了实现粗筛检索阶段的中文分词,同时需要下载和安装中文分词器

  2. 安装 mongodb

    启动服务之后,会使用mongodb存储会话历史和必要的数据

3. 准备数据

  1. 数据集百度云链接: https://pan.baidu.com/s/1xJibJmOOCGIzmJVC6CZ39Q; 提取码: vmua
  2. 将对应的数据文件存放在data目录下对应的子目录中,词向量文件chinese_w2v.txtenglish_w2v.bin存放在data下即可。
  3. 数据细节和预处理数据详见data/README.md
  4. 可用的数据集

5. 训练模型

  • 训练模型支持多GPU并行,只需要<gpu_ids>指定多个gpu id即可,比如0,1,2,3
  • dataset名称和data目录下的名称一致
Model CMD Type Details Refer Pre-train Model
bertretrieval ./run.sh train <dataset> bertretrieval <gpu_ids> retrieval 基于bert的精排模型(fine-tuning) Paper
gpt2 ./run.sh train <dataset> gpt2 <gpu_ids> generative GPT2生成式对话模型 Code
gpt2gan ./run.sh train <dataset> gpt2gan <gpu_ids> generative GAN-based对话模型,生成式模型是GPT2,判别模型是bert二分类模型 Paper

6. 实验结果

7. 启动flask服务

  1. 启动flask服务

    ./run_flask.sh <model_name> <gpu_id>
    
  2. 调用接口

    • 微信公众号
    • postman

More Repositories

1

Copyisallyouneed

[ICLR 2023] Codebase for Copy-Generator model, including an implementation of kNN-LM
Python
181
star
2

MultiTurnDialogZoo

Multi-turn dialogue baselines written in PyTorch
Python
162
star
3

science-llm

A large-scale language model for scientific domain, trained on redpajama arXiv split
Python
120
star
4

SimpleReDial-v1

The sources codes of the DR-BERT model and baselines
Python
38
star
5

RUBER-and-Bert-RUBER

Implementation of RUBER: An Unsupervised Method for Automatic Evaluation of Open-Domain Dialog Systems
Python
29
star
6

Rep-Dropout

[NeurIPS 2023] Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective
Python
27
star
7

MomentumDecoding

Momentum Decoding: Open-ended Text Generation as Graph Exploration
Python
19
star
8

EDA-NLP-Chinese

Easy Data Augmentation for NLP on Chinese
Python
16
star
9

PONE

Jupyter Notebook
13
star
10

GPT2Dialog

English or Chinses GPT2Dialog model from GPT2-chitchat
Python
11
star
11

Study

Good good study, day day ugly
Jupyter Notebook
10
star
12

Primary_Explainable_Factual_Consistency_Evaluation_Model

The simple demo of explainable factual consistency evaluation model, optimzing InternLM-7B by QLoRA
Python
10
star
13

WhenToSpeak

The codes of our paper When to Talk: Chatbot Controls the Timing of Talking during Multi-turn Open-domain Dialogue Generation
Jupyter Notebook
9
star
14

BIT-PSO

PSO Algorithm for solving the JSP problem
Python
6
star
15

Transformer-Dialog

PyTorch Transformer Dialogue Model
Python
6
star
16

EasyNLP

Python
5
star
17

EvidenceRetrievalLeaderboard

The leaderboard for evidence retrieval task
5
star
18

General-Zero

The AlphaZero for the WTN-EinStein Chess
Python
5
star
19

FeedbackPreference

This is the repo for our proposed Feedback Preference corpus
Python
4
star
20

SurveyFactory

All the survey I made, save the idea, help the newbie, review for myself
4
star
21

DeepLearning-Course

DeepLearning Course (RL, RNN, CNN, TextGen)
Jupyter Notebook
3
star
22

BITNLP

NLP project
Python
2
star
23

housechat

https://www.datafountain.cn/competitions/474
Python
2
star
24

CCompilerInPython

The Simple C Compiler in Python
HTML
2
star
25

HashRetrieval

Learning to Hash for Coarse Retrieval in Open-Domain Dialog Systems
Python
1
star
26

DataHammer-Training-Room

Machine Learning Practice for students get in touch with Group DataHammer
Shell
1
star
27

Paper-shredder

Paper, paper, and paper
Python
1
star
28

ubuntu-v2

Jupyter Notebook
1
star