@thu-coai
  • Stars
    star
    7,694
  • Global Org. Rank 3,004 (Top 1.0 %)
  • Registered over 5 years ago
  • Most used languages
    Python
    95.3 %
    JavaScript
    1.6 %
    C++
    1.6 %
    Cuda
    1.6 %
  • Location 🇨🇳 China
  • Country Total Rank 899
  • Country Ranking
    Python
    110
    Cuda
    200
    C++
    9,956

Top repositories

1

CDial-GPT

A Large-scale Chinese Short-Text Conversation Dataset and Chinese pre-training dialog models
Python
1,678
star
2

Safety-Prompts

Chinese safety prompts for evaluating and improving the safety of LLMs. 中文安全prompts,用于评估和提升大模型的安全性。
649
star
3

CrossWOZ

A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue Dataset
Python
580
star
4

KdConv

KdConv: A Chinese Multi-domain Dialogue Dataset Towards Multi-turn Knowledge-driven Conversation
Python
455
star
5

ConvLab-2

ConvLab-2: An Open-Source Toolkit for Building, Evaluating, and Diagnosing Dialogue Systems
Python
442
star
6

CharacterGLM-6B

CharacterGLM: Customizing Chinese Conversational AI Characters with Large Language Models
Python
302
star
7

EVA

EVA: Large-scale Pre-trained Chit-Chat Models
Python
299
star
8

BPO

Python
245
star
9

ccm

This project is a tensorflow implement of our work, CCM (Commonsense Conversational Model).
Python
218
star
10

ecm

This project is a tensorflow implement of our work, ECM (emotional chatting machine).
Python
216
star
11

NLG_book

书籍《现代自然语言生成》介绍
214
star
12

Emotional-Support-Conversation

Data and codes for ACL 2021 paper: Towards Emotional Support Dialog Systems
Python
205
star
13

PaperForONLG

Paper list for open-ended language generation
180
star
14

COLDataset

The official repository of the paper: COLD: A Benchmark for Chinese Offensive Language Detection
168
star
15

cotk

Conversational Toolkit. An Open-Source Toolkit for Fast Development and Fair Evaluation of Text Generation
Python
128
star
16

PsyQA

一个中文心理健康支持问答数据集,提供了丰富的援助策略标注。可用于生成富有援助策略的长咨询文本。
117
star
17

DA-Transformer

Official Implementation for the ICML2022 paper "Directed Acyclic Transformer for Non-Autoregressive Machine Translation"
Python
114
star
18

PPT

Official Code for "PPT: Pre-trained Prompt Tuning for Few-shot Learning". ACL 2022
Python
104
star
19

CommonsenseStoryGen

Implementation for paper "A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation"
Python
102
star
20

PICL

Code for ACL2023 paper: Pre-Training to Learn in Context
Python
101
star
21

CritiqueLLM

96
star
22

SafetyBench

Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety.
Python
95
star
23

tatk

Task-oriented dialog system toolkits
Python
84
star
24

SentiLARE

Codes for our paper "SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge" (EMNLP 2020)
Python
78
star
25

THUOOP

清华大学面向对象程序设计课程 课程材料及答疑
76
star
26

OPD

OPD: Chinese Open-Domain Pre-trained Dialogue Model
Python
73
star
27

JointGT

Codes for our paper "JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge Graphs" (ACL 2021 Findings)
Python
70
star
28

LOT-LongLM

Python
67
star
29

UNION

UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation
Python
56
star
30

ShieldLM

ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors
Python
54
star
31

OpenMEVA

Benchmark for evaluating open-ended generation
Python
41
star
32

HINT

Python
35
star
33

CTRLEval

Codes for our paper "CTRLEval: An Unsupervised Reference-Free Metric for Evaluating Controlled Text Generation" (ACL 2022)
Python
31
star
34

CPT4DST

Official code for "Continual Prompt Tuning for Dialog State Tracking" (ACL 2022).
Python
28
star
35

seq2seq-pytorch-bert

Python
26
star
36

DiaSafety

This repo is for the paper: On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark
Python
22
star
37

LAUG

Language Understanding Augmentation Toolkit for Robustness Testing
Python
19
star
38

TaiLr

ICLR2023 - Tailoring Language Generation Models under Total Variation Distance
Python
19
star
39

Targeted-Data-Extraction

Official Code for ACL 2023 paper: "Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation"
Python
17
star
40

AugESC

Official repository for the Findings of ACL 2023 paper "AugESC: Dialogue Augmentation with Large Language Models for Emotional Support Conversation"
15
star
41

ConPer

Official Code for NAACL 2022 paper: "Persona-Guided Planning for Controlling the Protagonist's Persona in Story Generation"
Python
14
star
42

NAST

Codes for "NAST: A Non-Autoregressive Generator with Word Alignment for Unsupervised Text Style Transfer" (ACL 2021 findings)
Python
14
star
43

CDConv

Data and codes for EMNLP 2022 paper "CDConv: A Benchmark for Contradiction Detection in Chinese Conversations"
Python
13
star
44

MoralStory

Python
13
star
45

AutoCAD

Official Code for EMNLP 2022 findings paper: "AutoCAD: Automatically Generating Counterfactuals for Mitigating Shortcut Learning"
Python
8
star
46

grounded-minimal-edit

Code for EMNLP 2021 paper "Transferable Persona-Grounded Dialogues via Grounded Minimal Edits"
Python
8
star
47

hred-tensorflow

Python
7
star
48

Implicit-Toxicity

Official Code for EMNLP 2023 paper: "Unveiling the Implicit Toxicity in Large Language Models""
Python
7
star
49

UDIT

Official Code for EMNLP2022 Paper: "Learning Instructions with Unlabeled Data for Zero-Shot Cross-Task Generalization"
Python
7
star
50

earl

This project is a tensorflow implementation of our work, EARL.
Python
6
star
51

EssayCommentGen

Python
6
star
52

Reverse_Generation

Python
5
star
53

LaMemo

NAACL2022 - LaMemo: Language Modeling with Look-Ahead Memory
Python
5
star
54

Re3Dial

Official Code for EMNLP 2023 paper: "Re3Dial: Retrieve, Reorganize and Rescale Conversations for Long-Turn Open-Domain Dialogue Pre-training"
Python
5
star
55

MoralDial

The official Implementations of the paper: MoralDial: A Framework to Train and Evaluate Moral Dialogue Systems via Moral Discussions
Python
4
star
56

ERIC

Code for the AAAI 2023 paper "Generating Coherent Narratives by Learning Dynamic and Discrete Entity States with a Contrastive Framework"
Python
4
star
57

DAG-Search

The beamsearch algorithm for DA-Transformer
C++
4
star
58

cotk_docs

Document for cotk package. Refer to: https://github.com/thu-coai/cotk
Python
4
star
59

seqGAN-tensorflow

Python
4
star
60

JailbreakDefense_GoalPriority

4
star
61

lightseq-nat

A Modified Version of LightSeq for Non-Autoregressive Transformer
Cuda
3
star
62

seq2seq-pytorch

Python
3
star
63

transformerLM-pytorch

Python
2
star
64

GPT2LM-pytorch

Python
2
star
65

cotk_dashboard

Dashboard for cotk
JavaScript
2
star
66

ConvLab-2_docs

2
star
67

CVAE-tensorflow

Python
2
star
68

SelfCont

Code for the paper "Mitigating the Learning Bias towards Repetition by Self-Contrastive Training for Open-Ended Generation"
Python
2
star
69

GRULM-pytorch

Python
1
star
70

LM-tensorflow

Python
1
star
71

SST-pytorch

Python
1
star
72

cotk-test-CVAE

Python
1
star
73

tatk_docs

The document of TaTK platform.
1
star
74

seq2seq-tensorflow

Python
1
star
75

VAE-tensorflow

Python
1
star
76

cotk_data

1
star