• Stars
    star
    146
  • Rank 252,769 (Top 5 %)
  • Language
  • License
    Other
  • Created over 7 years ago
  • Updated about 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Corpora for evaluating NLU services (like API.ai, RASA, Microsoft LUIS, ...)

README

This project is a collection of three corpora which can be used for evaluating chatbots or other conversational interfaces. Two of the corpora were extracted from StackExchange, one from a Telegram chatbot.

If you use the data and publish please let us know and cite our SIGdial 2017 paper:

@InProceedings{braun-EtAl:2017:SIGDIAL,
  author    = {Braun, Daniel  and  Hernandez-Mendez, Adrian  and  Matthes, Florian  and  Langen, Manfred},
  title     = {Evaluating Natural Language Understanding Services for Conversational Question Answering Systems},
  booktitle = {Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue},
  month     = {August},
  year      = {2017},
  address   = {Saarbrücken, Germany},
  publisher = {Association for Computational Linguistics},
  pages     = {174--185},
  url       = {http://www.aclweb.org/anthology/W17-3622}
}

Errata

There is an error in Table 5 of the paper. In the "true +" column, the overall sum should be 573, not 820, and accordingly precision, recall, and f-score are 0.92, 0.85, and 0.88.

[The reason for this error is in the Excel evaluation sheet, the total number of "true +" (573) was stored as number of "true +" for the chatbot corpus. Added up with the result for the other corpora (77, 170) we end up with 820.]

License

All three corpora are released under the CC BY-SA 3.0 license.

Content

Ask Ubuntu Corpus

162 questions and answers from https://askubuntu.com.

Five intents (MakeUpdate, SetupPrinter, ShutdownComputer, SoftwareRecommendation, None) and three entity types (Printer, Software, Version).

Web Applications Corpus

89 questions and answers from https://webapps.stackexchange.com.

Eight intents (ChangePassword, DeleteAccount, DownloadVideo, ExportData, FilterSpam, FindAlternative, SyncAccounts, None) and three entity types (WebService, OS, Browser).

Chatbot Corpus

206 questions from a Telegram chatbot for public transport in Munich.

Two intents (Departure Time, Find Connection) and five entity types (StationStart, StationDest, Criterion, Vehicle, Line).

Evaluation Scripts

Python scripts for automated evaluation are provided here.

Contact Information

If you have any questions, please contact:

Daniel Braun (Technical University of Munich) [email protected]

More Repositories

1

bbse

Lecture slides "Blockchain-based Systems Engineering"
TeX
368
star
2

Lbl2Vec

Lbl2Vec learns jointly embedded label, document and word vectors to retrieve documents with predefined topics from an unlabeled document corpus.
Python
176
star
3

KG-in-NLP-survey

This repository contains the annotated collection of 507 papers included in the study: "A Decade of Knowledge Graphs in Natural Language Processing: A Survey", published in AACL-IJCNLP 2022.
30
star
4

NLU-Evaluation-Scripts

Python scripts for evaluating NLU services (like API.ai, RASA, Microsoft LUIS, ...)
Python
28
star
5

SimpleNLG-DE

German version of SimpleNLG 4
Java
16
star
6

sebamaster-movie-backend

SEBAMaster Movie Backend Application
JavaScript
11
star
7

Medical-Abstracts-TC-Corpus

This repository contains a medical abstracts dataset, describing 5 different classes of patient conditions. The dataset can be used for text classification.
10
star
8

sebamaster-movie-frontend

SEBAMaster Movie Frontend Application React
JavaScript
8
star
9

Legal-Sentence-Classification-Datasets-and-Models

Datasets constituting legal sentences from the tenancy law of the German civil law as well as legal word2vec models.
5
star
10

akre-server

REST based service using on UIMA framework to support software architecture desicion making
Java
5
star
11

sebamaster-movie-frontend-angular-old

SEBAMaster Movie Frontend Application Angular (OLD/DEPRECATED)
JavaScript
5
star
12

Efficient-Domain-Adaptation-of-Sentence-Embeddings-using-Adapters

Repository of the RANLP 2023 paper "Efficient Domain Adaptation of Sentence Embeddings Using Adapters"
5
star
13

seba-master-movie-frontend

JavaScript
3
star
14

DocClassification

Seed project for document classification
JavaScript
3
star
15

seba-master-movie-backend

JavaScript
3
star
16

syncpipes-client

Angular based client for syncpipes-server, that allows to create, update and monitor data transformation pipelines.
JavaScript
2
star
17

amelietor

A simple web based prototype that highlightes architecture significant terms within entered software architecture description document. Software architecture alternatives and solutions suggested for each of the highlited terms
JavaScript
2
star
18

WElib

Word Embeddings Library
Java
1
star
19

movie-backend

Example node.js project for Web Application Engineering Master course 2016
JavaScript
1
star
20

sacm

JavaScript
1
star
21

cc-annotator

Clean code annotator (cc-annotator) is an Atom package providing UI for CleanCode backend service
JavaScript
1
star
22

Exploring-NLP-Research

Repository of the RANLP 2023 paper "Exploring the Landscape of Natural Language Processing Research"
1
star
23

bbse-dev-env

An example implementation of the Ethereum Basics: Setting Up a Development Environment exercise sheet.
JavaScript
1
star
24

seba-master-vis-example

JavaScript
1
star
25

MucLex

A German Lexicon for Surface Realisation
Python
1
star
26

Legal-Sentence-Boundary-Detection

Python
1
star
27

bbse-bank-dApp

An example implementation of the Ethereum Bonus: Implementing a dApp Frontend Using web3 exercise sheet.
JavaScript
1
star
28

cognitive-biases

Decision-making Processes and Cognitive Biases in Designing Software Architecture Design
CSS
1
star
29

verlyze-pseudonymization

This repository includes screenshots of the workflow of our Verlyze pseudonymization tool
1
star
30

bbse-bank-2.0

An example implementation of the Ethereum Design Patterns: Applying Idioms and Patterns exercise sheet.
JavaScript
1
star