• Stars
    star
    131
  • Rank 274,476 (Top 6 %)
  • Language
    Python
  • Created almost 6 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

The ATIS (Airline Travel Information System) Dataset

README written in English

The ATIS (Airline Travel Information System) Dataset

本仓库包含了 Python pickle 格式和 Rasa NLU JSON 格式(https://rasa.com/docs/nlu/dataformat/#json-format)的 ATIS Dataset(数据集),并提供了读取脚本和示例代码。

数据样本

原始格式

   0:         flight: BOS i want to fly from boston at 838 am and arrive in denver at 1110 in the morning EOS
                              BOS                                        O
                                i                                        O
                             want                                        O
                               to                                        O
                              fly                                        O
                             from                                        O
                           boston                      B-fromloc.city_name
                               at                                        O
                              838                       B-depart_time.time
                               am                       I-depart_time.time
                              and                                        O
                           arrive                                        O
                               in                                        O
                           denver                        B-toloc.city_name
                               at                                        O
                             1110                       B-arrive_time.time
                               in                                        O
                              the                                        O
                          morning              B-arrive_time.period_of_day
                              EOS                                        O

Rasa NLU Json 格式

{
    "rasa_nlu_data": {
        "common_examples": [
            {
                "text": "i would like to find a flight from charlotte to las vegas that makes a stop in st. louis",
                "intent": "flight",
                "entities": [
                    {
                        "start": 35,
                        "end": 44,
                        "value": "charlotte",
                        "entity": "fromloc.city_name"
                    },
                    {
                        "start": 48,
                        "end": 57,
                        "value": "las vegas",
                        "entity": "toloc.city_name"
                    },
                    {
                        "start": 79,
                        "end": 88,
                        "value": "st. louis",
                        "entity": "stoploc.city_name"
                    }
                ]
            },
            ...
        ]
    }
}

数据统计

样本数 词汇数 实体数 意图数
4978(训练集)+893(测试集) 943 129 26

示例代码

summary_data.py 中包含了读取原始数据的代码,用户可以参考该代码,实现从原始文件读取数据。

下载

数据格式 训练集 测试集
Python 3 Pickle 格式 atis.train.pkl atis.test.pkl
Rasa NLU JSON 格式 train.json test.json

Credit

同类项目

More Repositories

1

unlocking-the-power-of-llms

使用 Prompts 和 Chains 让 ChatGPT 成为神奇的生产力工具!Unlocking the power of LLMs.
Shell
2,319
star
2

Chinese_models_for_SpaCy

SpaCy 中文模型 | Models for SpaCy that support Chinese
Jupyter Notebook
608
star
3

hanzi_char_featurizer

汉字字符特征提取器 (featurizer),提取汉字的特征(发音特征、字形特征)用做深度学习的特征 | A Chinese character feature extractor, which extracts the features of Chinese characters (pronunciation features, glyph features) as features for deep learning
Python
271
star
4

hanzi_chaizi

汉字拆字库,可以将汉字拆解成偏旁部首,在机器学习中作为汉字的字形特征
Python
263
star
5

tools_for_corpus_of_people_daily

人民日报语料处理工具集 | Tools for Corpus of People's Daily
Python
256
star
6

WeatherBot

一个基于 Rasa 的中文天气情况问询机器人(chatbot), 带 Web UI 界面
235
star
7

MicroTokenizer

一个微型&算法全面的中文分词引擎 | A micro tokenizer for Chinese
Python
139
star
8

rasa_chinese

rasa_chinese 专门针对中文语言的 rasa 组件扩展包,提供了许多针对中文语言的组件
Python
119
star
9

seq2annotation

基于 TensorFlow & PaddlePaddle 的通用序列标注算法库(目前包含 BiLSTM+CRF, Stacked-BiLSTM+CRF 和 IDCNN+CRF,更多算法正在持续添加中)实现中文分词(Tokenizer / segmentation)、词性标注(Part Of Speech, POS)和命名实体识别(Named Entity Recognition, NER)等序列标注任务。
Python
82
star
10

MITIE_Chinese_Wikipedia_corpus

Pre-trained Wikipedia corpus by MITIE
52
star
11

chinese-wikipedia-corpus-creator

Corpus creator for Chinese Wikipedia
Python
42
star
12

MicroRegEx

一个微型的正则表达式引擎 | A micro regular expression engine
Jupyter Notebook
37
star
13

Chinese_tokenizer_benchmark

中文分词软件基准测试 | Chinese tokenizer benchmark
Python
21
star
14

rasa_contrib

rasa_contrib is a addon package for rasa. It provide some useful/powerful addition components
Python
21
star
15

NLU_benchmark_dataset

自然语言理解 基准测试 数据集 | Benchmark datasets for Natural Language Understanding (NLU)
Python
21
star
16

corpus_dataset_for_Chinese_NLP

中文 NLP 语料库数据集
18
star
17

four_corner_method

中文「四角号码」数据与工具,可以将汉字拆解成和字形相关的编码,在机器学习中作为汉字的字形特征
HTML
18
star
18

scel2txt

搜狗细胞词库到普通文本的转换提取工具。提取词汇表,用于深度学习做数据生成和字典特征
Python
17
star
19

vimapt

A package manager for vim (VimApt => Vim's Advanced Package Tools)
Python
16
star
20

tf_crf_layer

CRF(Conditional Random Field) Layer for TensorFlow 1.X with many powerful functions
Python
16
star
21

rasa_chinese_service

rasa_chinese 的服务 package
Python
16
star
22

MicroCompiler

一个微型的 LL/LR/LALR 语法解析器 | A micro compiler project to provide LL/LR/LALR syntax parser
Python
15
star
23

WeatherBot_Action

Action server for WeatherBot
Python
14
star
24

WeatherBot_UI

WebChat UI (HTML pages) for WeatherBot
HTML
14
star
25

PaddleTokenizer

使用 PaddlePaddle 实现基于深度神经网络的中文分词引擎 | A DNN Chinese Tokenizer by Using PaddlePaddle
JavaScript
14
star
26

MicroHMM

一个微型的基于 Python 的 HMM (隐马尔可夫模型) 包 | A micro python package for HMM (Hidden Markov Model)
Python
13
star
27

MicroWeatherBot_CN

基于 rasa 1.x 版本搭建的中文天气查询 demo | A simple & micro Chinese Weatherbot based on rasa framework
Python
12
star
28

WeatherBot_Core

Python
10
star
29

entity2embedding

A python package for word2vec
Python
7
star
30

MicroWeatherBot_EN

基于 rasa 1.x 版本搭建的英文天气查询 demo | A simple & micro English Weatherbot based on rasa framework
Python
7
star
31

q_learning_demo

Show how Q-learning works from scratch
Jupyter Notebook
5
star
32

PaddleNER

JavaScript
3
star
33

basic_weather_bot_server

Python
3
star
34

MicroCPUID

A micro tool based on assembly language to detect and display CPU information
Assembly
3
star
35

SDMdata

JavaScript
3
star
36

ner_offline_evaluate

Python
2
star
37

howl-anderson.github.io

HTML
2
star
38

howl-anderson

2
star
39

hanzi_char_lookup_feature

基于字典的方法给字符提供额外的特征,常用于基于深度学习的NER
Python
2
star
40

AIMLChatRobot

Python
1
star
41

how_Chinese_tokenizer_works

Python
1
star
42

google-io-keras-vae

Code for Google IO 2021 Modern Keras design patterns session
Jupyter Notebook
1
star
43

MicroTagger

一个微型的用于提取 Part-Of-Speech (POS) 的 Python 包 | A micro python library for NLP Tagger of Part-Of-Speech (POS)
Python
1
star
44

Assignment_for_Natural_Language_Processing_with_Deep_Learning_CS224n_By_Stanford_University

Assignment for CS224n: Natural Language Processing with Deep Learning By Stanford University
Python
1
star
45

sdmvspecies

SDMvspecies is R package to create virtual species (virtual data or artificial data) for SDM (Species Distribution Modelling)
R
1
star