• This repository has been archived on 03/Jun/2022
  • Stars
    star
    223
  • Rank 177,451 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created about 6 years ago
  • Updated almost 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

QANet+DuReader中文机器阅读理解

QANet_dureader

本人曾随队在mrc2018机器阅读理解比赛中获得TOP11的名次,当时使用的是BIDAF,现尝试使用QANet去尝试dureader数据集

QANet

  • 残差块: 使用了残差网络来加深网络深度
  • 自注意: Google采用了自家的multihead attention 来计算self attention
  • 强化位置信息: QANet强化了位置信息,在每个卷积块中都加入了时间序列信息,可查看layers / residual_block / add_timing_signal_ld

小小改进

加入原始位置信息(position embedding)在decoder层做Attention计算

模型

requirements

tensorflow 1.6+
jieba

语料预处理

包括生成词典,使用预训练词向量,本模型支持Chinese-Word-Vectors中预训练词向量,下载该模型词向量后在cli.py中指定即可.

python3 cli.py --prepro

训练

python3 cli.py --train [arguments]

或者直接采用封装好的bash训练

bash train.sh

使用全量数据训练

请先下载全量数据,已封装bash

bash data/download_dureader.sh

Reference

More Repositories

1

xmnlp

xmnlp:提供中文分词, 词性标注, 命名体识别,情感分析,文本纠错,文本转拼音,文本摘要,偏旁部首,句子表征及文本相似度计算等功能
Python
1,227
star
2

AnglE

Train and Infer Powerful Sentence Embeddings with AnglE | 🔥 SOTA on STS and MTEB Leaderboard
Python
441
star
3

nlp_learning

结合python一起学习自然语言处理 (nlp): 语言模型、HMM、PCFG、Word2vec、完形填空式阅读理解任务、朴素贝叶斯分类器、TFIDF、PCA、SVD
Python
234
star
4

TripleIE

依存句法实现关系三元组的自动抽取
Python
96
star
5

short-text-classification

SVM, FastText, TextCNN, BiGRU, CNN-BiGRU在短分本分类上的对比
Jupyter Notebook
84
star
6

datastruct_and_algorithms

python/c++实现常用算法(数据结构,搜索,排序,动态规划...)
Python
40
star
7

clfzoo

A deep text classifiers library.
Python
36
star
8

llano

Let ChatGPT (Large Language Models) Serve As Data Annotator and Zero-shot/few-shot Information Extractor.
Python
29
star
9

nnclf

神经网络分类器,PyTorch实现
Python
21
star
10

chinese_reading_comprehension

实现了Attention-over-Attention Neural Networks for Reading Comprehension
Python
20
star
11

generate-lyrics-using-PyTorch

use RNN to generate chinese lyrics
Python
16
star
12

simnet

基于numpy实现的简单神经网络框架
Python
15
star
13

duReader_pytorch

基于duReader的阅读理解
Python
9
star
14

titanic_disaster

数据分析实战 kaggle titanic disaster。使用了RandomForestRegressor来预测缺失值,RandomForestClassifier来分类
Jupyter Notebook
7
star
15

rnn-attention-classifier

tensorflow 实现RNN+Attention文本分类
Python
6
star
16

LanguageDetect.jl

Port of Google's language-detection library to Julia.
Julia
5
star
17

LL1-Parser

A simple LL1 compiler.
C++
4
star
18

datacleaner

datacleaner, python数据清洗
Python
4
star
19

xiaoming

a seq2seq + attention chatbot
Python
4
star
20

simple_svm

python 实现svm
Python
4
star
21

BertWordPieceTokenizer.jl

WordPiece Tokenizer for BERT models.
Julia
3
star
22

artf

a lightweight tensorflow library.
Python
3
star
23

bimpm

Implement the model of bimpm (Bilateral Multi-Perspective Matching for Natural Language Sentences)
Python
2
star
24

SeanLee97

Config files for my GitHub profile.
1
star
25

my_profile

1
star
26

module-weather

小模块: python 获取 http://weather.com.cn/ 的天气数据
Python
1
star
27

xmnlp-extend

xmnlp的扩充,支持基于DeepNLP的依存句法分析、命名实体识别等
1
star
28

word2vec-test

usage of word2vec
Python
1
star