• Stars
    star
    569
  • Rank 75,408 (Top 2 %)
  • Language
    Jupyter Notebook
  • License
    Apache License 2.0
  • Created almost 5 years ago
  • Updated almost 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Simple XLNet implementation with Pytorch Wrapper

XLNet-Pytorch arxiv:1906.08237

Simple XLNet implementation with Pytorch Wrapper!

You can see How XLNet Architecture work in pre-training with small batch size(=1) example.

To Usage

$ git clone https://github.com/graykode/xlnet-Pytorch && cd xlnet-Pytorch

# To use Sentence Piece Tokenizer(pretrained-BERT Tokenizer)
$ pip install pytorch_pretrained_bert

$ python main.py --data ./data.txt --tokenizer bert-base-uncased \
   --seq_len 512 --reuse_len 256 --perm_size 256 \
   --bi_data True --mask_alpha 6 --mask_beta 1 \
   --num_predict 85 --mem_len 384 --num_epoch 100

Also, You can run code in Google Colab easily.

  • Hyperparameters for Pretraining in Paper.

#### Option
  • —data(String) : .txt file to train. It doesn't matter multiline text. Also, one file will be one batch tensor. Default : data.txt

  • —tokenizer(String) : I just used huggingface/pytorch-pretrained-BERT's Tokenizer as subword tokenizer(I'll edit it to sentence piece soon). you can choose in bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased. Default : bert-base-uncased

  • —seq_len(Integer) : Sequence length. Default : 512

  • —reuse_len(Interger) : Number of token that can be reused as memory. Could be half of seq_len. Default : 256

  • —perm_size(Interger) : the length of longest permutation. Could be set to be reuse_len. Default : 256

  • --bi_data(Boolean) : whether to create bidirectional data. If bi_data is True, biz(batch size) should be even number. Default : False

  • —mask_alpha(Interger) : How many tokens to form a group. Defalut : 6

  • —mask_beta(Integer) : How many tokens to mask within each group. Default : 1

  • —num_predict(Interger) : Num of tokens to predict. In Paper, it mean Partial Prediction. Default : 85

  • —mem_len(Interger) : Number of steps to cache in Transformer-XL Architecture. Default : 384

  • —num_epoch(Interger) : Number of Epoch. Default : 100

What is XLNet?

XLNet is a new unsupervised language representation learning method based on a novel generalized permutation language modeling objective. Additionally, XLNet employs Transformer-XL as the backbone model, exhibiting excellent performance for language tasks involving long context.

Model MNLI QNLI QQP RTE SST-2 MRPC CoLA STS-B
BERT 86.6 92.3 91.3 70.4 93.2 88.0 60.6 90.0
XLNet 89.8 93.9 91.8 83.8 95.6 89.2 63.6 91.8

Keyword in XLNet

  1. How did XLNet benefit from Auto-Regression and Auto-Encoding models?

    • Auto-Regression Model
    • Auto-Encoding Model
  2. Permutation Language Modeling with Partial Prediction

    • Permutation Language Modeling

    • Partial Prediction

  3. Two-Stream Self-Attention with Target-Aware Representation

    • Two-Stram Self-Attention

    • Target-Aware Representation

Author

  • Because the original repository is subject to the Apache2.0 license, it is subject to the same license.
  • Tae Hwan Jung(Jeff Jung) @graykode, Kyung Hee Univ CE(Undergraduate).
  • Author Email : [email protected]

More Repositories

1

nlp-tutorial

Natural Language Processing Tutorial for Deep Learning Researchers
Jupyter Notebook
13,597
star
2

nlp-roadmap

ROADMAP(Mind Map) and KEYWORD for students those who have interest in learning NLP
3,160
star
3

distribution-is-all-you-need

The basic distribution probability Tutorial for Deep Learning Researchers
Python
1,596
star
4

gpt-2-Pytorch

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation
Python
927
star
5

commit-autosuggestions

A tool that AI automatically recommends commit messages.
Python
381
star
6

toeicbert

TOEIC(Test of English for International Communication) solving using pytorch-pretrained-BERT model.
Python
115
star
7

modelsummary

All Model summary in PyTorch similar to `model.summary()` in Keras
Python
84
star
8

matorage

Matorage is tensor(multidimensional matrix) object storage manager for deep learning framework(Pytorch, Tensorflow V2, Keras)
Python
72
star
9

KorQuAD-beginner

Guide KorQuAD upload to leaderboard (EM 68.947 / F1 88.468) model which only use BERT-multilingual(single)
Python
41
star
10

aws-kubeflow

A guideline for basic use and installation of kubeflow in AWS.
Jupyter Notebook
37
star
11

vision-tutorial

Computer Vision Tutorial for Deep Learning Researchers
Python
33
star
12

DeepLearning-Study

This is repository for DeepLearning Study in Kyung Hee University
Python
27
star
13

DAC

Deep Adaptive Image Clustering Paper Implementation
Jupyter Notebook
25
star
14

horovod-ansible

Create Horovod cluster easily using Ansible
HCL
22
star
15

aws-kubeadm-terraform

create kubernetes cluster on AWS only typing 'terraform apply' on 3 minutes.
HCL
16
star
16

kubernetes-glusterfs-aws

file system clustering as glusterfs in kubernetes environment on aws platform
Shell
13
star
17

mlm-pipeline

mlm-pipeline is a cloud architecture that preprocesses the masked language model (mlm)
Python
10
star
18

linux0.11-kernel-code-review

The old Linux kernel source ver 0.11 review with line by line for OS lecture.
C
9
star
19

khuthon2018

딥러닝을 사용한 맛집 분석 - 2018년 쿠톤(해커톤)
JavaScript
8
star
20

projects

MY PROJECT LIST AT A GLANCE 🌈🚀🦄
7
star
21

graykode.github.io

graykode's blog
Shell
4
star
22

ALGORITHM-MASTER

I LOVE ALGORITHM
C++
4
star
23

nlpblock

Use Abstractions Level Block for NLP with Pytorch
Python
3
star
24

intellij-foundry

Kotlin
3
star
25

ml-kubernetes-tutorial

very basic tutorial for who interesting in Machine Learning Serving with Docker, Kubernetes, Kubeflow
3
star
26

mnist-flow

This Project is only repository for solving AI Engineer Party
Python
3
star
27

modelaverage

tf-keras, make the average of model weight in same model.
Python
3
star
28

nonce-python

2019 github seminar in D.COM
HTML
2
star
29

nlp-advance

Simple Paper Implementation Code about all model after Attention is all you need(Transformer)
2
star
30

graykode

1
star
31

ohora

Jupyter Notebook
1
star