• This repository has been archived on 27/May/2022
  • Stars
    star
    1,252
  • Rank 36,200 (Top 0.8 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 3 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

PORORO: Platform Of neuRal mOdels for natuRal language prOcessing

PORORO: Platform Of neuRal mOdels for natuRal language prOcessing

GitHub release Apache 2.0 Docs Issues


pororo performs Natural Language Processing and Speech-related tasks.

It is easy to solve various subtasks in the natural language and speech processing field by simply passing the task name.


Installation

  • pororo is based on torch=1.6(cuda 10.1) and python>=3.6

  • You can install a package through the command below:

pip install pororo
  • Or you can install it locally:
git clone https://github.com/kakaobrain/pororo.git
cd pororo
pip install -e .
  • For library installation for specific tasks other than the common modules, please refer to INSTALL.md

  • For the utilization of Automatic Speech Recognition, wav2letter should be installed separately. For the installation, please run the asr-install.sh

bash asr-install.sh
  • For the utilization of Speech Synthesis, please run the tts-install.sh
bash tts-install.sh
  • Speech Synthesis samples can be found here

Usage

  • pororo can be used as follows:
  • First, in order to import pororo, you must execute the following snippet
>>> from pororo import Pororo
  • After the import, you can check the tasks currently supported by the pororo through the following commands
>>> from pororo import Pororo
>>> Pororo.available_tasks()
"Available tasks are ['mrc', 'rc', 'qa', 'question_answering', 'machine_reading_comprehension', 'reading_comprehension', 'sentiment', 'sentiment_analysis', 'nli', 'natural_language_inference', 'inference', 'fill', 'fill_in_blank', 'fib', 'para', 'pi', 'cse', 'contextual_subword_embedding', 'similarity', 'sts', 'semantic_textual_similarity', 'sentence_similarity', 'sentvec', 'sentence_embedding', 'sentence_vector', 'se', 'inflection', 'morphological_inflection', 'g2p', 'grapheme_to_phoneme', 'grapheme_to_phoneme_conversion', 'w2v', 'wordvec', 'word2vec', 'word_vector', 'word_embedding', 'tokenize', 'tokenise', 'tokenization', 'tokenisation', 'tok', 'segmentation', 'seg', 'mt', 'machine_translation', 'translation', 'pos', 'tag', 'pos_tagging', 'tagging', 'const', 'constituency', 'constituency_parsing', 'cp', 'pg', 'collocation', 'collocate', 'col', 'word_translation', 'wt', 'summarization', 'summarisation', 'text_summarization', 'text_summarisation', 'summary', 'gec', 'review', 'review_scoring', 'lemmatization', 'lemmatisation', 'lemma', 'ner', 'named_entity_recognition', 'entity_recognition', 'zero-topic', 'dp', 'dep_parse', 'caption', 'captioning', 'asr', 'speech_recognition', 'st', 'speech_translation', 'ocr', 'srl', 'semantic_role_labeling', 'p2g', 'aes', 'essay', 'qg', 'question_generation', 'age_suitability']"
  • To check which models are supported by each task, you can go through the following process
>>> from pororo import Pororo
>>> Pororo.available_models("collocation")
'Available models for collocation are ([lang]: ko, [model]: kollocate), ([lang]: en, [model]: collocate.en), ([lang]: ja, [model]: collocate.ja), ([lang]: zh, [model]: collocate.zh)'
  • If you want to perform a specific task, you can put the task name in the task argument and the language name in the lang argument
>>> from pororo import Pororo
>>> ner = Pororo(task="ner", lang="en")
  • After object construction, it can be used in a way that passes the input value as follows:
>>> ner("Michael Jeffrey Jordan (born February 17, 1963) is an American businessman and former professional basketball player.")
[('Michael Jeffrey Jordan', 'PERSON'), ('(', 'O'), ('born', 'O'), ('February 17, 1963)', 'DATE'), ('is', 'O'), ('an', 'O'), ('American', 'NORP'), ('businessman', 'O'), ('and', 'O'), ('former', 'O'), ('professional', 'O'), ('basketball', 'O'), ('player', 'O'), ('.', 'O')]
  • If task supports multiple languages, you can change the lang argument to take advantage of models trained in different languages.
>>> ner = Pororo(task="ner", lang="ko")
>>> ner("마이클 제프리 조던(영어: Michael Jeffrey Jordan, 1963년 2월 17일 ~ )은 미국의 은퇴한 농구 선수이다.")
[('마이클 제프리 조던', 'PERSON'), ('(', 'O'), ('영어', 'CIVILIZATION'), (':', 'O'), (' ', 'O'), ('Michael Jeffrey Jordan', 'PERSON'), (',', 'O'), (' ', 'O'), ('1963년 2월 17일 ~', 'DATE'), (' ', 'O'), (')은', 'O'), (' ', 'O'), ('미국', 'LOCATION'), ('의', 'O'), (' ', 'O'), ('은퇴한', 'O'), (' ', 'O'), ('농구 선수', 'CIVILIZATION'), ('이다.', 'O')]
>>> ner = Pororo(task="ner", lang="ja")
>>> ner("マイケル・ジェフリー・ジョーダンは、アメリカ合衆国の元バスケットボール選手")
[('マイケル・ジェフリー・ジョーダン', 'PERSON'), ('は', 'O'), ('、アメリカ合衆国', 'O'), ('の', 'O'), ('元', 'O'), ('バスケットボール', 'O'), ('選手', 'O')]
>>> ner = Pororo(task="ner", lang="zh")
>>> ner("麥可·傑佛瑞·喬丹是美國退役NBA職業籃球運動員,也是一名商人,現任夏洛特黃蜂董事長及主要股東")
[('麥可·傑佛瑞·喬丹', 'PERSON'), ('是', 'O'), ('美國', 'GPE'), ('退', 'O'), ('役', 'O'), ('nba', 'ORG'), ('職', 'O'), ('業', 'O'), ('籃', 'O'), ('球', 'O'), ('運', 'O'), ('動', 'O'), ('員', 'O'), (',', 'O'), ('也', 'O'), ('是', 'O'), ('一', 'O'), ('名', 'O'), ('商', 'O'), ('人', 'O'), (',', 'O'), ('現', 'O'), ('任', 'O'), ('夏洛特黃蜂', 'ORG'), ('董', 'O'), ('事', 'O'), ('長', 'O'), ('及', 'O'), ('主', 'O'), ('要', 'O'), ('股', 'O'), ('東', 'O')]
  • If the task supports multiple models, you can change the model argument to use another model.
>>> from pororo import Pororo
>>> mt = Pororo(task="mt", lang="multi", model="transformer.large.multi.mtpg")
>>> fast_mt = Pororo(task="mt", lang="multi", model="transformer.large.multi.fast.mtpg")

Documentation

For more detailed information, see full documentation

If you have any questions or requests, please report the issue.


Citation

If you apply this library to any project and research, please cite our code:

@misc{pororo,
  author       = {Heo, Hoon and Ko, Hyunwoong and Kim, Soohwan and
                  Han, Gunsoo and Park, Jiwoo and Park, Kyubyong},
  title        = {PORORO: Platform Of neuRal mOdels for natuRal language prOcessing},
  howpublished = {\url{https://github.com/kakaobrain/pororo}},
  year         = {2021},
}

Contributors

Hoon Heo, Hyunwoong Ko, Soohwan Kim, Gunsoo Han, Jiwoo Park and Kyubyong Park


License

PORORO project is licensed under the terms of the Apache License 2.0.

Copyright 2021 Kakao Brain Corp. https://www.kakaobrain.com All Rights Reserved.

More Repositories

1

fast-autoaugment

Official Implementation of 'Fast AutoAugment' in PyTorch.
Python
1,581
star
2

nerf-factory

An awesome PyTorch NeRF library
Python
1,239
star
3

coyo-dataset

COYO-700M: Large-scale Image-Text Pair Dataset
Python
1,062
star
4

kogpt

KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)
Python
994
star
5

torchgpipe

A GPipe implementation in PyTorch
Python
776
star
6

karlo

Python
679
star
7

rq-vae-transformer

The official implementation of Autoregressive Image Generation using Residual Quantization (CVPR '22)
Jupyter Notebook
669
star
8

mindall-e

PyTorch implementation of a 1.3B text-to-image generation model trained on 14 million image-text pairs
Python
630
star
9

honeybee

Official implementation of project Honeybee (CVPR 2024)
Python
370
star
10

word2word

Easy-to-use word-to-word translations for 3,564 language pairs.
Python
350
star
11

torchlars

A LARS implementation in PyTorch
Python
326
star
12

g2pm

A Neural Grapheme-to-Phoneme Conversion Package for Mandarin Chinese Based on a New Open Benchmark Dataset
Python
326
star
13

kor-nlu-datasets

KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding
283
star
14

trident

A performance library for machine learning applications.
Python
176
star
15

autoclint

A specially designed light version of Fast AutoAugment
Python
170
star
16

sparse-detr

PyTorch Implementation of Sparse DETR
Python
150
star
17

hotr

Official repository for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'21, Oral Presentation)
Python
132
star
18

kortok

The code and models for "An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks" (AACL-IJCNLP 2020)
Python
114
star
19

scrl

PyTorch Implementation of Spatially Consistent Representation Learning(SCRL)
Python
108
star
20

bassl

Python
108
star
21

flame

Official implementation of the paper "FLAME: Free-form Language-based Motion Synthesis & Editing"
Python
103
star
22

tcl

Official implementation of TCL (CVPR 2023)
Python
98
star
23

brain-agent

Brain Agent for Large-Scale and Multi-Task Agent Learning
Python
92
star
24

helo-word

Team Kakao&Brain's Grammatical Error Correction System for the ACL 2019 BEA Shared Task
Python
88
star
25

miro

Official PyTorch implementation of MIRO (ECCV 2022)
Python
82
star
26

jejueo

Jejueo Datasets for Machine Translation and Speech Synthesis
Python
74
star
27

solvent

Python
66
star
28

noc

Jupyter Notebook
44
star
29

cxr-clip

Python
43
star
30

expgan

Python
41
star
31

autowu

Official repository for Automated Learning Rate Scheduler for Large-Batch Training (8th ICML Workshop on AutoML)
Python
39
star
32

nvs-adapter

Python
33
star
33

ginr-ipc

The official implementation of Generalizable Implicit Neural Representations with Instance Pattern Composers(CVPR’23 highlight).
Python
30
star
34

coyo-vit

ViT trained on COYO-Labeled-300M dataset
Python
28
star
35

irm-empirical-study

An Empirical Study of Invariant Risk Minimization
Python
28
star
36

coyo-align

ALIGN trained on COYO-dataset
Python
25
star
37

magvlt

The official implementation of MAGVLT: Masked Generative Vision-and-Language Transformer (CVPR'23)
Python
21
star
38

hqtransformer

Locally Hierarchical Auto-Regressive Modeling for Image Generation (HQ-Transformer)
Jupyter Notebook
21
star
39

CheXGPT

Python
17
star
40

learning-loss-for-tta

"Learning Loss for Test-Time Augmentation (NeurIPS 2020)"
Python
8
star
41

stg

Official implementation of Selective Token Generation (COLING'22)
Jupyter Notebook
8
star
42

leco

Official implementation of LECO (NeurIPS'22)
Python
5
star
43

bc-hyperopt-example

brain cloud hyperopt example (mnist)
Python
3
star