• Stars
    star
    1,481
  • Rank 31,647 (Top 0.7 %)
  • Language
    Python
  • License
    MIT License
  • Created over 7 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.

anaGo

Codacy Badge

anaGo is a Python library for sequence labeling(NER, PoS Tagging,...), implemented in Keras.

anaGo can solve sequence labeling tasks such as named entity recognition (NER), part-of-speech tagging (POS tagging), semantic role labeling (SRL) and so on. Unlike traditional sequence labeling solver, anaGo don't need to define any language dependent features. Thus, we can easily use anaGo for any languages.

As an example of anaGo, the following image shows named entity recognition in English:

anaGo Demo

English NER

Get Started

In anaGo, the simplest type of model is the Sequence model. Sequence model includes essential methods like fit, score, analyze and save/load. For more complex features, you should use the anaGo modules such as models, preprocessing and so on.

Here is the data loader:

>>> from anago.utils import load_data_and_labels

>>> x_train, y_train = load_data_and_labels('train.txt')
>>> x_test, y_test = load_data_and_labels('test.txt')
>>> x_train[0]
['EU', 'rejects', 'German', 'call', 'to', 'boycott', 'British', 'lamb', '.']
>>> y_train[0]
['B-ORG', 'O', 'B-MISC', 'O', 'O', 'O', 'B-MISC', 'O', 'O']

You can now iterate on your training data in batches:

>>> import anago

>>> model = anago.Sequence()
>>> model.fit(x_train, y_train, epochs=15)
Epoch 1/15
541/541 [==============================] - 166s 307ms/step - loss: 12.9774
...

Evaluate your performance in one line:

>>> model.score(x_test, y_test)
0.802  # f1-micro score
# For more performance, you have to use pre-trained word embeddings.
# For now, anaGo's best score is 90.94 f1-micro score.

Or tagging text on new data:

>>> text = 'President Obama is speaking at the White House.'
>>> model.analyze(text)
{
    "words": [
        "President",
        "Obama",
        "is",
        "speaking",
        "at",
        "the",
        "White",
        "House."
    ],
    "entities": [
        {
            "beginOffset": 1,
            "endOffset": 2,
            "score": 1,
            "text": "Obama",
            "type": "PER"
        },
        {
            "beginOffset": 6,
            "endOffset": 8,
            "score": 1,
            "text": "White House.",
            "type": "LOC"
        }
    ]
}

To download a pre-trained model, call download function:

>>> from anago.utils import download

>>> url = 'https://s3-ap-northeast-1.amazonaws.com/dev.tech-sketch.jp/chakki/public/conll2003_en.zip'
>>> weights, params, preprocessor = download(url)
>>> model = anago.Sequence.load(weights, params, preprocessor)
>>> model.score(x_test, y_test)
0.909446369856927

If you want to use ELMo for better performance(f1: 92.22), you can use ELModel and ELMoTransformer:

# Transforming datasets.
p = ELMoTransformer()
p.fit(x_train, y_train)

# Building a model.
model = ELModel(...)
model, loss = model.build()
model.compile(loss=loss, optimizer='adam')

# Training the model.
trainer = Trainer(model, preprocessor=p)
trainer.train(x_train, y_train, x_test, y_test)

For futher details, see anago/examples/elmo_example.py.

Feature Support

anaGo supports following features:

  • Model Training
  • Model Evaluation
  • Tagging Text
  • Custom Model Support
  • Downloading pre-trained model
  • GPU Support
  • Character feature
  • CRF Support
  • Custom Callback Support
  • ๐Ÿ’ฅ(new) ELMo

anaGo officially supports Python 3.4โ€“3.6.

Installation

To install anaGo, simply use pip:

$ pip install anago

or install from the repository:

$ git clone https://github.com/Hironsan/anago.git
$ cd anago
$ python setup.py install

Documentation

(coming soon)

Reference

This library is based on the following papers:

More Repositories

1

BossSensor

Hide screen when boss is approaching.
Python
6,197
star
2

awesome-embedding-models

A curated list of awesome embedding models tutorials, projects and communities.
Jupyter Notebook
1,740
star
3

bertsearch

Elasticsearch with BERT for advanced document search.
Python
894
star
4

HotPepperGourmetDialogue

Restaurant Search System through Dialogue in Japanese.
Python
271
star
5

HateSonar

Hate Speech Detection Library for Python.
Jupyter Notebook
187
star
6

asari

Japanese sentiment analyzer implemented in Python.
Python
143
star
7

natural-language-preprocessings

Some recipes of natural language pre-processing
Python
132
star
8

ja.text8

Japanese text8 corpus for word embedding.
Python
108
star
9

keras-crf-layer

Implementation of CRF layer in Keras.
Python
74
star
10

IOB2Corpus

Japanese IOB2 tagged corpus for Named Entity Recognition.
60
star
11

neraug

A text augmentation tool for named entity recognition.
Python
53
star
12

WikipediaQA

HTML
46
star
13

google-vision-sampler

Code examples for Google Vision API.
46
star
14

tensorflow-nlp-examples

TensorFlow Examples for Natural Language Processing
Python
32
star
15

awesome-text-classification

Text classification meets word embeddings.
Python
30
star
16

google-natural-language-sampler

Code examples for Google Natural Language API.
13
star
17

sentiment-analysis-toolbox

Sentiment analysis toolbox for all NLPer.
Jupyter Notebook
11
star
18

wiki-article-dataset

Wikipedia article dataset
Jupyter Notebook
11
star
19

kintone-handson

Python
10
star
20

japanese-news-crawler

A complete automated japanese news crawler built on the top of Scrapy framework
Python
8
star
21

protext

Python library for processing Japanese text.
Python
8
star
22

ChatDeTornado

CSS
5
star
23

TatsujinDaifugo

ใ‚ณใƒณใƒ”ใƒฅใƒผใ‚ฟๅคง่ฒงๆฐ‘ใฎใ‚ฏใƒฉใ‚คใ‚ขใƒณใƒˆไฝœๆˆใ‚’้€šใ˜ใฆใ€้”ไบบใƒ—ใƒญใ‚ฐใƒฉใƒžใซ่ฟ‘ใฅใใŸใ‚ใฎใ‚ตใƒผใƒ“ใ‚น
JavaScript
5
star
24

PyFaceRecognizer

Python
4
star
25

uecda-pyclient

Standard UECda Client written in Python.
Python
3
star
26

CourseraMachineLearning

Python
3
star
27

Internship

Python
2
star
28

spacy-hearst

Hearst patterns, for finding hyponyms, written in Python and spaCy.
Python
1
star
29

Hironsan

1
star