• Stars
    star
    185
  • Rank 208,271 (Top 5 %)
  • Language
    Shell
  • Created over 4 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

xfspell — the Transformer Spell Checker

xfspell — the Transformer Spell Checker

xfspell — the Transformer Spell Checker

This is a Transformer-based English spell checker trained on 7M+ generated parallel sentences.

Usage

  • Clone this repository
  • Create a virtual environment (e.g., python3 -m venv .pyenv)
  • Install requirements (e.g., pip install -r requirements.txt)
  • Download the pretrained model and extract the content (tar zxvf model7m.tar.gz)
  • Run:
$ echo "tisimptant too spll chck ths dcment." \
    | python src/tokenize.py \
    | fairseq-interactive model7m/ \
    --path model7m/checkpoint_best.pt \
    --source-lang fr --target-lang en --beam 10 \
   | python src/format_fairseq_output.py
It's important to spell check this document.

Fun examples

This spell checker clearly understands the long-range structures of the language:

$ echo "The book Tom and Jerry put on the yellow desk yesterday war about NLP."
    | python src/tokenize.py \
    | fairseq-interactive model7m/ \
    --path model7m/checkpoint_best.pt \
    --source-lang fr --target-lang en \
   | python src/format_fairseq_output.py
The book Tom and Jerry put on the yellow desk yesterday was about NLP.
$ echo "The books Tom and Jerry put on the yellow desk yesterday war about NLP." \
    | python src/tokenize.py \
    | fairseq-interactive model7m/ \
    --path model7m/checkpoint_best.pt \
    --source-lang fr --target-lang en \
   | python src/format_fairseq_output.py
The books Tom and Jerry put on the yellow desk yesterday were about NLP.

How it's built

See The Unreasonable Effectiveness of the Transformer Spell Checker.

More Repositories

1

100-nlp-papers

100 Must-Read NLP Papers
3,722
star
2

github-typo-corpus

GitHub Typo Corpus: A Large-Scale Multilingual Dataset of Misspellings and Grammatical Errors
Python
480
star
3

realworldnlp

Example code for "Real-World Natural Language Processing"
Python
328
star
4

nanigonet

NanigoNet — Language detector for code-mixed input supporting 150+19 human+programming languages using deep neural networks
Python
70
star
5

cc-kedict

cc-kedict: Creative Commons Korean-English Dictionary
Python
41
star
6

zmifanva

zmifanva - Lojban ↔ English Machine Translation Engine
Python
37
star
7

nltk

NKTL Japanese related files
Python
22
star
8

enja.kdict.org

The world's fastest online dictionary
HTML
15
star
9

camxes.js

Lojban Parser written in JavaScript. Based on camxes.
JavaScript
15
star
10

cll-ja

Japanese summary translation of "The Complete Lojban Language"
XSLT
15
star
11

paper-reviews

10
star
12

awesome-japanese-nlp

📖 A curated list of resources for Japanese Natural Language Processing (NLP)
6
star
13

nlproc-cookbook

Python
6
star
14

chinese-nlp

mhagiwara's Chinese (language) related files
Python
4
star
15

LojbanDictionary

Swift
3
star
16

mhagiwara.github.io

Masato Hagiwara's user pages
HTML
3
star
17

deepnlp-kata

Deep NLP Kata - Practice Exercises for Deep Learning and Natural Language Processing
HTML
3
star
18

universalscripts

Parametrized Universal Scripts—generation model trained from all the scripts in the world
Jupyter Notebook
2
star
19

runway-distilgpt

DistilGPT2 model for Runway ML
Python
2
star
20

runway-e2e-tts

Real-time text-to-speech using ParallelWaveGAN
Python
2
star
21

nes-music-with-transformer

CSS
1
star
22

www.aimlbooks.com

HTML
1
star
23

www.realworldnlpbook.com

HTML
1
star
24

englishforhackers.com

HTML
1
star
25

fcg.sharedtask.org

Feedback Comment Generation Shared Task
HTML
1
star
26

aiml-dict-ja

AIML-dict-ja ― オープンソースの AI (人工知能)・ML (機械学習) 用語辞典
Python
1
star
27

szdict

Creative Commons Chinese-English Dictionary of Tech Terms
Python
1
star