• Stars
    star
    132
  • Rank 274,205 (Top 6 %)
  • Language
    Python
  • License
    MIT License
  • Created over 7 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Some recipes of natural language pre-processing

Natural Language Pre-processing

This repository includes some recipes of natural language pre-processing.

The list of recipes are as follows:

  • Data cleaner
  • Word normalization
  • Stopwords remover
  • Tokenizer
  • Word Vector

Install

To install required modules, simply:

$ pip install -r requirements.txt

Setup

First, you should download livedoor news corpus and extract it. For downloading the corpus, please execute following command:

$ cd src/data
$ python make_dataset.py

Now, you can ready for classification!

Start jupyter notebook:

$ jupyter notebook

And you can execute notebooks/document_classification.ipynb.

Good NLP Life!

Licence

MIT

Author

Hironsan

More Repositories

1

BossSensor

Hide screen when boss is approaching.
Python
6,197
star
2

awesome-embedding-models

A curated list of awesome embedding models tutorials, projects and communities.
Jupyter Notebook
1,740
star
3

anago

Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.
Python
1,481
star
4

bertsearch

Elasticsearch with BERT for advanced document search.
Python
894
star
5

HotPepperGourmetDialogue

Restaurant Search System through Dialogue in Japanese.
Python
271
star
6

HateSonar

Hate Speech Detection Library for Python.
Jupyter Notebook
187
star
7

asari

Japanese sentiment analyzer implemented in Python.
Python
143
star
8

ja.text8

Japanese text8 corpus for word embedding.
Python
108
star
9

keras-crf-layer

Implementation of CRF layer in Keras.
Python
74
star
10

IOB2Corpus

Japanese IOB2 tagged corpus for Named Entity Recognition.
60
star
11

neraug

A text augmentation tool for named entity recognition.
Python
53
star
12

WikipediaQA

HTML
46
star
13

google-vision-sampler

Code examples for Google Vision API.
46
star
14

tensorflow-nlp-examples

TensorFlow Examples for Natural Language Processing
Python
32
star
15

awesome-text-classification

Text classification meets word embeddings.
Python
30
star
16

google-natural-language-sampler

Code examples for Google Natural Language API.
13
star
17

sentiment-analysis-toolbox

Sentiment analysis toolbox for all NLPer.
Jupyter Notebook
11
star
18

wiki-article-dataset

Wikipedia article dataset
Jupyter Notebook
11
star
19

kintone-handson

Python
10
star
20

japanese-news-crawler

A complete automated japanese news crawler built on the top of Scrapy framework
Python
8
star
21

protext

Python library for processing Japanese text.
Python
8
star
22

ChatDeTornado

CSS
5
star
23

TatsujinDaifugo

コンピュータ大貧民のクライアント作成を通じて、達人プログラマに近づくためのサービス
JavaScript
5
star
24

PyFaceRecognizer

Python
4
star
25

uecda-pyclient

Standard UECda Client written in Python.
Python
3
star
26

CourseraMachineLearning

Python
3
star
27

Internship

Python
2
star
28

spacy-hearst

Hearst patterns, for finding hyponyms, written in Python and spaCy.
Python
1
star
29

Hironsan

1
star