Hironsan/natural-language-preprocessings

Stars
132
Rank 274,205 (Top 6 %)
Language
Python
License
MIT License
Created over 7 years ago
Updated over 1 year ago

Hironsan/natural-language-preprocessings

Hironsan

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Some recipes of natural language pre-processing

Natural Language Pre-processing

This repository includes some recipes of natural language pre-processing.

The list of recipes are as follows:

Data cleaner
Word normalization
Stopwords remover
Tokenizer
Word Vector

Install

To install required modules, simply:

$ pip install -r requirements.txt

Setup

First, you should download livedoor news corpus and extract it. For downloading the corpus, please execute following command:

$ cd src/data
$ python make_dataset.py

Now, you can ready for classification!

Start jupyter notebook:

$ jupyter notebook

And you can execute notebooks/document_classification.ipynb.

Good NLP Life!

Licence

Author

BossSensor

Hide screen when boss is approaching.

awesome-embedding-models

A curated list of awesome embedding models tutorials, projects and communities.

Jupyter Notebook

anago

Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.

bertsearch

Elasticsearch with BERT for advanced document search.

HotPepperGourmetDialogue

Restaurant Search System through Dialogue in Japanese.

HateSonar

Hate Speech Detection Library for Python.

Jupyter Notebook

asari

Japanese sentiment analyzer implemented in Python.

ja.text8

Japanese text8 corpus for word embedding.

keras-crf-layer

Implementation of CRF layer in Keras.

IOB2Corpus

Japanese IOB2 tagged corpus for Named Entity Recognition.

neraug

A text augmentation tool for named entity recognition.

WikipediaQA

google-vision-sampler

Code examples for Google Vision API.

tensorflow-nlp-examples

TensorFlow Examples for Natural Language Processing

awesome-text-classification

Text classification meets word embeddings.

google-natural-language-sampler

Code examples for Google Natural Language API.

sentiment-analysis-toolbox

Sentiment analysis toolbox for all NLPer.

Jupyter Notebook

wiki-article-dataset

Wikipedia article dataset

Jupyter Notebook

kintone-handson

japanese-news-crawler

A complete automated japanese news crawler built on the top of Scrapy framework

protext

Python library for processing Japanese text.

ChatDeTornado

TatsujinDaifugo

コンピュータ大貧民のクライアント作成を通じて、達人プログラマに近づくためのサービス

PyFaceRecognizer

uecda-pyclient

Standard UECda Client written in Python.

CourseraMachineLearning

Internship

spacy-hearst

Hearst patterns, for finding hyponyms, written in Python and spaCy.

Hironsan