Category prediction model
This repo contains AllenNLP model for prediction of Named Entity categories by its mentions.
Data
Fake data
You can generate some fake data using this Notebook
Real data (Work in progress)
Filtered OneShotWikilinks dataset with manually selected categories.
Data preparation steps
- Crete category graph build_category_graph.ipynb
- Produces:
category_graph.pkl
- Produces:
- Obtain the list of Person articles from Ontology obtain_people_articles.ipynb:
- Requires:
dbpedia_2016-10.owl
- Produces:
people_categories.json
- Requires:
- Build mapping from article to people categories generate_full_people_categories.ipynb. Requires
people_categories.json
category_graph.pkl
projects/categories_prediction/manual_categories.gsheet
- Filter mentions for people filter_mentions.ipynb.
- Requires:
people_all_categories.json
- Produces:
people_mentions.tsv
- Requires:
Prepare splitted data with:
!split -n l/10 --verbose ../data/fake_data_train.tsv ../data/fake_data_train.tsv_
Install
pip install -r requirements.txt
Run
Train
rm -rf ./data/vocabulary ; allennlp make-vocab -s ./data/ allen_conf_vocab.json --include-package category_prediction
allennlp train -f -s data/stats allen_conf.json --include-package category_prediction
allennlp train -f -s data/stats allen_conf.json --include-package category_prediction -o '{"trainer": {"cuda_device": 0}}'
Continue training with different params
rm -rf data/stats2/ # Clear new serialization dir
allennlp fine-tune -s data/stats2/ -c allen_conf.json -m ./data/stats/model.tar.gz --include-package category_prediction -o '{"trainer": {"cuda_device": 0}, "iterator": {"base_iterator": {"batch_size": 64}}}'
Validate
allennlp evaluate ./data/stats/model.tar.gz ./data/fake_data_test.tsv --include-package category_prediction
Server
Debug
MODEL=./data/trained_models/6th_augmented/model.tar.gz python run_server.py
Prod
gunicorn -c gunicorn_config.py wsgi:application
Docker
Build
cd docker
docker build --tag mention .
Run with passing pyenv into container
docker run --rm --restart unless-stopped -v $HOME:$HOME -p 8000:8000 \
-v $HOME/.pyenv:/root/.pyenv \
-e ENV_PATH=$HOME/virtualenv/path \
-e APP_PATH=$HOME/project/root/path mention
GCE related notes
Fix 100% GPU utilization
sudo nvidia-smi -pm 1