Deep Text Classifier
Implementation of document classification model described in Hierarchical Attention Networks for Document Classification (Yang et al., 2016).
How to run
- Create a virtual environment, activate it, and install requirements:
python3 -m venv env
source env/bin/activate
pip install -r requirements.txt
- Download the English model for spaCy:
python -m spacy download en
- Get Yelp review dataset and extract it in this directory.
python3 yelp_prepare.py dataset/review.json
python3 worker.py --mode=train --device=/gpu:0 --batch-size=30
Results
I am getting 65% accuracy on a dev set (16% of data) after 3 epochs. Results reported in the paper are 71% on Yelp'15. No systemic hyperparameter optimization was performed.