• Stars
    star
    158
  • Rank 235,762 (Top 5 %)
  • Language
    Python
  • License
    MIT License
  • Created over 5 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Pytorch implemention of Deep CNN Encoder + LSTM Decoder with Attention for Image to Latex

Im2Latex

License

Deep CNN Encoder + LSTM Decoder with Attention for Image to Latex, the pytorch implemention of the model architecture used by the Seq2Seq for LaTeX generation

Sample results from this implemention

sample_result

Experimental results on the IM2LATEX-100K test dataset

BLUE-4 Edit Distance Exact Match
40.80 44.23 0.27

Getting Started

Install dependency:

pip install -r requirement.txt

Download the dataset for training:

cd data
wget http://lstm.seas.harvard.edu/latex/data/im2latex_validate_filter.lst
wget http://lstm.seas.harvard.edu/latex/data/im2latex_train_filter.lst
wget http://lstm.seas.harvard.edu/latex/data/im2latex_test_filter.lst
wget http://lstm.seas.harvard.edu/latex/data/formula_images_processed.tar.gz
wget http://lstm.seas.harvard.edu/latex/data/im2latex_formulas.norm.lst
tar -zxvf formula_images_processed.tar.gz

Preprocess:

python preprocess.py

Build vocab

python build_vocab.py

Train:

 python train.py \
      --data_path=[data dir] \
      --save_dir=[the dir for saving ckpts] \
      --dropout=0.2 --add_position_features \
      --epoches=25 --max_len=150

Evaluate:

python evaluate.py --split=test \
     --model_path=[the path to model] \
     --data_path=[data dir] \
     --batch_size=32 \
     --ref_path=[the file to store reference] \
     --result_path=[the file to store decoding result]

Features