AxCell: Automatic Extraction of Results from Machine Learning Papers
This repository is the official implementation of AxCell: Automatic Extraction of Results from Machine Learning Papers.
Requirements
To create a conda environment named axcell
and install requirements run:
conda env create -f environment.yml
Additionally, axcell
requires docker
(that can be run without sudo
). Run scripts/pull_docker_images.sh
to download necessary images.
Datasets
We publish the following datasets:
See datasets notebook for an example of how to load the datasets provided below. The extraction notebook shows how to use axcell
to extract text and tables from papers.
Evaluation
See the evaluation notebook for the full example on how to evaluate AxCell on the PWCLeaderboards dataset.
Training
- pre-training language model on the ArxivPapers dataset
- table type classifier and table segmentation on the SegmentedResults dataset
Pre-trained Models
You can download pretrained models here:
- axcell — an archive containing the taxonomy, abbreviations, table type classifier and table segmentation model. See the results-extraction notebook for an example of how to load and run the models
- language model — ULMFiT language model pretrained on the ArxivPapers dataset
Results
AxCell achieves the following performance:
Dataset | Macro F1 | Micro F1 |
---|---|---|
PWC Leaderboards | 21.1 | 28.7 |
NLP-TDMS | 19.7 | 25.8 |
License
AxCell is released under the Apache 2.0 license.
Citation
The pipeline is described in the following paper:
@inproceedings{axcell,
title={AxCell: Automatic Extraction of Results from Machine Learning Papers},
author={Marcin Kardas and Piotr Czapla and Pontus Stenetorp and Sebastian Ruder and Sebastian Riedel and Ross Taylor and Robert Stojnic},
year={2020},
booktitle={2004.14356}
}