Automatic SOTA (state-of-the-art) extraction
Aggregate public SOTA tables that are shared under free licences.
Download the scraped data or run the scrapers yourself to get the latest data.
In the future, we are planning to automate the process of extracting tasks, datasets and results from papers.
Getting the data
The data is kept in the data directory. All data is shared under the CC-BY-SA-4 licence.
The data has been parsed into a consistent JSON format, described below.
JSON format description
The format consists of five primary data types: Task
, Dataset
, Sota
, SotaRow
and Link
.
A valid JSON file is a list of Task
objects. You can see examples in the data/tasks folder.
Task
A Task
consists of the following fields:
task
- name of the task (string)description
- short description of the task, in markdown (string)subtasks
- a list of zero or moreTask
objects that are children to this task (list)datasets
- a list of zero or moreDataset
objects on which the tasks are evaluated (list)source_link
- an optionalLink
object to the original source of the task
Dataset
A Dataset
consists of the following fields:
dataset
- name of the dataset (string)description
- a short description in markdown (string)subdatasets
- zero or more childrenDataset
objects (e.g. dataset subsets or dataset partitions) (list)dataset_links
- zero or moreLink
objects, representing the links to the dataset download page or any other relevant external pages (list)dataset_citations"
- zero or moreLink
objects, representing the papers that are the primary citations for the datasetsota
- theSota
object representing the state-of-the-art table on this dataset
Link
A Link
object describes a URL, and has these two fields:
title
- title of the link, i.e. anchor text (string)url
- target URL (string)
Sota
A Sota
object represents one state-of-the-art table, with these fields:
metrics
- a list of metric names used to evaluate the methods (list of strings)rows
a list of rows in the SOTA table, a list ofSotaRow
objects (list)
SotaRow
A SotaRow
object represents one line of the SOTA table, it has these fields:
model_name
- Name of the model evaluated (string)paper_title
- Primary paper's title (string)paper_url
- Primary paper's URL (string)paper_date
- Paper date of publishing, if available (string)code_links
- a list of zero or moreLink
objects, with links to relevant code implementations (list)model_links
- a list of zero or moreLink
objects, with links to relevant pretrained model files (list)metrics
- a dictionary of values, where the keys are string from the parentSota.rows
list, and the values are the measured performance. (dictionary)
Running the scrapers
Installation
Requires Python 3.6+.
pip install -r requirements.txt
NLP-progress
NLP-progress is a hand-annotated collection of SOTA results from NLP tasks.
The scraper is part of the NLP-progress project.
Licence: MIT
EFF
EFF has annotated a set of SOTA results on a small number of tasks, and produced this great report.
To convert the current content run:
python -m scrapers.eff
Licence: CC-BY-SA-4
SQuAD
The Stanford Question Answering Dataset is an active project for evaluating the question answering task using a hidden test set.
To scrape the current content run:
python -m scrapers.squad
Licence: CC-BY-SA-4
RedditSota
The RedditSota repository lists the best method for a variety of tasks across all of ML.
To scrape the current content run:
python -m scrapers.redditsota
Licence: Apache-2
SNLI
The The Stanford Natural Language Inference (SNLI) Corpus is an active project for Natural Language Inference.
To scrape the current content run:
python -m scrapers.snli
Licence: CC-BY-SA
Cityscapes
Cityscapes is a benchmark for semantic segmentation.
To scrape the current content run:
python -m scrapers.cityscapes
Evaluating the SOTA extraction performance
In the future, this repository will also contain the automatic SOTA extraction pipeline. The aim is to automatically extract tasks, datasets and results from papers.
To evaluate the current prediction performance for all tasks:
python -m extractor.eval_all
The most current report can be seen here: eval_all_report.csv.