• Stars
    star
    141
  • Rank 258,445 (Top 6 %)
  • Language
    Python
  • Created over 6 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Extracting scientific claims from biomedical abstracts (powered by AllenNLP)

Claim Extraction for Scientific Publications

Detecting claim from scientific publication using discourse model and transfer learning. Models are trained using AllenNLP library.

Installing as a package

You can install the package using PIP, which will help you use the discourse classes inside a module

pip install git+https://github.com/titipata/detecting-scientific-claim.git

you will be able to use them as

import discourse
predictor = discourse.DiscourseCRFClassifierPredictor()

Training discourse model

Running AllenNLP to train a discourse model using PubmMedRCT dataset as follows

allennlp train experiments/pubmed_rct.json -s output --include-package discourse

We point data location to Amazon S3 directly in pubmed_rct.json so you do not need to download the data locally. Change cuda_device to -1 in pubmed_rct.json if you want to run on CPU. There are more experiments available in experiments folder.

Note that you have to remove output folder first before running.

Predicting discourse

We trained the Bidirectional LSTM model on structured abstracts from Pubmed to predict discourse probability (RESULTS, METHODS, CONCLUSIONS, BACKGROUND, OBJECTIVE) of a given sentence. You can download trained model from Amazon S3

wget https://s3-us-west-2.amazonaws.com/pubmed-rct/model.tar.gz # or model_crf.tar.gz for pretrained model with CRF layer

and run web service for discourse prediction task as follow

bash web_service.sh

To test the train model with provided examples fixtures.json, simply run the following to predict labels.

allennlp predict model.tar.gz \
    pubmed-rct/PubMed_200k_RCT/fixtures.json \
    --include-package discourse \
    --predictor discourse_predictor

or run the following for

allennlp predict model_crf.tar.gz \
    pubmed-rct/PubMed_200k_RCT/fixtures_crf.json \
    --include-package discourse \
    --predictor discourse_crf_predictor

To evaluate discourse model, you can run the following command

allennlp evaluate model.tar.gz \
  https://s3-us-west-2.amazonaws.com/pubmed-rct/test.json \
  --include-package discourse

Predicting claim (web service)

We use transfer learning with fine tuning to train claim extraction model from pre-trained discourse model. The schematic of the training can be seen below.

You can run the demo web application to detect claims as follows

export FLASK_APP=main.py
flask run --host=0.0.0.0 # this will serve at port 5000

The interface will look something like this

And output will look something like the following (highlight means claim, tag behind the sentence is discourse prediction)

Expertly annotated dataset We release the dataset of annotated 1,500 abstracts containing 11,702 sentences (2,276 annotated as claim sentences) sampled from 110 biomedical journals. The final dataset are the majority vote from three experts. The annotations are hosted on Amazon S3 and can be found from these given URLs.

Requirements

Citing the repository

You can cite our paper available on arXiv as

Achakulvisut, Titipat, Chandra Bhagavatula, Daniel Acuna, and Konrad Kording. "Claim Extraction in Biomedical Publications using Deep Discourse Model and Transfer Learning." arXiv preprint arXiv:1907.00962 (2019).

or using BibTeX

@article{achakulvisut2019claim,
  title={Claim Extraction in Biomedical Publications using Deep Discourse Model and Transfer Learning},
  author={Achakulvisut, Titipat and Bhagavatula, Chandra and Acuna, Daniel and Kording, Konrad},
  journal={arXiv preprint arXiv:1907.00962},
  year={2019}
}

Acknowledgement

This project is done at the Allen Institute for Artificial Intelligence and Konrad Kording lab, University of Pennsylvania

More Repositories

1

pubmed_parser

πŸ“‹ A Python Parser for PubMed Open-Access XML Subset and MEDLINE XML Dataset
Python
571
star
2

scipdf_parser

Python PDF parser for scientific publications: content and figures
Python
309
star
3

paper-reviewer-matcher

Linear programming solver for paper-reviewer matching and mind-matching
Python
82
star
4

arxivpy

Python wrapper for arXiv API
Python
51
star
5

science_concierge

πŸ“» a Python repository for content-based recommendation based on Latent semantic analysis (LSA) topic distance and Rocchio Algorithm, see the implementation interactively on
Python
47
star
6

affiliation_parser

Simple python parser for MEDLINE, Pubmed OA affiliation string
Python
37
star
7

allennlp-tutorial

Tutorial on AllenNLP library with demo "which journal to submit paper?"
Jupyter Notebook
32
star
8

customize_ipython_notebook

🐧 CSS and logo to customize ipython notebook display for Kording lab
Jupyter Notebook
29
star
9

wos_parser

Python parser for Web of science XML, Web of Science parser, WoS parser
Python
26
star
10

grant_database

πŸ’΅ Downloader, preprocessor, parser and deduper for NIH and NSF grants
Python
20
star
11

yelp_dataset_challenge

Play around with Yelp dataset in Python (in progress and very messy repo)
Python
19
star
12

keyphrase_extraction

Implementing keyword extraction algorithm using tf-idf weighting, see
Python
17
star
13

affilparser

Conditional Random Field (CRF) Parser for Affiliation String in MEDLINE and Pubmed OA
Python
13
star
14

penn-events-calendar

University of Pennsylvania events with search and recommendation engine
Python
11
star
15

scrape_google_scholar

Snippet for scraping Google Scholar and transform it to Spark Dataframe
Python
5
star
16

titipata.github.io

Minimal personal page for Titipat.
JavaScript
4
star
17

touchbar-example

sample project of touch bar on the new mac using electron
JavaScript
4
star
18

cooccurence

Simple class for converting documents to co-ocurence matrix
Python
3
star
19

dogbreed

Streamlit demo for dog breed identification/classification
Python
3
star
20

bme469_neural_control_of_movement

reading, homework and project for BME 469 Neural Control of Movement (Spring 2016)
Jupyter Notebook
2
star
21

random_commands

A place to put my note taking
2
star
22

forecast

very simple weather forecast UI
2
star
23

aibuilders-vision

Lessons for AI Builders: Vision Track
Jupyter Notebook
1
star
24

google_scholar_scoreboard

🌽 Real time citation scoreboard from Google Scholar [default for Kording lab]
Python
1
star
25

science_concierge_manuscript

LaTeX document and PDF for Science Concierge Manuscript
TeX
1
star
26

me454_nonlinear_optimal_control

Mathematica project for ME 454 Nonlinear Optimal Control class
Mathematica
1
star
27

be566_network_neuroscience

Analysis code for BE566 projects at UPenn
Python
1
star