• Stars
    star
    155
  • Rank 240,864 (Top 5 %)
  • Language
    Python
  • License
    MIT License
  • Created over 7 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Python biomedical relation extraction package that uses a supervised approach (i.e. needs training data).

Kindred

Kindred is a Python3 package for relation extraction in biomedical texts. Given some training data, it can build a model to identify relations between entities (e.g. drugs, genes, etc) in a sentence.

Installation

You can install "kindred" via pip from PyPI

pip install kindred

Kindred relies on the Spacy toolkit for parsing. After installing kindred (which also installs spacy), you will need to install a Spacy language model. For instance, the command below installs the English language model::

python -m spacy download en_core_web_sm

Usage

Check out the tutorial that goes through a simple use case of extracting capital cities from text. More details and the full documentation can be found at readthedocs.

BioNLP Shared Task Example

import kindred

# Load the SeeDev corpus
trainCorpus = kindred.bionlpst.load('2016-SeeDev-binary-train')
devCorpus = kindred.bionlpst.load('2016-SeeDev-binary-dev')

# Create a copy of the dev corpus to make predictions on
predictionCorpus = devCorpus.clone()
predictionCorpus.removeRelations()

# Create a relation classifier, train it and make predictions
classifier = kindred.RelationClassifier()
classifier.train(trainCorpus)
classifier.predict(predictionCorpus)

# Get the F1 score of the predicted relations
f1score = kindred.evaluate(devCorpus, predictionCorpus, metric='f1score')

PubAnnotation Example

corpus = kindred.pubannotation.load('bionlp-st-gro-2013-development')

PubTator Example

corpus = kindred.pubtator.load([19894120,19894121])

Input Formats

Kindred can load several formats, including BioNLP Shared Task, JSON, BioC XML and a simple tag format. Check out the file format documentation for example data and code.

Citing

It would be wonderful if you could cite the associated paper for this package if used in any academic research.

@article{lever2017painless,
   title={Painless {R}elation {E}xtraction with {K}indred},
   author={Lever, Jake and Jones, Steven},
   journal={BioNLP 2017},
   pages={176--183},
   year={2017}
}

Contributing

Contributions are very welcome.

License

Distributed under the terms of the MIT license, "kindred" is free and open source software

Issues

If you encounter any problems, please file an issue along with a detailed description.

More Repositories

1

pubrunner

A framework for keeping biomedical text mining result up-to-date
Python
41
star
2

cancermine

Text-mined knowledgebase for drivers, oncogenes and tumor suppressors in cancer
R
40
star
3

pgxmine

Text mining for pharmacogenomic associations for PharmGKB
TeX
26
star
4

civicmine

Text mining cancer biomarkers for the CIVIC database
Python
21
star
5

biowordlists

Biomedical wordlists (of drugs, genes, etc) for several text mining projects
Python
16
star
6

biotext

Get a nicely-chunked local copy of the biomedical literature (to use for other projects)!
Python
13
star
7

knowledgediscovery

Analysis code for knowledge discovery project
Python
12
star
8

VERSE

Vancouver Event and Relation System for Extraction
Python
12
star
9

corona-ml

Machine learning to text-mine coronavirus research for CoronaCentral.ai
Python
9
star
10

brokenlinks

Tool for scanning a website for broken links
Python
6
star
11

ubcthesis_bookdown

Skeleton code for a UBC thesis using R bookdown
TeX
5
star
12

corona-web

Website for viewing database of Coronavirus research at CoronaCentral.ai
CSS
5
star
13

GNBR

Tools to work with Global Network of Biomedical Relationships data
Python
5
star
14

OpenSesamIE

A Open Information Extraction tool for extracting relations from text
Python
3
star
15

bio2vec

Scripts to run word2vec on PubMed corpus files
Python
2
star
16

glasgowcs_labtest

Testing framework for online labs
Python
1
star
17

annotator

Python
1
star
18

text_classification_in_R

Demonstration of some text classification methods in R
R
1
star
19

tumorantigens

Text mining tumor antigens for cancer immunotherapy research
Python
1
star
20

blah4

Code for BLAH4 hackathon project in Kashiwa, Japan - Knowledge discovery with relations
Python
1
star
21

pubtator

Running various NER tools on big corpora
Perl
1
star
22

foodrelations

Nutrigenomics example text mining project using PubRunner
Python
1
star
23

docstringtest

Little package to test if docstrings are up-to-date in Python code
Python
1
star
24

textasdata_bugbounty

Bug bounty programme for TextAsData course
Python
1
star
25

thesis

My PhD thesis on biomedical text mining
TeX
1
star