• Stars
    star
    163
  • Rank 223,886 (Top 5 %)
  • Language
    Python
  • License
    BSD 2-Clause "Sim...
  • Created over 2 years ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

BERN2: an advanced neural biomedical namedentity recognition and normalization tool

BERN2

We present BERN2 (Advanced Biomedical Entity Recognition and Normalization), a tool that improves the previous neural network-based NER tool by employing a multi-task NER model and neural network-based NEN models to achieve much faster and more accurate inference. This repository provides a way to host your own BERN2 server. Currently, BERN2 is running on a hosting server with 64-core CPU, 512GB Memory, and 12GB GPU. See our paper for more details.

***** Try BERN2 at http://bern2.korea.ac.kr *****

Updates

  • [Apr 26, 2023] The BERN2 server will be shut down due to an internal server issue and is expected to be restored by 2022-04-26 12:00 PM (UTC-0). Sorry for the inconvenience.
  • [Feb 13, 2023] The BERN2 server is currently down due to an internal server issue and is expected to be restored by 2022-02-19 12:00 PM (UTC-0). Sorry for the inconvenience.
  • [Jun 26, 2022] We updated our resource file (resources_v1.1.b.tar.gz) to address the issue regarding CRF++. (issue #17).
  • [Apr 14, 2022] We updated our resource file (resources_v1.1.a.tar.gz) to address the issue where BERN2 is not working on Windows (issue #4).
  • [Apr 14, 2022] We increased the API limit of our web service from '100 reqeusts per 100 seconds' to '300 requests per 100 seconds' per user.
  • [Mar 18, 2022] On the web service, we set the API limit of 100 requests per 100 seconds per user. For bulk requests, we highly recommend you to use the local installation.
  • [Mar 17, 2022] BERN2 v1.1 has been released. Please see the release page for more information on what's new in this version.
  • [Feb 15, 2022] Bioregistry is used to standardize prefixes for normalized entity identifiers.
old new
MESH:D009369 mesh:D009369
OMIM:608627 mim:608627
CL_0000021 CL:0000021
CVCL_J260 cellosaurus:CVCL_J260
NCBI:txid10095 NCBITaxon:10095
EntrezGene:10533 NCBIGene:10533

Installing BERN2

You first need to install BERN2 and its dependencies.

# Install torch with conda (please check your CUDA version)
conda create -n bern2 python=3.7
conda activate bern2
conda install pytorch==1.9.0 cudatoolkit=10.2 -c pytorch
conda install faiss-gpu libfaiss-avx2 -c conda-forge

# Check if cuda is available
python -c "import torch;print(torch.cuda.is_available())"

# Install BERN2
git clone [email protected]:dmis-lab/BERN2.git
cd BERN2
pip install -r requirements.txt

(Optional) If you want to use mongodb as a caching database, you need to install and run it.

# https://docs.mongodb.com/manual/tutorial/install-mongodb-on-ubuntu/#install-mongodb-community-edition-using-deb-packages
sudo systemctl start mongod
sudo systemctl status mongod

Then, you need to download resources (e.g., external modules or dictionaries) for running BERN2. Note that you will need 70GB of free disk space. You can also download the resource file from google drive.

wget http://nlp.dmis.korea.edu/projects/bern2-sung-et-al-2022/resources_v1.1.b.tar.gz
tar -zxvf resources_v1.1.b.tar.gz
md5sum resources_v1.1.b.tar.gz
# make sure the md5sum is 'c0db4e303d1ccf6bf56b42eda2fe05d0'
rm -rf resources_v1.1.b.tar.gz

# (For Linux Users) install CRF 
cd resources/GNormPlusJava
tar -zxvf CRF++-0.58.tar.gz
mv CRF++-0.58 CRF
cd CRF
./configure --prefix="$HOME"
make
make install
cd ../../..

# (For Windows Users) install CRF 
cd resources/GNormPlusJava
unzip -zxvf CRF++-0.58.zip
mv CRF++-0.58 CRF
cd ../..

Running BERN2

The minimum memory requirement for running BERN2 on GPU is 63.5GB of RAM & 5.05GB of GPU. The following command runs BERN2.

export CUDA_VISIBLE_DEVICES=0
cd scripts

# For Linux
bash run_bern2.sh

# For Windows
bash run_bern2_windows.sh

(Optional) To restart BERN2, you need to run the following commands.

export CUDA_VISIBLE_DEVICES=0
cd scripts
bash stop_bern2.sh
bash run_bern2.sh

Using BERN2

After successfully running BERN2 in your local environment, you can access it via RESTful API. If you want to use BERN2 without installing it locally, please see here for instructions on how to use the web service.

Plain Text as Input

import requests

def query_plain(text, url="http://localhost:8888/plain"):
    return requests.post(url, json={'text': text}).json()

if __name__ == '__main__':
    text = "Autophagy maintains tumour growth through circulating arginine."
    print(query_plain(text))

PubMed ID (PMID) as Input

import requests

def query_pmid(pmids, url="http://localhost:8888/pubmed"):
    return requests.get(url + "/" + ",".join(pmids)).json()

if __name__ == '__main__':
    pmids = ["30429607", "29446767"]
    print(query_pmid(pmids))

Annotations

wget http://nlp.dmis.korea.edu/projects/bern2-sung-et-al-2022/annotation_v1.1.tar.gz

NER and normalization for 33.4+ millions of PubMed articles (pubmed22n0001 ~ pubmed22n1114 (2021.12.12)) generated by BERN2 v1.1 (Compressed, 22 GB). The data provided by BERN2 is post-processed and may differ from the most current/accurate data available from U.S. National Library of Medicine (NLM).

Citation

@article{sung2022bern2,
    title={BERN2: an advanced neural biomedical namedentity recognition and normalization tool}, 
    author={Sung, Mujeen and Jeong, Minbyul and Choi, Yonghwa and Kim, Donghyeon and Lee, Jinhyuk and Kang, Jaewoo},
    year={2022},
    eprint={2201.02080},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Contact Information

For help or issues using BERN2, please submit a GitHub issue. Please contact Mujeen Sung (mujeensung (at) korea.ac.kr), or Minbyul Jeong (minbyuljeong (at) korea.ac.kr) for communication related to BERN2.

More Repositories

1

biobert

Bioinformatics'2020: BioBERT: a pre-trained biomedical language representation model for biomedical text mining
Python
1,847
star
2

biobert-pytorch

PyTorch Implementation of BioBERT
Java
289
star
3

bern

A neural named entity recognition and multi-type normalization tool for biomedical text mining
Python
167
star
4

BioSyn

ACL'2020: Biomedical Entity Representations with Synonym Marginalization
Python
156
star
5

hats

HATS: A Hierarchical Graph Attention Network for Stock Movement Prediction
Python
138
star
6

bioasq-biobert

Pre-trained Language Model for Biomedical Question Answering
Python
119
star
7

GeNER

Simple Questions Generate Named Entity Recognition Datasets (EMNLP 2022)
Python
73
star
8

KitcheNette

KitcheNette: Predicting and Recommending Food Ingredient Pairings using Siamese Neural Networks
Python
65
star
9

covidAsk

covidAsk: Answering Questions on COVID-19 in Real-Time
Python
64
star
10

BioLAMA

EMNLP'2021: Can Language Models be Biomedical Knowledge Bases?
Python
54
star
11

ReSimNet

Implementation of ReSimNet for drug response similarity prediction
Jupyter Notebook
34
star
12

PerceiverCPI

Bioinformatics'2022 PerceiverCPI: A nested cross-attention network for compound-protein interaction prediction
Python
33
star
13

excord

Learn to Resolve Conversational Dependency: A Consistency Training Framework for Conversational Question Answering (Kim et al., ACL 2021)
Python
31
star
14

position-bias

EMNLP'2020: Look at the First Sentence: Position Bias in Question Answering
Python
29
star
15

nesa

NESA: Neural Event Scheduling Assistant
Python
28
star
16

TouR

Findings of ACL'2023: Optimizing Test-Time Query Representations for Dense Retrieval
Python
28
star
17

tbinet

TBiNet: A deep neural network for predicting transcription factor binding sites using attention mechanism
Jupyter Notebook
21
star
18

demographic-prediction

Predicting Multiple Demographic Attributes with Task Specific Embedding Transformation and Attention Network
Python
19
star
19

LIQUID

LIQUID: A Framework for List Question Anwering Dataset Generation (AAAI 2023)
Python
15
star
20

VAECox

ISMB 2020: Improved survival analysis by learning shared genomic information from pan-cancer data
Python
15
star
21

moable

Predicting mechanism of action of novelcompounds using compound structure andtranscriptomic signature co-embedding
Python
13
star
22

self-biorag

ISMB/ECCB'24 "Self-BioRAG: Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models"
Python
13
star
23

SeqTagQA

Sequence Tagging for Biomedical Extractive Question Answering (Bioinformatics'2020)
Python
11
star
24

ArkDTA

Python
11
star
25

bioasq8b

Transferability of Natural Language Inference to Biomedical Question Answering
Python
11
star
26

ConNER

Bioinformatics'2023: Consistency Enhancement of Model Prediction on Document-level Named Entity Recognition
Python
10
star
27

AdvSR

Adversarial Subword Regularization forRobust Neural Machine Translation
Python
10
star
28

MulinforCPI

MulinforCPI: enhancing precision of compound-protein interaction prediction through novel perspectives on multi-level information integration
Python
8
star
29

KAZU-NER-module

EMNLP 2022: Biomedical NER for the Enterprise with Distillated BERN2 and the Kazu Framework
Python
8
star
30

RecipeMind

RecipeMind: Guiding Ingredient Choices from Food Pairing to Recipe Completition using Cascaded Set Transformer (Mogan Gim et al., 2022)
Jupyter Notebook
7
star
31

trnet

TRNet: A neural network model for predicting drug induced gene expression profiles
Python
6
star
32

bioner-generalization

How Do Your Biomedical Named Entity Recognition Models Generalize to Novel Entities?
Python
6
star
33

bc7-chem-id

DMIS at BioCreative VII NLMChem Track
Python
5
star
34

KitchenScale

KitchenScale: Learning Food Numeracy from Recipes through Context-Aware Ingredient Quantity Prediction
Python
5
star
35

bioasq9b-dmis

KU-DMIS at BioASQ 9
Jupyter Notebook
4
star
36

GLIT

GLIT: A Graph Neural Network for Drug-inducedLiver Injury Prediction using Transcriptome Data
Python
3
star
37

MolPLA

Python
3
star
38

arpnet

ARPNet: Antidepressant Response Prediction Network for Major Depressive Disorder
Python
2
star
39

bio-entity-extractor

Java
2
star
40

SMURF

SMURF: Machine learning pipeline for discovering cancer type specific driver mutations and diagnostic markers
Jupyter Notebook
2
star