• Stars
    star
    173
  • Rank 220,124 (Top 5 %)
  • Language
    Python
  • License
    BSD 2-Clause "Sim...
  • Created over 5 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A neural named entity recognition and multi-type normalization tool for biomedical text mining

BERN

BERN is a BioBERT-based multi-type NER tool that also supports normalization of extracted entities. This repository contains the official implementation of BERN. You can use BERN at https://bern.korea.ac.kr, or host your own server by following the description below. Please refer to our paper (Kim et al., IEEE Access 2019) for more details. This project is done by DMIS Laboratory at Korea University.

[Updates]

***** Check out BERN2, an improved version of BERN with much faster and more accurate inference! *****

Fixed our gene normalizer to respond to issues between 2020-03-12 and 2020-03-13

  1. Download gnormplus-normalization_19.jar at this URL and place (overwrite) the file under normalization/resources/normalizers/gene directory.
  2. Stop normalizers by running stop_normalizers.sh
  3. Start the normalizers by running load_dicts.sh

Done - Server down due to air conditioning problems in our server room 2019-10-10 - 2019-10-11 7:55 AM (UTC-0)

Fixed our disease normalizer 2019-08-19, 2019-08-10 and 2019-08-02 issues

  1. Download disease_normalizer_19.jar at this URL and place the file under normalization/resources/normalizers/disease directory.
  2. Stop normalizers by running stop_normalizers.sh and restart the normalizers by running load_dicts.sh

Done - Server check 2019-07-18 8:20 AM - 1:30 PM (UTC-0)

BERN

Overview of BERN.

The description below gives instructions on hosting your own BERN. Please refer to https://bern.korea.ac.kr for the RESTful Web service of BERN.

Requirements

Note that you will need at least 66 GB of free disk space and 32 GB or more RAM.

Installation

  • Clone this repo
cd
git clone https://github.com/dmis-lab/bern.git
  • Install python packages
pip3 install -r requirements.txt --user
cd ~/bern
wget https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/download/GNormPlus/GNormPlusJava.zip
unzip GNormPlusJava.zip

cd GNormPlusJava
wget -O ./crfpp-0.58.tar.gz https://drive.google.com/uc?id=0B4y35FiV1wh7QVR6VXJ5dWExSTQ
tar xvfz crfpp-0.58.tar.gz
cp -rf CRF++-0.58/* CRF
cd CRF
sh ./configure
make
sudo make install

cd ..
chmod 764 Ab3P
# chmod 764 CRF/crf_test

# Set FocusSpecies to 9606 (Human)
sed -i 's/= All/= 9606/g' setup.txt; echo "FocusSpecies: from All to 9606 (Human)"
sh Installation.sh

rm -r CRF++-0.58
rm crfpp-0.58.tar.gz

# Download GNormPlusServer.jar
wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1g-JlhqeDIlZX5YFk8Y27_M8BXUXcQRSX' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1g-JlhqeDIlZX5YFk8Y27_M8BXUXcQRSX" -O GNormPlusServer.jar && rm -rf /tmp/cookies.txt

# Start GNormPlusServer
nohup java -Xmx16G -Xms16G -jar GNormPlusServer.jar 18895 >> ~/bern/logs/nohup_gnormplus.out 2>&1 &
  • Install tmVar2 & run tmVar2Server.jar
cd ~/bern
wget ftp://ftp.ncbi.nlm.nih.gov/pub/lu/Suppl/tmVar2/tmVarJava.zip
unzip tmVarJava.zip

cd tmVarJava
wget -O ./crfpp-0.58.tar.gz https://drive.google.com/uc?id=0B4y35FiV1wh7QVR6VXJ5dWExSTQ
tar xvfz crfpp-0.58.tar.gz
cp -rf CRF++-0.58/* CRF
cd CRF
sh ./configure
make
sudo make install

cd ..
chmod 764 CRF/crf_test

sh Installation.sh

rm -r CRF++-0.58
rm crfpp-0.58.tar.gz

# Download tmVar2Server.jar
wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1kQYzLHLFLsU9qKpRRGjXkIYmaYK6bPJm' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1kQYzLHLFLsU9qKpRRGjXkIYmaYK6bPJm" -O tmVar2Server.jar && rm -rf /tmp/cookies.txt

# Download dependencies
wget https://repo1.maven.org/maven2/org/xerial/sqlite-jdbc/3.20.0/sqlite-jdbc-3.20.0.jar
wget https://repo1.maven.org/maven2/edu/stanford/nlp/stanford-corenlp/3.5.2/stanford-corenlp-3.5.2.jar

# Start tmVar2Server
nohup java -Xmx8G -Xms8G -jar tmVar2Server.jar 18896 >> ~/bern/logs/nohup_tmvar.out 2>&1 &
  • Download normalization resources and pre-trained BioBERT NER models
cd ~/bern/scripts
sh download_norm.sh
sh download_biobert_ner_models.sh
  • Run named entity normalizers
cd ..
sh load_dicts.sh
  • Run BERN server
# Check your GPU number(s)
echo $CUDA_VISIBLE_DEVICES

# Set your GPU number(s)
export CUDA_VISIBLE_DEVICES=0

# Run BERN
# Please check gnormplus_home directory and tmvar2_home directory.
nohup python3 -u server.py --port 8888 --gnormplus_home ~/bern/GNormPlusJava --gnormplus_port 18895 --tmvar2_home ~/bern/tmVarJava --tmvar2_port 18896 >> logs/nohup_BERN.out 2>&1 &

# Print logs
tail -F logs/nohup_BERN.out
  • Usage
    • PMID(s) (HTTP GET)
      • http://<YOUR_SERVER_ADDRESS>:8888/?pmid=<a PMID or comma seperate PMIDs>&format=<json or pubtator>
      • Example: http://<YOUR_SERVER_ADDRESS>:8888/?pmid=30429607&format=json&indent=true
      • Example: http://<YOUR_SERVER_ADDRESS>:8888/?pmid=30429607&format=pubtator
      • Example: http://<YOUR_SERVER_ADDRESS>:8888/?pmid=30429607,29446767&format=json&indent=true
    • Raw text (HTTP POST)
      • POST Address: http://<YOUR_SERVER_ADDRESS>:8888
      • Set key, value of a body as follows:
      import requests
      import json
      body_data = {"param": json.dumps({"text": "CLAPO syndrome: identification of somatic activating PIK3CA mutations and delineation of the natural history and phenotype. PURPOSE: CLAPO syndrome is a rare vascular disorder characterized by capillary malformation of the lower lip, lymphatic malformation predominant on the face and neck, asymmetry, and partial/generalized overgrowth. Here we tested the hypothesis that, although the genetic cause is not known, the tissue distribution of the clinical manifestations in CLAPO seems to follow a pattern of somatic mosaicism. METHODS: We clinically evaluated a cohort of 13 patients with CLAPO and screened 20 DNA blood/tissue samples from 9 patients using high-throughput, deep sequencing. RESULTS: We identified five activating mutations in the PIK3CA gene in affected tissues from 6 of the 9 patients studied; one of the variants (NM_006218.2:c.248T>C; p.Phe83Ser) has not been previously described in developmental disorders. CONCLUSION: We describe for the first time the presence of somatic activating PIK3CA mutations in patients with CLAPO. We also report an update of the phenotype and natural history of the syndrome."})}
      response = requests.post('http://<YOUR_SERVER_ADDRESS>:8888', data=body_data)
      result_dict = response.json()
      print(result_dict)
      

Result

See a result example in JSON (PMID:29446767)
[
    {
        "denotations": [
            {
                "id": [
                    "MESH:C567763",
                    "BERN:262813101"
                ],
                "obj": "disease",
                "span": {
                    "begin": 0,
                    "end": 13
                }
            },
            {
                "id": [
                    "MIM:171834",
                    "HGNC:8975",
                    "Ensembl:ENSG00000121879",
                    "BERN:324295302"
                ],
                "obj": "gene",
                "span": {
                    "begin": 53,
                    "end": 58
                }
            },
            {
                "id": [
                    "MESH:C567763",
                    "BERN:262813101"
                ],
                "obj": "disease",
                "span": {
                    "begin": 133,
                    "end": 146
                }
            },
            {
                "id": [
                    "MESH:D014652",
                    "BERN:256572101"
                ],
                "obj": "disease",
                "span": {
                    "begin": 158,
                    "end": 174
                }
            },
            {
                "id": [
                    "MESH:C567763",
                    "BERN:262813101"
                ],
                "obj": "disease",
                "span": {
                    "begin": 193,
                    "end": 231
                }
            },
            {
                "id": [
                    "MESH:C567763",
                    "BERN:262813101"
                ],
                "obj": "disease",
                "span": {
                    "begin": 234,
                    "end": 288
                }
            },
            {
                "id": [
                    "MESH:C567763",
                    "BERN:262813101"
                ],
                "obj": "disease",
                "span": {
                    "begin": 589,
                    "end": 593
                }
            },
            {
                "id": [
                    "MIM:171834",
                    "HGNC:8975",
                    "Ensembl:ENSG00000121879",
                    "BERN:324295302"
                ],
                "obj": "gene",
                "span": {
                    "begin": 748,
                    "end": 758
                }
            },
            {
                "id": [
                    "CUI-less"
                ],
                "mutationType": "ProteinMutation",
                "normalizedName": "p.F83S;CorrespondingGene:5290",
                "obj": "mutation",
                "span": {
                    "begin": 857,
                    "end": 866
                }
            },
            {
                "id": [
                    "BERN:257523801"
                ],
                "obj": "disease",
                "span": {
                    "begin": 906,
                    "end": 928
                }
            },
            {
                "id": [
                    "CUI-less"
                ],
                "obj": "gene",
                "span": {
                    "begin": 1009,
                    "end": 1024
                }
            },
            {
                "id": [
                    "MESH:C567763",
                    "BERN:262813101"
                ],
                "obj": "disease",
                "span": {
                    "begin": 1043,
                    "end": 1047
                }
            }
        ],
        "elapsed_time": {
            "ner": 0.611,
            "normalization": 0.218,
            "tmtool": 1.281,
            "total": 2.111
        },
        "project": "BERN",
        "sourcedb": "PubMed",
        "sourceid": "29446767",
        "text": "CLAPO syndrome: identification of somatic activating PIK3CA mutations and delineation of the natural history and phenotype. PURPOSE: CLAPO syndrome is a rare vascular disorder characterized by capillary malformation of the lower lip, lymphatic malformation predominant on the face and neck, asymmetry, and partial/generalized overgrowth. Here we tested the hypothesis that, although the genetic cause is not known, the tissue distribution of the clinical manifestations in CLAPO seems to follow a pattern of somatic mosaicism. METHODS: We clinically evaluated a cohort of 13 patients with CLAPO and screened 20 DNA blood/tissue samples from 9 patients using high-throughput, deep sequencing. RESULTS: We identified five activating mutations in the PIK3CA gene in affected tissues from 6 of the 9 patients studied; one of the variants (NM_006218.2:c.248T>C; p.Phe83Ser) has not been previously described in developmental disorders. CONCLUSION: We describe for the first time the presence of somatic activating PIK3CA mutations in patients with CLAPO. We also report an update of the phenotype and natural history of the syndrome.",
        "timestamp": "Thu Jul 04 06:15:27 +0000 2019"
    }
]

Restart

# Start GNormPlusServer
cd ~/bern/GNormPlusJava
nohup java -Xmx16G -Xms16G -jar GNormPlusServer.jar 18895 >> ~/bern/logs/nohup_gnormplus.out 2>&1 &

# Start tmVar2Server
cd ~/bern/tmVarJava
nohup java -Xmx8G -Xms8G -jar tmVar2Server.jar 18896 >> ~/bern/logs/nohup_tmvar.out 2>&1 &

# Start normalizers
cd ~/bern/
sh load_dicts.sh

# Check your GPU number(s)
echo $CUDA_VISIBLE_DEVICES

# Set your GPU number(s)
export CUDA_VISIBLE_DEVICES=0

# Run BERN
nohup python3 -u server.py --port 8888 --gnormplus_home ~/bern/GNormPlusJava --gnormplus_port 18895 --tmvar2_home ~/bern/tmVarJava --tmvar2_port 18896 >> logs/nohup_BERN.out 2>&1 &

# Print logs
tail -F logs/nohup_BERN.out

Troubleshooting

Monitoring

  • List processes (every 5s)
watch -n 5 "ps auxww | egrep 'python|java|node' | grep -v grep"
  • Periodic HTTPS GET checker

    • Permission setting
    chmod +x scripts/bern_checker.sh
    
    • crontab (every 30 min)
    crontab -e
    */30 * * * * /home/<YOUR_ACCOUNT>/bern/scripts/bern_checker.sh >> /home/<YOUR_ACCOUNT>/bern/logs/bern_checker.out 2>&1
    

Bug report

Add a new issue to https://github.com/dmis-lab/bern/issues

Contact

[email protected]

Citation

  • Please cite the following two papers if you use BERN on your work.
@article{kim2019neural,
  title={A Neural Named Entity Recognition and Multi-Type Normalization Tool for Biomedical Text Mining},
  author={Kim, Donghyeon and Lee, Jinhyuk and So, Chan Ho and Jeon, Hwisang and Jeong, Minbyul and Choi, Yonghwa and Yoon, Wonjin and Sung, Mujeen and and Kang, Jaewoo},
  journal={IEEE Access},
  volume={7},
  pages={73729--73740},
  year={2019},
  publisher={IEEE}
}

@article{10.1093/bioinformatics/btz682,
    author = {Lee, Jinhyuk and Yoon, Wonjin and Kim, Sungdong and Kim, Donghyeon and Kim, Sunkyu and So, Chan Ho and Kang, Jaewoo},
    title = "{BioBERT: a pre-trained biomedical language representation model for biomedical text mining}",
    journal = {Bioinformatics},
    year = {2019},
    month = {09},
    issn = {1367-4803},
    doi = {10.1093/bioinformatics/btz682},
    url = {https://doi.org/10.1093/bioinformatics/btz682},
}

More Repositories

1

biobert

Bioinformatics'2020: BioBERT: a pre-trained biomedical language representation model for biomedical text mining
Python
1,929
star
2

biobert-pytorch

PyTorch Implementation of BioBERT
Java
300
star
3

BERN2

BERN2: an advanced neural biomedical namedentity recognition and normalization tool
Python
170
star
4

BioSyn

ACL'2020: Biomedical Entity Representations with Synonym Marginalization
Python
160
star
5

hats

HATS: A Hierarchical Graph Attention Network for Stock Movement Prediction
Python
147
star
6

bioasq-biobert

Pre-trained Language Model for Biomedical Question Answering
Python
122
star
7

GeNER

Simple Questions Generate Named Entity Recognition Datasets (EMNLP 2022)
Python
74
star
8

KitcheNette

KitcheNette: Predicting and Recommending Food Ingredient Pairings using Siamese Neural Networks
Python
69
star
9

covidAsk

covidAsk: Answering Questions on COVID-19 in Real-Time
Python
64
star
10

BioLAMA

EMNLP'2021: Can Language Models be Biomedical Knowledge Bases?
Python
54
star
11

ReSimNet

Implementation of ReSimNet for drug response similarity prediction
Jupyter Notebook
36
star
12

OLAPH

OLAPH: Improving Factuality in Biomedical Long-form Question Answering
Python
36
star
13

PerceiverCPI

Bioinformatics'2022 PerceiverCPI: A nested cross-attention network for compound-protein interaction prediction
Python
34
star
14

self-biorag

ISMB'24 "Self-BioRAG: Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models"
Python
33
star
15

excord

Learn to Resolve Conversational Dependency: A Consistency Training Framework for Conversational Question Answering (Kim et al., ACL 2021)
Python
31
star
16

position-bias

EMNLP'2020: Look at the First Sentence: Position Bias in Question Answering
Python
29
star
17

TouR

Findings of ACL'2023: Optimizing Test-Time Query Representations for Dense Retrieval
Python
29
star
18

nesa

NESA: Neural Event Scheduling Assistant
Python
27
star
19

LIQUID

LIQUID: A Framework for List Question Anwering Dataset Generation (AAAI 2023)
Python
22
star
20

tbinet

TBiNet: A deep neural network for predicting transcription factor binding sites using attention mechanism
Jupyter Notebook
22
star
21

demographic-prediction

Predicting Multiple Demographic Attributes with Task Specific Embedding Transformation and Attention Network
Python
19
star
22

CompAct

[EMNLP 2024] CompAct: Compressing Retrieved Documents Actively for Question Answering
Python
16
star
23

VAECox

ISMB 2020: Improved survival analysis by learning shared genomic information from pan-cancer data
Python
16
star
24

ANGEL

Learning from Negative samples for Biomedical Generative Entity Linking
Python
15
star
25

moable

Predicting mechanism of action of novelcompounds using compound structure andtranscriptomic signature co-embedding
Python
13
star
26

cookingsense

CookingSense: A Culinary Knowledgebase with Multidisciplinary Assertions (LREC-COLING 2024)
Python
12
star
27

ConNER

Bioinformatics'2023: Consistency Enhancement of Model Prediction on Document-level Named Entity Recognition
Python
11
star
28

SeqTagQA

Sequence Tagging for Biomedical Extractive Question Answering (Bioinformatics'2020)
Python
11
star
29

ArkDTA

Python
11
star
30

bioasq8b

Transferability of Natural Language Inference to Biomedical Question Answering
Python
11
star
31

AdvSR

Adversarial Subword Regularization forRobust Neural Machine Translation
Python
10
star
32

MulinforCPI

MulinforCPI: enhancing precision of compound-protein interaction prediction through novel perspectives on multi-level information integration
Python
9
star
33

RecipeMind

RecipeMind: Guiding Ingredient Choices from Food Pairing to Recipe Completition using Cascaded Set Transformer (Mogan Gim et al., 2022)
Jupyter Notebook
8
star
34

KAZU-NER-module

EMNLP 2022: Biomedical NER for the Enterprise with Distillated BERN2 and the Kazu Framework
Python
8
star
35

CRADLE-VAE

Python
7
star
36

trnet

TRNet: A neural network model for predicting drug induced gene expression profiles
Python
6
star
37

KitchenScale

KitchenScale: Learning Food Numeracy from Recipes through Context-Aware Ingredient Quantity Prediction
Python
6
star
38

bioner-generalization

How Do Your Biomedical Named Entity Recognition Models Generalize to Novel Entities?
Python
6
star
39

bc7-chem-id

DMIS at BioCreative VII NLMChem Track
Python
5
star
40

MolPLA

Python
5
star
41

bioasq9b-dmis

KU-DMIS at BioASQ 9
Jupyter Notebook
4
star
42

GLIT

GLIT: A Graph Neural Network for Drug-inducedLiver Injury Prediction using Transcriptome Data
Python
3
star
43

ParaCLIP

Fine-tuning CLIP Text Encoders with Two-step Paraphrasing (EACL 2024, Findings)
Python
3
star
44

arpnet

ARPNet: Antidepressant Response Prediction Network for Major Depressive Disorder
Python
2
star
45

bio-entity-extractor

Java
2
star
46

SMURF

SMURF: Machine learning pipeline for discovering cancer type specific driver mutations and diagnostic markers
Jupyter Notebook
2
star
47

LAPIS

Python
1
star
48

RAG2

1
star