• Stars
    star
    136
  • Rank 267,670 (Top 6 %)
  • Language
    Python
  • License
    MIT License
  • Created over 3 years ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

BioPhi is an open-source antibody design platform. It features methods for automated antibody humanization (Sapiens), humanness evaluation (OASis) and an interface for computer-assisted antibody sequence design.




Build & Test BioConda Install Latest GitHub release follow on Twitter

BioPhi is an open-source antibody design platform. It features methods for automated antibody humanization (Sapiens), humanness evaluation (OASis) and an interface for computer-assisted antibody sequence design.

Learn more in the BioPhi, Sapiens and OASis in our publication:

David Prihoda, Jad Maamary, Andrew Waight, Veronica Juan, Laurence Fayadat-Dilman, Daniel Svozil & Danny A. Bitton (2022) BioPhi: A platform for antibody design, humanization, and humanness evaluation based on natural antibody repertoires and deep learning, mAbs, 14:1, DOI: https://doi.org/10.1080/19420862.2021.2020203

The university-hosted BioPhi web server is available at: http://biophi.dichlab.org

For more information about the Sapiens antibody language model, see the Sapiens repository

The data and notebooks supporting the analysis are found in the BioPhi-2021-publication repository

Intro video

BioPhi Intro Video

Contributing

BioPhi is an open and extensible platform, contributions are welcome.

If you have ideas about what to improve or which tools could be integrated, please submit any feature requests using the Issues tab.

Running BioPhi on your machine

If you don't want to use the public BioPhi server, you can run BioPhi on your own machine.

1. Download OASis database

To run BioPhi with OASis humanness evaluation locally, you will need to download and unzip the OASis database file (22GB uncompressed).

# Download database file
wget https://zenodo.org/record/5164685/files/OASis_9mers_v1.db.gz
# Unzip
gunzip OASis_9mers_v1.db.gz

2. Install BioPhi using Conda

You can install BioPhi using Conda or one of the alternatives (Miniconda, Miniforge).

Install BioPhi using:

# Recommended: Create a separate BioPhi environment
conda create -n biophi python=3.9
conda activate biophi

# Install BioPhi 
# Using Bioconda and Conda-Forge channels
conda install biophi -c bioconda -c conda-forge --override-channels

If conda installation fails, you can try running using Docker. See Run BioPhi using provided Docker image.

3. Run simplified server

# Set up path to OASis database (downloaded and unzipped)
export OASIS_DB_PATH=/path/to/downloaded/OASis_9mers_v1.db

# Run simplified BioPhi server (not for live deployment!)
biophi web

Note: This is simplified usage for local use only. See Deploying your own BioPhi server section below to learn about deploying BioPhi properly on a server.

Run BioPhi using provided Docker image

First, download OASis DB as described above.

Then, run a simplified BioPhi server using the provided Docker image:

docker run \
    -v /your/absolute/path/to/oasis/directory/:/data \
    -e OASIS_DB_PATH=/data/OASis_9mers_v1.db \
    -p 5000:5000 \
    quay.io/biocontainers/biophi:1.0.5--pyhdfd78af_0 \
    biophi web --host 0.0.0.0

The application will be accessible at localhost:5000.

Note: This is simplified usage for local use only. See Deploying your own BioPhi server section below to learn about deploying BioPhi properly on a server.

BioPhi command-line interface

BioPhi also provides a command-line interface that enables bulk processing.

See more
# Get humanized FASTA
# Expected input: Both chains of each antibody should have the same ID
#                 with an optional _VL/_VH or _HC/_LC suffix
biophi sapiens mabs.fa --fasta-only --output humanized.fa

# Run full humanization & humanness evaluation pipeline
biophi sapiens mabs.fa \
    --oasis-db path/to/downloaded/OASis_9mers_v1.db \
    --output humanized/

# Get the Sapiens probability matrix (score of each residue at each position)
biophi sapiens mabs.fa --scores-only --output scores.csv

# Get mean Sapiens score (one score for each sequence)
biophi sapiens mabs.fa --mean-score-only --output scores.csv

# Get OASis humanness evaluation
biophi oasis mabs.fa \
    --oasis-db path/to/downloaded/OASis_9mers_v1.db \
    --output oasis.xlsx

Development

BioPhi is composed of three services that need to be running at the same time:

  • web: Flask web server that handles both the frontend and the backend of the web application
  • celery: Asynchronous worker service(s) that process long-running tasks
  • redis: In-memory database for storing celery queue tasks and results

Run BioPhi dev server through Docker Compose

Running through Docker Compose is easiest in terms of setup, but web server autoreload is not supported, so you will have to restart the services after each code update.

See more

1. Install Docker

See https://docs.docker.com/get-docker/

2. Clone this repository

Download or clone this repository using:

git clone https://github.com/Merck/BioPhi.git

3. Download OASis DB

Download OASis database as described above. Put it in local data/ dir inside the project folder.

4. Build all images using Docker Compose

# Open BioPhi directory
cd BioPhi    
# Build docker image using Makefile
make docker-build
# or directly using
docker-compose build

4. Run all services using Docker Compose

# Run using Makefile
make docker-run
# or directly using
docker-compose up

The application will be accessible at localhost:5000.

To build and run, you can use:

# Run using Makefile
make docker-build docker-run
# or directly using
docker-compose up --build

5. Handle code updates

After your code is updated, you will need to stop the services, run build and start again. See the next section for info on running locally with flask auto-reload.

Run BioPhi dev server using Conda

Running each service locally using Conda will enable flask auto-reload, which is useful if you are going back and forth between your IDE and the browser.

See more

1. Install Conda

Install Conda or one of the alternatives (Miniconda, Miniforge)

2. Install Redis server

Install and run Redis server. On Mac, you can install Redis using Brew.

3. Clone this repository

Download or clone this repository using:

git clone https://github.com/Merck/BioPhi.git

4. Download OASis DB

Download OASis database as described above.

5. Setup environment

# Open BioPhi directory
cd BioPhi    
# Install dependencies using the provided Makefile
make env
# Or directly using
conda env create -n biophi -f environment.yml
conda activate biophi
pip install -e . --no-deps

6. Run all services

You will have to run each service in a separate terminal (Use Cmd+T to open a new tab):

# Run Redis server (this depends on your installation, the server might already be running)
redis-server

# In a separate terminal, run celery worker queue
export OASIS_DB_PATH=/path/to/OASis_9mers_v1.db
make celery

# In a separate terminal, run flask web server
export OASIS_DB_PATH=/path/to/OASis_9mers_v1.db
make web

See the provided

7. Handle code updates

After your code is updated, the flask web service should refresh automatically. However, the celery service needs to be stopped and started manually, so you will need to do that if you update code that is executed from the workers.

Deploying your own BioPhi server

You can deploy your own internal BioPhi server. You will need to run the three separate services - the flask web server, the celery worker and the redis database.

This will depend on your platform and your cloud provider, the easiest deployment is using Docker Compose through the provided docker-compose.yml file.

For 🐧 Ubuntu deployment, feel free to copy the deployment configs used on the public university server: lich-uct/biophi.dichlab.org

Acknowledgements

BioPhi is based on antibody repertoires from the Observed Antibody Space:

Kovaltsuk, A., Leem, J., Kelm, S., Snowden, J., Deane, C. M., & Krawczyk, K. (2018). Observed Antibody Space: A Resource for Data Mining Next-Generation Sequencing of Antibody Repertoires. The Journal of Immunology, 201(8), 2502–2509. https://doi.org/10.4049/jimmunol.1800708

Antibody numbering is performed using ANARCI:

Dunbar, J., & Deane, C. M. (2016). ANARCI: Antigen receptor numbering and receptor classification. Bioinformatics, 32(2), 298–300. https://doi.org/10.1093/bioinformatics/btv552

More Repositories

1

deepbgc

BGC Detection and Classification Using Deep Learning
Jupyter Notebook
122
star
2

Halyard

Halyard is an extremely horizontally scalable Triplestore with support for Named Graphs, designed for integration of extremely large Semantic Data Models, and for storage and SPARQL 1.1 querying of the whole Linked Data universe snapshots.
Java
106
star
3

r2rtf

Easily Create Production-Ready Rich Text Format (RTF) Table and Figure
R
76
star
4

DeepNeuralNet-QSAR

Python
64
star
5

matcher

Matcher is a tool for understanding how chemical structure optimization problems have been solved. Matcher enables deep control over searching structure/activity relationships (SAR) derived from large datasets, and takes the form of an accessible web application with simple deployment. Matcher is built around the mmpdb platform.
Python
48
star
6

rdf2x

RDF2X converts big RDF datasets to the relational database model, CSV, JSON and ElasticSearch.
Java
47
star
7

Sapiens

Sapiens is a human antibody language model based on BERT.
Jupyter Notebook
44
star
8

pkglite

Compact Package Representations
R
30
star
9

sonar-r-plugin

Adds support for R language into SonarQube. It uses output from lintr tool which is processed by the plugin and uploaded into SonarQube server.
Java
23
star
10

Line-of-Therapy-Algorithm

This is the Line of Therapy Algorithm, as described in the paper "Temporal phenotyping by mining healthcare data to derive lines of therapy for cancer" pending submission in the Journal of Biomedical Informatics.
Python
23
star
11

gsDesign2

Group Sequential Design Under Non-Proportional Hazards
R
19
star
12

AlgebraicAgents.jl

A lightweight framework to enable hierarchical, heterogeneous dynamical systems co-integration. Batteries included!
Julia
17
star
13

simtrial

Clinical trial simulation for time-to-event endpoints
R
17
star
14

metalite.ae

An R package for standard adverse events analysis
R
17
star
15

BioPhi-2021-publication

This repository contains scripts, data and jupyter notebooks used to produce the evaluation results in the BioPhi 2021 publication
Jupyter Notebook
15
star
16

metalite

An R package to create metadata structure for ADaM data analysis and reporting
R
15
star
17

PepSeA

Python
14
star
18

Mutation_Maker

Application for mutagenic primer design. Facilitates development of biocatalysts (Green Chemistry) and new therapeutic proteins.
Python
14
star
19

ReactiveDynamics.jl

A Julia package that implements a category of reaction (transportation) network-type dynamical systems.
Julia
14
star
20

boxly

Interactive box plot for clinical trial analysis
R
13
star
21

mRNAid

Jupyter Notebook
11
star
22

forestly

Interactive forest plot for adverse events analysis
R
11
star
23

pmpo

Probabilistic Multi-Parameter Optimization (pMPO)
Python
11
star
24

bgc-pipeline

Jupyter Notebook
9
star
25

AbLEF

Antibody Langauge Ensemble Fusion - fuses antibody structural ensemble and language representation for property prediction
Python
8
star
26

gMCPLite

Lightweight graph-based multiple comparison procedures
R
8
star
27

GeneratedExpressions.jl

A Julia package that implements a metalanguage to support expression comprehensions.
Julia
8
star
28

Data-Profiler

Java
8
star
29

gMCPShiny

A Shiny app for graphical multiplicity control
R
7
star
30

NNGP

Nearest Neighbor Gaussian Process
7
star
31

CEEDesigns.jl

A decision-making framework for the cost-efficient design of experiments, balancing the value of acquired experimental evidence and incurred costs.
Julia
7
star
32

rtdpy

Residence Time Distribution modeling in Python.
Python
6
star
33

matcher-mmpdb

Python
5
star
34

MolPROP

fuses molecular language and graph representation for property prediction
Python
5
star
35

Real-world-Time-to-Treatment-Discontinuation-Prediction-Algorithm

Real-world Time to Treatment Discontinuation Prediction Algorithm
Perl
4
star
36

compoundcomplexity

This is an implementation of Compound Complexity for use in the SMART-PMI as described by Sherer et al. It contains derived training data as required by the described Random Forest Model in order to replicate data presented in paper as well as applying to novel data.
Perl
4
star
37

TraceTrack

Python
3
star
38

gsdmvn

The goal of gsdmvn is to enable group sequential trial design for time-to-event endpoints under non-proportional hazards assumptions.
R
3
star
39

curation-open-source

This wrapper enables the HPC execution of FDA DB curation and list all the step in a programming language style.
Jupyter Notebook
3
star
40

Message-Hub

The Messaging Orchestration HUB will be responsible for providing a connection between an organization's GS1 EPCIS-based track and trace data source system (for example ATTP) and the blockchain networks that require data relevant to product serialization and track & trace.
TypeScript
3
star
41

helm-visualisation

JavaScript
2
star
42

OMOP-CONCEPT-EMBEDDING

Python
2
star
43

polo

POLO: web interface to MARCO-scored crystallization images
Python
2
star
44

deker

This library is made to perform feature selection based on a method originally proposed in by Sun et al. [1]. This library specifically relates to the methodology described in [2], named DEKER for decomposed kernel regression, which includes methods for identifying optimal hyperparameter values. This library was also designed for use in the context of network inference, also described in [2], by iteratively reapplying the DEKER method for feature selection across all features of a dataset.
C++
2
star
45

BART-QSAR

R
1
star
46

3D_Tumor_Lightsheet_Analysis_Pipeline

Python
1
star
47

MicroMap_Pipeline

R
1
star
48

ProbeDesign

HTML
1
star
49

bayesiansprt

The goal of bayesiansprt (under GPL-3 license) is to provide the results for sequential probability ratio test under frequentist and Bayesian setup.
R
1
star
50

mmrm

R
1
star
51

rCPDMS

Chemoproteomics Data Analysis
R
1
star
52

Infant-Microbiome-Cohort

Infant Microbiome Cohort
Jupyter Notebook
1
star
53

psm3mkv

psm3mkv: A package to evaluate the fit and efficiency of three state oncology cost-effectiveness model structures
R
1
star