• Stars
    star
    189
  • Rank 204,649 (Top 5 %)
  • Language
    C++
  • License
    MIT License
  • Created almost 6 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Optimized Dynamic Genome/Graph Implementation: understanding pangenome graphs

odgi

build and test install with bioconda

optimized dynamic genome/graph implementation (odgi)

odgi provides an efficient and succinct dynamic DNA sequence graph model, as well as a host of algorithms that allow the use of such graphs in bioinformatic analyses.

Careful encoding of graph entities allows odgi to efficiently compute and transform pangenomes with minimal overheads. odgi implements a dynamic data structure that leveraged multi-core CPUs and can be updated on the fly.

The edges and path steps are recorded as deltas between the current node id and the target node id, where the node id corresponds to the rank in the global array of nodes. Graphs built from biological data sets tend to have local partial order and, when sorted, the deltas be small. This allows them to be compressed with a variable length integer representation, resulting in a small in-memory footprint at the cost of packing and unpacking.

The RAM and computational savings are substantial. In partially ordered regions of the graph, most deltas will require only a single byte.

installation

building from source

odgi requires a C++ version of 9.3 or higher. You can check your version via:

gcc --version
g++ --version

odgi pulls in a host of source repositories as dependencies. It may be necessary to install several system-level libraries to build odgi. On Ubuntu 20.04, these can be installed using apt:

sudo apt install build-essential cmake python3-distutils python3-dev libjemalloc-dev

After installing the required dependencies, clone the odgi git repository recursively because of the many submodules and build with:

git clone --recursive https://github.com/pangenome/odgi.git
cd odgi
cmake -H. -Bbuild && cmake --build build -- -j 3

To build a static executable, use:

cmake -DBUILD_STATIC=1 -H. -Bbuild && cmake --build build -- -j 3

You'll need to set this flag to 0 or remove and rebuild your build directory if you want to unset this build behavior and get a dynamic binary again. Static builds are unlikely to be supported on OSX, and require appropriate static libraries on linux.

For more information on optimisations, debugging and GNU Guix builds, see INSTALL.md and CMakeLists.txt.

Notes for distribution

If you need to avoid machine-specific optimizations, use the CMAKE_BUILD_TYPE=Generic build type:

cmake -H. -Bbuild -DCMAKE_BUILD_TYPE=Generic && cmake --build build -- -j 3

Notes on dependencies

On Arch Linux, the jemalloc dependency can be installed with:

sudo pacman -S jemalloc     # arch linux

Bioconda

odgi recipes for Bioconda are available at https://bioconda.github.io/recipes/odgi/README.html. To install the latest version using Conda please execute:

conda install -c bioconda odgi

Docker

To simplify installation and versioning, we have an automated GitHub action that pushes the current docker build to dockerhub. To use it, pull the docker image:

docker pull pangenome/odgi

Then, you can run odgi with:

docker run odgi

Guix

An alternative way to manage odgi's dependencies is by using the GNU GUIX package manager. We use Guix to develop, test and deploy odgi on our systems. For more information see INSTALL.

documentation

odgi includes a variety of tools for analyzing and manipulating large pangenome graphs. Read the full documentation at https://odgi.readthedocs.io/.

multiqc

Since v1.11 MultiQC has an ODGI module. This module can only work with output from odgi stats! For more details take a look at the documentation at odgi.readthedocs.io/multiqc.

Citation

Andrea Guarracino*, Simon Heumos*, Sven Nahnsen, Pjotr Prins, Erik Garrison. ODGI: understanding pangenome graphs, Bioinformatics, 2022
*Shared first authorship

funding sources

odgi has been funded through a variety of mechanisms, including a Wellcome Sanger PhD fellowship and diverse NIH and NSF grants (listed in our paper), as well as funding from the State of Tennessee. Of particular note is the contribution of NLnet to the development of a differential privacy model, "privvg", which supported significant maintenance and development effort in the odgi toolkit.

tests

Unittests from vg have been ported here and are used to validate the behavior of the algorithm. They can be run via odgi test which is invoked by

ctest .

API

odgi::graph_t is a MutablePathDeletableHandleGraph in the generic variation graph handle graph hierarchical API model. As such, it is possible to add, delete, and modify nodes, edges, and paths through the graph. Wherever possible, destructive operations on the graph maintain path validity.

versioning

Each time odgi is build, the current version is inferred via git describe --always --tags. Assuming, version.cpp is up to date, odgi version will not only print out the current tagged version, but its release codename, too.

new release (developers only)

  • Create a new release on GitHub
    • Choose a tag: v0.X.Y
    • Fill the Release title: ODGI v0.X.Y - Miao
    • Fill the Describe this release section
    • Tick This is a pre-release
    • Click Publish release
  • Produce a buildable source tarball, containing code for odgi and all submodules, and upload it to the release.
    • Execute the following instructions:
    mkdir source-tarball
    cd source-tarball
    git clone --recursive https://github.com/pangenome/odgi
    cd odgi
    git fetch --tags origin
    LATEST_TAG="$(git describe --tags `git rev-list --tags --max-count=1`)"
    git checkout "${LATEST_TAG}"
    git submodule update --init --recursive
    mkdir include
    bash scripts/generate_git_version.sh include
    sed 's/execute_process(COMMAND bash/#execute_process(COMMAND bash/g' CMakeLists.txt -i
    rm -Rf .git
    find deps -name ".git" -exec rm -Rf "{}" \;
    cd ..
    mv odgi "odgi-${LATEST_TAG}"
    tar -czf "odgi-${LATEST_TAG}.tar.gz" "odgi-${LATEST_TAG}"
    rm -Rf "odgi-${LATEST_TAG}"
    • Open the (pre-)release created earlier
    • Upload the odgi-v0.X.Y.tar.gz file
    • Remove the tick on This is a pre-release
    • Click Publish release (this will trigger the update on bioconda)

presentations

@AndreaGuarracino and @subwaystation presented odgi at the German Bioinformatics Conference 2021: ODGI - scalable tools for pangenome graphs.

name

odgi is a play on the Italian word "oggi" (/ˈɔd.dʒi/), which means "today". As of 2019, a standard refrain in genomics is that genome graphs will be useful in x years. But, if we make them efficient and scalable, they will be useful today.

More Repositories

1

pggb

the pangenome graph builder
Shell
360
star
2

smoothxg

linearize and simplify variation graphs using blocked partial order alignment
C++
56
star
3

impg

implicit pangenome graph
Rust
38
star
4

maffer

extract MSAs from genome variation graphs
C++
32
star
5

PanSN-spec

Pangenome Sequence Naming: a backwards-compatible hack to simplify the identification of samples and haplotypes in pangenomes
28
star
6

vcfbub

use variant nesting information to flter overlapping sites from vg deconstruct output
Rust
22
star
7

pgge

the pangenome graph evaluator
Shell
22
star
8

pgvf-spec

Pangenome Graph Variation Format (PGVF)
18
star
9

HPRCyear1v2genbank

building a human pangenome from the HPRCy1v2 genbank accessioned assemblies
Shell
12
star
10

pggb-workshop

tutorial on pggb
11
star
11

spodgi

RDF and SPARQL ideas to build on top of [odgi](https://github.com/pangenome/odgi)
Python
11
star
12

pggb-paper

The PanGenome Graph Builder
TeX
10
star
13

gfagino

your friendly pangenome graph genotyper
Rust
10
star
14

pangenome.github.io

HTML
9
star
15

jbrowse-visualization

Workflow to visualize pggb graphs in jbrowse
Shell
8
star
16

MemPanG23

Computational Pangenomics in Memphis
HTML
7
star
17

gfa-wp

Convert GFA W-lines to P-lines
Rust
7
star
18

resolve-nested-genotypes

Rust
6
star
19

chm13-grch38-liftover

liftover from the CHM13 T2T assembly to GRCh38 using wfmash
Shell
6
star
20

lifetree

hard mode all genomes alignment
Shell
5
star
21

chromosome_communities

Recombination between heterologous human acrocentric chromosomes
R
5
star
22

HPRCy1

pangenome build for HPRC year 1 samples
4
star
23

ssfog

scale-space filtering on genome graphs
Rust
4
star
24

rs-peanut

GAF alignment evaluation tool.
Rust
4
star
25

mafchunk

split MAF blocks into smaller chunks
Rust
4
star
26

sorting-paper

Path-Guided Stochastic Gradient Descent layout of pangenome graphs
TeX
3
star
27

MHC

exploring the MHC in the HPRC year 1 samples
3
star
28

phage-evo-paper

Repository to store the analysis of the paper on phage evolution.
Python
3
star
29

seqwish-paper

Unbiased pangenome graphs
TeX
3
star
30

gfalace

lace up GFA files using panSN path name sub-ranges to coordinate the lacing
Rust
3
star
31

domibubble

sketch of bubble detection via dominator trees
C++
2
star
32

privacy-pangenomics

HTML
2
star
33

HumanPangenomeBYOD2024

Material for the Human Pangenome Bring Your Own Data (BYOD) analysis Workshop 2024, October 21-25 in Cape Town, South Africa.
2
star
34

paf2dot

use gnuplot to generate a dotplot from sequence alignments in PAF format
Perl
1
star
35

odgi-paper

ODGI: understanding pangenome graphs (https://doi.org/10.1093/bioinformatics/btac308)
TeX
1
star
36

burng

Removing complex regions from pangenome graphs
Shell
1
star
37

odgi.github.io

ODGI suite documentation (Optimized Dynamic Genome/Graph Implementation)
HTML
1
star
38

keane_mouse_pangenome

R
1
star
39

HPRCy1v2

pangenome build for human pangenome project year 1 genomes (version 2 assemblies)
Shell
1
star
40

private-graphs-website

Privacy preserving graphs for pangenomics
HTML
1
star
41

flubble

finding single-entry single-exit bubbles in the flowgraph
Rust
1
star