• Stars
    star
    246
  • Rank 164,726 (Top 4 %)
  • Language
    C++
  • License
    GNU General Publi...
  • Created over 9 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Classifier for metagenomic sequences

Centrifuge

Classifier for metagenomic sequences

[Centrifuge] is a novel microbial classification engine that enables rapid, accurate and sensitive labeling of reads and quantification of species on desktop computers. The system uses a novel indexing scheme based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini (FM) index, optimized specifically for the metagenomic classification problem. Centrifuge requires a relatively small index (4.7 GB for all complete bacterial and viral genomes plus the human genome) and classifies sequences at very high speed, allowing it to process the millions of reads from a typical high-throughput DNA sequencing run within a few minutes. Together these advances enable timely and accurate analysis of large metagenomics data sets on conventional desktop computers

The Centrifuge hompage is http://www.ccb.jhu.edu/software/centrifuge

The Centrifuge paper is available at https://genome.cshlp.org/content/26/12/1721

The Centrifuge poster is available at http://www.ccb.jhu.edu/people/infphilo/data/Centrifuge-poster.pdf

For more details on installing and running Centrifuge, look at MANUAL

Quick guide

Installation from source

git clone https://github.com/DaehwanKimLab/centrifuge
cd centrifuge
make
sudo make install prefix=/usr/local

Building indexes

We provide several indexes on the Centrifuge homepage at http://www.ccb.jhu.edu/software/centrifuge. Centrifuge needs sequence and taxonomy files, as well as sequence ID to taxonomy ID mapping. See the MANUAL files for details. We provide a Makefile that simplifies the building of several standard and custom indices

cd indices
make p+h+v                   # bacterial, human, and viral genomes [~12G]
make p_compressed            # bacterial genomes compressed at the species level [~4.2G]
make p_compressed+h+v        # combination of the two above [~8G]