• Stars
    star
    535
  • Rank 82,904 (Top 2 %)
  • Language
    C
  • License
    GNU General Publi...
  • Created over 9 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Remote protein homology detection suite.

HH-suite3 for sensitive sequence searching

(C) Johannes Soeding, Markus Meier, Martin Steinegger, Milot Mirdita, Michael Remmert, Andreas Hauser, Andreas Biegert

BioConda Install Biocontainer Pulls Github All Releases Docker Pulls Build Status

The HH-suite is an open-source software package for sensitive protein sequence searching based on the pairwise alignment of hidden Markov models (HMMs).

Documentation

We provide an extensive user guide with many usage examples, frequently asked questions and guides to build your own databases.

Installation

HH-suite3 can also be installed by downloading a statically compiled version, conda or Docker. HH-suite3 requires a 64-bit system (check with uname -a | grep x86_64). On AMD/Intel CPUs it requires at least support for the SSE2 instruction set (check by executing cat /proc/cpuinfo | grep sse2 on Linux or sysctl -a | grep machdep.cpu.features | grep SSE2 on macOS). AVX2 is roughly 2x faster compared to SSE2. HH-suite3 also works on Linux systems with ARM64 and PPC64LE CPUs. Precompiled binaries for all supported systems can be found at mmseqs.com/hhsuite.

# install via conda
conda install -c conda-forge -c bioconda hhsuite 
# install docker
docker pull soedinglab/hh-suite
# static SSE2 build
wget https://github.com/soedinglab/hh-suite/releases/download/v3.3.0/hhsuite-3.3.0-SSE2-Linux.tar.gz; tar xvfz hhsuite-3.3.0-SSE2-Linux.tar.gz; export PATH="$(pwd)/bin:$(pwd)/scripts:$PATH"
# static AVX2 build
wget https://github.com/soedinglab/hh-suite/releases/download/v3.3.0/hhsuite-3.3.0-AVX2-Linux.tar.gz; tar xvfz hhsuite-3.3.0-AVX2-Linux.tar.gz; export PATH="$(pwd)/bin:$(pwd)/scripts:$PATH"

Only the self-compiled HH-suite3 version includes MPI support, since MPI configuration is specific to the local environment.

Available Databases

List of available database for HH-suite3:

Also checkout the databases (COG/ECOG/CD/...) maintained by the MPI Bioinformatics Toolkit [pub].

Compilation

To compile from source, you will need a recent C/C++ compiler (at least GCC 4.8 or Clang 3.6) and CMake 2.8.12 or later.

To download the source code and compile the HH-suite execute the following commands:

git clone https://github.com/soedinglab/hh-suite.git
mkdir -p hh-suite/build && cd hh-suite/build
cmake -DCMAKE_INSTALL_PREFIX=. ..
make -j 4 && make install
export PATH="$(pwd)/bin:$(pwd)/scripts:$PATH"

To compile HH-suite3 on macOS, first install the gcc compiler from Homebrew. The default macOS clang compiler does not support OpenMP and HH-suite3 will only be able to use a single thread. Then replace the cmake call above with the following one:

CC="$(brew --prefix)/bin/gcc-10" CXX="$(brew --prefix)/bin/g++-10" cmake -DCMAKE_INSTALL_PREFIX=. ..

Usage

For performing a single search iteration of HHblits, run HHblits with the following command:

hhblits -i <input-file> -o <result-file> -n 1 -d <database-basename>

For generating an alignment of homologous sequences:

hhblits -i <input-file> -o <result-file> -oa3m <result-alignment> -d <database-basename>

A detailed list of options for HHblits is available by running HHblits with the -h parameter.

Reference

Steinegger M, Meier M, Mirdita M, Vöhringer H, Haunsberger S J, and Söding J (2019) HH-suite3 for fast remote homology detection and deep protein annotation, BMC Bioinformatics, 473. doi: 10.1186/s12859-019-3019-7

Links

More Repositories

1

MMseqs2

MMseqs2: ultra fast and sensitive search and clustering suite
C
1,391
star
2

metaeuk

MetaEuk - sensitive, high-throughput gene discovery and annotation for large-scale eukaryotic metagenomics
C
175
star
3

plass

sensitive and precise assembly of short sequencing reads
C
145
star
4

CCMpred

Protein Residue-Residue Contacts from Correlated Mutations predicted quickly and accurately.
C
93
star
5

MMseqs2-App

MMseqs2 app to run on your workstation or servers
Vue
58
star
6

WIsH

Predict prokaryotic host for phage metagenomic sequences
C++
52
star
7

spacedust

Discovery of conserved gene clusters in multiple genomes
C
42
star
8

uniclust-pipeline

Shell
35
star
9

spacepharer

SpacePHARER CRISPR Spacer Phage-Host pAiRs findER
C
34
star
10

prosstt

PRObabilistic Simulations of ScRNA-seq Tree-like Topologies
Python
25
star
11

CCMgen

HTML
20
star
12

pdbx

pdbx is a parser module in python for structures of the protein data bank in the mmcif format
Python
20
star
13

BaMMmotif

Bayesian Markov Model motif discovery - An expectation maximization algorithm for the de novo discovery of enriched motifs as modelled by higher-order Markov models.
C++
19
star
14

merlot

Reconstruct the lineage topology of a scRNA-seq differentiation dataset.
HTML
18
star
15

kClust

kClust is a fast and sensitive clustering method for the clustering of protein sequences. It is able to cluster large protein databases down to 20-30% sequence identity. kClust generates a clustering where each cluster is represented by its longest sequence (representative sequence).
C++
17
star
16

b-lore

Bayesian multiple logistic regression for GWAS meta-analysis
Python
16
star
17

MMseqs

C++
14
star
18

BaMMmotif2

Bayesian Markov Model motif discovery tool version 2 - An expectation maximization algorithm for the de novo discovery of enriched motifs as modelled by higher-order Markov models.
C++
12
star
19

ffindex_soedinglab

C
11
star
20

tejaas

Tejaas - a tool for discovering trans-eQTLs
C
10
star
21

bbcontacts

Prediction of beta-strand pairing from direct coupling patterns
Papyrus
8
star
22

hhdatabase_cif70

Scripts to generate the pdb70 database for hh-suite on the basis of pdb's mmcif format
Shell
7
star
23

PEnG-motif

PEnG-motif is an open-source software package for searching statistically overrepresented motifs (position specific weight matrices, PWMs) in a set of DNA sequences.
C++
7
star
24

transannot

TransAnnot - a fast transcriptome annotation pipeline
C
5
star
25

BaMM_webserver

Webserver for motif discovery with higher-order Bayesian Markov Models (BaMMs)
HTML
4
star
26

metaG-ECCB18-partII

MMseqs2 tutorial for metagenomics sequence data
TeX
3
star
27

bamm-suite

De-novo motif discovery and optimization
Python
3
star
28

CCMgen-scripts

Contains plotting scripts, examples, and other small scripts relevant to CCMgen and the corresponding publication.
Python
2
star
29

mockinbird

PAR-CLIP data processing pipeline
Python
2
star
30

bipartite_motif_finder

BMF: Bipartite Motif Finder
Python
1
star
31

CoCo

Consensus Correction
C++
1
star
32

MMseqs2-Regression

MMseqs2 Regression Testing
Shell
1
star
33

xxmotif

XXmotif: eXhaustive, weight matriX-based motif discovery in nucleotide sequences
Perl
1
star
34

prosstt-r

An R package with evaluation and visualization functions for the python PROSSTT package
HTML
1
star