HH-suite3 for sensitive sequence searching
(C) Johannes Soeding, Markus Meier, Martin Steinegger, Milot Mirdita, Michael Remmert, Andreas Hauser, Andreas Biegert
The HH-suite is an open-source software package for sensitive protein sequence searching based on the pairwise alignment of hidden Markov models (HMMs).
Documentation
We provide an extensive user guide with many usage examples, frequently asked questions and guides to build your own databases.
Installation
HH-suite3 can also be installed by downloading a statically compiled version, conda or Docker. HH-suite3 requires a 64-bit system (check with uname -a | grep x86_64
). On AMD/Intel CPUs it requires at least support for the SSE2 instruction set (check by executing cat /proc/cpuinfo | grep sse2
on Linux or sysctl -a | grep machdep.cpu.features | grep SSE2
on macOS). AVX2
is roughly 2x faster compared to SSE2. HH-suite3 also works on Linux systems with ARM64 and PPC64LE CPUs. Precompiled binaries for all supported systems can be found at mmseqs.com/hhsuite.
# install via conda
conda install -c conda-forge -c bioconda hhsuite
# install docker
docker pull soedinglab/hh-suite
# static SSE2 build
wget https://github.com/soedinglab/hh-suite/releases/download/v3.3.0/hhsuite-3.3.0-SSE2-Linux.tar.gz; tar xvfz hhsuite-3.3.0-SSE2-Linux.tar.gz; export PATH="$(pwd)/bin:$(pwd)/scripts:$PATH"
# static AVX2 build
wget https://github.com/soedinglab/hh-suite/releases/download/v3.3.0/hhsuite-3.3.0-AVX2-Linux.tar.gz; tar xvfz hhsuite-3.3.0-AVX2-Linux.tar.gz; export PATH="$(pwd)/bin:$(pwd)/scripts:$PATH"
Available Databases
List of available database for HH-suite3:
- Uniclust30 [pub]
- BFD (consists of 2.5 billion, mostly enviromental, protein sequences) [pub]
- Pfam/SCOP/PDB70/dbCAN
Also checkout the databases (COG/ECOG/CD/...) maintained by the MPI Bioinformatics Toolkit [pub].
Compilation
To compile from source, you will need a recent C/C++ compiler (at least GCC 4.8 or Clang 3.6) and CMake 2.8.12 or later.
To download the source code and compile the HH-suite execute the following commands:
git clone https://github.com/soedinglab/hh-suite.git
mkdir -p hh-suite/build && cd hh-suite/build
cmake -DCMAKE_INSTALL_PREFIX=. ..
make -j 4 && make install
export PATH="$(pwd)/bin:$(pwd)/scripts:$PATH"
gcc
compiler from Homebrew. The default macOS clang
compiler does not support OpenMP and HH-suite3 will only be able to use a single thread. Then replace the cmake
call above with the following one:
CC="$(brew --prefix)/bin/gcc-10" CXX="$(brew --prefix)/bin/g++-10" cmake -DCMAKE_INSTALL_PREFIX=. ..
Usage
For performing a single search iteration of HHblits, run HHblits with the following command:
hhblits -i <input-file> -o <result-file> -n 1 -d <database-basename>
For generating an alignment of homologous sequences:
hhblits -i <input-file> -o <result-file> -oa3m <result-alignment> -d <database-basename>
A detailed list of options for HHblits is available by running HHblits with the -h
parameter.
Reference
Steinegger M, Meier M, Mirdita M, Vöhringer H, Haunsberger S J, and Söding J (2019) HH-suite3 for fast remote homology detection and deep protein annotation, BMC Bioinformatics, 473. doi: 10.1186/s12859-019-3019-7