• Stars
    star
    148
  • Rank 244,895 (Top 5 %)
  • Language
    C++
  • License
    Other
  • Created over 11 years ago
  • Updated almost 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Hidden Markov Model Application and C++ Library for Rapid and Flexible Development of HMMs

#StochHMM - A Flexible hidden Markov model application and C++ library.


#Introduction

StochHMM is a free, open source C++ Library and application that implements HMM from simple text files. It implements traditional HMM algorithms in addition it providing additional flexibility. The additional flexibility is achieved by allowing researchers to integrate additional data sources and application into the HMM framework.

For documentation on model syntax and designing a model, see Github wiki.

###http://www.github.com/KorfLab/StochHMM/wiki

Update: Comparison between StochHMM, Mamot, R HMM, and HMMoc

###Download version 0.36: https://github.com/KorfLab/StochHMM/archive/master.zip

##Integrating Data Here are a few of the ways that StochHMM allows the users to integrate additional data sources:

  1. Multiple Emission States
  2. Weighting or Explicitly Defining State paths on a sequence
  3. Linking States Emissions/Transitions to external user-defined functions

##Multiple Emission States

StochHMM allows the user to provide multiple sequences. These sequences are then handled by the emissions. These sequences can be REAL numbers or discrete characters/words. StochHMM allows each state to have many emissions (Discrete or Continuous). Discrete emissions can be independent of each other or joint distributions. The continuous emissions can be considered in multiple ways. 1) They can be considered as raw probabilities which will be integrated without transformation. 2) They can be considered as values to be plugged into a Univariate Probability Distribution Function or Multivariate PDF (In the case of multiple REAL sequences.

Each states emissions are user-defined, so one state may have emissions from two different sequences, while another may only have a single emission from a single sequence.

##Weighting or Explicitly Defining State paths to follow on a sequence.

Often, we have some prior knowledge about the sequence. If this is the case, we may want to integrate that into the model, without redesigning or retraining the model (a timely endeavor). StochHMM allows the user to explicitly define a State path (By name of state, or category of state). In addition, StochHMM also allows the user to weight a states path (By name of State or category of state defined by user) This allows the user to restrict the predicted path or weight their prior knowledge.

##Linking States Emissions or Transitions to external user-defined functions

When that transition/emission is evaluated the function is called and can provide an emission. While this may provide one way of addressing a weakness of HMMs, which is that they do not handle long range dependencies. We see it rather as a way to link together existing utilities or functions that provide additional information to the decoding algorithms. In this way, we can link divergent datasets or functions within the HMM trellis in order to arrive at a better prediction.


#Features

##Brief list of features implemented in StochHMM:

  • General settings within Hidden Markov Models
    1. User-defined HMM model via simple human readable text file
    2. User-defined Alphabet
    3. User-defined Ambiguous Characters
  • States
    1. Emissions
      • Multiple emission states (Discrete / Continuous)
      • Independent (Single or Multiple Discrete)
      • Joint Distribution (Multiple Discrete)
      • Univariate PDF (Single Sequence - Continuous)
      • Multivariate PDF (Multiple Sequence - Continuous)
      • Linkable to user-defined function
    2. Transitions
      • Standard Transitions
      • Lexical Transitions (Single or multiple emission)
      • (Preliminary Support) Explicit Duration Transitions
      • Linkable to user-defined functions
  • Decoding
    1. Traditional Decoding Algorithms
      • Forward/Backward/Posterior
      • Viterbi
      • N-best Viterbi
    2. Stochastic Sampling Decoding Algorithms
      • Stochastic Forward
      • Stochastic Viterbi
      • Stochastic Posterior
  • Decoding Traceback Path output formats
    • State Path Index
    • State Path Label
    • GFF
    • Hit Table (Stochastic Algorithms)
    • Posterior Probability Table

#Developers

##Korf Lab

Korf Lab, Genome Center, University of California, Davis

##For suggestions or support:


#Code Documentation Documentation for the C++ code can be found at StochHMM Doxygen Documentation

Documentation on the Model files can be found at StochHMM Github Wiki


#References:

  1. Schroeder, D.I., Blair J.D., Lott P., Yu H.O., Hong D., Crary F., Ashwood P., Walker C. , Korf I., Robinson W.P., LaSalle J.M.. The human placenta methylome. PNAS 15:6037-6042 (2013)

  2. Lott, P., Dunaway, K., Yu, K., Korf, I. StochHMM: A Flexible Hidden Markov Model Framework for Rapid Development of HMMs. Poster presented at: Genome Informatics, 2012 Sep 6-9, Cambridge, UK.

  3. Ginno, P. A., Lott, P. L., Christensen, H. C., Korf, I. & ChΓ©din, F. R-loop formation is a distinctive characteristic of unmethylated human CpG island promoters. Mol. Cell 45, 814–825 (2012).

  4. Schroeder, D. I., Lott, P., Korf, I. & LaSalle, J. M. Large-scale methylation domains mark a functional subset of neuronally expressed genes. Genome Res 21, 1583–1591 (2011).


#Installation

To compile StochHMM in Unix command-line (Linux, Mac OS X)

 $ ./configure
 $ make

Compiled application ./stochhmm will be located in the projects root folder and the static library will be in the src/ folder.

To compile StochHMM in XCode (Mac OS X only)

  1. Open the StochHMM.xcodeproj in the Xcode directory.
  2. Select the Debug/Release within the StochHMM Scheme.
  3. Select Run

Compiled target will be accessible from Xcode


#Examples

To run the examples,

$ cd bin/
$ stochhmm -model ../examples/Dice.hmm -seq ../examples/Dice.fa -viterbi -label
$ stochhmm -model ../examples/3_16Eddy.hmm -seq ../examples/3_16Eddy.fa -viterbi -gff
$ stochhmm -model ../examples/3_16Eddy.hmm -seq ../examples/3_17Eddy.fa -posterior
$ stochhmm -model ../examples/Dice.hmm -seq ../examples/Dice.fa -stochastic viterbi -rep 10 -label
$ stochhmm -model ../examples/Dice.hmm -seq ../examples/Dice.fa -stochastic posterior -rep 10 -label

#License Information

The MIT License (MIT)

Copyright (c) 2007-2012 Paul Lott, Ian Korf, Korf Lab, Genome Center, UC Davis, Davis, CA. All rights reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

More Repositories

1

SNAP

Gene prediction software
C
57
star
2

CEGMA_v2

The final version 2 release of our software to detect core genes in eukaryotic genomes
Perl
27
star
3

Assemblathon

Code that was used in the Assemblathon 1 and 2 projects
Perl
6
star
4

LearnGit

Let's Learn Git
Perl
6
star
5

Perl_utils

Miscellaneous Perl scripts and modules used by people in the Korf lab
Perl
4
star
6

imeter

2022 reboot of IME project
Python
3
star
7

Centromere_repeat_paper

Code used to generate data for our analysis of centromeric repeats
Perl
3
star
8

FRAG_project

To contain code and data files for the Fragmentation project, investigating shattered chromosome phenotypes in Arabidopsis thaliana
Perl
3
star
9

setup

How to set up your work environment
Python
3
star
10

CEGMA_v3

CEGMA (Core Eukaryotic Genes Mapping Approach), for building a highly reliable set of gene annotations in the absence of experimental data.
3
star
11

Rosalind

Solutions to Rosalind problems
Perl
3
star
12

grimoire

Biological sequence analysis tools
Python
3
star
13

datacore

Standardized protocols and datasets for some model organisms
Python
3
star
14

genomikon

Genomic sequence analysis library & apps
C
2
star
15

spitfire

2
star
16

learning-snakemake

Python
2
star
17

Suecica

Scritps written for research on A. suecica for Luca Comai's lab
Python
2
star
18

algorithms

Python
2
star
19

Sickle

1
star
20

genDL

Genomic experiments in Deep Learning
Jupyter Notebook
1
star
21

Genesmith

Gene finder mixing genomic and protein information
Perl
1
star
22

learning-go

1
star
23

learning-C

C
1
star
24

DupHMM

An easy to use hidden Markov model-based copy number variant identifier
Python
1
star
25

Milk-DNase-Seq-Project

Analysis of milk DNase-Seq data in mouse and human
Perl
1
star
26

diseasy

HTML
1
star
27

lyman2020

Custom scripts for Lyman et al (2020)
Python
1
star
28

ago

Go
1
star
29

julie

Bioinformatics experiments with Julia
Julia
1
star
30

unix_and_perl

HTML
1
star
31

learning-conda

1
star
32

IME

A new code base for the IME project.
Perl
1
star
33

korflab.github.io

1
star
34

Ploidamatic

StochHMM-based decoder for detecting copy number from alignments
Perl
1
star
35

grimoire2

Go
1
star