• Stars
    star
    145
  • Rank 254,144 (Top 6 %)
  • Language
    Jupyter Notebook
  • License
    GNU General Publi...
  • Created about 8 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Gibbs sampler for the Hierarchical Latent Dirichlet Allocation topic model

Hierarchical Latent Dirichlet Allocation

Note: this repository should only be used for education purpose. For production use, I'd recommend using https://github.com/bab2min/tomotopy which is more production-ready


Hierarchical Latent Dirichlet Allocation (hLDA) addresses the problem of learning topic hierarchies from data. The model relies on a non-parametric prior called the nested Chinese restaurant process, which allows for arbitrarily large branching factors and readily accommodates growing data collections. The hLDA model combines this prior with a likelihood that is based on a hierarchical variant of latent Dirichlet allocation.

Hierarchical Topic Models and the Nested Chinese Restaurant Process

The Nested Chinese Restaurant Process and Bayesian Nonparametric Inference of Topic Hierarchies

Implementation

  • hlda/sampler.py is the Gibbs sampler for hLDA inference, based on the implementation from Mallet having a fixed depth on the nCRP tree.

Installation

  • Simply use pip install hlda to install the package.
  • An example notebook that infers the hierarchical topics on the BBC Insight corpus can be found in notebooks/bbc_test.ipynb.