Discover joewandy/hlda Open Source project by Joe Wandy (@joewandy)

Hierarchical Latent Dirichlet Allocation

Note: this repository should only be used for education purpose. For production use, I'd recommend using https://github.com/bab2min/tomotopy which is more production-ready

Hierarchical Latent Dirichlet Allocation (hLDA) addresses the problem of learning topic hierarchies from data. The model relies on a non-parametric prior called the nested Chinese restaurant process, which allows for arbitrarily large branching factors and readily accommodates growing data collections. The hLDA model combines this prior with a likelihood that is based on a hierarchical variant of latent Dirichlet allocation.

Hierarchical Topic Models and the Nested Chinese Restaurant Process

The Nested Chinese Restaurant Process and Bayesian Nonparametric Inference of Topic Hierarchies

Implementation

hlda/sampler.py is the Gibbs sampler for hLDA inference, based on the implementation from Mallet having a fixed depth on the nCRP tree.

Installation

Simply use pip install hlda to install the package.
An example notebook that infers the hierarchical topics on the BBC Insight corpus can be found in notebooks/bbc_test.ipynb.

joewandy/hlda

joewandy

Reviews

Repository Details

Hierarchical Latent Dirichlet Allocation

Implementation

Installation

More Repositories