• Stars
    star
    196
  • Rank 197,891 (Top 4 %)
  • Language
    Shell
  • License
    GNU General Publi...
  • Created over 9 years ago
  • Updated almost 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

This implements topics that change over time (Dynamic Topic Models) and a model of how individual documents predict that change.

Dynamic Topic Models and the Document Influence Model

This implements topics that change over time (Dynamic Topic Models) and a model of how individual documents predict that change.

This code is the result of work by David M. Blei and Sean M. Gerrish.

(C) Copyright 2006, David M. Blei

(C) Copyright 2011, Sean M. Gerrish

It includes software corresponding to models described in the following papers:

[1] D. Blei and J. Lafferty. Dynamic topic models. In Proceedings of the 23rd International Conference on Machine Learning, 2006.

[2] S. Gerrish and D. Blei. A Language-based Approach to Measuring Scholarly Impact. In Proceedings of the 27th International Conference on Machine Learning, 2010.

These files are part of DIM.

DIM is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

DIM is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA


A. COMPILING

You will need to have several libraries installed to compile this package. One of these is gsl-devel. Depending on your package manager, you may be able to install this with one of the following commands:

sudo aptitude install libgsl0-dev # Ubuntu 10.04 sudo zypper install gsl-devel # OpenSUSE 11.2 sudo yum install gsl-devel # CentOS 5.5

You can make the main program by changing your working directory to dtm/ and typing:

make

This software has been compiled on Ubuntu 10.04, OpenSUSE 11.2, and CentOS 5.5. Depending on your environment, you may need to install additional libraries.

B. RUNNING

Once everything is compiled, you can run this software by typing the command "./main ", where flags is a list of command-line options. An example command and a description of the input and output files is given in dtm/sample.sh. You can see all command-line options by typing

./main --help

(although we suggest you start out with the example in dtm/sample.sh).

C. SUPPORT and QUESTIONS

This software is provided as-is, without any warranty or support, WHATSOEVER. If you have any questions about running this software, you can post your question to the topic-models mailing list at [email protected]. You are welcome to submit modifications or bug-fixes of this software to the authors, although not all submissions may be posted.

D. USAGE

This progam takes as input a collection of text documents and creates as output a list of topics over time, a description of each document as a mixture of these topics, and (possibly) a measure of how "influential" each document is, based on its language.

We have provided an example dataset, instructions for formatting input data and processing output files, and example command lines for running this software in the file dtm/sample.sh.

E. CHANGES

Changes in this version include:

  • Change the default top_obs_var flag to 0.5 (from -1.0)
  • Change to use more iterations and a tighter convergence criterion in each doc's E-step.
  • Change to initialize random topics to be a bit more "flat".

More Repositories

1

edward

A probabilistic programming language in TensorFlow. Deep generative models, variational inference.
Jupyter Notebook
4,832
star
2

onlineldavb

Online variational Bayes for latent Dirichlet allocation (LDA)
Python
300
star
3

lda-c

This is a C implementation of variational EM for latent Dirichlet allocation (LDA), a topic model for text or other discrete data.
C
166
star
4

hdp

Hierarchical Dirichlet processes. Topic models where the data determine the number of topics. This implements Gibbs sampling.
C++
150
star
5

ctr

Collaborative modeling for recommendation. Implements variational inference for a collaborative topic models. These models recommend items to users based on item content and other users' ratings.
C++
147
star
6

online-hdp

Online inference for the Hierarchical Dirichlet Process. Fits hierarchical Dirichlet process topic models to massive data. The algorithm determines the number of topics.
Python
144
star
7

causal-text-embeddings

Software and data for "Using Text Embeddings for Causal Inference"
Python
122
star
8

deconfounder_tutorial

Jupyter Notebook
87
star
9

hlda

This implements hierarchical latent Dirichlet allocation, a topic model that finds a hierarchy of topics. The structure of the hierarchy is determined by the data.
JavaScript
77
star
10

publications

The pdf and LaTeX for each paper (and sometimes the code and data used to generate the figures).
TeX
73
star
11

class-slda

Implements supervised topic models with a categorical response.
C++
64
star
12

variational-smc

Reference implementation of variational sequential Monte Carlo proposed by Naesseth et al. "Variational Sequential Monte Carlo" (2018)
Python
63
star
13

deep-exponential-families

Deep exponential families (DEFs)
C++
56
star
14

DynamicPoissonFactorization

Dynamic version of Poisson Factorization (dPF). dPF captures the changing interest of users and the evolution of items over time according to user-item ratings.
C++
49
star
15

turbotopics

Turbo topics find significant multiword phrases in topics.
Python
46
star
16

ars-reparameterization

Source code for Naesseth et. al. "Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms" (2017)
Jupyter Notebook
38
star
17

zero-inflated-embedding

Code for the icml paper "zero inflated exponential family embedding"
Python
28
star
18

context-selection-embedding

Context Selection for Embedding Models
Python
27
star
19

ctm-c

This implements variational inference for the correlated topic model.
C
21
star
20

deconfounder_public

Jupyter Notebook
18
star
21

treeffuser

Treeffuser is an easy-to-use package for probabilistic prediction on tabular data with tree-based diffusion models.
Jupyter Notebook
11
star
22

factorial-network-models

Discussion of Durante et al for JSM 2017. Includes factorial network model generalization.
Jupyter Notebook
9
star
23

markovian-score-climbing

Python
8
star
24

diln

This implements the discrete infinite logistic normal, a Bayesian nonparametric topic model that finds correlated topics.
C
6
star
25

poisson-influence-factorization

Jupyter Notebook
4
star
26

Riken_tutorial

Jupyter Notebook
4
star