A python library focuses on constructing Deep Probabilistic Models (DPMs). Our developed Pydpm not only provides efficient distribution sampling functions on GPU, but also has included the implementations of existing popular DPMs.
Documentation | Paper [Arxiv] | Tutorials | Benchmarks | Examples |
News
Install
The current version of PyDPM can be installed under either Windows or Linux system with PyPI.
$ pip install pydpm
For Windows system, we recommed to install Visual Studio 2019 as the compiler equipped with CUDA 11.5 toolkit; For Linux system, we recommed to install the latest version of CUDA toolkit.
Overview
The overview of the framework of PyDPM library can be roughly split into four sectors, specifically Sampler, Model, Evaluation, and Example modules, which have been illustrated as follows:
- Sampler module includes both parts of the basic Distribution Sampler and the sophisticate Model Sampler, which can effectively accomplish the sampling requirements of these DPMs constructed on either CPU or GPU;
- Model module contains a wide variety of classical and popular DPMs, which can be directly called as APIs in Python;
- Evaluation module provides a DataLoader sub-module to process data samples in various forms, such as images, text, graphs etc., and also a Metric sub-module to comprehensively evaluate these DPMs after training;
- Example module, for each DPM included in the Model module, we provides a corresponding code demo equipped with a detailed explanation in the official docs.
The workflow of applying PyDPM for downstream tasks, which can be splited into four steps as follows:
- Device deployment of pyDPM can be choose as a platform with either CPU or GPU;
- Mechasnisms of model training or testing includes either or both of Gibbs sampling and back propagation, implemented by pyDPM.sampler and pyTorch respecitveily;
- Model categories in pyDPM mainly include Bayesian Probabilistic Model, Deep-Learning Probabilistic Models, and Hybrid Probabilistic Models;
- Applications of DPMs has included Nature Language Processing (NLP), Graph Neural Network (GNN), and Recommendation System (RS) etc.
Model List
The Model module in pyDPM has included a wide variety of popular DPMs, which can be roughly split into several categories, including Bayesian Probabilistic Model, Deep-Learning Probabilistic Models, and Hybrid Probabilistic Models.
Bayesian Probabilistic Models
      Probabilistic Model Name       | Abbreviation |    Paper Link    |
---|---|---|
Latent Dirichlet Allocation | LDA | Blei et al., 2003 |
Poisson Factor Analysis | PFA | Zhou et al., 2012 |
Poisson Gamma Belief Network | PGBN | Zhou et al., 2015 |
Convolutional Poisson Factor Analysis | CPFA | Wang et al., 2019 |
Convolutional Poisson Gamma Belief Network | CPGBN | Wang et al., 2019 |
Poisson Gamma Dynamical Systems | PGDS | Zhou et al., 2016 |
Deep Poisson Gamma Dynamical Systems | DPGDS | Guo et al., 2018 |
Dirichlet Belief Networks | DirBN | Zhao et al., 2018 |
Deep Poisson Factor Analysis | DPFA | Gan et al., 2015 |
Word Embeddings Deep Topic Model | WEDTM | Zhao et al., 2018 |
Multimodal Poisson Gamma Belief Network | MPGBN | Wang et al., 2018 |
Graph Poisson Gamma Belief Network | GPGBN | Wang et al., 2020 |
Deep-Learning Probabilistic Models
      Probabilistic Model Name       | Abbreviation |    Paper Link    |
---|---|---|
Restricted Boltzmann Machines | RBM | Hinton et al., 2010 |
Variational Autoencoder | VAE | Kingma et al., 2014 |
Generative Adversarial Network | GAN | Goodfellow et al., 2014 |
Normlizing Flow | NF | Dinh et al., 2017 |
Denoising Diffusion Probabilistic Models | DDPM | Ho et al., 2020 |
Hybrid Probabilistic Models
      Probabilistic Model Name       | Abbreviation |    Paper Link    |
---|---|---|
Weibull Hybrid Autoencoding Inference | WHAI | Zhang et al., 2018 |
Weibull Graph Attention Autoencoder | WGAAE | Wang et al., 2020 |
Recurrent Gamma Belief Network | rGBN | Guo et al., 2020 |
Multimodal Weibull Variational Autoencoder | MWVAE | Wang et al., 2020 |
Sawtooth Embedding Topic Model | SawETM | Duan et al., 2021 |
TopicNet | TopicNet | Duan et al., 2021 |
Deep Coupling Embedding Topic Model | dc-ETM | Li et al., 2022 |
Topic Taxonomy Mining with Hyperbolic Embedding | HyperMiner | Xu et al., 2022 |
Knowledge Graph Embedding Topic Model | KG-ETM | Wang et al., 2022 |
Variational Edge Parition Model | VEPM | He et al., 2022 |
Generative Text Convolutional Neural Network | GTCNN | Wang et al., 2022 |
Deep Proabilistic Models planned to be built
      Probabilistic Model Name       | Abbreviation |    Paper Link    |
---|---|---|
Nouveau Variational Autoencoder | NVAE | Vahdat et al., 2020 |
flow-based Variational Autoencoder | f-VAE | Su et al., 2018 |
Conditional Variational Autoencoder | CVAE | Sohn et al., 2015 |
Deep Convolutional Generative Adversarial Networks | DCGAN | Radford et al., 2016 |
Wasserstein Generative Adversarial Networks | WGAN | Arjovsky et al., 2017 |
Score-Based Generative Models | SGM | Bortoli et al., 2022 |
Poisson Flow Generative Models | PFGM | Xu et al., 2022 |
Stable Diffusion | LDM | Rombach et al., 2022 |
Denoising Diffusion Implicit Models | DDIM | Song et al., 2022 |
Vector Quantized Diffusion | VQ-Diffusion | Tang et al., 2023 |
Usage
Example: a few code lines to quickly construct and evaluate a 3-layer Bayesian model named PGBN on GPU.
from pydpm.model import PGBN
from pydpm.metric import ACC
# create the model and deploy it on gpu or cpu
model = PGBN([128, 64, 32], device='gpu')
model.initial(train_data)
train_local_params = model.train(train_data, iter_all=100)
train_local_params = model.test(train_data, iter_all=100)
test_local_params = model.test(test_data, iter_all=100)
# evaluate the model with classification accuracy
# the demo accuracy can achieve 0.8549
results = ACC(train_local_params.Theta[0], test_local_params.Theta[0], train_label, test_label, 'SVM')
# save the model after training
model.save()
Example: a few code lines to quickly deploy distribution sampler of Pydpm on GPU.
from pydpm.sampler import Basic_Sampler
sampler = Basic_Sampler('gpu')
a = sampler.gamma(np.ones(100)*5, 1, times=10)
b = sampler.gamma(np.ones([100, 100])*5, 1, times=10)
Compare
Compare the distribution sampling efficiency of PyDPM with numpy:
Compare the distribution sampling efficiency of PyDPM with tensorflow and torch:
Compare the distribution sampling efficiency of PyDPM with CuPy and PyCUDA(used by pydpm v1.0):
Contact
License: Apache License Version 2.0
Contact: Chaojie Wang [email protected], Wei Zhao [email protected], Xinyang Liu [email protected], Bufeng Ge [email protected], Jiawen Wu [email protected]
Copyright (c), 2020, Chaojie Wang, Wei Zhao, Xinyang Liu, Jiawen Wu, Jie Ren, Yewen Li, Hao Zhang, Bo Chen and Mingyuan Zhou