bochengroup / pydpm Goto Github PK

A Python Library for Deep Probabilistic Models

License: Apache License 2.0

Python 80.26% C 0.81% Cuda 18.93%

pydpm's Introduction

A python library focuses on constructing Deep Probabilistic Models (DPMs). Our developed Pydpm not only provides efficient distribution sampling functions on GPU, but also has included the implementations of existing popular DPMs.

News

🔥A new version that does not depend on Pycuda has been released.

🔥An abundance of professional learning materials on Deep Generative Models from the Ermom's group at Stanford University. (CS236 - Fall 2021)

🔥A tutorial of DPMs has been uploaded by Prof. Wilker Aziz (University of Amsterdam).

Install

The current version of PyDPM can be installed under either Windows or Linux system with PyPI.

$ pip install pydpm

For Windows system, we recommed to install Visual Studio 2019 as the compiler equipped with CUDA 11.5 toolkit; For Linux system, we recommed to install the latest version of CUDA toolkit.

The enviroment for testing has been released for easily reproducing our results.

$ conda env create -f enviroment.yaml

Overview

The overview of the framework of PyDPM library can be roughly split into four sectors, specifically Sampler, Model, Evaluation, and Example modules, which have been illustrated as follows:

Sampler module includes both parts of the basic Distribution Sampler and the sophisticate Model Sampler, which can effectively accomplish the sampling requirements of these DPMs constructed on either CPU or GPU;
Model module contains a wide variety of classical and popular DPMs, which can be directly called as APIs in Python;
Evaluation module provides a DataLoader sub-module to process data samples in various forms, such as images, text, graphs etc., and also a Metric sub-module to comprehensively evaluate these DPMs after training;
Example module, for each DPM included in the Model module, we provides a corresponding code demo equipped with a detailed explanation in the official docs.

The workflow of applying PyDPM for downstream tasks, which can be splited into four steps as follows:

Device deployment of pyDPM can be choose as a platform with either CPU or GPU;
Mechasnisms of model training or testing includes either or both of Gibbs sampling and back propagation, implemented by pyDPM.sampler and pyTorch respecitveily;
Model categories in pyDPM mainly include Bayesian Probabilistic Model, Deep-Learning Probabilistic Models, and Hybrid Probabilistic Models;
Applications of DPMs has included Nature Language Processing (NLP), Graph Neural Network (GNN), and Recommendation System (RS) etc.

Model List

The Model module in pyDPM has included a wide variety of popular DPMs, which can be roughly split into several categories, including Bayesian Probabilistic Model, Deep-Learning Probabilistic Models, and Hybrid Probabilistic Models.

Bayesian Probabilistic Models

Probabilistic Model Name	Abbreviation	Paper Link
Latent Dirichlet Allocation	LDA	Blei et al., 2003
Poisson Factor Analysis	PFA	Zhou et al., 2012
Poisson Gamma Belief Network	PGBN	Zhou et al., 2015
Convolutional Poisson Factor Analysis	CPFA	Wang et al., 2019
Convolutional Poisson Gamma Belief Network	CPGBN	Wang et al., 2019
Factor Analysis	FA
Gaussian Mixed Model	GMM
Poisson Gamma Dynamical Systems	PGDS	Zhou et al., 2016
Deep Poisson Gamma Dynamical Systems	DPGDS	Guo et al., 2018
Dirichlet Belief Networks	DirBN	Zhao et al., 2018
Deep Poisson Factor Analysis	DPFA	Gan et al., 2015
Word Embeddings Deep Topic Model	WEDTM	Zhao et al., 2018
Multimodal Poisson Gamma Belief Network	MPGBN	Wang et al., 2018
Graph Poisson Gamma Belief Network	GPGBN	Wang et al., 2020

Deep-Learning Probabilistic Models

Probabilistic Model Name	Abbreviation	Paper Link
Restricted Boltzmann Machines	RBM	Hinton et al., 2010
Variational Autoencoder	VAE	Kingma et al., 2014
Generative Adversarial Network	GAN	Goodfellow et al., 2014
Density estimation using Real NVP	RealNVP (2d)	Dinh et al., 2017
Denoising Diffusion Probabilistic Models	DDPM	Ho et al., 2020
Density estimation using Real NVP	RealNVP (image)	Dinh et al., 2018
Conditional Variational Autoencoder	CVAE	Sohn et al., 2015
Deep Convolutional Generative Adversarial Networks	DCGAN	Radford et al., 2016
Wasserstein Generative Adversarial Networks	WGAN	Arjovsky et al., 2017
Information Maximizing Generative Adversarial Nets	InfoGAN	Xi Chen et al., 2016

Hybrid Probabilistic Models

Probabilistic Model Name	Abbreviation	Paper Link
Weibull Hybrid Autoencoding Inference	WHAI	Zhang et al., 2018
Weibull Graph Attention Autoencoder	WGAAE	Wang et al., 2020
Recurrent Gamma Belief Network	rGBN	Guo et al., 2020
Multimodal Weibull Variational Autoencoder	MWVAE	Wang et al., 2020
Sawtooth Embedding Topic Model	SawETM	Duan et al., 2021
TopicNet	TopicNet	Duan et al., 2021
Deep Coupling Embedding Topic Model	dc-ETM	Li et al., 2022
Topic Taxonomy Mining with Hyperbolic Embedding	HyperMiner	Xu et al., 2022
Knowledge Graph Embedding Topic Model	KG-ETM	Wang et al., 2022
Variational Edge Parition Model	VEPM	He et al., 2022
Generative Text Convolutional Neural Network	GTCNN	Wang et al., 2022

Deep Proabilistic Models planned to be built

🔥Welcome to introduce classical or novel Deep Proabilistic Models for us.

Probabilistic Model Name	Abbreviation	Paper Link
Nouveau Variational Autoencoder	NVAE	Vahdat et al., 2020
flow-based Variational Autoencoder	f-VAE	Su et al., 2018
Score-Based Generative Models	SGM	Bortoli et al., 2022
Poisson Flow Generative Models	PFGM	Xu et al., 2022
Stable Diffusion	LDM	Rombach et al., 2022
Denoising Diffusion Implicit Models	DDIM	Song et al., 2022
Vector Quantized Diffusion	VQ-Diffusion	Tang et al., 2023
Vector Quantized Variational Autoencoder	VQ-VAE	Aaron van den Oord et al., 2017
Conditional Generative Adversarial Nets	cGAN	Mirza et al., 2014
Information Maximizing Variational Autoencoders	InfoVAE	zhao et al.,2017
Generative Flow	Glow	Kingama et al., 2018
Structured Denoising Diffusion Models in Discrete State-Spaces	DP3M	Austin et al., 2021

Usage

Example: a few code lines to quickly construct and evaluate a 3-layer Bayesian model named PGBN on GPU.

from pydpm.model import PGBN
from pydpm.metric import ACC

# create the model and deploy it on gpu or cpu
model = PGBN([128, 64, 32], device='gpu')
model.initial(train_data)
train_local_params = model.train(train_data, iter_all=100)
train_local_params = model.test(train_data, iter_all=100)
test_local_params = model.test(test_data, iter_all=100)

# evaluate the model with classification accuracy
# the demo accuracy can achieve 0.8549
results = ACC(train_local_params.Theta[0], test_local_params.Theta[0], train_label, test_label, 'SVM')

# save the model after training
model.save()

Example: a few code lines to quickly deploy distribution sampler of Pydpm on GPU.

from pydpm.sampler import Basic_Sampler

sampler = Basic_Sampler('gpu')
a = sampler.gamma(np.ones(100)*5, 1, times=10)
b = sampler.gamma(np.ones([100, 100])*5, 1, times=10)

Compare

Compare the distribution sampling efficiency of PyDPM with numpy:

Compare the distribution sampling efficiency of PyDPM with tensorflow and torch:

Compare the distribution sampling efficiency of PyDPM with CuPy and PyCUDA(used by pydpm v1.0):

Contact

License: Apache License Version 2.0

Contact: Chaojie Wang [email protected], Wei Zhao [email protected], Xinyang Liu [email protected], Bufeng Ge [email protected], Jiawen Wu [email protected]

Copyright (c), 2020, Chaojie Wang, Wei Zhao, Xinyang Liu, Jiawen Wu, Jie Ren, Yewen Li, Hao Zhang, Bo Chen and Mingyuan Zhou

pydpm's People

Contributors

Stargazers

Watchers

pydpm's Issues

Lack of explanation in doc for new models

Hi, I find some new model in this new version library, but there is no corresponding document to explanation them. Cound you update them?
Best wish

sampler_kernel_win.cu FileNotFoundError: Could not find module site-packages '...\pydpm\_sampler\_compact\sampler_kernel.dll' (or one of its dependencies). Try using the full path with constructor syntax.

How to fix the error below?
ptxas fatal : Unresolved extern function '_Z3powfi' Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\Anaconda3\envs\mmdet\lib\site-packages\pydpm\_model\_pgbn.py", line 52, in __init__
self._sampler = Basic_Sampler(self._model_setting.device) File "D:\Anaconda3\envs\mmdet\lib\site-packages\pydpm\_sampler\_basic_sampler.py", line 37, in __init__ self._gpu_sampler_initial()
File "D:\Anaconda3\envs\mmdet\lib\site-packages\pydpm\_sampler\_basic_sampler.py", line 63, in _gpu_sampler_initial
sampler = distribution_sampler_gpu(self.system_type)
File "D:\Anaconda3\envs\mmdet\lib\site-packages\pydpm\_sampler\_distribution_sampler_gpu.py", line 56, in __init__
dll = ctypes.cdll.LoadLibrary(compact_path)
File "D:\Anaconda3\envs\mmdet\lib\ctypes\__init__.py", line 451, in LoadLibrary
return self._dlltype(name) File "D:\Anaconda3\envs\mmdet\lib\ctypes\__init__.py", line 373, in __init__
self._handle = _dlopen(self._name, mode)
FileNotFoundError: Could not find module 'D:\Anaconda3\envs\mmdet\lib\site-packages\pydpm\_sampler\_compact\sampler_kernel.dll' (or one of its dependencies). Try using the full path with constructor syntax.

when running
from pydpm._model import PGBN
model = PGBN([128,64,32], device='gpu')
Thanks.

Defination of metric

Would you please provide any definations of metric for evaluation in your repository?

Thanks.

HelloWorld

沙发。
windows版本啥时候上线？

compile sampler library remotely on AWS

thanks for the pre q&a about recompilelation, and I still wonder how to compile sampler library files manually on my server. want to depoly this project on AWS

Comparison of sampler built in this project and cupy

I found that the sampler part in this project is similar to cupy(cuda version of numpy). So what the difference between them? Does this pydpm sampler faster? btw, appreciate for your contribution in DTM

/bin/sh: 1: nvcc: not found

I got an error when I run the sampler in my ubuntu PC:
'''
/bin/sh: 1: nvcc: not found
File "/PyDPM4.0.1/pydpm/sampler/distribution_sampler_gpu.py", line 99, in init
dll = ctypes.cdll.LoadLibrary(compact_path)
File "/anaconda3/envs/pydpm/lib/python3.6/ctypes/init.py", line 426, in LoadLibrary
return self._dlltype(name)
File "anaconda3/envs/pydpm/lib/python3.6/ctypes/init.py", line 348, in init
self._handle = _dlopen(self._name, mode)
OSError:/PyDPM4.0.1/pydpm/sampler/_compact/distribution_sampler.so: cannot open shared object file: No such file or directory
'''
I'm green to this, how can I deal with it
Thank you

Some recent work about topic mdoels

I find some intereseting work published in recent top conference. Can your guys include these projects into this library for convience?

The work list:
Alleviating “Posterior Collapse” in Deep Topic Models via Policy Gradient", NeurIPS 2022
HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding , NeurIPS 2022
Knowledge-Aware Bayesian Deep Topic Model, NeurIPS2022

文档和examples里面的代码不一致

您好，我发现这个文档https://dustone-mu.github.io/Getting%20Started/ 和exampleshttps://github.com/BoChenGroup/PyDPM/tree/master/pydpm/example 里面提供的代码结构和内容有一些不一致的地方，请问后续能统一一下吗，非常感谢！

A question about training different datasets without initialization

How to make the model continuously train on two datasets. Specifically, how to continue training on the second data without initialization after training on the first one.

Thanks

how could I use the sampler module to build my own PRNGs?

Thanks for your contributions to the cuda based sampler module. I want to build some new distributions' PRNG by this sampler, and find out there are different files in sampler/_compact. wanna to know where and how to edit them for new PRNG

Requiests for adding new diffusion models

It is great to see u to start to increase the variants of diffusion models. I wonder if you have a plan to include more recent popular diffusion models in your project, just like

Denoising Diffusion Implicit Models. NIPS 2022
Improved Vector Quantized Diffusion Models
. CVPR 2023

These two are widely treated as baselines in today's reseach

Thanks

About metrics of GAN

I see that other models have some good API, but I can't find a suitable metric to evaluate the quality of the “Generative Adversarial Networks.” Would you please add an evaluation metric for this model, I think it will greatly improve the flexibility of the application.
Thanks.

Filed installation of pycuda

I get errors during installing the denpendency of pycuda:
ERROR: Could not build wheels for pycuda, which is required to install pyproject.toml-based projects
from pycuda._driver import * # noqa
ImportError: DLL load failed:
Those errors occurred during the installation by pip. I hava changed 2 device to install pydpm, but It doesn't work both : (
The environment of my win PC is as list:
python 3.6
cuda 10.2
win 10

Supplementary documentation of whai

I couldn't find any details about whai in documentreadme. could you provide more information?

how to recompile this project?

There are some problems about the sampler. and I suspect that these errors are caused by the compiled files. I want to recompile those cuda file. so which files are needed for recompilation and how can I do that? tks : )

Some questions about dataset used in your demo

Thanks for your efforts for deep generative models. However, I find that the datasets used in your demos are not public datasets. Can you replace the datasets in your code with those in torchvision/torchtext etc, which will be more convenient for us to extend the models in your libarary.

Thanks

About Gaussian Process

Thanks for your efforts in summarizing Bayesian models.

I am wondering if you would like to include the widely used Gaussian Process in your library. Because some recent studies of my colleagues have difficulties in speeding up the sampling efficiency, where the previous implementation of Gaussian Process on CPU is too slow for us.

Thanks

Some questions about the diffusion model

I’m surprise to see that the Diffusion Model has been added to your repository of Bayesian models. But it is a little complicated to understand the code of Diffusion Model, is there any tutorial to guide me to generate images with the trained model. And It would be nice if there was an example.
Thanks

Some additional suggestions about Normalizing Flow model

I’m glad that this repository has collected many generative models, like the diffusion model, variational autoencoder, Generative Adversarial Networks and so on. But I didn’t find the flow-like model, would you like to add a model and demo of Normalizing Flow?

Thanks for your effort in building this repository of the generative model.

Warning: shape -36.502342 <= 0 in threads idx: 8258 [thread:(2125698480, 0), block:(66, 0)]

DPGDS error
I meet this warning when running dpgds demo:
Training Stage: epoch 0 takes 4.90 seconds. Likelihood: -0.425
Training Stage: epoch 1 takes 5.11 seconds. Likelihood: -0.367
Training Stage: epoch 2 takes 4.83 seconds. Likelihood: -0.373
Training Stage: epoch 3 takes 4.65 seconds. Likelihood: -0.376
Training Stage: epoch 4 takes 4.54 seconds. Likelihood: -0.378
Training Stage: epoch 5 takes 4.49 seconds. Likelihood: -0.377
Training Stage: epoch 6 takes 4.44 seconds. Likelihood: -0.377
Training Stage: epoch 7 takes 4.41 seconds. Likelihood: -0.377
Training Stage: epoch 8 takes 4.36 seconds. Likelihood: -0.374
Training Stage: epoch 9 takes 4.37 seconds. Likelihood: -0.369
Training Stage: epoch 10 takes 4.36 seconds. Likelihood: -0.367
Training Stage: epoch 11 takes 4.35 seconds. Likelihood: -0.363
Training Stage: epoch 12 takes 4.28 seconds. Likelihood: -0.360
Training Stage: epoch 13 takes 4.26 seconds. Likelihood: -0.357
Training Stage: epoch 14 takes 4.25 seconds. Likelihood: -0.349
Training Stage: epoch 15 takes 4.25 seconds. Likelihood: -0.343
Training Stage: epoch 16 takes 4.24 seconds. Likelihood: -0.334
Training Stage: epoch 17 takes 4.28 seconds. Likelihood: -0.326
Training Stage: epoch 18 takes 4.25 seconds. Likelihood: -0.315
Training Stage: epoch 19 takes 4.23 seconds. Likelihood: -0.306
Training Stage: epoch 20 takes 4.20 seconds. Likelihood: -0.294
Warning: shape -36.502342 <= 0 in threads idx: 8258 [thread:(2125698480, 0), block:(66, 0)]
Warning: shape -0.142884 <= 0 in threads idx: 8260 [thread:(2125698480, 0), block:(68, 0)]
Warning: shape -2.206542 <= 0 in threads idx: 8262 [thread:(2125698480, 0), block:(70, 0)]
Warning: shape -0.727849 <= 0 in threads idx: 8264 [thread:(2125698480, 0), block:(72, 0)]
Warning: shape -0.010437 <= 0 in threads idx: 8265 [thread:(2125698480, 0), block:(73, 0)]
Warning: shape -0.446480 <= 0 in threads idx: 8266 [thread:(2125698480, 0), block:(74, 0)]
Warning: shape -0.286718 <= 0 in threads idx: 8267 [thread:(2125698480, 0), block:(75, 0)]

Can you add the diffusion model?

At present, the diffusion model is popular, especially the DDPM, and it should also belong to the depth probability generative model.

A small bug

Dear writer, in the PyDPM/pydpm/model/deep_learning_pm/vae.py, attribute sample and forword in class VAE have some bugs (decoder is wrong, where vae_decoder is right) , and now have be checked.
BuFeng Ge

Why is the gpu version of PGDS slower than the cpu version?

Why is the gpu version of PGDS slower than the cpu version? The time those two cost per epoch is about 1.256 to 1.032 seconds. That's very different from other models like PGBN.

Issues about SawETM and WHAI

The email from Hjelkrem Tan, who is a PhD student in University of Oslo:

``
Dear Mr. Chaojie Wang,

I am a PhD student at the University of Oslo, Norway. I would very much like to use your PyDPM library in my research, as I have read several papers from you and your colleagues on topic models. Would you be able to answer some questions I have about the implementations in your GitHub repo?

I cannot find any implementation of SawETM in the PyDPM library. Are you planning to release this with PyDPM?
In pydpm/model/hybrid_pm/whai.py it seems that the implementation of the encoder does not include the stochastic downward part from the WHAI paper. Is this intentional or something you will change later?

I hope that you can clarify this for me. Thank you for your time!

With best regards,

Martine Hjelkrem Tan

PhD student, University of Oslo

Digital Signal Processing and Image Analysis Group
''

bochengroup / pydpm Goto Github PK

pydpm's Introduction

News

Install

Overview

Model List

Bayesian Probabilistic Models

Deep-Learning Probabilistic Models

Hybrid Probabilistic Models

Deep Proabilistic Models planned to be built

Usage

Compare

Contact

pydpm's People

Contributors

Stargazers

Watchers

Forkers

pydpm's Issues

Recommend Projects

Recommend Topics

Recommend Org