newbeeer / poisson_flow Goto Github PK

View Code? Open in Web Editor NEW

840.0 840.0 62.0 87.06 MB

Code for NeurIPS 2022 Paper, "Poisson Flow Generative Models" (PFGM)

License: Apache License 2.0

Python 93.86% C++ 0.64% Cuda 5.23% Shell 0.26%

poisson_flow's People

Contributors

Stargazers

Watchers

poisson_flow's Issues

UnboundLocalError: local variable 'eval_ds' referenced before assignment

Hi, Yilun, I have another question.
When I was sampling from bedroom_ddpmpp model, an error occurs

  File "/home/xjtu/code/Poisson_flow/main.py", line 60, in <module>
    app.run(main)
  File "/home/xjtu/anaconda3/envs/sde/lib/python3.9/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/home/xjtu/anaconda3/envs/sde/lib/python3.9/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "/home/xjtu/code/Poisson_flow/main.py", line 54, in main
    run_lib.evaluate(FLAGS.config, FLAGS.workdir, FLAGS.eval_folder)
  File "/home/xjtu/code/Poisson_flow/run_lib.py", line 345, in evaluate
    eval_iter = iter(eval_ds)  # pytype: disable=wrong-arg-types
UnboundLocalError: local variable 'eval_ds' referenced before assignment

I found that the reason is in #LINE 124 in the datasets.py, as the following:

dataset_builder = tfds.builder(f'lsun/{config.data.category}', data_dir='/scratch/ylxu/tensorflow_datasets')

So I wonder why is the config of 'LSUN' dataset different from CIFAR10 or CELEBA. (i.e. there is a data_dir which I could not find)

Error running after environment is configured on the server

Hi，I use the Ubuntu 20.04 system, and the server in the Python 3.8/cuda11.6 environment runs this command
python3 main.py --config ./configs/poisson/cifar10_ddpmpp.py --mode train --workdir poisson_ddpmpp
The following problems have occurred. Would you please check them? Thank you

WARNING:tensorflow:From /root/miniconda3/envs/myconda/lib/python3.8/site-packages/tensorflow_gan/python/estimator/tpu_gan_estimator.py:42: The name tf.estimator.tpu.TPUEstimator is deprecated. Please use tf.compat.v1.estimator.tpu.TPUEstimator instead.

WARNING:tensorflow:From /root/miniconda3/envs/myconda/lib/python3.8/site-packages/tensorflow_gan/python/estimator/tpu_gan_estimator.py:42: The name tf.estimator.tpu.TPUEstimator is deprecated. Please use tf.compat.v1.estimator.tpu.TPUEstimator instead.

1 Physical GPUs, 1 Logical GPUs
I0129 10:36:33.654870 140394431960256 xla_bridge.py:356] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker: 
I0129 10:36:33.660157 140394431960256 xla_bridge.py:356] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: Interpreter Host CUDA
I0129 10:36:33.661039 140394431960256 xla_bridge.py:356] Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client'
I0129 10:36:33.661505 140394431960256 xla_bridge.py:356] Unable to initialize backend 'plugin': xla_extension has no attributes named get_plugin_device_client. Compile TensorFlow with //tensorflow/compiler/xla/python:enable_plugin_device set to true (defaults to false) to enable this.
2023-01-29 10:36:33.680217: W tensorflow/core/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "NOT_FOUND: Could not locate the credentials file.". Retrieving token from GCE failed with "FAILED_PRECONDITION: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host

A question about predicting "z"

Hello!
I have a question that why we need to let neural network learn "z"... Both direction and magnitute of "z" can be obtained directly..
.

best wishes,
Huanran Chen

Steps I had to take in order to run this project on my machine locally

My environment already has Cuda 11.8 installed.

Clone the repository locally
Create a new conda environment with python==3.9.13
Modify requirements.txt to remove the requirement for absl==0.0
~~4. Modify requirements.txt to remove the +cu116 suffixes for the torch and torchvision packages~~ This change is part of why I was facing issues
pip install -r requirements.txt
Install tensorflow 2.10.0 over the version specified in the requirements.txt in order to support Tensorflow Probability
Install ninja to load C++ extensions
Downloaded the zip cifar10_ddpmpp.zip from the linked google drive and extracted the checkpoint_500000.pth model file to the directory ./workdir/checkpoints
pip install "jax[cuda11_cudnn82]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

When running the following command:
python main.py --config ./configs/poisson/cifar10_ddpmpp.py --mode eval --workdir ./workdir

I get this console output, and no files are generated:

2022-10-27 11:04:29.246149: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-10-27 11:04:29.348970: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-10-27 11:04:29.766714: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2022-10-27 11:04:29.766792: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2022-10-27 11:04:29.766821: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
WARNING:tensorflow:From /home/<username>/miniconda3/envs/poisson/lib/python3.9/site-packages/tensorflow_gan/python/estimator/tpu_gan_estimator.py:42: The name tf.estimator.tpu.TPUEstimator is deprecated. Please use tf.compat.v1.estimator.tpu.TPUEstimator instead.

WARNING:tensorflow:From /home/<username>/miniconda3/envs/poisson/lib/python3.9/site-packages/tensorflow_gan/python/estimator/tpu_gan_estimator.py:42: The name tf.estimator.tpu.TPUEstimator is deprecated. Please use tf.compat.v1.estimator.tpu.TPUEstimator instead.

/home/<username>/miniconda3/envs/poisson/lib/python3.9/site-packages/torch/cuda/__init__.py:146: UserWarning:
NVIDIA GeForce RTX 3080 Ti with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3080 Ti GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
2022-10-27 11:05:10.565954: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:966] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-10-27 11:05:10.688136: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2022-10-27 11:05:10.688179: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
I1027 11:05:10.730823 140021456623424 xla_bridge.py:345] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker:
I1027 11:05:10.965448 140021456623424 xla_bridge.py:345] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: Interpreter CUDA Host
I1027 11:05:10.965839 140021456623424 xla_bridge.py:345] Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client'
I1027 11:05:10.966645 140021456623424 dataset_info.py:439] Load dataset info from /home/<username>/tensorflow_datasets/cifar10/3.0.2
W1027 11:05:10.968783 140021456623424 options.py:588] options.experimental_threading is deprecated. Use options.threading instead.
W1027 11:05:10.968874 140021456623424 options.py:588] options.experimental_threading is deprecated. Use options.threading instead.
I1027 11:05:10.969661 140021456623424 dataset_builder.py:369] Reusing dataset cifar10 (/home/<username>/tensorflow_datasets/cifar10/3.0.2)
I1027 11:05:10.969773 140021456623424 logging_logger.py:44] Constructing tf.data.Dataset cifar10 for split train, from /home/<username>/tensorflow_datasets/cifar10/3.0.2
2022-10-27 11:05:10.972231: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
W1027 11:05:11.132379 140021456623424 options.py:588] options.experimental_threading is deprecated. Use options.threading instead.
W1027 11:05:11.132514 140021456623424 options.py:588] options.experimental_threading is deprecated. Use options.threading instead.
I1027 11:05:11.132633 140021456623424 dataset_builder.py:369] Reusing dataset cifar10 (/home/<username>/tensorflow_datasets/cifar10/3.0.2)
I1027 11:05:11.132725 140021456623424 logging_logger.py:44] Constructing tf.data.Dataset cifar10 for split test, from /home/<username>/tensorflow_datasets/cifar10/3.0.2
--- sampling eps: 0.001
I1027 11:05:12.034308 140021456623424 resolver.py:105] Using /tmp/tfhub_modules to cache modules.
I1027 11:05:13.515388 140021456623424 run_lib.py:308] begin checkpoint: 8
./workdir/checkpoints/checkpoint_400000.pth does not exist
./workdir/checkpoints/checkpoint_450000.pth does not exist
loading from  ./workdir/checkpoints/checkpoint_500000.pth
./workdir/checkpoints/checkpoint_550000.pth does not exist
./workdir/checkpoints/checkpoint_600000.pth does not exist
./workdir/checkpoints/checkpoint_650000.pth does not exist
./workdir/checkpoints/checkpoint_700000.pth does not exist
./workdir/checkpoints/checkpoint_750000.pth does not exist
./workdir/checkpoints/checkpoint_800000.pth does not exist
./workdir/checkpoints/checkpoint_850000.pth does not exist
./workdir/checkpoints/checkpoint_900000.pth does not exist
./workdir/checkpoints/checkpoint_950000.pth does not exist
./workdir/checkpoints/checkpoint_1000000.pth does not exist
./workdir/checkpoints/checkpoint_1050000.pth does not exist
./workdir/checkpoints/checkpoint_1100000.pth does not exist
./workdir/checkpoints/checkpoint_1150000.pth does not exist
./workdir/checkpoints/checkpoint_1200000.pth does not exist
./workdir/checkpoints/checkpoint_1250000.pth does not exist
./workdir/checkpoints/checkpoint_1300000.pth does not exist

How to set hyperparameter like gamma, sigma_end etc.?

Hi, Could you please provide code of calculating average l2 norm on CIFAR-10? Thanks!

ImportError: This version of TensorFlow Probability requires TensorFlow version >= 2.10; Detected an installation of version 2.9.0. Please upgrade TensorFlow to proceed.

Hi, I made sure I used python3.9.2 as you do and installed the package with requirements.txt, but I still have this error, if I update tensorflow it will raise other errors; May I ask why?

ImportError: cannot import name 'ddpm' from 'models'

CalledProcessError

Hi, Yilun, when I excute the following sampling code
python3 main.py --config ./configs/poisson/cifar10_ddpmpp.py --mode eval --workdir poisson/cifar10_ddpmpp --config.eval.enable_sampling --config.eval.save_images --config.eval.batch_size 100.
A CalledProcessError is raised:

2022-10-27 16:39:03.643006: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.2022-10-27 16:39:03.846988: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-10-27 16:39:04.484731: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-11.0/lib64:/usr/local/cuda-11.0/lib642022-10-27 16:39:04.484847: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-11.0/lib64:/usr/local/cuda-11.0/lib64
2022-10-27 16:39:04.484860: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.WARNING:tensorflow:From /home/xjtu/anaconda3/lib/python3.9/site-packages/tensorflow_gan/python/estimator/tpu_gan_estimator.py:42: The name tf.estimator.tpu.TPUEstimator is deprecated. Please use tf.compat.v1.estimator.tpu.TPUEstimator instead.
Traceback (most recent call last):  File "/home/xjtu/anaconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1808, in _run_ninja_build    subprocess.run(  File "/home/xjtu/anaconda3/lib/python3.9/subprocess.py", line 528, in run    raise CalledProcessError(retcode, process.args,subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:

Traceback (most recent call last):  File "/home/xjtu/code/Poisson_flow/main.py", line 18, in <module>    import run_lib  File "/home/xjtu/code/Poisson_flow/run_lib.py", line 30, in <module>    from models import ncsnv2, ncsnpp  File "/home/xjtu/code/Poisson_flow/models/ncsnpp.py", line 18, in <module>    from . import utils, layers, layerspp, normalization  File "/home/xjtu/code/Poisson_flow/models/layerspp.py", line 20, in <module>    from . import up_or_down_sampling  File "/home/xjtu/code/Poisson_flow/models/up_or_down_sampling.py", line 10, in <module>    from op import upfirdn2d
  File "/home/xjtu/code/Poisson_flow/op/__init__.py", line 1, in <module>
    from .fused_act import FusedLeakyReLU, fused_leaky_relu
  File "/home/xjtu/code/Poisson_flow/op/fused_act.py", line 11, in <module>
    fused = load(
  File "/home/xjtu/anaconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1202, in load
    return _jit_compile(
  File "/home/xjtu/anaconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1425, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/home/xjtu/anaconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1537, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/home/xjtu/anaconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1824, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'fused': [1/2] :/usr/local/cuda-11.0:/usr/local/cuda-11.0/bin/nvcc  -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/xjtu/anaconda3/lib/python3.9/site-packages/torch/include -isystem /home/xjtu/anaconda3/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/xjtu/anaconda3/lib/python3.9/site-packages/torch/include/TH -isystem /home/xjtu/anaconda3/lib/python3.9/site-packages/torch/include/THC -isystem :/usr/local/cuda-11.0:/usr/local/cuda-11.0/include -isystem /home/xjtu/anaconda3/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/xjtu/code/Poisson_flow/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o 
FAILED: fused_bias_act_kernel.cuda.o 
:/usr/local/cuda-11.0:/usr/local/cuda-11.0/bin/nvcc  -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/xjtu/anaconda3/lib/python3.9/site-packages/torch/include -isystem /home/xjtu/anaconda3/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/xjtu/anaconda3/lib/python3.9/site-packages/torch/include/TH -isystem /home/xjtu/anaconda3/lib/python3.9/site-packages/torch/include/THC -isystem :/usr/local/cuda-11.0:/usr/local/cuda-11.0/include -isystem /home/xjtu/anaconda3/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/xjtu/code/Poisson_flow/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o 
/bin/sh: 1: :/usr/local/cuda-11.0:/usr/local/cuda-11.0/bin/nvcc: not found
ninja: build stopped: subcommand failed.

Do you have any idea?

Request for a Talk on Poisson Flow Generative Models

Hello!

I'm extremely interested in Poisson Flow Generative Models (PFGM) and their improved versions. I would love to gain insights into the thought process behind designing such models. I was wondering if there are any plans to give a talk on PFGMs in the near future?

best wishes

upfirdn2d_op = load( "upfirdn2d", sources=[ os.path.join(module_path, "upfirdn2d.cpp"), os.path.join(module_path, "upfirdn2d_kernel.cu"), ], )

Hi, when I run "python3 main.py --config ./configs/poisson/cifar10_ddpmpp.py --mode train --workdir poisson_ddpmpp", the program keeps running, but there is no feedback from the terminal. I debugged the program and found that the program has been running in this line of code:

upfirdn2d_op = load(
"upfirdn2d",
sources=[
os.path.join(module_path, "upfirdn2d.cpp"),
os.path.join(module_path, "upfirdn2d_kernel.cu"),
],
)

Could you please tell me what I should do?

Hyper prarameters, l1 norm or l2 norm?

Hi, I have a question which make me confused.
What is the 'Average norm of data' in hyper-parameters.py, l1 norm or l2 norm?
I found that in the paper B1.1, Ep(x)∣∣x∣∣^2 ≈ 900, ∣∣x∣∣ is the sign of l1 norm, but I caculated it with cifar10 train dataset(50000), the result is 28.9^2 when I use l2 norm.

Could not find a version that satisfies the requirement absl==0.0

I created a new conda environment in wsl2 in order to set up the repo, but I get this error when I attempt to install requirements.txt. I see that absl_py==1.2.0 is already a requirement; is this extra requirement a mistake?

Question with inferencing stage

Hi,

I was able to get the code to run and train on cifar 10. However, I am struggling to understand how would I inference with the model.

Lets say I have a trained model, and a random noise of size (64, 3, 32, 32) (B,C,H,W). how would I use the existing model and the restore_checkpoint('from utils import restore_checkpoint') to create 64 new images?

feel free to refer to any function in the repo.

Thanks,
Michael

newbeeer / poisson_flow Goto Github PK

poisson_flow's People

Contributors

Stargazers

Watchers

Forkers

poisson_flow's Issues

Recommend Projects

Recommend Topics

Recommend Org