newbeeer / poisson_flow Goto Github PK
View Code? Open in Web Editor NEWCode for NeurIPS 2022 Paper, "Poisson Flow Generative Models" (PFGM)
License: Apache License 2.0
Code for NeurIPS 2022 Paper, "Poisson Flow Generative Models" (PFGM)
License: Apache License 2.0
Hi, Yilun, I have another question.
When I was sampling from bedroom_ddpmpp model, an error occurs
File "/home/xjtu/code/Poisson_flow/main.py", line 60, in <module>
app.run(main)
File "/home/xjtu/anaconda3/envs/sde/lib/python3.9/site-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/home/xjtu/anaconda3/envs/sde/lib/python3.9/site-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File "/home/xjtu/code/Poisson_flow/main.py", line 54, in main
run_lib.evaluate(FLAGS.config, FLAGS.workdir, FLAGS.eval_folder)
File "/home/xjtu/code/Poisson_flow/run_lib.py", line 345, in evaluate
eval_iter = iter(eval_ds) # pytype: disable=wrong-arg-types
UnboundLocalError: local variable 'eval_ds' referenced before assignment
I found that the reason is in #LINE 124 in the datasets.py, as the following:
dataset_builder = tfds.builder(f'lsun/{config.data.category}', data_dir='/scratch/ylxu/tensorflow_datasets')
So I wonder why is the config of 'LSUN' dataset different from CIFAR10 or CELEBA. (i.e. there is a data_dir which I could not find)
Hi,I use the Ubuntu 20.04 system, and the server in the Python 3.8/cuda11.6 environment runs this command
python3 main.py --config ./configs/poisson/cifar10_ddpmpp.py --mode train --workdir poisson_ddpmpp
The following problems have occurred. Would you please check them? Thank you
WARNING:tensorflow:From /root/miniconda3/envs/myconda/lib/python3.8/site-packages/tensorflow_gan/python/estimator/tpu_gan_estimator.py:42: The name tf.estimator.tpu.TPUEstimator is deprecated. Please use tf.compat.v1.estimator.tpu.TPUEstimator instead.
WARNING:tensorflow:From /root/miniconda3/envs/myconda/lib/python3.8/site-packages/tensorflow_gan/python/estimator/tpu_gan_estimator.py:42: The name tf.estimator.tpu.TPUEstimator is deprecated. Please use tf.compat.v1.estimator.tpu.TPUEstimator instead.
1 Physical GPUs, 1 Logical GPUs
I0129 10:36:33.654870 140394431960256 xla_bridge.py:356] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker:
I0129 10:36:33.660157 140394431960256 xla_bridge.py:356] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: Interpreter Host CUDA
I0129 10:36:33.661039 140394431960256 xla_bridge.py:356] Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client'
I0129 10:36:33.661505 140394431960256 xla_bridge.py:356] Unable to initialize backend 'plugin': xla_extension has no attributes named get_plugin_device_client. Compile TensorFlow with //tensorflow/compiler/xla/python:enable_plugin_device set to true (defaults to false) to enable this.
2023-01-29 10:36:33.680217: W tensorflow/core/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "NOT_FOUND: Could not locate the credentials file.". Retrieving token from GCE failed with "FAILED_PRECONDITION: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host
My environment already has Cuda 11.8 installed.
python==3.9.13
absl==0.0
+cu116
suffixes for the torch
and torchvision
packagespip install -r requirements.txt
cifar10_ddpmpp.zip
from the linked google drive and extracted the checkpoint_500000.pth
model file to the directory ./workdir/checkpoints
pip install "jax[cuda11_cudnn82]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
When running the following command:
python main.py --config ./configs/poisson/cifar10_ddpmpp.py --mode eval --workdir ./workdir
I get this console output, and no files are generated:
2022-10-27 11:04:29.246149: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-10-27 11:04:29.348970: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-10-27 11:04:29.766714: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2022-10-27 11:04:29.766792: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2022-10-27 11:04:29.766821: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
WARNING:tensorflow:From /home/<username>/miniconda3/envs/poisson/lib/python3.9/site-packages/tensorflow_gan/python/estimator/tpu_gan_estimator.py:42: The name tf.estimator.tpu.TPUEstimator is deprecated. Please use tf.compat.v1.estimator.tpu.TPUEstimator instead.
WARNING:tensorflow:From /home/<username>/miniconda3/envs/poisson/lib/python3.9/site-packages/tensorflow_gan/python/estimator/tpu_gan_estimator.py:42: The name tf.estimator.tpu.TPUEstimator is deprecated. Please use tf.compat.v1.estimator.tpu.TPUEstimator instead.
/home/<username>/miniconda3/envs/poisson/lib/python3.9/site-packages/torch/cuda/__init__.py:146: UserWarning:
NVIDIA GeForce RTX 3080 Ti with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3080 Ti GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
2022-10-27 11:05:10.565954: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:966] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-10-27 11:05:10.688136: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2022-10-27 11:05:10.688179: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
I1027 11:05:10.730823 140021456623424 xla_bridge.py:345] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker:
I1027 11:05:10.965448 140021456623424 xla_bridge.py:345] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: Interpreter CUDA Host
I1027 11:05:10.965839 140021456623424 xla_bridge.py:345] Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client'
I1027 11:05:10.966645 140021456623424 dataset_info.py:439] Load dataset info from /home/<username>/tensorflow_datasets/cifar10/3.0.2
W1027 11:05:10.968783 140021456623424 options.py:588] options.experimental_threading is deprecated. Use options.threading instead.
W1027 11:05:10.968874 140021456623424 options.py:588] options.experimental_threading is deprecated. Use options.threading instead.
I1027 11:05:10.969661 140021456623424 dataset_builder.py:369] Reusing dataset cifar10 (/home/<username>/tensorflow_datasets/cifar10/3.0.2)
I1027 11:05:10.969773 140021456623424 logging_logger.py:44] Constructing tf.data.Dataset cifar10 for split train, from /home/<username>/tensorflow_datasets/cifar10/3.0.2
2022-10-27 11:05:10.972231: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
W1027 11:05:11.132379 140021456623424 options.py:588] options.experimental_threading is deprecated. Use options.threading instead.
W1027 11:05:11.132514 140021456623424 options.py:588] options.experimental_threading is deprecated. Use options.threading instead.
I1027 11:05:11.132633 140021456623424 dataset_builder.py:369] Reusing dataset cifar10 (/home/<username>/tensorflow_datasets/cifar10/3.0.2)
I1027 11:05:11.132725 140021456623424 logging_logger.py:44] Constructing tf.data.Dataset cifar10 for split test, from /home/<username>/tensorflow_datasets/cifar10/3.0.2
--- sampling eps: 0.001
I1027 11:05:12.034308 140021456623424 resolver.py:105] Using /tmp/tfhub_modules to cache modules.
I1027 11:05:13.515388 140021456623424 run_lib.py:308] begin checkpoint: 8
./workdir/checkpoints/checkpoint_400000.pth does not exist
./workdir/checkpoints/checkpoint_450000.pth does not exist
loading from ./workdir/checkpoints/checkpoint_500000.pth
./workdir/checkpoints/checkpoint_550000.pth does not exist
./workdir/checkpoints/checkpoint_600000.pth does not exist
./workdir/checkpoints/checkpoint_650000.pth does not exist
./workdir/checkpoints/checkpoint_700000.pth does not exist
./workdir/checkpoints/checkpoint_750000.pth does not exist
./workdir/checkpoints/checkpoint_800000.pth does not exist
./workdir/checkpoints/checkpoint_850000.pth does not exist
./workdir/checkpoints/checkpoint_900000.pth does not exist
./workdir/checkpoints/checkpoint_950000.pth does not exist
./workdir/checkpoints/checkpoint_1000000.pth does not exist
./workdir/checkpoints/checkpoint_1050000.pth does not exist
./workdir/checkpoints/checkpoint_1100000.pth does not exist
./workdir/checkpoints/checkpoint_1150000.pth does not exist
./workdir/checkpoints/checkpoint_1200000.pth does not exist
./workdir/checkpoints/checkpoint_1250000.pth does not exist
./workdir/checkpoints/checkpoint_1300000.pth does not exist
Hi, Could you please provide code of calculating average l2 norm on CIFAR-10? Thanks!
Hi, I made sure I used python3.9.2 as you do and installed the package with requirements.txt, but I still have this error, if I update tensorflow it will raise other errors; May I ask why?
Hi, Yilun, when I excute the following sampling code
python3 main.py --config ./configs/poisson/cifar10_ddpmpp.py --mode eval --workdir poisson/cifar10_ddpmpp --config.eval.enable_sampling --config.eval.save_images --config.eval.batch_size 100
.
A CalledProcessError is raised:
2022-10-27 16:39:03.643006: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.2022-10-27 16:39:03.846988: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-10-27 16:39:04.484731: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-11.0/lib64:/usr/local/cuda-11.0/lib642022-10-27 16:39:04.484847: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-11.0/lib64:/usr/local/cuda-11.0/lib64
2022-10-27 16:39:04.484860: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.WARNING:tensorflow:From /home/xjtu/anaconda3/lib/python3.9/site-packages/tensorflow_gan/python/estimator/tpu_gan_estimator.py:42: The name tf.estimator.tpu.TPUEstimator is deprecated. Please use tf.compat.v1.estimator.tpu.TPUEstimator instead.
Traceback (most recent call last): File "/home/xjtu/anaconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1808, in _run_ninja_build subprocess.run( File "/home/xjtu/anaconda3/lib/python3.9/subprocess.py", line 528, in run raise CalledProcessError(retcode, process.args,subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/home/xjtu/code/Poisson_flow/main.py", line 18, in <module> import run_lib File "/home/xjtu/code/Poisson_flow/run_lib.py", line 30, in <module> from models import ncsnv2, ncsnpp File "/home/xjtu/code/Poisson_flow/models/ncsnpp.py", line 18, in <module> from . import utils, layers, layerspp, normalization File "/home/xjtu/code/Poisson_flow/models/layerspp.py", line 20, in <module> from . import up_or_down_sampling File "/home/xjtu/code/Poisson_flow/models/up_or_down_sampling.py", line 10, in <module> from op import upfirdn2d
File "/home/xjtu/code/Poisson_flow/op/__init__.py", line 1, in <module>
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/home/xjtu/code/Poisson_flow/op/fused_act.py", line 11, in <module>
fused = load(
File "/home/xjtu/anaconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1202, in load
return _jit_compile(
File "/home/xjtu/anaconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1425, in _jit_compile
_write_ninja_file_and_build_library(
File "/home/xjtu/anaconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1537, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/home/xjtu/anaconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1824, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'fused': [1/2] :/usr/local/cuda-11.0:/usr/local/cuda-11.0/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/xjtu/anaconda3/lib/python3.9/site-packages/torch/include -isystem /home/xjtu/anaconda3/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/xjtu/anaconda3/lib/python3.9/site-packages/torch/include/TH -isystem /home/xjtu/anaconda3/lib/python3.9/site-packages/torch/include/THC -isystem :/usr/local/cuda-11.0:/usr/local/cuda-11.0/include -isystem /home/xjtu/anaconda3/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/xjtu/code/Poisson_flow/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
FAILED: fused_bias_act_kernel.cuda.o
:/usr/local/cuda-11.0:/usr/local/cuda-11.0/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/xjtu/anaconda3/lib/python3.9/site-packages/torch/include -isystem /home/xjtu/anaconda3/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/xjtu/anaconda3/lib/python3.9/site-packages/torch/include/TH -isystem /home/xjtu/anaconda3/lib/python3.9/site-packages/torch/include/THC -isystem :/usr/local/cuda-11.0:/usr/local/cuda-11.0/include -isystem /home/xjtu/anaconda3/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/xjtu/code/Poisson_flow/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
/bin/sh: 1: :/usr/local/cuda-11.0:/usr/local/cuda-11.0/bin/nvcc: not found
ninja: build stopped: subcommand failed.
Do you have any idea?
Hello!
I'm extremely interested in Poisson Flow Generative Models (PFGM) and their improved versions. I would love to gain insights into the thought process behind designing such models. I was wondering if there are any plans to give a talk on PFGMs in the near future?
best wishes
Hi, when I run "python3 main.py --config ./configs/poisson/cifar10_ddpmpp.py --mode train --workdir poisson_ddpmpp", the program keeps running, but there is no feedback from the terminal. I debugged the program and found that the program has been running in this line of code:
upfirdn2d_op = load(
"upfirdn2d",
sources=[
os.path.join(module_path, "upfirdn2d.cpp"),
os.path.join(module_path, "upfirdn2d_kernel.cu"),
],
)
Could you please tell me what I should do?
Hi, I have a question which make me confused.
What is the 'Average norm of data' in hyper-parameters.py, l1 norm or l2 norm?
I found that in the paper B1.1, Ep(x)∣∣x∣∣^2 ≈ 900, ∣∣x∣∣ is the sign of l1 norm, but I caculated it with cifar10 train dataset(50000), the result is 28.9^2 when I use l2 norm.
I created a new conda environment in wsl2 in order to set up the repo, but I get this error when I attempt to install requirements.txt. I see that absl_py==1.2.0 is already a requirement; is this extra requirement a mistake?
Hi,
I was able to get the code to run and train on cifar 10. However, I am struggling to understand how would I inference with the model.
Lets say I have a trained model, and a random noise of size (64, 3, 32, 32) (B,C,H,W). how would I use the existing model and the restore_checkpoint('from utils import restore_checkpoint') to create 64 new images?
feel free to refer to any function in the repo.
Thanks,
Michael
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.