Giter VIP home page Giter VIP logo

poisson_flow's People

Contributors

newbeeer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

poisson_flow's Issues

UnboundLocalError: local variable 'eval_ds' referenced before assignment

Hi, Yilun, I have another question.
When I was sampling from bedroom_ddpmpp model, an error occurs

  File "/home/xjtu/code/Poisson_flow/main.py", line 60, in <module>
    app.run(main)
  File "/home/xjtu/anaconda3/envs/sde/lib/python3.9/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/home/xjtu/anaconda3/envs/sde/lib/python3.9/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "/home/xjtu/code/Poisson_flow/main.py", line 54, in main
    run_lib.evaluate(FLAGS.config, FLAGS.workdir, FLAGS.eval_folder)
  File "/home/xjtu/code/Poisson_flow/run_lib.py", line 345, in evaluate
    eval_iter = iter(eval_ds)  # pytype: disable=wrong-arg-types
UnboundLocalError: local variable 'eval_ds' referenced before assignment

I found that the reason is in #LINE 124 in the datasets.py, as the following:

dataset_builder = tfds.builder(f'lsun/{config.data.category}', data_dir='/scratch/ylxu/tensorflow_datasets')

So I wonder why is the config of 'LSUN' dataset different from CIFAR10 or CELEBA. (i.e. there is a data_dir which I could not find)

Error running after environment is configured on the server

Hi,I use the Ubuntu 20.04 system, and the server in the Python 3.8/cuda11.6 environment runs this command
python3 main.py --config ./configs/poisson/cifar10_ddpmpp.py --mode train --workdir poisson_ddpmpp
The following problems have occurred. Would you please check them? Thank you

WARNING:tensorflow:From /root/miniconda3/envs/myconda/lib/python3.8/site-packages/tensorflow_gan/python/estimator/tpu_gan_estimator.py:42: The name tf.estimator.tpu.TPUEstimator is deprecated. Please use tf.compat.v1.estimator.tpu.TPUEstimator instead.

WARNING:tensorflow:From /root/miniconda3/envs/myconda/lib/python3.8/site-packages/tensorflow_gan/python/estimator/tpu_gan_estimator.py:42: The name tf.estimator.tpu.TPUEstimator is deprecated. Please use tf.compat.v1.estimator.tpu.TPUEstimator instead.

1 Physical GPUs, 1 Logical GPUs
I0129 10:36:33.654870 140394431960256 xla_bridge.py:356] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker: 
I0129 10:36:33.660157 140394431960256 xla_bridge.py:356] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: Interpreter Host CUDA
I0129 10:36:33.661039 140394431960256 xla_bridge.py:356] Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client'
I0129 10:36:33.661505 140394431960256 xla_bridge.py:356] Unable to initialize backend 'plugin': xla_extension has no attributes named get_plugin_device_client. Compile TensorFlow with //tensorflow/compiler/xla/python:enable_plugin_device set to true (defaults to false) to enable this.
2023-01-29 10:36:33.680217: W tensorflow/core/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "NOT_FOUND: Could not locate the credentials file.". Retrieving token from GCE failed with "FAILED_PRECONDITION: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host 

A question about predicting "z"

Hello!
I have a question that why we need to let neural network learn "z"... Both direction and magnitute of "z" can be obtained directly..
image.

best wishes,
Huanran Chen

Steps I had to take in order to run this project on my machine locally

My environment already has Cuda 11.8 installed.

  1. Clone the repository locally
  2. Create a new conda environment with python==3.9.13
  3. Modify requirements.txt to remove the requirement for absl==0.0
    4. Modify requirements.txt to remove the +cu116 suffixes for the torch and torchvision packages This change is part of why I was facing issues
  4. pip install -r requirements.txt
  5. Install tensorflow 2.10.0 over the version specified in the requirements.txt in order to support Tensorflow Probability
  6. Install ninja to load C++ extensions
  7. Downloaded the zip cifar10_ddpmpp.zip from the linked google drive and extracted the checkpoint_500000.pth model file to the directory ./workdir/checkpoints
  8. pip install "jax[cuda11_cudnn82]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

When running the following command:
python main.py --config ./configs/poisson/cifar10_ddpmpp.py --mode eval --workdir ./workdir

I get this console output, and no files are generated:

2022-10-27 11:04:29.246149: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-10-27 11:04:29.348970: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-10-27 11:04:29.766714: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2022-10-27 11:04:29.766792: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2022-10-27 11:04:29.766821: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
WARNING:tensorflow:From /home/<username>/miniconda3/envs/poisson/lib/python3.9/site-packages/tensorflow_gan/python/estimator/tpu_gan_estimator.py:42: The name tf.estimator.tpu.TPUEstimator is deprecated. Please use tf.compat.v1.estimator.tpu.TPUEstimator instead.

WARNING:tensorflow:From /home/<username>/miniconda3/envs/poisson/lib/python3.9/site-packages/tensorflow_gan/python/estimator/tpu_gan_estimator.py:42: The name tf.estimator.tpu.TPUEstimator is deprecated. Please use tf.compat.v1.estimator.tpu.TPUEstimator instead.

/home/<username>/miniconda3/envs/poisson/lib/python3.9/site-packages/torch/cuda/__init__.py:146: UserWarning:
NVIDIA GeForce RTX 3080 Ti with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3080 Ti GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
2022-10-27 11:05:10.565954: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:966] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-10-27 11:05:10.688136: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2022-10-27 11:05:10.688179: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
I1027 11:05:10.730823 140021456623424 xla_bridge.py:345] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker:
I1027 11:05:10.965448 140021456623424 xla_bridge.py:345] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: Interpreter CUDA Host
I1027 11:05:10.965839 140021456623424 xla_bridge.py:345] Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client'
I1027 11:05:10.966645 140021456623424 dataset_info.py:439] Load dataset info from /home/<username>/tensorflow_datasets/cifar10/3.0.2
W1027 11:05:10.968783 140021456623424 options.py:588] options.experimental_threading is deprecated. Use options.threading instead.
W1027 11:05:10.968874 140021456623424 options.py:588] options.experimental_threading is deprecated. Use options.threading instead.
I1027 11:05:10.969661 140021456623424 dataset_builder.py:369] Reusing dataset cifar10 (/home/<username>/tensorflow_datasets/cifar10/3.0.2)
I1027 11:05:10.969773 140021456623424 logging_logger.py:44] Constructing tf.data.Dataset cifar10 for split train, from /home/<username>/tensorflow_datasets/cifar10/3.0.2
2022-10-27 11:05:10.972231: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
W1027 11:05:11.132379 140021456623424 options.py:588] options.experimental_threading is deprecated. Use options.threading instead.
W1027 11:05:11.132514 140021456623424 options.py:588] options.experimental_threading is deprecated. Use options.threading instead.
I1027 11:05:11.132633 140021456623424 dataset_builder.py:369] Reusing dataset cifar10 (/home/<username>/tensorflow_datasets/cifar10/3.0.2)
I1027 11:05:11.132725 140021456623424 logging_logger.py:44] Constructing tf.data.Dataset cifar10 for split test, from /home/<username>/tensorflow_datasets/cifar10/3.0.2
--- sampling eps: 0.001
I1027 11:05:12.034308 140021456623424 resolver.py:105] Using /tmp/tfhub_modules to cache modules.
I1027 11:05:13.515388 140021456623424 run_lib.py:308] begin checkpoint: 8
./workdir/checkpoints/checkpoint_400000.pth does not exist
./workdir/checkpoints/checkpoint_450000.pth does not exist
loading from  ./workdir/checkpoints/checkpoint_500000.pth
./workdir/checkpoints/checkpoint_550000.pth does not exist
./workdir/checkpoints/checkpoint_600000.pth does not exist
./workdir/checkpoints/checkpoint_650000.pth does not exist
./workdir/checkpoints/checkpoint_700000.pth does not exist
./workdir/checkpoints/checkpoint_750000.pth does not exist
./workdir/checkpoints/checkpoint_800000.pth does not exist
./workdir/checkpoints/checkpoint_850000.pth does not exist
./workdir/checkpoints/checkpoint_900000.pth does not exist
./workdir/checkpoints/checkpoint_950000.pth does not exist
./workdir/checkpoints/checkpoint_1000000.pth does not exist
./workdir/checkpoints/checkpoint_1050000.pth does not exist
./workdir/checkpoints/checkpoint_1100000.pth does not exist
./workdir/checkpoints/checkpoint_1150000.pth does not exist
./workdir/checkpoints/checkpoint_1200000.pth does not exist
./workdir/checkpoints/checkpoint_1250000.pth does not exist
./workdir/checkpoints/checkpoint_1300000.pth does not exist

CalledProcessError

Hi, Yilun, when I excute the following sampling code
python3 main.py --config ./configs/poisson/cifar10_ddpmpp.py --mode eval --workdir poisson/cifar10_ddpmpp --config.eval.enable_sampling --config.eval.save_images --config.eval.batch_size 100.
A CalledProcessError is raised:

2022-10-27 16:39:03.643006: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.2022-10-27 16:39:03.846988: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-10-27 16:39:04.484731: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-11.0/lib64:/usr/local/cuda-11.0/lib642022-10-27 16:39:04.484847: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-11.0/lib64:/usr/local/cuda-11.0/lib64
2022-10-27 16:39:04.484860: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.WARNING:tensorflow:From /home/xjtu/anaconda3/lib/python3.9/site-packages/tensorflow_gan/python/estimator/tpu_gan_estimator.py:42: The name tf.estimator.tpu.TPUEstimator is deprecated. Please use tf.compat.v1.estimator.tpu.TPUEstimator instead.
Traceback (most recent call last):  File "/home/xjtu/anaconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1808, in _run_ninja_build    subprocess.run(  File "/home/xjtu/anaconda3/lib/python3.9/subprocess.py", line 528, in run    raise CalledProcessError(retcode, process.args,subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:

Traceback (most recent call last):  File "/home/xjtu/code/Poisson_flow/main.py", line 18, in <module>    import run_lib  File "/home/xjtu/code/Poisson_flow/run_lib.py", line 30, in <module>    from models import ncsnv2, ncsnpp  File "/home/xjtu/code/Poisson_flow/models/ncsnpp.py", line 18, in <module>    from . import utils, layers, layerspp, normalization  File "/home/xjtu/code/Poisson_flow/models/layerspp.py", line 20, in <module>    from . import up_or_down_sampling  File "/home/xjtu/code/Poisson_flow/models/up_or_down_sampling.py", line 10, in <module>    from op import upfirdn2d
  File "/home/xjtu/code/Poisson_flow/op/__init__.py", line 1, in <module>
    from .fused_act import FusedLeakyReLU, fused_leaky_relu
  File "/home/xjtu/code/Poisson_flow/op/fused_act.py", line 11, in <module>
    fused = load(
  File "/home/xjtu/anaconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1202, in load
    return _jit_compile(
  File "/home/xjtu/anaconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1425, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/home/xjtu/anaconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1537, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/home/xjtu/anaconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1824, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'fused': [1/2] :/usr/local/cuda-11.0:/usr/local/cuda-11.0/bin/nvcc  -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/xjtu/anaconda3/lib/python3.9/site-packages/torch/include -isystem /home/xjtu/anaconda3/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/xjtu/anaconda3/lib/python3.9/site-packages/torch/include/TH -isystem /home/xjtu/anaconda3/lib/python3.9/site-packages/torch/include/THC -isystem :/usr/local/cuda-11.0:/usr/local/cuda-11.0/include -isystem /home/xjtu/anaconda3/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/xjtu/code/Poisson_flow/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o 
FAILED: fused_bias_act_kernel.cuda.o 
:/usr/local/cuda-11.0:/usr/local/cuda-11.0/bin/nvcc  -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/xjtu/anaconda3/lib/python3.9/site-packages/torch/include -isystem /home/xjtu/anaconda3/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/xjtu/anaconda3/lib/python3.9/site-packages/torch/include/TH -isystem /home/xjtu/anaconda3/lib/python3.9/site-packages/torch/include/THC -isystem :/usr/local/cuda-11.0:/usr/local/cuda-11.0/include -isystem /home/xjtu/anaconda3/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/xjtu/code/Poisson_flow/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o 
/bin/sh: 1: :/usr/local/cuda-11.0:/usr/local/cuda-11.0/bin/nvcc: not found
ninja: build stopped: subcommand failed.

Do you have any idea?

Request for a Talk on Poisson Flow Generative Models

Hello!

I'm extremely interested in Poisson Flow Generative Models (PFGM) and their improved versions. I would love to gain insights into the thought process behind designing such models. I was wondering if there are any plans to give a talk on PFGMs in the near future?

best wishes

upfirdn2d_op = load( "upfirdn2d", sources=[ os.path.join(module_path, "upfirdn2d.cpp"), os.path.join(module_path, "upfirdn2d_kernel.cu"), ], )

Hi, when I run "python3 main.py --config ./configs/poisson/cifar10_ddpmpp.py --mode train --workdir poisson_ddpmpp", the program keeps running, but there is no feedback from the terminal. I debugged the program and found that the program has been running in this line of code:

upfirdn2d_op = load(
"upfirdn2d",
sources=[
os.path.join(module_path, "upfirdn2d.cpp"),
os.path.join(module_path, "upfirdn2d_kernel.cu"),
],
)

Could you please tell me what I should do?

Hyper prarameters, l1 norm or l2 norm?

Hi, I have a question which make me confused.
What is the 'Average norm of data' in hyper-parameters.py, l1 norm or l2 norm?
I found that in the paper B1.1, Ep(x)∣∣x∣∣^2 ≈ 900, ∣∣x∣∣ is the sign of l1 norm, but I caculated it with cifar10 train dataset(50000), the result is 28.9^2 when I use l2 norm.

Question with inferencing stage

Hi,

I was able to get the code to run and train on cifar 10. However, I am struggling to understand how would I inference with the model.

Lets say I have a trained model, and a random noise of size (64, 3, 32, 32) (B,C,H,W). how would I use the existing model and the restore_checkpoint('from utils import restore_checkpoint') to create 64 new images?

feel free to refer to any function in the repo.

Thanks,
Michael

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.