ciarastrawberry / svd-temporal-controlnet Goto Github PK

View Code? Open in Web Editor NEW

395.0 395.0 29.0 10.52 MB

Python 100.00%

svd-temporal-controlnet's People

Contributors

Stargazers

Watchers

svd-temporal-controlnet's Issues

Is there any way to enhance temporal consistency?

some other conditions

hi
@CiaraStrawberry
Thank you for open sourcing such great work. Currently your work only supports depth maps, not sure if it supports other conditions like pose, map etc. Maybe, can I also train my own based on psoe or other conditions?
Looking forward your reply!!!

Torch not compiled with CUDA enabled

(venv) C:\svd-temporal-controlnet>python run_inference.py
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 2.1.2+cu121 with CUDA 1201 (you have 2.1.2+cpu)
    Python  3.10.11 (you have 3.10.6)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details
layers per block is 2
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 5/5 [00:00<00:00, 25.25it/s]
Traceback (most recent call last):
  File "C:\svd-temporal-controlnet\run_inference.py", line 274, in <module>
    video_frames = pipeline(validation_image, validation_control_images[:14], decode_chunk_size=8,num_frames=14,motion_bucket_id=100,controlnet_cond_scale=1.0).frames
  File "C:\svd-temporal-controlnet\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\svd-temporal-controlnet\pipeline\pipeline_stable_video_diffusion_controlnet.py", line 441, in __call__
    image_embeddings = self._encode_image(image, device, num_videos_per_prompt, do_classifier_free_guidance)
  File "C:\svd-temporal-controlnet\pipeline\pipeline_stable_video_diffusion_controlnet.py", line 155, in _encode_image
    image = image.to(device=device, dtype=dtype)
  File "C:\svd-temporal-controlnet\venv\lib\site-packages\torch\cuda\__init__.py", line 289, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

and can you tell me where is downloaded models?

Great Work. Curious about more details for training.

Hi,

Kudos on the great work for ControlSVD. I tested it on some of my customized case they looks very appealing. Wonder do you plan to release more details in regards of training? E.g., size of data you leveraged for training, hyper-parameters and etc. And also it'd be very helpful if you plan to release sample of training data. Looking forward to your response.

Regards

How to call this control node?

Hello, not long after I started learning svd, I installed your plug-in, but I couldn't find the corresponding node configuration in comfyui. I would like to ask how I can bring out this control node in comfyui,
thank you.

Sampling of sigma

Hi, it seems that a simple log-normal distribution should be used to sample sigma.

svd-temporal-controlnet/train_svd.py

Line 179 in bdd2e8f

 def rand_cosine_interpolated(shape, image_d, noise_d_low, noise_d_high, sigma_data=1., min_value=1e-3, max_value=1e3, device='cpu', dtype=torch.float32): 

Replace it with:

def rand_log_normal(shape, loc=0., scale=1., device='cpu', dtype=torch.float32):
    """Draws samples from an lognormal distribution."""
    u = torch.rand(shape, dtype=dtype, device=device) * (1 - 2e-7) + 1e-7
    return torch.distributions.Normal(loc, scale).icdf(u).exp()

Relevant discussions can be found here.
pixeli99/SVD_Xtend#21

noise_scheduler

noise_scheduler was not used in train_svd.py,why?
@CiaraStrawberry @BlakeOne

Do you have any plans to implement lora for Temporal Attention?

great work!
Do you have any intention of implementing temporal attention lora, as in the SVD paper?

Size of output

Hi, it takes height, width as parameters, but getting always 256*256 from run_inference

merry xmas

Discussion with CFG

For inference of paper https://arxiv.org/pdf/2211.09800.pdf, It seems C_I and C_T as two conditions and there should be three times inference and two guidences scales. You still use C_T not use C_I as condition for inference. Have you try the same way as InstructPix2Pix shows?

What is motion value?

Hi, thanks for your great work!
I'm trying to modify your code on my work. However, I don't know what is the meaning of the 'motion value', could you tell me the usage of it?
Besides, why the motion value can be converted to "add_time_ids"?
Thank you!

Inquiry on VRAM and training time requirement

Hi, and thank you for this excellent open-source project!

Could you provide details on the GPU requirements for training, specifically regarding VRAM usage and expected training duration?

Thanks for your assistance!

Code in repo not working, outdated?

Oops it looks like the code in this repo has gotten out of sync with your working copy...

Here are the issues that I've encountered so far: the motion_bucket_ids were on the wrong device resulting in a runtime error, EulerDiscreteScheduler was being imported improperly resulting in another runtime error, and the number of frames is still somewhere hard coded at 16 it seems...

I can try to PR my minimal fixes in but I wonder if it might be simpler and more comprehensive to upload the new, working code files that you're training with?

Either way thanks for sharing your project, it looks really exciting! :)

"exp_vml_cpu" not implemented for 'Half'

Hello, author. Thank you for the great open-source project. I encountered the following bug while training the code. Do you have any suggestions?

File "train_svd.py", line 1425, in <module> main() File "train_svd.py", line 1172, in main encoder_hidden_states = encode_image( File "train_svd.py", line 1031, in encode_image pixel_values = _resize_with_antialiasing(pixel_values, (224, 224)) File "train_svd.py", line 253, in _resize_with_antialiasing input = _gaussian_blur2d(input, ks, sigmas) File "train_svd.py", line 333, in _gaussian_blur2d kernel_x = _gaussian(kx, sigma[:, 1].view(bs, 1)) File "train_svd.py", line 320, in _gaussian gauss = torch.exp(-x.pow(2.0) / (2 * sigma.pow(2.0))) RuntimeError: "exp_vml_cpu" not implemented for 'Half'

What is purpose of the motion values?

Thank you to the author for providing such a great open-source code!

I tried to train this code with openpose as the condition, but I can't understand what the motion value represents, or what characteristics of the video frame does it reflect? Or how should I process the video to get this motion value?

I have seen similar problems in closed issues link .

Please feel free to enlighten me, thank you.

ciarastrawberry / svd-temporal-controlnet Goto Github PK

svd-temporal-controlnet's People

Contributors

Stargazers

Watchers

Forkers

svd-temporal-controlnet's Issues

Recommend Projects

Recommend Topics

Recommend Org