When running the example code, I keep getting the following error (see below). Do you

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Error in Sample(): Expected scalar type float but found double about phenaki-pytorch HOT 7 OPEN

lucidrains commented on May 27, 2024

Error in Sample(): Expected scalar type float but found double

from phenaki-pytorch.

Comments (7)

gmegh commented on May 27, 2024 1

Yes, updating pytorch solved the issue, thanks!

Have you successfully trained the model?

from phenaki-pytorch.

gmegh commented on May 27, 2024 1

I had 1.10, i believe

Regarding training, how can videos of different shapes be inputted into the model? I try adding a video with frames of size (300, 620) and it didn't work because it expects the video to have dimension image_size

from phenaki-pytorch.

lucidrains commented on May 27, 2024 1

@gmegh oh that is recent, maybe i should downgrade and fix the root issue

supporting rectangular sized video is actually possible with this architecture! let me put it in my todos

are you a phd student at Stanford?

from phenaki-pytorch.

lucidrains commented on May 27, 2024

@gmegh Hi Guillem! Could you paste your full script? Also, you won't see anything but noise if you don't train the model on a big corpus of images and video, if you were expecting something different

from phenaki-pytorch.

gmegh commented on May 27, 2024

Sure! Yeah, I understood that I would just get noise without training, but I was first trying to run it without training to see the output, and then train on that.

This is the full script (see below). I just copied the setup code you had at README. Thanks for the help!

import torch
import sys
import os

from phenaki_pytorch import Phenaki, CViViT, MaskGit, MaskGitTrainWrapper, TokenCritic, CriticTrainer, make_video

maskgit = MaskGit(
    num_tokens = 5000,
    max_seq_len = 1024,
    dim = 512,
    dim_context = 768,
    depth = 6,
)

cvivit = CViViT(
    dim = 512,
    codebook_size = 5000,
    image_size = 256,
    patch_size = 32,
    temporal_patch_size = 2,
    spatial_depth = 4,
    temporal_depth = 4,
    dim_head = 64,
    heads = 8
)

phenaki = Phenaki(
    cvivit = cvivit,
    maskgit = maskgit
).cuda()

videos = torch.randn(3, 3, 17, 256, 256).cuda() # (batch, channels, frames, height, width)

texts = [
    'a whale breaching from afar',
    'young girl blowing out candles on her birthday cake',
    'fireworks with blue and green sparkles'
]

loss = phenaki(videos, texts)
loss.backward()

# do the above for many steps, then ...

video = phenaki.sample(text = 'a squirrel examines an acorn', num_frames = 17, cond_scale = 5.) # (1, 3, 17, 256, 256)

from phenaki-pytorch.

lucidrains commented on May 27, 2024

@gmegh oh strange, it runs for me, what version of pytorch are you on?

from phenaki-pytorch.

lucidrains commented on May 27, 2024

@gmegh oh great! which version were you on before?

no not yet, but should be in a state ready for training soon

from phenaki-pytorch.

Error in Sample(): Expected scalar type float but found double about phenaki-pytorch HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent