Giter VIP home page Giter VIP logo

Comments (3)

alexanderswerdlow avatar alexanderswerdlow commented on September 14, 2024 1

Same question here!

from ml-4m.

garjania avatar garjania commented on September 14, 2024

Hi @shaibagon @alexanderswerdlow

Regarding your script, the tokenizer was trained with the inputs normalized using IMAGENET_INCEPTION_MEAN and IMAGENET_INCEPTION_STD parameters. So for a correct tokenization/reconstruction, you should use these two values instead of the ImageNet ones in normalizing and denormalizing.

Note that the tokenizer only supports resolutions between 224 and 448, and it might not work for any resolution outside of this range. Also, you need to specify the image size as the decoder input. Since the RGB tokenizer uses a diffusion decoder, it needs the image size to sample the initial noise with the correct resolution. So overall the script should look like this:

from fourm.vq.vqvae import DiVAE
from fourm.utils import denormalize, IMAGENET_INCEPTION_MEAN, IMAGENET_INCEPTION_STD
from torchvision.transforms import Normalize

tok = DiVAE.from_pretrained('EPFL-VILAB/4M_tokenizers_rgb_16k_224-448').cuda()
normalize = Normalize(mean=IMAGENET_INCEPTION_MEAN, std=IMAGENET_INCEPTION_STD)

# encode
_, _, tokens = tok.encode(normalize(rgb_b3hw).cuda())

# decode
image_size = rgb_b3hw.shape[-1]
rgb_b3hw  = tok.decode_tokens(tokens, image_size=image_size)
rgb_b3hw = denormalize(rgb_b3hw, mean=IMAGENET_INCEPTION_MEAN, std=IMAGENET_INCEPTION_STD)

Another note is that by default, the diffusion decoder uses 1000 timesteps for decoding the tokens, which is unnecessary during inference. You can do it in 50 steps to make the decoding faster by passing the timesteps argument:

tok.decode_tokens(tokens, image_size=image_size, timesteps=50)

Hope this helps.

from ml-4m.

shaibagon avatar shaibagon commented on September 14, 2024

@garjania - works like a charm!
Using 50 diffusion steps:
image

Using full 1000 steps:
image
As you said - diffusion for 1000 steps does not make such a diference.

from ml-4m.

Related Issues (18)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.