Comments (3)
Same question here!
from ml-4m.
Hi @shaibagon @alexanderswerdlow
Regarding your script, the tokenizer was trained with the inputs normalized using IMAGENET_INCEPTION_MEAN
and IMAGENET_INCEPTION_STD
parameters. So for a correct tokenization/reconstruction, you should use these two values instead of the ImageNet ones in normalizing and denormalizing.
Note that the tokenizer only supports resolutions between 224 and 448, and it might not work for any resolution outside of this range. Also, you need to specify the image size as the decoder input. Since the RGB tokenizer uses a diffusion decoder, it needs the image size to sample the initial noise with the correct resolution. So overall the script should look like this:
from fourm.vq.vqvae import DiVAE
from fourm.utils import denormalize, IMAGENET_INCEPTION_MEAN, IMAGENET_INCEPTION_STD
from torchvision.transforms import Normalize
tok = DiVAE.from_pretrained('EPFL-VILAB/4M_tokenizers_rgb_16k_224-448').cuda()
normalize = Normalize(mean=IMAGENET_INCEPTION_MEAN, std=IMAGENET_INCEPTION_STD)
# encode
_, _, tokens = tok.encode(normalize(rgb_b3hw).cuda())
# decode
image_size = rgb_b3hw.shape[-1]
rgb_b3hw = tok.decode_tokens(tokens, image_size=image_size)
rgb_b3hw = denormalize(rgb_b3hw, mean=IMAGENET_INCEPTION_MEAN, std=IMAGENET_INCEPTION_STD)
Another note is that by default, the diffusion decoder uses 1000 timesteps for decoding the tokens, which is unnecessary during inference. You can do it in 50 steps to make the decoding faster by passing the timesteps argument:
tok.decode_tokens(tokens, image_size=image_size, timesteps=50)
Hope this helps.
from ml-4m.
@garjania - works like a charm!
Using 50 diffusion steps:
Using full 1000 steps:
As you said - diffusion for 1000 steps does not make such a diference.
from ml-4m.
Related Issues (18)
- Fine-tune using LoRA
- Question on Token Masking in 4M Implementation HOT 1
- What are the minimum requirements to run an inference? HOT 1
- Training details of RGB tokenizer
- [Errno 2] No such file or directory: './fourm/utils/hmr2_utils/model_cfg.pkl' HOT 1
- Is it possible to prompt 4m
- CUDA? Are you kidding me?
- Examples of non-generative usage (and some additional discussion) HOT 11
- Input masks for generation - Potential small bug.
- Depth tokenizer
- Typo for tokenizer_path arg
- Object Detection with Caption HOT 2
- Example of generating image pixels from ImageBind modality HOT 1
- VRAM Requirements and Multi-GPU Inference Support
- how to convert the trained FM pth model file to safetensors format?
- CLIPScore moved in latest torchmetrics v1.4.0.post0
- Whatβs the best way to use Color palette and another image to condition outputs? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ml-4m.