showlab / visorgpt Goto Github PK
View Code? Open in Web Editor NEW[NeurIPS 2023] Customize spatial layouts for conditional image synthesis models, e.g., ControlNet, using GPT
License: MIT License
[NeurIPS 2023] Customize spatial layouts for conditional image synthesis models, e.g., ControlNet, using GPT
License: MIT License
Really appreicate your impressive work~
I wonder how the KL divergence in Table 4 is calculated ? Is it an average of KL in each category or calculated across all categories as a whole? For COCO, only 6400 generated samples are used, am I right ?
Thanks in advance for your kind response.
I want to know if I want to generate objects with different sizes, such as a large building and lots of small windows in an image, can VISORGPT do it?
Hi, I followed the steps you provided for 200,000 steps training. When I used the inference test results, the generated_sentence.txt I got was different from the Output sequence shown in the paper. When I write "box; multiple instances; medium; 4; 0; apple, apple, cake, knife;" in beginning.txt, I get "[CLS] box; multiple instances; medium; 4; 0; apple, apple, cake, knife; [ ] 176 ymin 188 xmax 236 ymax 426 ] [SEP] banana xmin 112 ymin 181 xmax 167 ymax 429 ] [SEP] ##r xmin 138 ymin 189 xmax 180 ymax 427 ] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] cell phone xmin 83 ymin 197 xmax 143 ymax 448 ] [SEP] [SEP] [SEP] 94 ymin 202 xmax 139 ymax 422 ] [SEP] [SEP] [SEP] [SEP] [SEP] [ SEP] xmin 144 ymin 182 xmax 230 ymax 420 ] [SEP] [SEP] [SEP] 185 ] [SEP] [SEP] [ xmin [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] xmin . ...", what does [SEP] here mean?
Thanks for sharing your excellent work!
When I run the training command, I met the following error.
Loading extension module utils...
Traceback (most recent call last):
File "pretrain.py", line 121, in
main()
File "pretrain.py", line 117, in main
trainer.train_and_validate(args)
File "/mnt/data-1/data/jiagang.zhu/VisorGPT/train/tencentpretrain/trainer.py", line 56, in train_and_validate
worker(args.local_rank, None, args, model_for_training, model_for_dataloader)
File "/mnt/data-1/data/jiagang.zhu/VisorGPT/train/tencentpretrain/trainer.py", line 593, in worker
model_for_training, optimizer, _, scheduler = deepspeed.initialize(
File "/mnt/data-1/data/jiagang.zhu/miniconda3/envs/visorgpt/lib/python3.8/site-packages/deepspeed/init.py", line 125, in initialize
engine = DeepSpeedEngine(args=args,
File "/mnt/data-1/data/jiagang.zhu/miniconda3/envs/visorgpt/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 336, in init
self._configure_optimizer(optimizer, model_parameters)
File "/mnt/data-1/data/jiagang.zhu/miniconda3/envs/visorgpt/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1284, in _configure_optimizer
self.optimizer = self._configure_zero_optimizer(basic_optimizer)
File "/mnt/data-1/data/jiagang.zhu/miniconda3/envs/visorgpt/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1533, in _configure_zero_optimizer
optimizer = DeepSpeedZeroOptimizer(
File "/mnt/data-1/data/jiagang.zhu/miniconda3/envs/visorgpt/lib/python3.8/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 165, in init
util_ops = UtilsBuilder().load()
File "/mnt/data-1/data/jiagang.zhu/miniconda3/envs/visorgpt/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 485, in load
return self.jit_load(verbose)
File "/mnt/data-1/data/jiagang.zhu/miniconda3/envs/visorgpt/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 520, in jit_load
op_module = load(
File "/mnt/data-1/data/jiagang.zhu/miniconda3/envs/visorgpt/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
File "/mnt/data-1/data/jiagang.zhu/miniconda3/envs/visorgpt/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1534, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/mnt/data-1/data/jiagang.zhu/miniconda3/envs/visorgpt/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1936, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
File "", line 556, in module_from_spec
File "", line 1166, in create_module
File "", line 219, in _call_with_frames_removed
ImportError: /home/user001/.cache/torch_extensions/py38_cu117/utils/utils.so: cannot open shared object file: No such file or directory
Have you met this problem before? Thank you.
Hello,
While loading GLigen, I have got the following errors. Are those the right weights please?
File "gradio_demo.py", line 35, in <module>
g_config, g_grounding_tokenizer_input = build_gligen_model(ckpt=gligen_model_path)
File "/home/VisorGPT/demo/GLIGEN/gligen/gligen_inference_box.py", line 229, in build_gligen_model
model, autoencoder, text_encoder, diffusion, config = load_ckpt(ckpt)
File "/home/VisorGPT/demo/GLIGEN/gligen/gligen_inference_box.py", line 99, in load_ckpt
text_encoder.load_state_dict( saved_ckpt["text_encoder"] )
File "/opt/conda/envs/visorgpt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2153, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for FrozenCLIPEmbedder:
Unexpected key(s) in state_dict: "transformer.text_model.embeddings.position_ids".
Hi Sierkinhane,
Very nice work. Can you provide the original training data file for us to understand how your data is organized? And how to process it as the visorgpt_dagger_train_seq.bin?
Thanks.
Thank you greatly for your excellent work, as I try to reproduce the training process, I encountered the following problem and wondered if you have encountered it?
Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)
File "/storage/zhaoliuqing/code/VisorGPT/train/tencentpretrain/embeddings/word_embedding.py", line 27, in forward
emb = self.embedding(src)
File "/storage/zhaoliuqing/code/VisorGPT/train/tencentpretrain/embeddings/embedding.py", line 27, in forward
emb = embedding(src, seg)
File "/storage/zhaoliuqing/code/VisorGPT/train/tencentpretrain/models/model.py", line 33, in forward
emb = self.embedding(src, seg)
File "/storage/zhaoliuqing/code/VisorGPT/train/tencentpretrain/trainer.py", line 160, in forward_propagation
loss_info = model(src, tgt, seg)
File "/storage/zhaoliuqing/code/VisorGPT/train/tencentpretrain/trainer.py", line 110, in train
loss = self.forward_propagation(batch, model)
File "/storage/zhaoliuqing/code/VisorGPT/train/tencentpretrain/trainer.py", line 638, in worker
trainer.train(args, gpu_id, rank, train_loader, model_for_training, optimizer, scheduler)
File "/storage/zhaoliuqing/code/VisorGPT/train/tencentpretrain/trainer.py", line 56, in train_and_validate
worker(args.local_rank, None, args, model_for_training, model_for_dataloader)
File "/storage/zhaoliuqing/code/VisorGPT/train/pretrain.py", line 117, in main
trainer.train_and_validate(args)
File "/storage/zhaoliuqing/code/VisorGPT/train/pretrain.py", line 121, in
main()
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)
Looking forward to your reply!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.