gwang-kim / diffusionclip Goto Github PK
View Code? Open in Web Editor NEW[CVPR 2022] Official PyTorch Implementation for DiffusionCLIP: Text-guided Image Manipulation Using Diffusion Models
License: Other
[CVPR 2022] Official PyTorch Implementation for DiffusionCLIP: Text-guided Image Manipulation Using Diffusion Models
License: Other
Thank you for your great work
i am trying to train ( fine tune ) my own model
i am running on CelebA-HQ dataset and i try to train my models for changing a face into Neanderthal/pixar but i get bad results
my configuration for Neanderthal fine tuning:
mode = "clip_finetune_eff" #"can't run with clip_finetune - gpu memory error"
exp = './runs/finetune_Neanderthal'
edit_attr = "neanderthal"
n_train_img = 50
n_test_img = 10
n_iter = 50
t_0 = 500
n_inv_step = 40
n_train_step = 6
n_test_step = 40
lr_clip_finetune = 8e-6
id_loss_w = 0
l1_loss_w = 1
the results i get :
after 5 epochs (train):
after 8 epochs :
after 10 epochs :
same happen for pixar ( change edit_attr = "pixar" )
after 5 epochs:
after 10 epochs :
after 15 epochs :
what i am doing wrong ?
thank you
Hi, would you be interested in adding DiffusionCLIP to Hugging Face? The Hub offers free hosting, and it would make your work more accessible and visible to the rest of the ML community.
Example from other organizations:
Keras: https://huggingface.co/keras-io
Microsoft: https://huggingface.co/microsoft
Facebook: https://huggingface.co/facebook
Example spaces with repos:
github: https://github.com/salesforce/BLIP
Spaces: https://huggingface.co/spaces/salesforce/BLIP
github: https://github.com/facebookresearch/omnivore
Spaces: https://huggingface.co/spaces/akhaliq/omnivore
and here are guides for adding spaces/models/datasets to your org
How to add a Space: https://huggingface.co/blog/gradio-spaces
how to add models: https://huggingface.co/docs/hub/adding-a-model
uploading a dataset: https://huggingface.co/docs/datasets/upload_dataset.html
Please let us know if you would be interested and if you have any questions, we can also help with the technical implementation.
Thank you for releasing your work. I really appreciate your work!
I find it may be useful for my project, and I would like to know what software license of this repo is? If possible, I'd like to reuse or modify your code for my own purposes.
Dear author,
The command "python main.py --edit_one_image " let me see the edited face, but I also want to see the reconstructed face without any editing. How could I do that ?
Thanks!
Are there any checkpoints for each attribute editing? Like we could get a simling.pth for testing custom images?
thansk for your job,now I found a question, I had found that your dataset's format is pth,I plan to make a dataset for my job,but I have no way to make a dataset in pth,please tell me a solution,thank you!
Can I fine-tune DiffusionCLIP for 512*512 images on a 24GB GPU (RTX 4090)?
Hello dear authors I have a little question about the model choice and parameter optimization.
I set "Human->Zombie", here are the results:
a. ViT-B/16: just one epoch can achieve the good result like follows
b. ViT-L/14@336px: but when I use this one the results seem strange. I don't know whether it should be set different parameters to finetune the diffusion model. (left->right:1-5 epochs)
where is the human_face/curly_hair_t401.pth
?
I saw it in paths_config.py
, but I didn't find it .
Hi,
This work is amazing!
I am trying to reproduce the bedroom editing results you have shown in the paper.
I can achieve them when I use the pretrained finetuned model provided in google drive.
However when I try to finetune the model myself, I get very poor results.
This is the command I used for finetuning the model. Could you tell me if some of the hyperparemeters are set incorrectly here?
python main.py --clip_finetune \
--config bedroom.yml \
--exp ./test_runs/bedrooms_full \
--edit_attr "bedroom_princess" \
--do_train 1 \
--do_test 1 \
--n_train_img 50 \
--n_test_img 10 \
--n_iter 5 \
--t_0 500 \
--n_inv_step 40 \
--n_train_step 6 \
--n_test_step 40 \
--lr_clip_finetune 8e-6 \
--id_loss_w 0 \
--l1_loss_w 1
-Gaurav
作者您好,请问论文中的clip loss会导致clip主干的权重也被微调吗?
Hello
Even though you have DataParallel
inside your code, it won't work properly cause your implementation only supports a bs_train=bs_test=1
.
Hi, you work is amazing and inspiring. I am following your great work and trying to train the diffusion model in another dataset. Could you help with training code? Thank you very much.
Is there any way to finetune in colab?, maybe reducing batch size or something similar
如果在训练的时候我将n_train_step改为2,并且多训练是否可以得到可以接受的结果 。我12g的显存无法跑n_train_step=6的代码
Hi
I want to create my own .ckpt file trained on the CUB dataset. Can you please guide me on how to do that?
Thanks
Thanks for your great work! I find it amazing that the reconstruction error is subtle in DiffusionCLIP. I wonder where the reconstruction code of Table 1 is? Many thanks.
Look forward your reply~
lmdb.Error: data/celeba_hq/LMDB_train: No such file or directory
I want to train a new effect on CelebA_HQ_dataset, thank you.
hi,
I got an error when running.
Downloading: "https://image-editing-test-12345.s3-us-west-2.amazonaws.com/checkpoints/celeba_hq.ckpt" to C:\Users\xx/.cache\torch\hub\checkpoints\celeba_hq.ckpt
ERROR - main.py - 2023-04-22 20:48:44,842 - Traceback (most recent call last):
File "main.py", line 212, in main
runner.clip_finetune()
File "E:\DL\generation\diffusionCLIP\DiffClip\diffusionclip.py", line 82, in clip_finetune
init_ckpt = torch.hub.load_state_dict_from_url(url, map_location=self.device)
File "D:\ProgramData\anaconda3\envs\diffusionclip\lib\site-packages\torch\hub.py", line 731, in load_state_dict_from_url
download_url_to_file(url, cached_file, hash_prefix, progress=progress)
File "D:\ProgramData\anaconda3\envs\diffusionclip\lib\site-packages\torch\hub.py", line 597, in download_url_to_file
u = urlopen(req)
File "D:\ProgramData\anaconda3\envs\diffusionclip\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "D:\ProgramData\anaconda3\envs\diffusionclip\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "D:\ProgramData\anaconda3\envs\diffusionclip\lib\urllib\request.py", line 640, in http_response
response = self.parent.error(
File "D:\ProgramData\anaconda3\envs\diffusionclip\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)
File "D:\ProgramData\anaconda3\envs\diffusionclip\lib\urllib\request.py", line 502, in _call_chain
result = func(*args)
File "D:\ProgramData\anaconda3\envs\diffusionclip\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
pretrain models aren't automatically downloaded in the code, may you share them in Google Drive?
thanks.
Hello.
Thank you for your code. I am executing clip_finetune()
on CelebA-HQ-256x256 and I am monitoring my GPU VRAM usage. I am noticing a gradual increase in VRAM while precomputing the latents from the given dataset. Is this normal? Also which part of the code is responsible for this behavior. I thought VRAM usage should remain steady throughout the training procedure and reach its peak from the beginning. Also given this behavior, for a large enough number of n_precomp_img
this will eventually lead in memory overflow which is definitely not desired.
Thanks in advance.
https://image-editing-test-12345.s3-us-west-2.amazonaws.com/checkpoints/bedroom.ckpt
https://image-editing-test-12345.s3-us-west-2.amazonaws.com/checkpoints/church_outdoor.ckpt
https://image-editing-test-12345.s3-us-west-2.amazonaws.com/checkpoints/celeba_hq.ckpt
The three links listed above are not valid any more.
Error Message: HTTP Error 403: Forbidden
Could you please provide a new link? Appreciate it!
Hi,
I was trying to use DuffusionClip for a different problem on different datasets. I was wondering if you have the training script available. Also, I tried to open diffusionclip.py, but everything seemed to be in a single line. Is ti possible to get the indented version?
Many thanks!
congratulations for incredible work, I was playing with him and incredible how he edits the faces better than gan inversion .
I'm disappointed that there aren't a lot of face attributes to test.
unfortunately I don't have enough resources to train attributes.
I would really appreciate it if you could add these face attributes:
male-->female
child --> old man
old man---> child
HttpError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pydrive/files.py in FetchMetadata(self, fields, fetch_all)
236 fields=fields)
--> 237 .execute(http=self.http)
238 except errors.HttpError as error:
5 frames
HttpError: <HttpError 404 when requesting https://www.googleapis.com/drive/v2/files/15Twto21spGLwiby7_yGO_xquFXtfJIAo?fields=alternateLink%2CappDataContents%2CcanComment%2CcanReadRevisions%2Ccopyable%2CcreatedDate%2CdefaultOpenWithLink%2Cdescription%2CdownloadUrl%2Ceditable%2CembedLink%2Cetag%2CexplicitlyTrashed%2CexportLinks%2CfileExtension%2CfileSize%2CfolderColorRgb%2CfullFileExtension%2CheadRevisionId%2CiconLink%2Cid%2CimageMediaMetadata%2CindexableText%2CisAppAuthorized%2Ckind%2Clabels%2ClastModifyingUser%2ClastModifyingUserName%2ClastViewedByMeDate%2CmarkedViewedByMeDate%2Cmd5Checksum%2CmimeType%2CmodifiedByMeDate%2CmodifiedDate%2CopenWithLinks%2CoriginalFilename%2CownedByMe%2CownerNames%2Cowners%2Cparents%2Cpermissions%2Cproperties%2CquotaBytesUsed%2CselfLink%2Cshareable%2Cshared%2CsharedWithMeDate%2CsharingUser%2Cspaces%2Cthumbnail%2CthumbnailLink%2Ctitle%2CuserPermission%2Cversion%2CvideoMediaMetadata%2CwebContentLink%2CwebViewLink%2CwritersCanShare&alt=json returned "File not found: 15Twto21spGLwiby7_yGO_xquFXtfJIAo". Details: "File not found: 15Twto21spGLwiby7_yGO_xquFXtfJIAo">
During handling of the above exception, another exception occurred:
ApiRequestError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pydrive/files.py in FetchMetadata(self, fields, fetch_all)
237 .execute(http=self.http)
238 except errors.HttpError as error:
--> 239 raise ApiRequestError(error)
240 else:
241 self.uploaded = True
ApiRequestError: <HttpError 404 when requesting https://www.googleapis.com/drive/v2/files/15Twto21spGLwiby7_yGO_xquFXtfJIAo?fields=alternateLink%2CappDataContents%2CcanComment%2CcanReadRevisions%2Ccopyable%2CcreatedDate%2CdefaultOpenWithLink%2Cdescription%2CdownloadUrl%2Ceditable%2CembedLink%2Cetag%2CexplicitlyTrashed%2CexportLinks%2CfileExtension%2CfileSize%2CfolderColorRgb%2CfullFileExtension%2CheadRevisionId%2CiconLink%2Cid%2CimageMediaMetadata%2CindexableText%2CisAppAuthorized%2Ckind%2Clabels%2ClastModifyingUser%2ClastModifyingUserName%2ClastViewedByMeDate%2CmarkedViewedByMeDate%2Cmd5Checksum%2CmimeType%2CmodifiedByMeDate%2CmodifiedDate%2CopenWithLinks%2CoriginalFilename%2CownedByMe%2CownerNames%2Cowners%2Cparents%2Cpermissions%2Cproperties%2CquotaBytesUsed%2CselfLink%2Cshareable%2Cshared%2CsharedWithMeDate%2CsharingUser%2Cspaces%2Cthumbnail%2CthumbnailLink%2Ctitle%2CuserPermission%2Cversion%2CvideoMediaMetadata%2CwebContentLink%2CwebViewLink%2CwritersCanShare&alt=json returned "File not found: 15Twto21spGLwiby7_yGO_xquFXtfJIAo". Details: "File not found: 15Twto21spGLwiby7_yGO_xquFXtfJIAo">
Hi,
could you give more information on the gender dataset you used?
regards,
Dyah
Can you Please provide the code for the obtaining the Quantitative results reported in the Table 1 and Table 3 of the paper?
Hi there, Im trying to finetune your FFHQ model but get this error:
Loading ResNet ArcFace
Prepare identity latent
precomputed/CelebA_HQ_train_t500_nim100_ninv40_pairs.pth
ERROR - main.py - 2022-04-05 01:37:56,717 - Traceback (most recent call last):
File "main.py", line 211, in main
runner.clip_finetune()
File "/content/DiffusionCLIP/diffusionclip.py", line 143, in clip_finetune
train_dataset, test_dataset = get_dataset(self.config.data.dataset, DATASET_PATHS, self.config)
File "/content/DiffusionCLIP/datasets/data_utils.py", line 13, in get_dataset
train_dataset, test_dataset = get_celeba_dataset(dataset_paths['CelebA_HQ'], config)
File "/content/DiffusionCLIP/datasets/CelebA_HQ_dataset.py", line 55, in get_celeba_dataset
train_transform, config.data.image_size)
File "/content/DiffusionCLIP/datasets/CelebA_HQ_dataset.py", line 16, in init
meminit=False,
lmdb.Error: /data/DiffusionCLIP/celeba_hq/LMDB_train: No such file or directory
Hi,
Thanks for your work. I am trying the pretrained models on a few test images to see what the results look like. I was trying out the tennis_baseball_t500.pth
to see how it works. It works well when the tennisball is well isolated but not so much when the object is part of a scene. When we fine tune the model, the paper says I need 30 or so images, were these images well isolated. If I replace it with images where tennis ball is a small part of the image, will the performance improve?
大佬你好,如果我们使用扩散模型和CLIP进行人脸正面化操作,是否可行呢。我并不知道CLIP对”a frontal face“有效。希望大佬能给我们一点建议。期待您的回复
Hi,
Nice work! What text_dic you used for ImageNet manipulation? I didn't find it in text_dic.py. Besides, will the definition of text_dic affect the results? If so, how can I define the text_dic better? Can you share some experience? Thank you.
Hello,
I am trying to fine-tune a pre-trained model on the AFHQ dataset for the dog_bear task using Colab.
I have successfully saved the pre-trained model and set up the dataset.
However, a value error occurs when I try to run the following cell.
!python main.py --clip_finetune_eff
--config afhq.yml
--exp ./runs/test
--edit_attr dog_bear
--do_train 1
--do_test 1
--n_train_img 50
--n_test_img 10
--n_iter 5
--t_0 500
--n_inv_step 40
--n_train_step 6
--n_test_step 40
--lr_clip_finetune 8e-6
--id_loss_w 0
--l1_loss_w 1
INFO - main.py - 2024-06-13 17:40:33,558 - Using device: cuda
INFO - main.py - 2024-06-13 17:40:33,559 - Exp instance id = 39862
INFO - main.py - 2024-06-13 17:40:33,559 - Exp comment =
INFO - main.py - 2024-06-13 17:40:33,559 - Config =
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
./runs/test_FT_dog_dog_bear_t500_ninv40_ngen6_id0.0_l11.0_lr8e-06
['Dog']
-> ['Bear']
Improved diffusion Model loaded.
Setting optimizer with lr=8e-06
Loading losses
Prepare identity latent
precomputed/dog_train_t500_nim100_ninv40_pairs.pth
ERROR - main.py - 2024-06-13 17:40:44,029 - Traceback (most recent call last):
File "/content/DiffusionCLIP/main.py", line 213, in main
runner.clip_finetune_eff()
File "/content/DiffusionCLIP/diffusionclip.py", line 423, in clip_finetune_eff
loader_dic = get_dataloader(train_dataset, test_dataset, bs_train=self.args.bs_train,
File "/content/DiffusionCLIP/datasets/data_utils.py", line 23, in get_dataloader
train_loader = DataLoader(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 350, in init
sampler = RandomSampler(dataset, generator=generator) # type: ignore[arg-type]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/sampler.py", line 143, in init
raise ValueError(f"num_samples should be a positive integer value, but got num_samples={self.num_samples}")
ValueError: num_samples should be a positive integer value, but got num_samples=0
Could you help me?
Hi
Could you add style mixing feature in your project
Thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.