gwang-kim / diffusionclip Goto Github PK

View Code? Open in Web Editor NEW

767.0 767.0 112.0 25.86 MB

[CVPR 2022] Official PyTorch Implementation for DiffusionCLIP: Text-guided Image Manipulation Using Diffusion Models

License: Other

Python 99.33% Shell 0.67%

diffusionclip's People

Contributors

Stargazers

Watchers

Forkers

jacobwjs minlee077 peterzhousz aravindskumar1998 kurnianggoro joskid ai-machine-vision-lab sillychef peternara kwonminki lipovsek charlotte12l dmarx wn1695173791 laplacekorea wangqiang9 tollanador smksyj codwest ayankumarbhunia ifvirtual dizzy-cell weizx208 tonyxia2001 mfkiwl sangyun884 farrinfedra zero2er0 chenxwh plyfager guoqi0531 mrphipps jxzhangjhu jaedukseo johnsnow511 phymhan sunwoo76 katsugeneration piggy2008 rana-shahroz wrt1998 zhenyuanlin pytesin xingbod jackyccl zyq-lucky yangalalei akiohayakawa-sony lhwlucas santiusma superbia-zyb taesunwhang haizhu12 superheadnick weiz-cqu ydyhello averyyy gavinjin0501 zhangxh227 johndpope imchyna chqwer2 peterzs jeffery9707 dbaranchuk zhuxiongwei24 bossunwang karagg rancheng bloodhunt3r chrisraynoor 0iui0 rapka yujielu10 matsun715 niks77 xiaoxiaoma-mq pochenyun flazerain roo-nju henryfw sur-sakthy lxj9457 adambear fengchenf syguan96 crai-ksu craiksu changzhijiang sijieliu518 5l1v3r1 xw-666 xyxingx ningnawang w8yi csxuwu a-cowlagi farhadnawaz yigit-uslu mistaround

diffusionclip's Issues

How do I find the LMDB_train file or the data_root file?

tring to reconstrace paper results

Thank you for your great work
i am trying to train ( fine tune ) my own model
i am running on CelebA-HQ dataset and i try to train my models for changing a face into Neanderthal/pixar but i get bad results

my configuration for Neanderthal fine tuning:
mode = "clip_finetune_eff" #"can't run with clip_finetune - gpu memory error"
exp = './runs/finetune_Neanderthal'
edit_attr = "neanderthal"
n_train_img = 50
n_test_img = 10
n_iter = 50
t_0 = 500
n_inv_step = 40
n_train_step = 6
n_test_step = 40
lr_clip_finetune = 8e-6
id_loss_w = 0
l1_loss_w = 1

the results i get :

after 5 epochs (train):

after 8 epochs :

after 10 epochs :

after 12 epochs :

same happen for pixar ( change edit_attr = "pixar" )
after 5 epochs:

after 10 epochs :

after 15 epochs :

what i am doing wrong ?
thank you

add web demo/model to Huggingface

Hi, would you be interested in adding DiffusionCLIP to Hugging Face? The Hub offers free hosting, and it would make your work more accessible and visible to the rest of the ML community.

Example from other organizations:
Keras: https://huggingface.co/keras-io
Microsoft: https://huggingface.co/microsoft
Facebook: https://huggingface.co/facebook

Example spaces with repos:
github: https://github.com/salesforce/BLIP
Spaces: https://huggingface.co/spaces/salesforce/BLIP

github: https://github.com/facebookresearch/omnivore
Spaces: https://huggingface.co/spaces/akhaliq/omnivore

and here are guides for adding spaces/models/datasets to your org

How to add a Space: https://huggingface.co/blog/gradio-spaces
how to add models: https://huggingface.co/docs/hub/adding-a-model
uploading a dataset: https://huggingface.co/docs/datasets/upload_dataset.html

Please let us know if you would be interested and if you have any questions, we can also help with the technical implementation.

About the LICENSE

Thank you for releasing your work. I really appreciate your work!

I find it may be useful for my project, and I would like to know what software license of this repo is? If possible, I'd like to reuse or modify your code for my own purposes.

How to see the reconstructed face ?

Dear author,

The command "python main.py --edit_one_image " let me see the edited face, but I also want to see the reconstructed face without any editing. How could I do that ?

Thanks!

Attribute editing checkpoints

Are there any checkpoints for each attribute editing? Like we could get a simling.pth for testing custom images?

how to make custom dataset

thansk for your job,now I found a question, I had found that your dataset's format is pth,I plan to make a dataset for my job,but I have no way to make a dataset in pth,please tell me a solution,thank you!

Several question about the paper

Main Paper：equation（1），it should be $\sqrt{1-\alpha _ t }w $
Supplement Paper : Algorithm 4, line 5, the same problem with 1)
Supplement Paper : Algorithm 1, line 13,14 & Algorithm 2, line 7,8. Shouldn't these $x_{ \tau _s }^{(i)}$ be with hat ?

Can I fine-tune DiffusionCLIP for 512*512 images on a 24GB GPU (RTX 4090)?

Different results in ViT-B/16 and ViT-L/14@336px

Hello dear authors I have a little question about the model choice and parameter optimization.
I set "Human->Zombie", here are the results:
a. ViT-B/16: just one epoch can achieve the good result like follows

b. ViT-L/14@336px: but when I use this one the results seem strange. I don't know whether it should be set different parameters to finetune the diffusion model. (left->right:1-5 epochs)

where is the `human_face/curly_hair_t401.pth`?

where is the human_face/curly_hair_t401.pth?

I saw it in paths_config.py, but I didn't find it .

Finetuning bedroom models

Hi,

This work is amazing!
I am trying to reproduce the bedroom editing results you have shown in the paper.
I can achieve them when I use the pretrained finetuned model provided in google drive.
However when I try to finetune the model myself, I get very poor results.
This is the command I used for finetuning the model. Could you tell me if some of the hyperparemeters are set incorrectly here?

python main.py --clip_finetune \
            --config bedroom.yml      \
            --exp ./test_runs/bedrooms_full        \
            --edit_attr "bedroom_princess"  \
            --do_train 1             \
            --do_test 1              \
            --n_train_img 50         \
            --n_test_img 10          \
            --n_iter 5               \
            --t_0 500                \
            --n_inv_step 40          \
            --n_train_step 6         \
            --n_test_step 40         \
            --lr_clip_finetune 8e-6  \
            --id_loss_w 0            \
            --l1_loss_w 1

-Gaurav

请问clip的权重有微调吗

作者您好，请问论文中的clip loss会导致clip主干的权重也被微调吗？

Current implementation does not support multi-gpu finetuning.

Hello

Even though you have DataParallel inside your code, it won't work properly cause your implementation only supports a bs_train=bs_test=1.

can't find the Dog face's pretrained model?

$@BXD_4U({44 3)TUB02D)SD$

I have a vpn, but can't find? Can anyone help me ? Thanks very much

Request the code to train the Diffusion Model

Hi, you work is amazing and inspiring. I am following your great work and trying to train the diffusion model in another dataset. Could you help with training code? Thank you very much.

请问大家能否单卡RTX 3090/2080大概要多久能完成一次有效的训练

Is there any way to finetune in colab?

Is there any way to finetune in colab?, maybe reducing batch size or something similar

我有个问题，希望大佬回答

如果在训练的时候我将n_train_step改为2，并且多训练是否可以得到可以接受的结果。我12g的显存无法跑n_train_step=6的代码

Creating my .ckpt files

I want to create my own .ckpt file trained on the CUB dataset. Can you please guide me on how to do that?

Thanks

editing specific input image

Thank you for your work
is there any example how to edit specific face ?
For example moving face from right to left / make angry face like the example in the paper

thanks

Where is the reconstruction code of Table 1?

Thanks for your great work! I find it amazing that the reconstruction error is subtle in DiffusionCLIP. I wonder where the reconstruction code of Table 1 is? Many thanks.

Look forward your reply~

What is LMDB_train?

lmdb.Error: data/celeba_hq/LMDB_train: No such file or directory
I want to train a new effect on CelebA_HQ_dataset, thank you.

celeba_hq.ckpt .etc pretrain model

hi,
I got an error when running.

Downloading: "https://image-editing-test-12345.s3-us-west-2.amazonaws.com/checkpoints/celeba_hq.ckpt" to C:\Users\xx/.cache\torch\hub\checkpoints\celeba_hq.ckpt
ERROR - main.py - 2023-04-22 20:48:44,842 - Traceback (most recent call last):
File "main.py", line 212, in main
runner.clip_finetune()
File "E:\DL\generation\diffusionCLIP\DiffClip\diffusionclip.py", line 82, in clip_finetune
init_ckpt = torch.hub.load_state_dict_from_url(url, map_location=self.device)
File "D:\ProgramData\anaconda3\envs\diffusionclip\lib\site-packages\torch\hub.py", line 731, in load_state_dict_from_url
download_url_to_file(url, cached_file, hash_prefix, progress=progress)
File "D:\ProgramData\anaconda3\envs\diffusionclip\lib\site-packages\torch\hub.py", line 597, in download_url_to_file
u = urlopen(req)
File "D:\ProgramData\anaconda3\envs\diffusionclip\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "D:\ProgramData\anaconda3\envs\diffusionclip\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "D:\ProgramData\anaconda3\envs\diffusionclip\lib\urllib\request.py", line 640, in http_response
response = self.parent.error(
File "D:\ProgramData\anaconda3\envs\diffusionclip\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)
File "D:\ProgramData\anaconda3\envs\diffusionclip\lib\urllib\request.py", line 502, in _call_chain
result = func(*args)
File "D:\ProgramData\anaconda3\envs\diffusionclip\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

pretrain models aren't automatically downloaded in the code, may you share them in Google Drive?

thanks.

GPU VRAM load slowly increases

Hello.

Thank you for your code. I am executing clip_finetune() on CelebA-HQ-256x256 and I am monitoring my GPU VRAM usage. I am noticing a gradual increase in VRAM while precomputing the latents from the given dataset. Is this normal? Also which part of the code is responsible for this behavior. I thought VRAM usage should remain steady throughout the training procedure and reach its peak from the beginning. Also given this behavior, for a large enough number of n_precomp_img this will eventually lead in memory overflow which is definitely not desired.

Thanks in advance.

LSUN pretrained model link not valid

https://image-editing-test-12345.s3-us-west-2.amazonaws.com/checkpoints/bedroom.ckpt
https://image-editing-test-12345.s3-us-west-2.amazonaws.com/checkpoints/church_outdoor.ckpt
https://image-editing-test-12345.s3-us-west-2.amazonaws.com/checkpoints/celeba_hq.ckpt

The three links listed above are not valid any more.
Error Message: HTTP Error 403: Forbidden

Could you please provide a new link? Appreciate it!

Training script.

Hi,
I was trying to use DuffusionClip for a different problem on different datasets. I was wondering if you have the training script available. Also, I tried to open diffusionclip.py, but everything seemed to be in a single line. Is ti possible to get the indented version?

Many thanks!

more face attributes

congratulations for incredible work, I was playing with him and incredible how he edits the faces better than gan inversion .
I'm disappointed that there aren't a lot of face attributes to test.
unfortunately I don't have enough resources to train attributes.
I would really appreciate it if you could add these face attributes:
male-->female
child --> old man
old man---> child

tanned model error

Downloading checkpoint/human_tanned_t201.pth ...

HttpError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pydrive/files.py in FetchMetadata(self, fields, fetch_all)
236 fields=fields)
--> 237 .execute(http=self.http)
238 except errors.HttpError as error:

5 frames
HttpError: <HttpError 404 when requesting https://www.googleapis.com/drive/v2/files/15Twto21spGLwiby7_yGO_xquFXtfJIAo?fields=alternateLink%2CappDataContents%2CcanComment%2CcanReadRevisions%2Ccopyable%2CcreatedDate%2CdefaultOpenWithLink%2Cdescription%2CdownloadUrl%2Ceditable%2CembedLink%2Cetag%2CexplicitlyTrashed%2CexportLinks%2CfileExtension%2CfileSize%2CfolderColorRgb%2CfullFileExtension%2CheadRevisionId%2CiconLink%2Cid%2CimageMediaMetadata%2CindexableText%2CisAppAuthorized%2Ckind%2Clabels%2ClastModifyingUser%2ClastModifyingUserName%2ClastViewedByMeDate%2CmarkedViewedByMeDate%2Cmd5Checksum%2CmimeType%2CmodifiedByMeDate%2CmodifiedDate%2CopenWithLinks%2CoriginalFilename%2CownedByMe%2CownerNames%2Cowners%2Cparents%2Cpermissions%2Cproperties%2CquotaBytesUsed%2CselfLink%2Cshareable%2Cshared%2CsharedWithMeDate%2CsharingUser%2Cspaces%2Cthumbnail%2CthumbnailLink%2Ctitle%2CuserPermission%2Cversion%2CvideoMediaMetadata%2CwebContentLink%2CwebViewLink%2CwritersCanShare&alt=json returned "File not found: 15Twto21spGLwiby7_yGO_xquFXtfJIAo". Details: "File not found: 15Twto21spGLwiby7_yGO_xquFXtfJIAo">

During handling of the above exception, another exception occurred:

ApiRequestError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pydrive/files.py in FetchMetadata(self, fields, fetch_all)
237 .execute(http=self.http)
238 except errors.HttpError as error:
--> 239 raise ApiRequestError(error)
240 else:
241 self.uploaded = True

ApiRequestError: <HttpError 404 when requesting https://www.googleapis.com/drive/v2/files/15Twto21spGLwiby7_yGO_xquFXtfJIAo?fields=alternateLink%2CappDataContents%2CcanComment%2CcanReadRevisions%2Ccopyable%2CcreatedDate%2CdefaultOpenWithLink%2Cdescription%2CdownloadUrl%2Ceditable%2CembedLink%2Cetag%2CexplicitlyTrashed%2CexportLinks%2CfileExtension%2CfileSize%2CfolderColorRgb%2CfullFileExtension%2CheadRevisionId%2CiconLink%2Cid%2CimageMediaMetadata%2CindexableText%2CisAppAuthorized%2Ckind%2Clabels%2ClastModifyingUser%2ClastModifyingUserName%2ClastViewedByMeDate%2CmarkedViewedByMeDate%2Cmd5Checksum%2CmimeType%2CmodifiedByMeDate%2CmodifiedDate%2CopenWithLinks%2CoriginalFilename%2CownedByMe%2CownerNames%2Cowners%2Cparents%2Cpermissions%2Cproperties%2CquotaBytesUsed%2CselfLink%2Cshareable%2Cshared%2CsharedWithMeDate%2CsharingUser%2Cspaces%2Cthumbnail%2CthumbnailLink%2Ctitle%2CuserPermission%2Cversion%2CvideoMediaMetadata%2CwebContentLink%2CwebViewLink%2CwritersCanShare&alt=json returned "File not found: 15Twto21spGLwiby7_yGO_xquFXtfJIAo". Details: "File not found: 15Twto21spGLwiby7_yGO_xquFXtfJIAo">

Gender Dataset

Hi,

could you give more information on the gender dataset you used?

regards,
Dyah

Great Work! Few queries...

Can you Please provide the code for the obtaining the Quantitative results reported in the Table 1 and Table 3 of the paper?

Error trying to train

Hi there, Im trying to finetune your FFHQ model but get this error:

Loading ResNet ArcFace
Prepare identity latent
precomputed/CelebA_HQ_train_t500_nim100_ninv40_pairs.pth
ERROR - main.py - 2022-04-05 01:37:56,717 - Traceback (most recent call last):
File "main.py", line 211, in main
runner.clip_finetune()
File "/content/DiffusionCLIP/diffusionclip.py", line 143, in clip_finetune
train_dataset, test_dataset = get_dataset(self.config.data.dataset, DATASET_PATHS, self.config)
File "/content/DiffusionCLIP/datasets/data_utils.py", line 13, in get_dataset
train_dataset, test_dataset = get_celeba_dataset(dataset_paths['CelebA_HQ'], config)
File "/content/DiffusionCLIP/datasets/CelebA_HQ_dataset.py", line 55, in get_celeba_dataset
train_transform, config.data.image_size)
File "/content/DiffusionCLIP/datasets/CelebA_HQ_dataset.py", line 16, in init
meminit=False,
lmdb.Error: /data/DiffusionCLIP/celeba_hq/LMDB_train: No such file or directory

Text editing in non-isolated images

Hi,
Thanks for your work. I am trying the pretrained models on a few test images to see what the results look like. I was trying out the tennis_baseball_t500.pth to see how it works. It works well when the tennisball is well isolated but not so much when the object is part of a scene. When we fine tune the model, the paper says I need 30 or so images, were these images well isolated. If I replace it with images where tennis ball is a small part of the image, will the performance improve?

CLIP文本引导的特征对一个正面的人脸是否有效呢

大佬你好，如果我们使用扩散模型和CLIP进行人脸正面化操作，是否可行呢。我并不知道CLIP对”a frontal face“有效。希望大佬能给我们一点建议。期待您的回复

Text_dic for ImageNet manipulation

Hi,
Nice work! What text_dic you used for ImageNet manipulation? I didn't find it in text_dic.py. Besides, will the definition of text_dic affect the results? If so, how can I define the text_dic better? Can you share some experience? Thank you.

finetuning error

Hello,
I am trying to fine-tune a pre-trained model on the AFHQ dataset for the dog_bear task using Colab.
I have successfully saved the pre-trained model and set up the dataset.

data

└── afhq

├── LMDB_test

│ ├── data.mdb

│ └── lock.mdb

├── LMDB_train

│ ├── data.mdb

│ └── lock.mdb

└── LMDB_val

├── data.mdb

└── lock.mdb

└── raw_images

├── test

├── images

└── test

├── images

└── val

├── images

However, a value error occurs when I try to run the following cell.
!python main.py --clip_finetune_eff
--config afhq.yml
--exp ./runs/test
--edit_attr dog_bear
--do_train 1
--do_test 1
--n_train_img 50
--n_test_img 10
--n_iter 5
--t_0 500
--n_inv_step 40
--n_train_step 6
--n_test_step 40
--lr_clip_finetune 8e-6
--id_loss_w 0
--l1_loss_w 1
INFO - main.py - 2024-06-13 17:40:33,558 - Using device: cuda

INFO - main.py - 2024-06-13 17:40:33,559 - Exp instance id = 39862
INFO - main.py - 2024-06-13 17:40:33,559 - Exp comment =
INFO - main.py - 2024-06-13 17:40:33,559 - Config =
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
./runs/test_FT_dog_dog_bear_t500_ninv40_ngen6_id0.0_l11.0_lr8e-06
['Dog']
-> ['Bear']
Improved diffusion Model loaded.
Setting optimizer with lr=8e-06
Loading losses
Prepare identity latent
precomputed/dog_train_t500_nim100_ninv40_pairs.pth
ERROR - main.py - 2024-06-13 17:40:44,029 - Traceback (most recent call last):
File "/content/DiffusionCLIP/main.py", line 213, in main
runner.clip_finetune_eff()
File "/content/DiffusionCLIP/diffusionclip.py", line 423, in clip_finetune_eff
loader_dic = get_dataloader(train_dataset, test_dataset, bs_train=self.args.bs_train,
File "/content/DiffusionCLIP/datasets/data_utils.py", line 23, in get_dataloader
train_loader = DataLoader(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 350, in init
sampler = RandomSampler(dataset, generator=generator) # type: ignore[arg-type]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/sampler.py", line 143, in init
raise ValueError(f"num_samples should be a positive integer value, but got num_samples={self.num_samples}")
ValueError: num_samples should be a positive integer value, but got num_samples=0

Could you help me?

Please add style mixing in your respository

Could you add style mixing feature in your project

Thanks

gwang-kim / diffusionclip Goto Github PK

diffusionclip's People

Contributors

Stargazers

Watchers

Forkers

diffusionclip's Issues

Downloading checkpoint/human_tanned_t201.pth ...

data

└── afhq

├── LMDB_test

│ ├── data.mdb

│ └── lock.mdb

├── LMDB_train

│ ├── data.mdb

│ └── lock.mdb

└── LMDB_val

├── data.mdb

└── lock.mdb

└── raw_images

├── test

├── images

└── test

├── images

└── val

├── images

Recommend Projects

Recommend Topics

Recommend Org