yuxinwenrick / hard-prompts-made-easy Goto Github PK

View Code? Open in Web Editor NEW

591.0 591.0 54.0 27.78 MB

License: MIT License

Python 100.00%

hard-prompts-made-easy's People

Contributors

Stargazers

Watchers

hard-prompts-made-easy's Issues

Fluency loss

Does this version of the code reflect fluency loss?

Thanks :)

question about '<start_of_text>'

I see the code of prompt_inversion_sd.ipynb.I have a question,why dummy_text is the '<start_of_text>' * prompt_len ,then replace dummy_ids[1:prompt_len+1] with inputs_ids.I feel confused. Why do this?

import pez.open_clip problem? ModuleNotFoundError: No module named 'pez'

I am using a fork of this.. and I am terrible at code, but I do use GPT to help LOL

anyway, when I run, it says "ModuleNotFoundError: No module named 'pez'"

What code can I add to update? thanks!!!

Demo does not work on huggingface

Demo doesnt work on huggingface. can you please fix it? i use it very often. best wishes

Only for 2.1?

Hi, is this only for SD 2.1, can this be used for 1.5? I guess, I can just switch the clip model, right?

Questions around running this to get more usable prompts

This is such an awesome project. Thanks for building this. Trying to figure out how I would go about reverse engineering an intricate photorealistic portrait like this image

If I run this currently I get this:
best cosine sim: 0.4274442791938782
best prompt: beatrice wolfdgers haircreative oirswolivanka

And the images that it outputs are https://share.cleanshot.com/GNsS4hJ9

You mentioned additional steps to figure out the optimal prompt. I don't mind training further if it can reveal counter-intuitive keywords that reveal output that we'd like to get.

Thoughts?

Negative and weighted prompts

First, this is really cool! I'm mostly getting complete gibberish prompts (e.g. aamaaamagranddaughter admire illustrations lmp profile halsey fortnite followart ️ ultimatefangraphics hounews) but they still reliably reproduce some sense of the original illustration.

Second: many tools now allow you to specify weights for parts of the prompts, including negative weights. Do you think it would be possible to generalize this technique to produce prompts with weights? These are still a little bit "hard", in the sense of being comprehensible for humans, but allow a much finer level of precision, particularly with negative weights.

Prompt Optimization without CLIP Loss

Hello authors,
I was wondering the possibility of utilizing stable diffusion MSE loss instead of CLIP loss in PEZ's algorithm. Then, we can optimize the prompt directly through the generator's gradient. What is your take on this?

For SD XL?

Hello @YuxinWenRick , your paper and repo really helped improve my workflow. Thank you!

Meanwhile, I am wondering if I can apply this approach to SD-XL. It uses two text encoders (ViT-bigG and ViT-L). I found both in the official open_clip repo. But I am not sure how to combine them, like in the diffusers inference pipeline.

Can you point me to the right direction? Thanks.

Any extension for ComfyUI yet?

Hello
We have discussed existence of webUIs for stable diffusion extensions here before, but I believe there are none in ComfyUI, am I wrong?
Thanks

reproduce result that only uses soft prompt

Hi,

in the paper it is claimed that "We note that even though Stable Diffusion and CLIP share the same text encoder, soft prompts do not transfer well compared to all hard prompt methods in our evaluation".

How could I reproduce this soft prompt result with you code? I guess I need to directly pass the soft prompt embedding to the Stable diffusion but not sure how to pass this to SD, as SD only support hard prompt as input. Even though the SD accepts prompt embedding as input, the format of this prompt embedding is different from the one you optimized.

Thanks in advance for any guidance.

Thanks

do you plan to create this for sd 1.5 too?

algorithm 1, and the necessity of image encoder

hey, thank you for your great work!

i had a few questions regarding adapting this algorithm to another setup which may not use clip (e.g., imagen or ediffi).

have you experimented with transferring the prompts for image generation on other networks? table 2 does this for sst-2, but i'm not sure if there's any experiments on image generation.
if i wanted to take the algorithm and train for another text encoder, e.g., t5, how would i go about it? are there proxies to a contrastive image-text encoder pair which can be used for gradient reprojection?

thank you

yuxinwenrick / hard-prompts-made-easy Goto Github PK

hard-prompts-made-easy's People

Contributors

Stargazers

Watchers

Forkers

hard-prompts-made-easy's Issues

Recommend Projects

Recommend Topics

Recommend Org