Giter VIP home page Giter VIP logo

hard-prompts-made-easy's People

Contributors

bakkot avatar eltociear avatar yuxinwenrick avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hard-prompts-made-easy's Issues

Fluency loss

Does this version of the code reflect fluency loss?

Thanks :)

question about '<start_of_text>'

I see the code of prompt_inversion_sd.ipynb.I have a question,why dummy_text is the '<start_of_text>' * prompt_len ,then replace dummy_ids[1:prompt_len+1] with inputs_ids.I feel confused. Why do this?

Only for 2.1?

Hi, is this only for SD 2.1, can this be used for 1.5? I guess, I can just switch the clip model, right?

Questions around running this to get more usable prompts

This is such an awesome project. Thanks for building this. Trying to figure out how I would go about reverse engineering an intricate photorealistic portrait like this image

If I run this currently I get this:
best cosine sim: 0.4274442791938782
best prompt: beatrice wolfdgers haircreative oirswolivanka

And the images that it outputs are https://share.cleanshot.com/GNsS4hJ9

You mentioned additional steps to figure out the optimal prompt. I don't mind training further if it can reveal counter-intuitive keywords that reveal output that we'd like to get.

Thoughts?

Negative and weighted prompts

First, this is really cool! I'm mostly getting complete gibberish prompts (e.g. aamaaamagranddaughter admire illustrations lmp profile halsey fortnite followart ๏ธ ultimatefangraphics hounews) but they still reliably reproduce some sense of the original illustration.

Second: many tools now allow you to specify weights for parts of the prompts, including negative weights. Do you think it would be possible to generalize this technique to produce prompts with weights? These are still a little bit "hard", in the sense of being comprehensible for humans, but allow a much finer level of precision, particularly with negative weights.

Prompt Optimization without CLIP Loss

Hello authors,
I was wondering the possibility of utilizing stable diffusion MSE loss instead of CLIP loss in PEZ's algorithm. Then, we can optimize the prompt directly through the generator's gradient. What is your take on this?

For SD XL?

Hello @YuxinWenRick , your paper and repo really helped improve my workflow. Thank you!

Meanwhile, I am wondering if I can apply this approach to SD-XL. It uses two text encoders (ViT-bigG and ViT-L). I found both in the official open_clip repo. But I am not sure how to combine them, like in the diffusers inference pipeline.

Can you point me to the right direction? Thanks.

Any extension for ComfyUI yet?

Hello
We have discussed existence of webUIs for stable diffusion extensions here before, but I believe there are none in ComfyUI, am I wrong?
Thanks

reproduce result that only uses soft prompt

Hi,

in the paper it is claimed that "We note that even though Stable Diffusion and CLIP share the same text encoder, soft prompts do not transfer well compared to all hard prompt methods in our evaluation".

How could I reproduce this soft prompt result with you code? I guess I need to directly pass the soft prompt embedding to the Stable diffusion but not sure how to pass this to SD, as SD only support hard prompt as input. Even though the SD accepts prompt embedding as input, the format of this prompt embedding is different from the one you optimized.

Thanks in advance for any guidance.

gpu detection fails

run.py fails to detect the gpu. I have a 3090Ti and all the current drivers installed. GPU detection works on everything else.

Also your instructions for activating the venv do not work as written.

[Feature] Can you make a simple UI for this?

Hello,
Could you make a simple UI, maybe in gradio, for this?
This is one of the most impressive tools I knew and yet only few people know about it in the world of Stable diffusion etc.

Btw, is the prompt generating depending on a model or does it work in a "general" way? I mean can I select a model and obtain different prompts depending on the model?
(Sorry I am not that AI technical to understand how it works in the details).

Thanks

algorithm 1, and the necessity of image encoder

hey, thank you for your great work!

i had a few questions regarding adapting this algorithm to another setup which may not use clip (e.g., imagen or ediffi).

  1. have you experimented with transferring the prompts for image generation on other networks? table 2 does this for sst-2, but i'm not sure if there's any experiments on image generation.

  2. if i wanted to take the algorithm and train for another text encoder, e.g., t5, how would i go about it? are there proxies to a contrastive image-text encoder pair which can be used for gradient reprojection?

thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.