Giter VIP home page Giter VIP logo

hard-prompts-made-easy's People

Contributors

bakkot avatar eltociear avatar yuxinwenrick avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hard-prompts-made-easy's Issues

Questions around running this to get more usable prompts

This is such an awesome project. Thanks for building this. Trying to figure out how I would go about reverse engineering an intricate photorealistic portrait like this image

If I run this currently I get this:
best cosine sim: 0.4274442791938782
best prompt: beatrice wolfdgers haircreative oirswolivanka

And the images that it outputs are https://share.cleanshot.com/GNsS4hJ9

You mentioned additional steps to figure out the optimal prompt. I don't mind training further if it can reveal counter-intuitive keywords that reveal output that we'd like to get.

Thoughts?

gpu detection fails

run.py fails to detect the gpu. I have a 3090Ti and all the current drivers installed. GPU detection works on everything else.

Also your instructions for activating the venv do not work as written.

question about '<start_of_text>'

I see the code of prompt_inversion_sd.ipynb.I have a question,why dummy_text is the '<start_of_text>' * prompt_len ,then replace dummy_ids[1:prompt_len+1] with inputs_ids.I feel confused. Why do this?

Any extension for ComfyUI yet?

Hello
We have discussed existence of webUIs for stable diffusion extensions here before, but I believe there are none in ComfyUI, am I wrong?
Thanks

Negative and weighted prompts

First, this is really cool! I'm mostly getting complete gibberish prompts (e.g. aamaaamagranddaughter admire illustrations lmp profile halsey fortnite followart ๏ธ ultimatefangraphics hounews) but they still reliably reproduce some sense of the original illustration.

Second: many tools now allow you to specify weights for parts of the prompts, including negative weights. Do you think it would be possible to generalize this technique to produce prompts with weights? These are still a little bit "hard", in the sense of being comprehensible for humans, but allow a much finer level of precision, particularly with negative weights.

For SD XL?

Hello @YuxinWenRick , your paper and repo really helped improve my workflow. Thank you!

Meanwhile, I am wondering if I can apply this approach to SD-XL. It uses two text encoders (ViT-bigG and ViT-L). I found both in the official open_clip repo. But I am not sure how to combine them, like in the diffusers inference pipeline.

Can you point me to the right direction? Thanks.

Only for 2.1?

Hi, is this only for SD 2.1, can this be used for 1.5? I guess, I can just switch the clip model, right?

Prompt Optimization without CLIP Loss

Hello authors,
I was wondering the possibility of utilizing stable diffusion MSE loss instead of CLIP loss in PEZ's algorithm. Then, we can optimize the prompt directly through the generator's gradient. What is your take on this?

algorithm 1, and the necessity of image encoder

hey, thank you for your great work!

i had a few questions regarding adapting this algorithm to another setup which may not use clip (e.g., imagen or ediffi).

  1. have you experimented with transferring the prompts for image generation on other networks? table 2 does this for sst-2, but i'm not sure if there's any experiments on image generation.

  2. if i wanted to take the algorithm and train for another text encoder, e.g., t5, how would i go about it? are there proxies to a contrastive image-text encoder pair which can be used for gradient reprojection?

thank you

Fluency loss

Does this version of the code reflect fluency loss?

Thanks :)

[Feature] Can you make a simple UI for this?

Hello,
Could you make a simple UI, maybe in gradio, for this?
This is one of the most impressive tools I knew and yet only few people know about it in the world of Stable diffusion etc.

Btw, is the prompt generating depending on a model or does it work in a "general" way? I mean can I select a model and obtain different prompts depending on the model?
(Sorry I am not that AI technical to understand how it works in the details).

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.