yuxinwenrick / hard-prompts-made-easy Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
This is such an awesome project. Thanks for building this. Trying to figure out how I would go about reverse engineering an intricate photorealistic portrait like this image
If I run this currently I get this:
best cosine sim: 0.4274442791938782
best prompt: beatrice wolfdgers haircreative oirswolivanka
And the images that it outputs are https://share.cleanshot.com/GNsS4hJ9
You mentioned additional steps to figure out the optimal prompt. I don't mind training further if it can reveal counter-intuitive keywords that reveal output that we'd like to get.
Thoughts?
run.py fails to detect the gpu. I have a 3090Ti and all the current drivers installed. GPU detection works on everything else.
Also your instructions for activating the venv do not work as written.
I see the code of prompt_inversion_sd.ipynb.I have a question,why dummy_text is the '<start_of_text>' * prompt_len ,then replace dummy_ids[1:prompt_len+1] with inputs_ids.I feel confused. Why do this?
Hello
We have discussed existence of webUIs for stable diffusion extensions here before, but I believe there are none in ComfyUI, am I wrong?
Thanks
Demo doesnt work on huggingface. can you please fix it? i use it very often. best wishes
.
Hello,
How to know what is the best model to choose from when trying to match the wanted prompt with the models we usually use (huggingface, civitai..) ?
First, this is really cool! I'm mostly getting complete gibberish prompts (e.g. aamaaamagranddaughter admire illustrations lmp profile halsey fortnite followart ๏ธ ultimatefangraphics hounews
) but they still reliably reproduce some sense of the original illustration.
Second: many tools now allow you to specify weights for parts of the prompts, including negative weights. Do you think it would be possible to generalize this technique to produce prompts with weights? These are still a little bit "hard", in the sense of being comprehensible for humans, but allow a much finer level of precision, particularly with negative weights.
Hello @YuxinWenRick , your paper and repo really helped improve my workflow. Thank you!
Meanwhile, I am wondering if I can apply this approach to SD-XL. It uses two text encoders (ViT-bigG and ViT-L). I found both in the official open_clip repo. But I am not sure how to combine them, like in the diffusers inference pipeline.
Can you point me to the right direction? Thanks.
Auto1111 web ui extension-is it possible?
Hi, is this only for SD 2.1, can this be used for 1.5? I guess, I can just switch the clip model, right?
Hello authors,
I was wondering the possibility of utilizing stable diffusion MSE loss instead of CLIP loss in PEZ's algorithm. Then, we can optimize the prompt directly through the generator's gradient. What is your take on this?
hey, thank you for your great work!
i had a few questions regarding adapting this algorithm to another setup which may not use clip (e.g., imagen or ediffi).
have you experimented with transferring the prompts for image generation on other networks? table 2 does this for sst-2, but i'm not sure if there's any experiments on image generation.
if i wanted to take the algorithm and train for another text encoder, e.g., t5, how would i go about it? are there proxies to a contrastive image-text encoder pair which can be used for gradient reprojection?
thank you
Does this version of the code reflect fluency loss?
Thanks :)
Hello,
Could you make a simple UI, maybe in gradio, for this?
This is one of the most impressive tools I knew and yet only few people know about it in the world of Stable diffusion etc.
Btw, is the prompt generating depending on a model or does it work in a "general" way? I mean can I select a model and obtain different prompts depending on the model?
(Sorry I am not that AI technical to understand how it works in the details).
Thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.