Comments (7)
Thanks for pointing it out! Interestingly, it's an amazing paper from my colleagues in my lab. I will take a look and optimize the training process.
from hard-prompts-made-easy.
Yes, either noise (the paper i linked) or the prompt embedding. In "Null-text Inversion for Editing Real Images using Guided Diffusion Models", they do exactly what u wished for: optimize the null token for every step. Definitely a feasible direction. Thanks for sharing ur take.
from hard-prompts-made-easy.
Hi, sorry for the late response.
I believe it's feasible based on my previous attempts. However, the main issue is that the training doesn't converge, and it's hard to select the best prompt. This is because the diffusion time step is different at each optimization step, and selecting the best prompt based on loss isn't reliable.
However, I think it's possible to overcome this by generating an image using the current prompt at each optimization step and then choosing the best prompt based on the distance to the target image. However, this approach could be computationally expensive.
Thank you for bringing this to my attention. I will conduct further experiments and keep you updated.
from hard-prompts-made-easy.
It just shows how versatile the algorithm is. https://arxiv.org/abs/2302.07121 In this paper, the authors use the predicted clean image x_0 (rather than x_t) to calculate the loss. Maybe this could be a way around the expensive computation. Thanks for sharing your opinion and for your work.
from hard-prompts-made-easy.
hey i wanted to add my 2ยข to this, mostly because i'd love to see it work under a more general setting :)
from what i can surmise @josejhlee's suggesting to find a gradient update which pushes the prompt to generate the ground truth image:
model(
which is found via minimizing the loss wrt the difference between generated image and the ground truth, mse(
however note the same prompt embedding can generate a different image with a different initial noise, which i deliberately omitted above, i.e., model(
therefore, when you are optimizing above updated objective w/ the noise, you may need to optimize the noise as well, which of course is a more complicated task.
i know of a way to sorta get the noise used from the generated image by inversing the process, e.g., here: https://www.reddit.com/r/StableDiffusion/comments/xboy90/a_better_way_of_doing_img2img_by_finding_the/
~ having taking a look the paper jose referenced, it seems like there is some similarities w/ above:
"The key idea of backward guidance is to optimize for a clean image that best matches the prompt based on zห0, and linearly translate the guided change back to the noisy image space at step t."
i guess the clever bit is to come up with an equation that optimizes for the best prompt given a noisy image, next level of noise, where the error model
from hard-prompts-made-easy.
Hi @josejhlee, @ozanciga, apologies for my delayed response. I had a busy week.
I have updated the code and added an example of how to optimize for the hard prompt through the diffusion model here. While it is still a work in progress, I believe it is effective. The current implementation involves randomly sampling a time step for each optimization step and calculating the MSE loss between the reconstructed noise and the ground-truth noise added to
I will keep optimizing the code, and any suggestions and pull requests are welcome!
Note: the current code requires ~20GB of GPU memory and ~10 mins for 1000 steps.
from hard-prompts-made-easy.
Thanks for your hard work, the insight from this experiment is well appreciated!
from hard-prompts-made-easy.
Related Issues (16)
- Only for 2.1? HOT 2
- Auto1111 web ui extension HOT 5
- do you plan to create this for sd 1.5 too? HOT 4
- Fluency loss HOT 1
- Demo does not work on huggingface HOT 1
- algorithm 1, and the necessity of image encoder HOT 2
- For SD XL? HOT 7
- [Feature] Can you make a simple UI for this? HOT 1
- [Discussion] Difference between different models? And other values? HOT 1
- question about '<start_of_text>' HOT 3
- Any extension for ComfyUI yet? HOT 1
- import pez.open_clip problem? ModuleNotFoundError: No module named 'pez' HOT 2
- Questions around running this to get more usable prompts HOT 4
- Negative and weighted prompts HOT 1
- gpu detection fails HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hard-prompts-made-easy.