Comments (8)
Hi @falloncandra!
To fully understand your question, are you looking to apply an already learnt boundary on the inversion's latent code, or do you plan on training the boundary based on codes obtained from the encoder?
Manipulating each of the 18 latent code entries based on a learnt semantic boundary in the (1, 512) space should work fine (option 2). In fact, this is exactly how we apply the InterFaceGAN editing (based on a learnt (1, 512) boundary).
from encoder4editing.
Training the boundary using the inversion latent codes might not suit your needs.
Although trained for small perturbations, the encoder still yields 18x512 code close to the W subspace.
As each of the 18 style code entries correspond to different semantic attributes, averaging / applying only the main code will cause changes to the resulting image.
As an example, here is a visualization of the images produced by the inversion latent code (left), the main w style vector (middle), and the average code over the 18 entries (right).
As can be observed, while the overall geometry of the face is similar, changes in texture (colors) and some middle level details are causing a large change in the output image.
To test this behavior yourself, you can run the following commands from the notebook after obtaining the inversion latents:
comparison_latents = torch.cat([
latents,
latents[:, 0, :].unsqueeze(1).repeat(1,18,1),
latents.mean(dim=1).unsqueeze(1).repeat(1,18,1)
])
and then to generate the comparison image (using the initialized LatentEditor object):
editor._latents_to_image(comparison_latents)
In case the attributes preservation of the above method is not sufficient for your needs,
you can opt to find the boundary in the 18x512 space, which might be challenging,
or alternatively train the e4e encoder yourself to output a single style code (repeated 18 times).
The latter can be achieved by using the --progressive_start
training flag and setting it to a training step near the end of training (for example, you can train the encoder for 250k steps and start training the deltas (the perturbations) at the 250k step, resulting in encoder checkpoints trained to use only the main w style vector).
Hope it can help with your experiments.
from encoder4editing.
Hi @falloncandra ,
May I ask why you need to invert your images specifically to a size of (1, 512)
? Generally, this can be done, but your reconstruction will be quite poor as a single 512
-dimensional vector is typically not expressive enough.
from encoder4editing.
Hi, thanks for your reply.!
For my thesis project, I need to apply the semantic manipulation here (Figure 3, Equation 3) to the latent code of a real image (the result from GAN inversion). However, the method works only for 2D latent code. Moreover, from the discussion here, the semantics of StyleGAN reside on W(1, 512). In my case, the semantic is more important than the reconstruction result (I don't mind if the reconstructed image looks quite different from the original image, as long as it has the same attributes, e.g. still a female, still smiling). Therefore, I think I need to get the latent code in W(1, 512).
I really like your work because it uses pytorch and the inversion time is very fast (i.e. +/- 0.5s per image compared to other inversion methods which can be up to 8s). Hence, I really hope that I can use the pre-trained model in this repo for getting the latent codes W and generate new images after the latent code manipulation.
Could you please tell me how to achieve that with your code?
Thank you very much for you help!
Edit:
Hi, after reading your paper more thoroughly, I realised that each of the 18 style vectors come from the same vector w with small perturbations. Hence, I would like to know your opinion, which one makes more sense:
- learn semantic boundaries in the (1, 18 * 512) space and edit an image by manipulating the reshaped (1, 18 * 512) latent code, or
- learn the semantic boundaries in the (1, 512) space, then edit an image by performing the same manipulation to each of the 18 x (1, 512) latent code?
Any thoughts would be much appreciated! Thanks!
from encoder4editing.
Hi, thanks for your answer and clarification!
So first, I want to train the boundary based on the inversion's latent codes of some training images (if using Option 2, probably I will average the 18 style vectors so that each training image is represented by one (1, 512) vector. I think this should work fine because the 18 style vectors originate from the same w(1, 512)
vector with small offsets. Do you think so?).
After that, I want to manipulate test images by applying that learnt boundary to the inversion's latent code of the test images.
Do you think training the boundary based on the inversion's latent codes (as opposed to random generated w
vectors) will work fine too?
Thank you very much!
from encoder4editing.
Hi @omertov!
Thank you very much for your clear answers, examples, and instructions! I will think more about this information. Can't thank you enough!
from encoder4editing.
hi, I have a problem with e4e. in every code I test it couldn't download encoder4editing from https://docs.google.com/uc?export=download&confirm=&id=1cUv_reLE6k3604or78EranS7XzuVMWeO so I couldn't upload my pic and convert it to a latents.pt for the next step. also, I put an image about that
from encoder4editing.
For closure,
I have added a new encoder type which encodes into the W* space (512 dimensional vector repeated 18 times), which can be used to look for the interfacegan directions (although it needs testing).
In case this is still relevant I would love hearing about the results!
Best,
Omer
from encoder4editing.
Related Issues (20)
- is it possible to fine tune? HOT 2
- Resume training for cars
- How much time did it take to train on FFHQ? HOT 5
- Inference with sample size >1 fails HOT 1
- How to get figure 2 HOT 1
- Error while running inference.py HOT 2
- Training e4e for 512*256 stylegan HOT 2
- Regarding finding directions in W+ space
- Is released ffhq e4e model trained by inversion task? HOT 1
- whether the code is wrong?
- Stuck on Iteration_0
- Model parameters at other resolutions HOT 2
- How do I train an encoder with a resolution of 256*256 on the FFHQ dataset?
- There is a problem with the pre-training weights
- Can I train an encoder using your e4e frame work on a StyleGAN2-ADA pretrained on my own dataset?
- Can I use my own pretrained StyleGAN2-ADA? HOT 1
- Is it possible to get more details in the images?
- is it possible to run this code for gender swap operations ?
- Is it possible to train the encoder which is segmap to face?
- Suggestions on training an encoder on FFHQ-256
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from encoder4editing.