Comments (4)
Relevant: rolux/stylegan2encoder#2 (comment) (posted in January 2020)
It took me a while to appreciate the fact that encoder output can have high visual quality, but bad semantics.
That is the kind of idea that you find in the paper: a good inversion is the result of a trade-off between i) perception (visual quality in terms of a realistic output), ii) distortion (visual quality in terms of an output close to the input), and iii) edit-ability (semantics).
If you look at the projected face of Angelina Jolie, you can see that it looks like a human face (perception), it slightly looks like Angelina Jolie (distortion), and it should hopefully change according to plan if you try to edit it (edit-ability).
Closely related, if you want to get an idea of what to expect from projections as implemented:
- in the original StyleGAN2 paper (W or W(1,*))
- in its forks (W+ or W(18,*)) which predate encoder4editing,
then you can check the results shown in the README of my repository: https://github.com/woctezuma/stylegan2-projecting-images Basically, the more constrained the projection, the higher the distortion, but the output should behave better.
With encoder4editing, one has access to a smart way to constrain the projection. Plus, the projection is fast.
from encoder4editing.
It seems like you run our encoder correctly.
Generally speaking, our pretrained e4e encoder is specifically designed to balance the tradeoffs existing in the StyleGAN's latent space (See our paper for further details and examples).
By doing so, we lose some reconstruction accuracy to gain more editable latent codes (that can be better used by other existing latent-space manipulation techniques, StyleFlow for example) compared to other inversion methods.
If exact reconstruction is what you seek, direct optimization will always yield the best results, or alternatively, you can control the tradeoff yourself according to your needs.
For example, you can train the encoder to favor reconstruction over editablity by not using the latent codes discriminator or by tuning the progressive training parameters.
from encoder4editing.
Hi @molo32,
Can you provide further details? have you performed the required face alignment?
from encoder4editing.
image_path = "/content/8.jpg"
original_image = Image.open(image_path)
original_image = original_image.convert("RGB")
input_image = run_alignment(image_path)
def run_on_batch(inputs, net):
images, latents = net(inputs.to("cuda").float(), randomize_noise=False, return_latents=True)
if experiment_type == 'cars_encode':
images = images[:, :, 32:224, :]
return images, latents
def display_alongside_source_image(result_image, source_image):
res = np.concatenate([np.array(source_image.resize(resize_dims)),
np.array(result_image.resize(resize_dims))], axis=1)
return Image.fromarray(res)
input_image.resize(resize_dims)
img_transforms = EXPERIMENT_ARGS['transform']
transformed_image = img_transforms(input_image)
with torch.no_grad():
tic = time.time()
images, latents = run_on_batch(transformed_image.unsqueeze(0), net)
result_image, latent = images[0], latents[0]
toc = time.time()
print('Inference took {:.4f} seconds.'.format(toc - tic))
# Display inversion:
display_alongside_source_image(tensor2im(result_image), input_image)
from encoder4editing.
Related Issues (20)
- is it possible to fine tune? HOT 2
- Resume training for cars
- How much time did it take to train on FFHQ? HOT 5
- Inference with sample size >1 fails HOT 1
- How to get figure 2 HOT 1
- Error while running inference.py HOT 2
- Training e4e for 512*256 stylegan HOT 2
- Regarding finding directions in W+ space
- Is released ffhq e4e model trained by inversion task? HOT 1
- whether the code is wrong?
- Stuck on Iteration_0
- Model parameters at other resolutions HOT 2
- How do I train an encoder with a resolution of 256*256 on the FFHQ dataset?
- There is a problem with the pre-training weights
- Can I train an encoder using your e4e frame work on a StyleGAN2-ADA pretrained on my own dataset?
- Can I use my own pretrained StyleGAN2-ADA? HOT 1
- Is it possible to get more details in the images?
- is it possible to run this code for gender swap operations ?
- Is it possible to train the encoder which is segmap to face?
- Suggestions on training an encoder on FFHQ-256
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from encoder4editing.