Comments (3)
Every place where we use CLIP, we use the same weighted combination of the two models, yes. In practice, for many of our results (as you saw in the supp table), we set the weight of one of the models to 0, which effectively means we used just one model.
The 32/B model has larger patch sizes, and it focuses less on local content and more on global things like style. The 16/B model helps somewhat when you want to improve smaller scale attributes like shape. There's also a 14/L model, but it almost always makes the results worse :) You can add it to help improve identity preservation, but you'll probably want to give it a low weight.
from stylegan-nada.
The entire pipeline could have been implemented using any of the available CLIP models, or a mix thereof. Setting the weight of the ViT-B/16 CLIP model to 0.0 just means that it did not contribute to any of the loss / layer selection calculations. The other CLIP model (ViT-B/32) would still be used, and you could simply rank the layers according to its output (rather than a weighted sum of the outputs of several CLIP models).
The instances where adaptive layer selection is off are the instances where the number of selected layers is the same as the number of W-code inputs for the model (e.g. 18 for the 1024x1024 FFHQ model).
It does make sense to use values other than 1.0 and 0.0. Each CLIP model leads to different visual effects. Using models with smaller patch sizes (16, 14) leads to better identity preservation. Using larger models (32) typically leads to better representation for styles etc. You can use values between 1.0 and 0.0 in order to interpolate between these preferences and decide how much importance you want to place on each.
If you're only using one CLIP model, then you are correct that you may as well just use 1.0 or 0.0 and play with the scale of the loss instead.
from stylegan-nada.
I see, there was a misunderstanding on my part. So for both, the global and directional loss, you use the same two CLIP models (ViT-B/32 and ViT-B/16)? And for both losses you sum the individual CLIP losses from ViT-B/32 and ViT-B/16?
Edit: One last question. Does the big CLIP Model focus more on global features and the smaler one more on local features? Or what is the difference?
from stylegan-nada.
Related Issues (20)
- Gradio related inquiry HOT 1
- Question about specifying the style HOT 4
- About the bug when running Demo HOT 7
- Style transfer of "White Walker" HOT 1
- How to use style mapper HOT 1
- A specific domain changes in global direction HOT 1
- Sample Code for retraining OASIS HOT 2
- Control eye position HOT 2
- Question about gradient of transforms HOT 2
- Unexpexted output when providing randomly sampled latents HOT 1
- Nvidia error when running docker-compose up HOT 1
- Cross-domain image interpolation HOT 3
- When choose other than 'ffhq' for source model type
- i want to add eyeglasses boundary from stylegan ffhq . HOT 1
- unauthorized: authentication required on docker compose up
- Ajout de nouvelle couche dans le modèle de stylegan_nada(transfert learning)
- image2image
- Migration with a specific content image HOT 5
- Performance Metric Missing HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stylegan-nada.