Hello, in you paper under Appendix I Table 3 you list different hype

clip_model_weights and auto_layer_k about stylegan-nada HOT 3 OPEN

lebeli commented on July 26, 2024

clip_model_weights and auto_layer_k

from stylegan-nada.

Comments (3)

rinongal commented on July 26, 2024 2

Every place where we use CLIP, we use the same weighted combination of the two models, yes. In practice, for many of our results (as you saw in the supp table), we set the weight of one of the models to 0, which effectively means we used just one model.

The 32/B model has larger patch sizes, and it focuses less on local content and more on global things like style. The 16/B model helps somewhat when you want to improve smaller scale attributes like shape. There's also a 14/L model, but it almost always makes the results worse :) You can add it to help improve identity preservation, but you'll probably want to give it a low weight.

from stylegan-nada.

rinongal commented on July 26, 2024 1

The entire pipeline could have been implemented using any of the available CLIP models, or a mix thereof. Setting the weight of the ViT-B/16 CLIP model to 0.0 just means that it did not contribute to any of the loss / layer selection calculations. The other CLIP model (ViT-B/32) would still be used, and you could simply rank the layers according to its output (rather than a weighted sum of the outputs of several CLIP models).

The instances where adaptive layer selection is off are the instances where the number of selected layers is the same as the number of W-code inputs for the model (e.g. 18 for the 1024x1024 FFHQ model).

It does make sense to use values other than 1.0 and 0.0. Each CLIP model leads to different visual effects. Using models with smaller patch sizes (16, 14) leads to better identity preservation. Using larger models (32) typically leads to better representation for styles etc. You can use values between 1.0 and 0.0 in order to interpolate between these preferences and decide how much importance you want to place on each.

If you're only using one CLIP model, then you are correct that you may as well just use 1.0 or 0.0 and play with the scale of the loss instead.

from stylegan-nada.

lebeli commented on July 26, 2024

I see, there was a misunderstanding on my part. So for both, the global and directional loss, you use the same two CLIP models (ViT-B/32 and ViT-B/16)? And for both losses you sum the individual CLIP losses from ViT-B/32 and ViT-B/16?

Edit: One last question. Does the big CLIP Model focus more on global features and the smaler one more on local features? Or what is the difference?

from stylegan-nada.

clip_model_weights and auto_layer_k about stylegan-nada HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent