Giter VIP home page Giter VIP logo

Comments (4)

JunyaoHu avatar JunyaoHu commented on May 27, 2024 2

Hello, @oszilevi, does my answer explain this question clearly? You can use my recently updated repository, which has been stripped of redundant code. 😊

from common_metrics_on_video_quality.

JunyaoHu avatar JunyaoHu commented on May 27, 2024

Thank you for your question.

TLDR

As for the FVD calculation implementation of styleganv, there is no need to use the function get_fvd_logits.

our implementation

In our calculation process calculate_fvd.py,

  • Step 1: preprocess the video to ensure the proper shape and value range.

videos1 = trans(videos1)
videos2 = trans(videos2)

  • Step 2: use pre-trained model (i3d) to get the feature space tensor of the video.

feats1 = get_fvd_feats(videos_clip1, i3d=i3d, device=device)
feats2 = get_fvd_feats(videos_clip2, i3d=i3d, device=device)

  • Step 3: calculate FVD score

fvd_results[clip_timestamp] = frechet_distance(feats1, feats2)

about Step 2

In our function get_fvd_feats, it calls the function get_feats.

def get_fvd_feats(videos, i3d, device, bs=10):
# videos in [0, 1] as torch tensor BCTHW
# videos = [preprocess_single(video) for video in videos]
embeddings = get_feats(videos, i3d, device, bs)
return embeddings

the input of the function get_feats is the video, and the output is the i3d embedding. and the parameter return_features=True makes the output as features not logits.

def get_feats(videos, detector, device, bs=10):
# videos : torch.tensor BCTHW [0, 1]
detector_kwargs = dict(rescale=False, resize=False, return_features=True) # Return raw features before the softmax layer.
feats = np.empty((0, 400))
with torch.no_grad():
for i in range((len(videos)-1)//bs + 1):
feats = np.vstack([feats, detector(torch.stack([preprocess_single(video) for video in videos[i*bs:(i+1)*bs]]).to(device), **detector_kwargs).detach().cpu().numpy()])
return feats

there is no need to use the function get_fvd_logits. I didn't comment out this function, and I'm sorry if that misinterpreted you, but this code is redundant.

from common_metrics_on_video_quality.

oszilevi avatar oszilevi commented on May 27, 2024

yes! Thank u very much 👍🏻

from common_metrics_on_video_quality.

JunyaoHu avatar JunyaoHu commented on May 27, 2024

Hello, I updated the repo just now, and it can support 2 pytorch FVD calculation methods (styleganv and videogpt).

As you say, the method calculates 'the fvd score with the output of the logits layer' is the implementation of videogpt.

I must say the method of videogpt is not wrong, instead, is also right, maybe its function name is not good...

Actually, logits are features

In google's origin I3D model, the tail of the model structure is 'Mixed_5c', 'Logits', and 'Predictions'. here

image

The two models both remove the module of 'Predictions' (softmax):

  • As for i3d_pretrained_400.pt, it use the I3D model file in our repo, the model end with 'Logits', without 'Predictions' (softmax). So it is right.
  • As for i3d_torchscript.pt, it use the parameter return_features=True. So is it right.

Finally, in our repo, I copied the file with the original function name, only importing it with an alias to refer to it.

def calculate_fvd(videos1, videos2, device, method='mcvd'):
if method == 'mcvd':
from fvd.styleganv.fvd import get_fvd_feats, frechet_distance, load_i3d_pretrained
elif method == 'videogpt':
from fvd.videogpt.fvd import load_i3d_pretrained
from fvd.videogpt.fvd import get_fvd_logits as get_fvd_feats
from fvd.videogpt.fvd import frechet_distance


I apologize for my upper incomplete and incorrect answers.

from common_metrics_on_video_quality.

Related Issues (12)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.