Giter VIP home page Giter VIP logo

dsnet's Introduction

Hi there 👋

github stats

dsnet's People

Contributors

li-plus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

dsnet's Issues

Video summary not getting dumped for some videos in TVSum & Summe at higher SR

While running inference on TVSum and Summe datasets(while using the pretrained Summe-trained and tvsum-trained models respectively, provided by the authors), at higher Sampling rates(sr), for some of the videos, I am getting blank output video files as dumps(258 byte video file dumped). It seems that the "pred_summ" variable has all "False".
Does that mean that there are no summary candidate frames produced by the model for that particular video at that sr ? Also, the only solution is to reduce the sr or any other variable can be changed to enable the summary creation? Note that the length of videos in both datasets is decent i.e. 1 min - 4 mins long.

Incompatibility (CUDA and torch-sparse)

Hello guys.

I'm currently running some experiments on the DSNet, but every time I need to set up the network on a different environment I have problems with the pip install -r requirements.txt. It happens primarily because of torch-spare package. It's not compatible with several CUDA versions. Maybe it's a good idea to add to the README building pytorch and torch-sparse from source as recommendations.

Some useful links:
Building pytorch from source -> https://github.com/pytorch/pytorch
Premade wheels for pip installing pytorch -> https://download.pytorch.org/whl/torch/
Bulding torch-sparse from source -> https://github.com/rusty1s/pytorch_sparse
Premade wheels for pip installing torch-sparse -> https://pytorch-geometric.com/whl/

Thank you very much for your amazing network.

custom date about key_shot

when i run the train.py

Traceback (most recent call last):
File "train.py", line 73, in
main()
File "train.py", line 62, in main
fscore = trainer(args, split, ckpt_path)
File "/home/yaoyc/DSNet/src/anchor_based/train.py", line 50, in train
gtscore, cps, n_frames, nfps, picks)
File "/home/yaoyc/DSNet/src/helpers/vsumm_helper.py", line 75, in get_keyshot_summ
assert pred.shape == picks.shape
AttributeError: 'str' object has no attribute 'shape'

the error happened,but i dont know whats wrong,i did what the writer said in readme

About Metrics

When measuring the model, you use the f1 score and treat the entire video summary task as a sequence labeling task. But in the process of condensing the video, the positive and negative samples are often extremely unbalanced. So does the f1 score really make sense?
In addition, in your code, you use the average method for tvsum data set , but for other data sets use the maximum value. I want to ask the reason for this.

summary video can't be played

When I use infer.py to predict my own video, the generated summary video cannot be played, but it can be played when predicting the video provided by the author

ABOUT SPLITS

Hi, I was looking for the correct splits of the datasets so in that way I could experiment correctly, but I found this split https://github.com/ok1zjf/VASNet/tree/master/splits and differs with the one you made. They used the same datasets so I was wondering if you know where is the correct one. Thanks!

Evaluation Approach for the baseline model

I tried using the baseline model with LSTM on my version of the dataset. I downloaded the videos and loaded the labels using the make_dataset.py script. However, the labels in my dataset don't match the original ones. Despite this, I tested the model on this modified dataset using the average of the user_summary annotations as the evaluation labels. The resulting F-score was about 0.30. Then, I tried using the maximum value instead, which gave better results with an F-score of 0.52.

Later, I tried evaluating the model using the gt_score and converting it to shot summaries, similar to our training approach. After evaluation, I got an average F-score of 0.70. But the F1-score varied a lot.

ckpt3-Lstm

As you can see in the image, the F1-score keeps changing. My question is whether this way of evaluating is not good, and if the unstable F-score indicates a problem.

issue in inference

python infer.py anchor-based --ckpt-path ../models/custom/checkpoint/custom.yml.0.pt
--source ../custom_data/videos/EE-bNr36nyA.mp4 --save-path ./output.mp4
Loading DSNet model ...
Traceback (most recent call last):
File "infer.py", line 66, in
main()
File "infer.py", line 18, in main
model.load_state_dict(state_dict)
File "/home/mossad/aeye/DSNet/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1407, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DSNet:
Unexpected key(s) in state_dict: "fc_ctr.weight", "fc_ctr.bias".

How to run the baseline model ?

I want get the result ''For the baseline, we removed the interest proposal formulation and only applied a self-attention layer to predict the importance scores.'',How to run the baseline model ?Looking forward to your reply.

Feature Extraction

In the evaluation phase, you use the features that have been extracted in 'h5py' file. However, when I run 'infer.py' to summary video with raw video in TvSum dataset, the results is completely different from the features of the h5py file. And when using the original video prediction, the result is completely wrong. So I want to ask, is the feature extraction method really like ’src/helpers/video_helper.py‘, using the features extracted by googlenet? Could you provide us with the method of feature extraction of your h5py file?

datasets

I have a question about the features:
Do I need additional processing of video frames when using googlenet to extract video features? For example, normalization and other operations, or directly resize the original video frame and use the network to obtain features?
What should I do if I want to use resnet for feature extraction?
Thank you!

[question] How annotations are done

I imagined that json in custom_data folder would model total frames as binary list
for example assume we have 10 frames in the video and the most important segments is from frame [3 >> 6] and [9 >> 10]
then the annotation would be [0,0,1,1,1,1,0,0,1,1]
in other words seems confused about this statement in readme.md file
The user summary of a video is a UxN binary matrix, where U denotes the number of annotators and N denotes the number of frames in the original video
why to replicate frames U times and what is U

Validation/Test splits

Hi there,

First, thanks for providing your code and boost the open-source mentality of video summarization research.
I have a question about the sets of video used. In this issue you are referring to a validation set f_score, but neither on the splits appear any validation_keys nor in the source code I see a validation set, but rather the use of the test set, as validation. Am I missing something, or the usage of a test set for picking the best model is data leakage?

I think that the proper usage is 1:

  • Training set: The sample of data used to fit the model.
  • Validation set: The sample of data used to provide an unbiased evaluation of a model fit on the training dataset while tuning model hyperparameters.
  • Test set: The sample of data used to provide an unbiased evaluation of a final model fit on the training dataset.

To be honest, though, the small number of videos in the video summarization dataset, shouldn't be enough for specifying a validation set.

Thanks in advance.
George

Params in table2

Hi,
May I ask how Params are calculated in Table 2 ? I used the following code to calculate Params, and the result obtained is significantly different from Table 2's 8.53 million. The result I got was 4.33 million.

import torch
from thop import profile
from src.anchor_based.dsnet import DSNet
model = DSNet('attention', 1024, 128, [4, 8, 16, 32], 8)
input = torch.randn(1, 1024, 1024)
flops, params = profile(model, inputs=(input, ))
print('flops:{}'.format(flops))
print('params:{}'.format(params))

I would like to know how you calculated it and look forward to receiving your reply. Thank you very much!

OVP dataset and YouTube dataset not in the same format as TVSum dataset

When I was reproducing your code, I found that the OVP dataset and YouTube dataset are not in the same format as the TVSum dataset and Summe, missing 'change_points', 'n_frames', 'picks', etc., which prevents the model from completing the transfer. and Augmented settings, how should it be solved?

About the statistics of dataset in Table 1

In Table 1 of the paper, I found that the TVSum, YouTube, and OVP datasets have the SAME duration(Min, Max, Avg). Are they correct?

and how could i get the OVP and youtube dataset?

How to generate summaries on new videos?

The code could generate video summaries on TVSum and SumMe dataset. If there are some new videos, how could we generate summaries on them?

I found the data file contains "change_points", "n_frame_per_seg", do we have to get these annotations before we can generate summaries for new videos?

pre-trained model

Hi,

May I ask why there are five .pt file for each datasets in the pre-trained model. I have tried the model on the default customer video and all five file provides quite different results. Which one should I choose? Thanks in advance.

Is the model learning anything meaningful ?

I have run your code for anchor-free model on canonical TVSum, as intructured in the README file. And I get the results similar to ones reported in the paper. To be precise, I get these numbers:

mean: 0.6160917484037374
split0: 0.624260622317302
split1: 0.5672167705637337
split2: 0.6329998280152374
split3: 0.6279028963330486
split4: 0.6280786247893654

But I have noticed that these best f-score numbers in each split are obtained right at the start of training (after a couple of of epochs). This can be observed in the following f-score vs epochs plot, where each colour corresponds to a separate TVSum split.

image

As you can see the best f-score numbers are not much better than the f-scores obtained with randomly initialized weights of the model. This makes me if the model is indeed learning something meaningful. Any thoughts on this ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.