li-plus / dsnet Goto Github PK

View Code? Open in Web Editor NEW

205.0 205.0 49.0 15.54 MB

DSNet: A Flexible Detect-to-Summarize Network for Video Summarization

Home Page: https://ieeexplore.ieee.org/document/9275314

License: MIT License

Python 95.94% Shell 4.06%

computer-vision detection machine-learning pytorch video-summarization

dsnet's Introduction

Hi there 👋

dsnet's People

Contributors

Stargazers

Watchers

dsnet's Issues

Video summary not getting dumped for some videos in TVSum & Summe at higher SR

While running inference on TVSum and Summe datasets(while using the pretrained Summe-trained and tvsum-trained models respectively, provided by the authors), at higher Sampling rates(sr), for some of the videos, I am getting blank output video files as dumps(258 byte video file dumped). It seems that the "pred_summ" variable has all "False".
Does that mean that there are no summary candidate frames produced by the model for that particular video at that sr ? Also, the only solution is to reduce the sr or any other variable can be changed to enable the summary creation? Note that the length of videos in both datasets is decent i.e. 1 min - 4 mins long.

30% implementation of this project. Please respond immediately. We have on review on Saturday

Hello sir, I am Santhoshkumar S from Anna University. Currently, I am pursuing final year computer science and engineering. For our final year project, we chose your DSNet project. My question is what is the 30 percent implementation of this project?. Is feature extraction code for input video is available.

How can I know the original video name for the h5 file

hello, I want to extract the feature and train the network. But I don't know the original video name in the eccv16_dataset_tvsum_google_pool5.h5 on the tvsum dataset.

I am getting an error

Incompatibility (CUDA and torch-sparse)

Hello guys.

I'm currently running some experiments on the DSNet, but every time I need to set up the network on a different environment I have problems with the pip install -r requirements.txt. It happens primarily because of torch-spare package. It's not compatible with several CUDA versions. Maybe it's a good idea to add to the README building pytorch and torch-sparse from source as recommendations.

Some useful links:
Building pytorch from source -> https://github.com/pytorch/pytorch
Premade wheels for pip installing pytorch -> https://download.pytorch.org/whl/torch/
Bulding torch-sparse from source -> https://github.com/rusty1s/pytorch_sparse
Premade wheels for pip installing torch-sparse -> https://pytorch-geometric.com/whl/

Thank you very much for your amazing network.

custom date about key_shot

when i run the train.py

Traceback (most recent call last):
File "train.py", line 73, in
main()
File "train.py", line 62, in main
fscore = trainer(args, split, ckpt_path)
File "/home/yaoyc/DSNet/src/anchor_based/train.py", line 50, in train
gtscore, cps, n_frames, nfps, picks)
File "/home/yaoyc/DSNet/src/helpers/vsumm_helper.py", line 75, in get_keyshot_summ
assert pred.shape == picks.shape
AttributeError: 'str' object has no attribute 'shape'

the error happened,but i dont know whats wrong,i did what the writer said in readme

About Metrics

When measuring the model, you use the f1 score and treat the entire video summary task as a sequence labeling task. But in the process of condensing the video, the positive and negative samples are often extremely unbalanced. So does the f1 score really make sense?
In addition, in your code, you use the average method for tvsum data set , but for other data sets use the maximum value. I want to ask the reason for this.

summary video can't be played

When I use infer.py to predict my own video, the generated summary video cannot be played, but it can be played when predicting the video provided by the author

pretrained model?

could you release your pre-trained model ?

ABOUT SPLITS

Hi, I was looking for the correct splits of the datasets so in that way I could experiment correctly, but I found this split https://github.com/ok1zjf/VASNet/tree/master/splits and differs with the one you made. They used the same datasets so I was wondering if you know where is the correct one. Thanks!

Evaluation Approach for the baseline model

I tried using the baseline model with LSTM on my version of the dataset. I downloaded the videos and loaded the labels using the make_dataset.py script. However, the labels in my dataset don't match the original ones. Despite this, I tested the model on this modified dataset using the average of the user_summary annotations as the evaluation labels. The resulting F-score was about 0.30. Then, I tried using the maximum value instead, which gave better results with an F-score of 0.52.

Later, I tried evaluating the model using the gt_score and converting it to shot summaries, similar to our training approach. After evaluation, I got an average F-score of 0.70. But the F1-score varied a lot.

As you can see in the image, the F1-score keeps changing. My question is whether this way of evaluating is not good, and if the unstable F-score indicates a problem.

While running the infer.py file for the custom video summarization I am getting an error

Loading DSNet model ...
Preprocessing source video ...

#ERROR
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=383 error=11 : invalid argument

Predicting summary ...
Writing summary video ...

seems that youtube dataset link is not right?

issue in inference

python infer.py anchor-based --ckpt-path ../models/custom/checkpoint/custom.yml.0.pt
--source ../custom_data/videos/EE-bNr36nyA.mp4 --save-path ./output.mp4
Loading DSNet model ...
Traceback (most recent call last):
File "infer.py", line 66, in
main()
File "infer.py", line 18, in main
model.load_state_dict(state_dict)
File "/home/mossad/aeye/DSNet/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1407, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DSNet:
Unexpected key(s) in state_dict: "fc_ctr.weight", "fc_ctr.bias".

How to run the baseline model ？

I want get the result ''For the baseline, we removed the interest proposal formulation and only applied a self-attention layer to predict the importance scores.'',How to run the baseline model ？Looking forward to your reply.

Feature Extraction

In the evaluation phase, you use the features that have been extracted in 'h5py' file. However, when I run 'infer.py' to summary video with raw video in TvSum dataset, the results is completely different from the features of the h5py file. And when using the original video prediction, the result is completely wrong. So I want to ask, is the feature extraction method really like ’src/helpers/video_helper.py‘, using the features extracted by googlenet? Could you provide us with the method of feature extraction of your h5py file?

datasets

I have a question about the features:
Do I need additional processing of video frames when using googlenet to extract video features? For example, normalization and other operations, or directly resize the original video frame and use the network to obtain features?
What should I do if I want to use resnet for feature extraction?
Thank you!

While running the infer.py file for the custom video summarization I am getting an error

Has anyone re-experimented feature extraction from the raw video?

As I have say in #12 , when I re-extract the features to train the network, the f1 score of the model is only about 0.3. Is this normal? Has anyone re-experimented feature extraction from the raw video?

[question] How annotations are done

I imagined that json in custom_data folder would model total frames as binary list
for example assume we have 10 frames in the video and the most important segments is from frame [3 >> 6] and [9 >> 10]
then the annotation would be [0,0,1,1,1,1,0,0,1,1]
in other words seems confused about this statement in readme.md file
The user summary of a video is a UxN binary matrix, where U denotes the number of annotators and N denotes the number of frames in the original video
why to replicate frames U times and what is U

feature extraction

could you release your feature extraction code?

Environment Configuration

Can you post a tutorial detailing the environment configuration process?

Validation/Test splits

Hi there,

First, thanks for providing your code and boost the open-source mentality of video summarization research.
I have a question about the sets of video used. In this issue you are referring to a validation set f_score, but neither on the splits appear any validation_keys nor in the source code I see a validation set, but rather the use of the test set, as validation. Am I missing something, or the usage of a test set for picking the best model is data leakage?

I think that the proper usage is 1:

Training set: The sample of data used to fit the model.
Validation set: The sample of data used to provide an unbiased evaluation of a model fit on the training dataset while tuning model hyperparameters.
Test set: The sample of data used to provide an unbiased evaluation of a final model fit on the training dataset.

To be honest, though, the small number of videos in the video summarization dataset, shouldn't be enough for specifying a validation set.

Thanks in advance.
George

Params in table2

Hi,
May I ask how Params are calculated in Table 2 ? I used the following code to calculate Params, and the result obtained is significantly different from Table 2's 8.53 million. The result I got was 4.33 million.

import torch
from thop import profile
from src.anchor_based.dsnet import DSNet
model = DSNet('attention', 1024, 128, [4, 8, 16, 32], 8)
input = torch.randn(1, 1024, 1024)
flops, params = profile(model, inputs=(input, ))
print('flops:{}'.format(flops))
print('params:{}'.format(params))

I would like to know how you calculated it and look forward to receiving your reply. Thank you very much!

OVP dataset and YouTube dataset not in the same format as TVSum dataset

When I was reproducing your code, I found that the OVP dataset and YouTube dataset are not in the same format as the TVSum dataset and Summe, missing 'change_points', 'n_frames', 'picks', etc., which prevents the model from completing the transfer. and Augmented settings, how should it be solved?

About the statistics of dataset in Table 1

In Table 1 of the paper, I found that the TVSum, YouTube, and OVP datasets have the SAME duration(Min, Max, Avg). Are they correct?

and how could i get the OVP and youtube dataset?

How to generate summaries on new videos?

The code could generate video summaries on TVSum and SumMe dataset. If there are some new videos, how could we generate summaries on them?

I found the data file contains "change_points", "n_frame_per_seg", do we have to get these annotations before we can generate summaries for new videos?

There is no pywrapknapsack_solver

the ortools have been updated hence its causing issue while training on custom dataset
please rectify it
thankyou

pre-trained model

Hi,

May I ask why there are five .pt file for each datasets in the pre-trained model. I have tried the model on the default customer video and all five file provides quite different results. Which one should I choose? Thanks in advance.

Is the model learning anything meaningful ?

I have run your code for anchor-free model on canonical TVSum, as intructured in the README file. And I get the results similar to ones reported in the paper. To be precise, I get these numbers:

mean: 0.6160917484037374
split0: 0.624260622317302
split1: 0.5672167705637337
split2: 0.6329998280152374
split3: 0.6279028963330486
split4: 0.6280786247893654

But I have noticed that these best f-score numbers in each split are obtained right at the start of training (after a couple of of epochs). This can be observed in the following f-score vs epochs plot, where each colour corresponds to a separate TVSum split.

As you can see the best f-score numbers are not much better than the f-scores obtained with randomly initialized weights of the model. This makes me if the model is indeed learning something meaningful. Any thoughts on this ?

li-plus / dsnet Goto Github PK

dsnet's Introduction

Hi there 👋

dsnet's People

Contributors

Stargazers

Watchers

Forkers

dsnet's Issues

Recommend Projects

Recommend Topics

Recommend Org