wenguanwang / dhf1k Goto Github PK

Revisiting Video Saliency: A Large-scale Benchmark and a New Model (CVPR18, PAMI19)

MATLAB 100.00%

saliency attention-mechanism salient-object-detection fixation saliency-prediction visual-attention cvpr2018 cvpr cvpr18

dhf1k's Introduction

DHF1K

For performance benchmarking, please directly drop emails to '[email protected]'.

===========================================================================

Wenguan Wang, J. Shen, M.-M Cheng and A. Borji,

Revisiting Video Saliency: A Large-scale Benchmark and a New Model,

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018 and

IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2019

===========================================================================

The code (ACLNet) and dataset (DHF1K with raw gaze records, UCF-sports are new added!) can be downloaded from:

Google disk：https://drive.google.com/open?id=1sW0tf9RQMO4RR7SyKhU8Kmbm4jwkFGpQ

Baidu pan: https://pan.baidu.com/s/110NIlwRIiEOTyqRwYdDnVg

The Hollywood-2 (74.6G, including attention maps) can be downloaded from:

Google disk：https://drive.google.com/drive/folders/1eCNcRSInK7GGNxXF60yeU7LqiRgRKO4r?usp=sharing

Baidu pan: link：https://pan.baidu.com/s/16BIAuaGEDDbbjylJ8zziuA code：bt3x

Since so many people are interested in the training code, I decide to upload it in above webdisks. Enjoy it.

===========================================================================

Files:

'video': 1000 videos (videoname.AVI)

'annotation/videoname/maps': continuous saliency maps in '.png' format

'annotation/videoname/fixation': binary eye fixation maps in '.png' format

'annotation/videoname/maps': binary eye fixation maps stored in mat file

'generate_frame.m': used for extracting the frame images from AVI videos.

Please note raw data of individual viewers are stored in 'exportdata_train.rar'.

Note that please do not change the way of naming frames.

===========================================================================

Dataset splitting:

Training set: first 600 videos (001.AVI-600.AVI)

Validation set: 100 videos (601.AVI-700.AVI)

Testing set: 300 videos (701.AVI-1000.AVI)

The annotations for the training and val sets are released, but the

annotations of the testing set are held-out for benchmarking.

===========================================================================

The attribute annotation of all the videos "DHF1k_attribute-all" has been uploaded.

The statistics of the attributes are subjected to this version.

===========================================================================

We have corrected some statistics of our results (baseline training setting (iii)) on UCF sports dataset. Please see our newest version in ArXiv.

===========================================================================

Note that, for Holly-wood2 dataset, we used the split videos (each video only contains one shot), instead of the full videos.

===========================================================================

The raw data of gaze record "exportdata_train.rar" has been uploaded.

===========================================================================

For DHF1K dataset, we use following functions to generate continous saliency map:

[x,y]=find(fixations);

densityMap= make_gauss_masks(y,x,[video_res_y,video_res_x]);

make_gauss_masks.m has been uploaded.

For UCF and Hollywood, I directly use following functions:

densityMap = imfilter(fixations,fspecial('gaussian',150,20),'replicate');

===========================================================================

Results submission.

Please orgnize your results in following format:

yourmethod/videoname/framename.png

Note that the frames and framenames should be generated by 'generate_frame.m'.

Then send your results to '[email protected]'.

You can only sumbmit ONCE within One week.

Please first test your model on the val set or other video saliency dataset.

The response may be more than one week.

If you want to list your results on our web, please send your name, model

name, paper title, short description of your method and the link of the web

of your project (if you have).

===========================================================================

We use

Keras: 2.2.2

tensorflow: 1.10.0

to implement our model.

===========================================================================

Citation:

@InProceedings{Wang_2018_CVPR,
author = {Wang, Wenguan and Shen, Jianbing and Guo, Fang and Cheng, Ming-Ming and Borji, Ali},
title = {Revisiting Video Saliency: A Large-Scale Benchmark and a New Model},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition},
year = {2018}
}

@ARTICLE{Wang_2019_revisitingVS, 
author={W. {Wang} and J. {Shen} and J. {Xie} and M. {Cheng} and H. {Ling} and A. {Borji}}, 
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 
title={Revisiting Video Saliency Prediction in the Deep Learning Era}, 
year={2019}, 
}

If you find our dataset is useful, please cite above papers.

===========================================================================

Code (ACLNet):

You can find the code in google disk: https://drive.google.com/open?id=1sW0tf9RQMO4RR7SyKhU8Kmbm4jwkFGpQ

===========================================================================

The dataset and code are licensed under a Creative Commons Attribution 4.0 License.

===========================================================================

Contact Information Email: [email protected]

dhf1k's People

Contributors

Stargazers

Watchers

dhf1k's Issues

About ACL.h5

Hi! Many thanks for your great work!

The ACL.h5 file could not be opened as a result of running the program.
Is it possible that this file is corrupt?

Is the audio presented to the viewer during fixation collection?

Hi, thanks for collecting such a valuable dataset!
Several things I want to clarify with you:

I noticed they videos are with audios. Are the viewers accessible to them during the data collection?
As for the 1000 video clips, are they the complete clips that you directly downloaded from YouTube or you randomly cut some them from the raw videos?

Many thanks!

Regarding saliency metric (especially CC)

Hi,

First of all, thank you for providing such a nice dataset and code for evaluation metrics. I would like to evaluate my saliency results using five metrics that paper talked about.

I found that linear cross correlation can provide positive and negative values [-1,1]. So, when you present score for CC, did you take absolute value of each linear CC with respect to image and averaging it with each frame and each clip? Your MATLAB code doesn't do that, but that does make sense to me.

Look forward to your reply.

UCF download link?

You mentioned in a previous issue :

Hi, all, the data of Hollywood-2 and UCF have been uploaded.

The code (ACLNet) and dataset (DHF1K with raw gaze records, UCF-sports are new added!) can be downloaded from:

Google disk：https://drive.google.com/open?id=1sW0tf9RQMO4RR7SyKhU8Kmbm4jwkFGpQ

Baidu pan: https://pan.baidu.com/s/110NIlwRIiEOTyqRwYdDnVg

The Hollywood-2 (74.6G) can be downloaded from:

Google disk：https://drive.google.com/open?id=1vfRKJloNSIczYEOVjB4zMK8r0k4VJuWk

Originally posted by @wenguanwang in #2 (comment)

Is there also a link for the UCF-sports dataset?

Testing setting of Hollywood2 dataset

Did you use the whole fixation points when training and testing Hollywood2 dataset?
Or, did you filter out some points? (e.x. filters out the points at image edge, ....)
Also, did you use the whole 884 test videos when testing Hollywood2?
How did you sync the fixation points with the video?

I am asking you this because I want to replicate the same result on Hollywood2 dataset.
Could you provide a more detailed information of your setting?
(I divided the videos by using shot bounds, as you mentioned)

Hollywood2 google link broken

Hi, is it possible to reupload the data for Hollywood2 as the Google drive link is now broken. Maybe upload it on Mega or Dropbox? It's very hard to download it from Baidu from a foreigner perspective. Thanks

Dataset license

Hi,

Thanks for the great work. Could you please provide a license for the dataset?

Attributes for first 700 videos

Many thanks for your great work!
as far as I can see, DHF1k_attribute.xlsx only provides data for the 300 test videos. Could you also provide this kind of attribute data for the first 700 videos?
That would save me a lot of work and would be highly appreciated!

question about testing AUC-shuffled

When using the evaluation code in this package, AUC-shuffled score is much lower than that reported in the paper on UCF dataset. I was wondering if it is anything wrong with the evaluation code, or if I missed some important details.

Annotations for only 26 videos

After downloading the dataset, the annotation folder has annotations for only 26 videos. How can I get the annotations of remaining videos?

想问下有训练代码吗？

questions about the paths and files

Hi, thank you for your dataset and the source code, I wanna replicate this work with your code, but I am confused about the paths in config.py. I want to know what kinds of data has been used to train the model. In your paper Revisiting Video Saliency: A Large-scale Benchmark and a New Model ,you said that you have used the static dataset SALICON to train the attention module, and in your code there are several paths. Could you tell me:

which paths are the video dataset's path and which is for the SALICON? Do you mean that frames_path is all the frames extracted from the video, and imgs_path is for the data in SALICON?
do I need to extract all frames from the videos by myself?

related code are as follows:

# path of training videos
videos_train_paths = ['D:/code/attention/DHF1K/training/']
# path of validation videos
videos_val_paths = ['D:/code/attention/DHF1K/val/']
videos_test_path = 'D:/code/attention/DHF1K/testing/'

# path of training maps
maps_path = '/maps/'
# path of training fixation maps
fixs_path = '/fixation/maps/'

frames_path = '/images/'

# path of training images
imgs_path = 'D:/code/attention/staticimages/training/'

Thankyou.

package version

Which version of python packages did you employ for the ACLNet/Attentive CNN-LSTM Network? Thanks a lot.

discrepancy in exportdata_train and DHF1K fixation maps?

Hi, thanks for the nice dataset.
I want to recreate the fixation maps using the raw gaze records in exportdata_train folder released for DHF1K.

However, the fixation map obtained using record_mapping.m script and raw data from exportdata_train folder donot match the ones released in DHF1K.

For example:

0001.png: this is the fixation map for first frame of 001.AVI copied from: annotation/0001/fixation/0001.png

0001_regenerated.png : I regenerated this fixation map using files from exportdata_train folder.

I used the record_mapping.m file after specifying appropriate paths and modifying line 22 and line 24.

Could you please help me understand what I might be missing?

For your reference, here is my copy of record_mapping.m file:

%This function is used for mapping the fixation record into the corresponding fixation maps.
screen_res_x = 1440;
screen_res_y = 900;

parent_dir = 'GIVE PATH TO PARENT DIRECTORY';

datasetFile1 = 'movie';
datasetFile = 'video';
gazeFile = 'exportdata_train';

videoFiles = dir(fullfile('./', datasetFile));
videoNUM = length(videoFiles)-2;
rate = 30;
  
full_vid_dir = [parent_dir, datasetFile, '/'];

 for videonum = 1:700
        videofolder =  videoFiles(videonum+2).name
        vidObj = VideoReader([full_vid_dir,videofolder]);
        options.infolder = fullfile( './', datasetFile,  videofolder, 'images' );
        % no need to read full video if I can use VideoReader to know
        % dimensions and duration of video
        % Cache all frames in memory
        %[data.frames,names,video_res_y,video_res_x,nframe ]= readAllFrames( options );
        nframe = vidObj.NumberOfFrames;
        video_res_x = vidObj.Width;
        video_res_y = vidObj.Height;
        a=video_res_x/screen_res_x;
        b=(screen_res_y-video_res_y/a)/2;
        all_fixation = zeros(video_res_y,video_res_x,nframe);
        for person = 1:17
            %modified the following line to match the video naming format
            txtloc = fullfile(parent_dir, gazeFile, sprintf('P%02d',person), [sprintf('P%02d_Trail',person), sprintf('%03d.txt',videonum)]);
            if exist(txtloc, 'file')
                %modified the following line to match the txt file format
                [time,model,trialnum,diax, diay, x_screen,y_screen,event]=textread(txtloc,'%f%s%f%f%f%f%f%s','headerlines',1);
                if size(time,1)
                    time = time-time(1);
                    event = cellfun(@(x) x(1), event);
                    for index = 1:nframe
                            eff = find( ((index-1)<rate*time/1000000)&(rate*time/1000000<index)&event=='F'); %framerate = 10;
                            x_stimulus=int32(a*x_screen(eff));
                            y_stimulus=int32(a*(y_screen(eff)-b));
                            t = x_stimulus<=0|x_stimulus>=video_res_x|y_stimulus<=0|y_stimulus>=video_res_y;
                            all_fixation(y_stimulus(~t),x_stimulus(~t),index) = 1;
                    end
                end
            end
        end 
end

Absent annotations

Thank you for providing the dataset.
After I unarchived the annotation it has data only for the first 700 videos. Is this intended?

modify generate_frame.m to python

import os
import cv2
from tqdm import tqdm

base_dir = '/home/simplew/dataset/sod/DHF1K'

video_dir = os.path.join(base_dir, 'video')

movies = [mov for mov in os.listdir(video_dir) if mov.endswith('.AVI')]
for movie in tqdm(movies):
    image_dir = os.path.join(base_dir , 'annotation', '0' + movie[:-4], 'images')
    os.makedirs(image_dir, exist_ok=True)
    
    # use opencv
    # cap = cv2.VideoCapture(f"{video_dir}/{movie}")
    # numFrames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    # for k in range(numFrames):
    #     ret, frame = cap.read()
    #     cv2.imwrite(f"{image_dir}/{k+1:04}.png", frame)
    # cap.release()

    # ubuntu
    command = f"ffmpeg -i {video_dir}/{movie}  {image_dir}/%04d.png"
    os.system(command)

How are ground truth saliency maps generated from recorded fixations?

In your data collection you gather a set of discrete fixation maps (P in the paper). From this, continuous saliency maps (Q in the paper) are generated. I found no details about how this is done, could you elaborate? I would guess that it involves gaussians centered on the spot of fixation, I am interested in the exact parameters, how you combine fixations from different test subjects and so on.

Thanks again for providing the dataset!

Gaussian blurring of fixation map of Hollywood2 and UCFSports

I checked that the fixation map of DHF1K is blurred by width 30 gaussian kernel.
Did you use the same width for the Hollywood2 and UCFSports dataset?

trained model

can you provide the pre-trained model?

The loss is nan.

Hi, I'm really interested in your work. And I used your training code -- 'ACL_full' to train my data. But during training, the loss always becomes NAN after several iterations:
53/100 [==============>...............] - ETA: 59s - loss: nan - time_distributed_15_loss: nan - time_distributed_16_loss: nan

I have tuned the base learning rate from 1e-4 to 1e-12, but the results are the same.

Do you know there are some solutions?

And what does the 'imgs_path' ('staticimages') in config.py mean?

Thanks very much!

wenguanwang / dhf1k Goto Github PK

dhf1k's Introduction

DHF1K

dhf1k's People

Contributors

Stargazers

Watchers

Forkers

dhf1k's Issues

Recommend Projects

Recommend Topics

Recommend Org