Giter VIP home page Giter VIP logo

shd360's Introduction

Introduction


Figure 1: An illustration of 360° video salient human detection (VSHD). The first row, two random key frames of a 360° panoramic video from our SHD360. The shown 360° video frames are mapped to plane by conducting equirectangular (ER) projection. The middle row: a subject observes 360° content by moving his head to control the field-of-view (FoV) in a range of 360°×180°. The salient human instances with 360° attributes observed in spherical FoVs at specific rotation angles (e.g., θi, θj). The last row: corresponding annotations such as per-pixel instance-level ground truth (GT) and general attributes including MP-multiple persons, DV-distant view and MB-motion blur.

Salient human detection (SHD) in dynamic 360° immersive videos is of great importance for various applications such as robotics, inter-human and human-object interaction in augmented reality. However, 360° video SHD has been seldom discussed in the computer vision community due to a lack of datasets with large-scale omnidirectional videos and rich annotations. To this end, we propose SHD360, the first 360° video SHD dataset which contains various real-life daily scenes. Our SHD360 provides six-level hierarchical annotations for 6,268 key frames uniformly sampled from 37,403 omnidirectional video frames at 4K resolution. Specifically, each collected key frame is labeled with a super-class, a sub-class, associated attributes (e.g., geometrical distortion), bounding boxes and per-pixel object-/instance-level masks. As a result, our SHD360 contains totally 16,238 salient human instances with manually annotated pixel-wise ground truth. Since so far there is no method proposed for 360° image/video SHD, we systematically benchmark 11 representative state-of-the-art salient object detection (SOD) approaches on our SHD360, and explore key issues derived from extensive experimenting results. We hope our proposed dataset and benchmark could serve as a good starting point for advancing human-centric researches towards 360° panoramic data.


Related Works


Figure 2: Summary of widely used salient object detection (SOD) datasets and our SHD360. GT = ground truth. ER Image = equirectangular image. Attr. = attributes. obj. = object-level GT. ins. = instance-level GT. Please note that all the datasets listed above provide pixel-wise annotations.


DataSet: SHD360


Figure 3: Statistics of the proposed SHD360. (a)/(b) The quantity of object-/instance-level per-pixel ground-truth masks of each of the scene categories. (c) Hierarchical labels including two super-classes (indoor/outdoor) and 41 scene categories. Attributes statistics including (d) and (e) which represent correlation and frequency of proposed attributes, respectively. (f) Descriptions of the six proposed attributes associated with each of the scene categories.


Figure 4: Examples of instance-level pixel-wise labels and challenging attributes (please refer to Figure 3 (f) for details) of our SHD360.


360° Geometry-adapted S-measure


Figure 5: A comparison between traditional S-measure and proposed 360° geometry-adapted S-measure. The former/latter compute region similarities based on ER blocks/cube maps, respectively. ’+X’,’-X’,’+Y’,’-Y’,’+Z’ and ’-Z’ denote cube maps covering a FoV of 90°×90°, observed from the right, left, up, down, front and back by a 360° camera.


Benchmark

Overall Results


Figure 6: Performance comparison of 8/2 SOTA SOD/VSOD methods and one 360° SOD method over the three testing sets of our SHD360. S = S-measure (α=0.5), S360 = 360° geometry-adapted S-measure, Fβ = mean F-measure (β2=0.3), Eφ = mean E-measure, M = mean absolute error. ↑/↓ denotes a larger/smaller value is better. The three best results of each column are in red, blue and green, respectively.

Attributes-based Results


Figure 7: Attributes-based performance comparison of 11 baselines over our SHD360. ↑/↓ denotes a larger/smaller value is better. Three best results of each row are in red, blue and green, respectively.

Reference

No. Year Pub. Title Links
01 2019 IEEE CVPR Cascaded Partial Decoder for Fast and Accurate Salient Object Detection Paper/Project
02 2019 IEEE ICCV Stacked Cross Refinement Network for Edge-Aware Salient Object Detection Paper/Project
03 2020 AAAI F3Net: Fusion, Feedback and Focus for Salient Object Detection Paper/Project
04 2020 IEEE CVPR Multi-scale Interactive Network for Salient Object Detection Paper/Project
05 2020 IEEE CVPR Label Decoupling Framework for Salient Object Detection Paper/Project
06 2020 ECCV Highly Efficient Salient Object Detection with 100K Parameters Paper/Project
07 2020 ECCV Suppress and Balance: A Simple Gated Network for Salient Object Detection Paper/Project
08 2021 AAAI Locate Globally, Segment Locally: A Progressive Architecture With Knowledge Review Network for Salient Object Detection Paper/Project
09 2019 IEEE ICCV Semi-Supervised Video Salient Object Detection Using Pseudo-Labels Paper/Project
10 2020 AAAI Pyramid Constrained Self-Attention Network for Fast Video Salient Object Detection Paper/Project
11 2020 IEEE SPL FANet: Features Adaptation Network for 360° Omnidirectional Salient Object Detection Paper/Project

Evaluation Toolbox

All the quantitative results were computed based on one-key Python toolbox: https://github.com/zzhanghub/eval-co-sod .


Downloads

The object-/instance-level ground truth (with default split) and edge maps can be downloaded from OneDrive or Google.

The videos can be downloaded from Google or OneDrive.

To generate video frames, please refer to video_to_frames.py.

To get access to raw videos on Youtube, please refer to sequence_links.txt.


Privacy

Please note that SHD360 dataset does not own the copyright of images. Only researchers and educators who wish to use the images for non-commercial researches and/or educational purposes, have access to SHD360.


Contact

Please feel free to drop an e-mail to [email protected] for questions or further discussion.


Citation

@article{zhang2021shd360,
  title={SHD360: A Benchmark Dataset for Salient Human Detection in 360° Videos},
  author={Zhang, Yi and Zhang, Lu and Wang, Kang and Hamidouche, Wassim and Deforges, Olivier},
  journal={arXiv preprint arXiv:2105.11578},
  year={2021}
}

shd360's People

Contributors

jun-pu avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

haitian2du

shd360's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.