Giter VIP home page Giter VIP logo

cross-modal-video-moment-retrieval-with-spatial-and-language-temporal-attention's Introduction

Cross-Modal Video Moment Retrieval with Spatial and Language-Temporal Attention

This is our implementation for the paper:

Jiang Bin, Huang Xin, Yang Chao, et al. Cross-Modal Video Moment Retrieval with Spatial and Language-Temporal Attention[C]//Proceedings of the 2019 on International Conference on Multimedia Retrieval. ACM, 2019: 217-225.

To well align the given textual query and the video moment candidates, we devise a spatial and language-temporal attention model to adaptively identify the relevant objects and interactions based on the query information.

Please cite our ICMR'19 paper if you use our codes. Thanks!

BibTeX:

@inproceedings{jiang2019cross,
 author = {Jiang, Bin and Huang, Xin and Yang, Chao and Yuan, Junsong},
 title = {Cross-Modal Video Moment Retrieval with Spatial and Language-Temporal Attention},
 booktitle = {Proceedings of the 2019 on International Conference on Multimedia Retrieval},
 series = {ICMR '19},
 year = {2019},
 isbn = {978-1-4503-6765-3},
 location = {Ottawa, ON, Canada},
 pages={217--225},
 url = {https://doi.org/10.1145/3323873.3325019},
 doi = {10.1145/3323873.3325019},
 acmid = {3325019},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {Spatial Attention, Language-Temporal Attention, Moment Localization, Cross-modal Video Retrieval},
}

Environment Settings

We use the framework tensorflow.

  • tensorflow version: '1.7.0'
  • python version: '3.6'

Example to run the codes.

Run SLTA:

SLTA.ipynb

After training process, the value of "R@n, IoU=m" in the test dataset will be printed in command window after each optimization iteration.

Output:

IoU=0.1, R@10: 0.59266802444; IoU=0.1, R@5: 0.459703229561; IoU=0.1, R@1: 0.223741635147
IoU=0.3, R@10: 0.41780622636; IoU=0.3, R@5: 0.31859179517; IoU=0.3, R@1: 0.170788478324
IoU=0.5, R@10: 0.262729124236; IoU=0.5, R@5: 0.207739307536; IoU=0.5, R@1: 0.11492580739
IoU=0.7, R@10: 0.149549025313; IoU=0.7, R@5: 0.122490544079; IoU=0.7, R@1: 0.0677916787896
IoU=0.9, R@10: 0.0389874890893; IoU=0.9, R@5: 0.0328775094559; IoU=0.9, R@1: 0.0139656677335

Parameter Tuning

we put all the papameters in the SLTA.ipynb

Dataset

We provide three processed dataset: TACoS, Charades-STA, DiDeMo.

You can download them from the Baidu SkyDrive and password is:

zlpq

Baselines

We put the comparison methods in this website:

https://icmr2019.wixsite.com/slta

cross-modal-video-moment-retrieval-with-spatial-and-language-temporal-attention's People

Contributors

bonniehuangxin avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.