Giter VIP home page Giter VIP logo

taskweave's Introduction

[CVPR 2024 Accepted] Task-Driven Exploration: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection

Task-Driven Exploration: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection

Arxiv

Introduction

This code repo implements TaskWeave in CVPR 2024, the first attempt to explore the task-driven paradigm for joint Moment Retrieval and Highlight Detection. In this paper, we present the first task-driven top-down framework, named TaskWeave. We introduce a task-decoupled unit to capture task-specific and common representations. To further investigate the interactions between these two tasks, we propose an inter-task feedback mechanism. It transforms the results of one task into guiding masks to assist the other task. Lastly, different from existing methods, we present a task-dependent joint loss function to optimize the model. As far as we are aware, this is the first framework to address this joint task from the task-centric perspective. Comprehensive experiments and in-depth ablation studies on QVHighlights, TVSum, and Charades-STA datasets corroborate the effectiveness and flexibility of the proposed framework. pipeline feedbacks

Data Preparation/Installation/More Details

Please refer to MomentDETR for more details. Please refer to UMT for more details. Please refer to QD-DETR for more details.

Training and Evaluation

  • Train(Take QVHighlights as an example)
bash taskweave/scripts/train.sh 
bash taskweave/scripts/train_audio.sh 
  • Evaluation (Take QVHighlights as an example)
bash taskweave/scripts/inference.sh results/{direc}/model_best.ckpt 'val'
bash taskweave/scripts/inference.sh results/{direc}/model_best.ckpt 'test'

References

If you are using our code, please consider citing the following paper.

@inproceedings{yang2024taskweave,
  title={Task-Driven Exploration: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection},
  author={Yang, Jin and Wei, Ping and Li, Huan and Ren, Ziyang}
  booktitle={CVPR},
  year={2024}
}

The implementation of this code is built upon MomentDETR and QD-DETR, and we would like to express gratitude for the open-source contribution of MomentDETR, QD-DETR and UMT.

taskweave's People

Contributors

edengabriel avatar

Stargazers

Marcos Rodrigo avatar  avatar Seojeong Park avatar Yongxin Guo avatar Yiyun Chen avatar hoho avatar yahooo avatar  avatar Hou Xiuquan avatar  avatar af-doom avatar whywhy avatar  avatar

Watchers

Kostas Georgiou avatar  avatar

taskweave's Issues

Asking for the link of extracted features

Thank you for your great work!
The original extracted feature from the Moment-DETR repository is now missing, and the author of Moment-DETR said he had lost that file as well.
Have you save this file? Could you share this file with me? Very much thanks!

Cannot reproduce the product (Both training and inference)

I trained your model based on all your configurations on a single NVIDIA A40, and got the result that is about 1% lower than yours (As shown in eval.log.txt and my_inference.txt).
Moreover, I used the checkpoint you provided in ./results/best_model_mr/model_best.ckpt and found that the evaluation result on val dataset is also about 1% lower than reported (As shown in ckpt_inference.txt). Besides, that result do not match the evaluation result of any epoch in ./results/best_model_mr/eval.log.txt. That makes me confuse a lot...
My environment is Python 3.7.16 with Pytorch 1.11.0+cu113 and Torchvision 0.12.0+cu113. Basic settings are shown in my_inference.txt and ckpt_inference.txt. Could you please help me figure out the reason? Thanks.
eval.log.txt
my_inference.txt
ckpt_inference.txt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.