Giter VIP home page Giter VIP logo

vsd's Introduction

Visual Description Description

  • The datasets VSDv2 are available now.

This repository cotains code and data for our paper Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation

** Note ** Please go into VLT5 and follow the README there for Pretrained Models and Feature Extraction.

Setup

# Create python environment (optional)
conda create -n vsd python=3.7
source activate vsd

# Install python dependencies
pip install -r requirements.txt

# For captioning evaluation
python -c "import language_evaluation; language_evaluation.download('coco')"

Code structure

# Store images, features, and annotations
./datasets

# Image feature extraction
./feature_extraction

# Train VL-T5
./VL-T5/
    src/
        modeling_t5.py modeling_bart.py                       <= VL-T5/VL-BART model classes
        caption_sp.py, vrd_caption.py                         <= fine-tuning
        param.py                                              <= (argparse) configuration
        tokenization.py                                       <= custom tokenizer
        utils.py, dist_utils.py                               <= utility functions
    snap/                                                     <= store weight checkpoints

Pretrained Models

  • pretrained VL-BART and VL-T5 are provided by [1]
  • Download snap/ from Google Drive
gdrive download 1_SBj4sZ0gUqfBon1gFBiNRAmfHv5w_ph --recursive

Run

bash ./baseline.sh gpu_num
bash ./end2end.sh gpu_num

Acknowledgement

This repo is adapted from VLT5.

Reference

Please cite our paper if you use our models or data in your project.

@inproceedings{zhao2022vsd,
  title     = {Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text
               Generation},
  author    = {Yu Zhao and
               Jianguo Wei and
               Zhichao Lin and
               Yueheng Sun and
               Meishan Zhang and
               Min Zhang},
  booktitle = {EMNLP},
  year      = {2022}
}

vsd's People

Contributors

zhaoyucs avatar

Stargazers

Shengqiong Wu avatar  avatar Rui avatar  avatar Meredith avatar Jeff Carpenter avatar  avatar Yuxiang Nie avatar  avatar Jaime Brooks avatar Hao Lu avatar  avatar Zhao Zhang avatar lincent avatar xieyusheng avatar hcwei avatar cao zhiqun avatar Wu Pengying avatar Grandzxw avatar Jingnan Gao avatar irem.eyiokur avatar Dogucan Yaman avatar  avatar

Watchers

 avatar

vsd's Issues

Execution steps

Hello, we like this paper and we are trying to reproduce the results of this paper. Can you provide us with the steps to execute the code.

no Q

Where can I get the datasets?

Where can I find vrd_boxes36.h5?

Hi!

Thank you for making this interesting research study and making your code public! We are trying to reproduce the results as a part of our school project assignment, and I have looked around and was able to find most files needed. However,

  1. I'm unable to find any vrd_boxes36.h5 to put in sp3000. Where can I download this file, or is there a way for me to produce this file myself from Spatialsense?
  2. I assumed the train/test/val.json files that needs to be placed under the spall folder are the 3 JSON files in https://github.com/zhaoyucs/VSD/tree/master/dataset/VSDv2. Is this assumption correct?

Looking forward to your reply soon. You can also reach me at [email protected] or on WeChat, if that's more convenient for you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.