The vsd from zhaoyucs

vsd's Introduction

Visual Description Description

The datasets VSDv2 are available now.

This repository cotains code and data for our paper Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation

** Note ** Please go into VLT5 and follow the README there for Pretrained Models and Feature Extraction.

Setup

# Create python environment (optional)
conda create -n vsd python=3.7
source activate vsd

# Install python dependencies
pip install -r requirements.txt

# For captioning evaluation
python -c "import language_evaluation; language_evaluation.download('coco')"

Code structure

# Store images, features, and annotations
./datasets

# Image feature extraction
./feature_extraction

# Train VL-T5
./VL-T5/
    src/
        modeling_t5.py modeling_bart.py                       <= VL-T5/VL-BART model classes
        caption_sp.py, vrd_caption.py                         <= fine-tuning
        param.py                                              <= (argparse) configuration
        tokenization.py                                       <= custom tokenizer
        utils.py, dist_utils.py                               <= utility functions
    snap/                                                     <= store weight checkpoints

Pretrained Models

pretrained VL-BART and VL-T5 are provided by [1]
Download snap/ from Google Drive

gdrive download 1_SBj4sZ0gUqfBon1gFBiNRAmfHv5w_ph --recursive

Run

bash ./baseline.sh gpu_num
bash ./end2end.sh gpu_num

Acknowledgement

This repo is adapted from VLT5.

Reference

Please cite our paper if you use our models or data in your project.

@inproceedings{zhao2022vsd,
  title     = {Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text
               Generation},
  author    = {Yu Zhao and
               Jianguo Wei and
               Zhichao Lin and
               Yueheng Sun and
               Meishan Zhang and
               Min Zhang},
  booktitle = {EMNLP},
  year      = {2022}
}

vsd's People

Contributors

Stargazers

Watchers

vsd's Issues

Execution steps

Hello, we like this paper and we are trying to reproduce the results of this paper. Can you provide us with the steps to execute the code.

Where can I find vrd_boxes36.h5?

Hi!

Thank you for making this interesting research study and making your code public! We are trying to reproduce the results as a part of our school project assignment, and I have looked around and was able to find most files needed. However,

I'm unable to find any vrd_boxes36.h5 to put in sp3000. Where can I download this file, or is there a way for me to produce this file myself from Spatialsense?
I assumed the train/test/val.json files that needs to be placed under the spall folder are the 3 JSON files in https://github.com/zhaoyucs/VSD/tree/master/dataset/VSDv2. Is this assumption correct?

Looking forward to your reply soon. You can also reach me at [email protected] or on WeChat, if that's more convenient for you.

Recommend Projects

zhaoyucs / vsd Goto Github PK