Giter VIP home page Giter VIP logo

dsve-loc's Introduction

Deep semantic-visual embedding with localization

Training and evalutation code for the paper Finding beans in burgers: Deep semantic-visual embedding with localization

This code allow training of new model, reproduction of experiments, as well as features extraction for both images and texts.

Author and contact: Martin Engilberge

Main dependencies

This code is written in python. To use it you will need:

  • Python 3.7
  • Pytorch 1.0
  • SRU[cuda]
  • Numpy
  • Scipy
  • Torchvision
  • Ms Coco API (pycocotools)
  • Visual Genome API
  • NLTK
  • opencv

An environment file for conda is available in the repository (environment.yml).

Getting started

You will first need to set the paths to the datasets and word embedding in the file misc/config.py Commentaries in the config file contains links where you can download the data.

To train and run model you will need:

To reproduce experiments in the paper:

Once the required paths have been set in the config file you can start training models using the following command:

python train.py

By default all the scripts use gpu, you can switch to cpu mode by uncommenting device = torch.device("cpu") at the beginning of the script.

Model evaluation

Models can be evaluated on three tasks:

  • cross modal retrieval:
python eval_retrieval.py -p "path/to/model/model.pth.tar" -te
  • pointing game:
python pointing_game.py -p "path/to/model/model.pth.tar"
  • semantic segmentation:
python semantic_seg.py -p "path/to/model/model.pth.tar"

Features extraction

The features space produced by the joint embedding manages to capture semantic property. Two scripts can be used to extract feature from that space for images and texts.

For images the script takes a folder as input and produces the embedding representation for all the jpeg images in the folder.

python image_features_extraction.py -p "path/to/model/model.pth.tar" -d "path/to/image/folder/" -o "path/to/output/file"

For text the script takes a text file and produces the embedding representation for each line.

python text_features_extraction.py -p "path/to/model/model.pth.tar" -d "path/to/text/file/" -o "path/to/output/file"

Reference

If you found this code useful, please cite the following paper:

@inproceedings{engilberge2018finding,
  title={Finding beans in burgers: Deep semantic-visual embedding with localization},
  author={Engilberge, Martin and Chevallier, Louis and P{\'e}rez, Patrick and Cord, Matthieu},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={3984--3993},
  year={2018}
}

License

by downloading this program, you commit to comply with the license as stated in the LICENSE.md file.

dsve-loc's People

Contributors

tchadmin avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.