Giter VIP home page Giter VIP logo

djsrh's Introduction

DJSRH


This repository is for "Deep Joint-Semantics Reconstructing Hashing for Large-Scale Unsupervised Cross-Modal Retrieval"

(to appear in ICCV 2019, Oral)

By Shupeng Su*, Zhisheng Zhong*, Chao Zhang (* Authors contributed equally).


Table of contents


Introduction

Cross-modal hashing encodes the multimedia data into a common binary hash space in which the correlations among the samples from different modalities can be effectively measured. Deep cross-modal hashing further improves the retrieval performance as the deep neural networks can generate more semantic relevant features and hash codes. In this paper, we study the unsupervised deep cross-modal hash coding and propose Deep JointSemantics Reconstructing Hashing (DJSRH), which has the following two main advantages. First, to learn binary codes that preserve the neighborhood structure of the original data, DJSRH constructs a novel joint-semantics affinity matrix which elaborately integrates the original neighborhood information from different modalities and accordingly is capable to capture the latent intrinsic semantic affinity for the input multi-modal instances. Second, DJSRH later trains the networks to generate binary codes that maximally reconstruct above joint-semantics relations via the proposed reconstructing framework, which is more competent for the batch-wise training as it reconstructs the specific similarity value unlike the common Laplacian constraint merely preserving the similarity order. Extensive experiments demonstrate the significant improvement by DJSRH in various cross-modal retrieval tasks.


Usage

Requirements

  • python == 2.7.x
  • pytorch == 0.3.1
  • torchvision
  • CV2
  • PIL
  • h5py

Datasets

For datasets, we follow Deep Cross-Modal Hashing's Github (Jiang, CVPR 2017). You can download these datasets from:

Process

The following experiment results are the average values, if you demand for better results, please run the experiment a few more times (2~5).

  • Clone this repo: git clone https://github.com/zzs1994/DJSRH.git.
  • Change the 'DATASET_DIR' in settings.py to where you place the datasets.
  • An example to train a model:
python train.py
  • Modify the parameter EVAL = True in settings.py for validation.
  • Ablation studies (optional): if you want to evaluate other components of our DJSRH, please refer to our paper and settings.py.

Ablation studies

Table 1. The mAP@50 results on NUS-WIDE to evaluate the effectiveness of each component in DJSRH.

Model Configuration 64bits (I→T) 64bits (T→I) 128bits (I→T) 128bits (T→I)
DJSRH-1 S=SI 0.717 0.712 0.741 0.735
DJSRH-2 S=ST 0.702 0.606 0.734 0.581
DJSRH-3 βSI+(1−β)ST 0.724 0.720 0.747 0.738
DJSRH-4 +(η=0.4) 0.790 0.745 0.803 0.757
DJSRH-5 +(μ=1.5) 0.793 0.747 0.812 0.768
DJSRH +(λ12=0.1) 0.798 0.771 0.817 0.789
DJSRH-6 −(α=1) 0.786 0.770 0.811 0.782

From the table we can observe that each of our proposed components plays a certain role for our final results.


Comparisons with SOTAs

Table 2. The mAP@50 results on image query text (I→T) and text query image (T→I) retrieval tasks at various encoding lengths and datasets. The best performances are shown as Red while the suboptimal as Blue.

Figure 1. The precision@top-R curves on different datasets at 128 encoding length.


Citation

If you find this code useful, please cite our paper:

@inproceedings{su2019deep,
	title={Deep Joint-Semantics Reconstructing Hashing for Large-Scale Unsupervised Cross-Modal Retrieval},
	author={Shupeng Su, Zhisheng Zhong, Chao Zhang},
	booktitle={International Conference on Computer Vision},
	year={2019}
}

All rights are reserved by the authors.


djsrh's People

Contributors

zs-zhong avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.