Giter VIP home page Giter VIP logo

dchmt's Introduction

Differentiable Cross Modal Hashing via Multimodal Transformers paper

This project has been moved to clip-based-cross-modal-hash

Framework

The main architecture of our method. framework

We propose a selecting mechanism to generate hash code that will transfor the discrete space into a continuous space. Hash code will be encoded as a seires of $2D$ vectors. hash

Dependencies

We use python to build our code, you need to install those package to run

  • pytorch 1.9.1
  • sklearn
  • tqdm
  • pillow

Training

Processing dataset

Before training, you need to download the oringal data from coco(include 2017 train,val and annotations), nuswide(include all), mirflickr25k(include mirflickr25k and mirflickr25k_annotations_v080), then use the "data/make_XXX.py" to generate .mat file

For example:

cd COCO_DIR # include train val images and annotations files

mkdir mat

cp DCMHT/data/make_coco.py mat

python make_coco.py --coco-dir ../ --save-dir ./

After all mat file generated, the dir of dataset will like this:

dataset
├── base.py
├── __init__.py
├── dataloader.py
├── coco
│   ├── caption.mat 
│   ├── index.mat
│   └── label.mat 
├── flickr25k
│   ├── caption.mat
│   ├── index.mat
│   └── label.mat
└── nuswide
    ├── caption.txt  # Notice! It is a txt file!
    ├── index.mat 
    └── label.mat

Download CLIP pretrained model

Pretrained model will be found in the 30 lines of CLIP/clip/clip.py. This code is based on the "ViT-B/32".

You should copy ViT-B-32.pt to this dir.

Start

After the dataset has been prepared, we could run the follow command to train.

python main.py --is-train --hash-layer select --dataset coco --caption-file caption.mat --index-file index.mat --label-file label.mat --similarity-function euclidean --loss-type l2 --vartheta 0.75 --lr 0.0001 --output-dim 64 --save-dir ./result/coco/64 --clip-path ./ViT-B-32.pt --batch-size 256

Result

result

Citation

inproceedings{10.1145/3503161.3548187,
author = {Tu, Junfeng and Liu, Xueliang and Lin, Zongxiang and Hong, Richang and Wang, Meng},
title = {Differentiable Cross-Modal Hashing via Multimodal Transformers},
year = {2022},
booktitle = {Proceedings of the 30th ACM International Conference on Multimedia},
pages = {453–461},
numpages = {9},
}

Acknowledegements

CLIP

SSAH

GCH

AGAH

DADH

deep-cross-modal-hashing

Apologize:

2023/03/01

I find figure 1 with the wrong formula for the vartheta, the right one is the function (10). It has been published, so I can't fix it.

dchmt's People

Contributors

kalenforn avatar

Stargazers

 avatar  avatar Longan Wang avatar  avatar  avatar  avatar je avatar shuaichaochao avatar Flaming avatar  avatar TuTu avatar  avatar

Watchers

Kostas Georgiou avatar  avatar

dchmt's Issues

PR curve

Hello, I have a question, which method of your code you use to draw the PR curve, is it 'calc _ precisions _ hash _ my ', and what is your Gnd parameter

有关Flicker和NUSWild数据集超参的问询。

作者您好,感谢您愿意共享这篇发表在MM上的论文代码。我在follow您github主页上给出的步骤复现论文结果的时候遇到了一个问题,特来请教。

  1. 当我直接用您给的运行coco数据集的命令去跑flicker数据集时(python main.py --is-train --hash-layer select --dataset flicker25k --caption-file caption.mat --index-file index.mat --label-file label.mat --similarity-function euclidean --loss-type l2 --vartheta 0.75 --lr 0.0001 --output-dim 64 --save-dir ./result/flicker25k/64 --clip-path ./ViT-B-32.pt --batch-size 256),loss不收敛,猜测是每个数据集应设置不同的超参,因此想问下您flicker和nuswild数据集的超参设置是什么呀?
  2. 论文中您说所有的实验都在一块GV100上实现,请问可否告知一下在各个数据集上训练和推理所用的时间?
    期待您的回复,祝您生活愉快!

question

Is the parameter “pretrained” the same as “clip-path”. If not, what should “pretrained” be set to?

problem

How to specify self.args.pretrained,thanks.

bits parameter

Hello, I would like to ask about the bits parameter, I tried to modify the output _ dim to 32, why the final map is the same as the output _ dim to 64

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.