Giter VIP home page Giter VIP logo

m3tr's Introduction

M3TR

Pytorch implementation of M3TR: Multi-modal Multi-label Recognition with Transformer. ACM MM 2021

M3TR

Prerequisites

Python 3.6+

Pytorch 1.7

CUDA 10.1

Tesla V100 × 4

Datasets

Train

python main.py  --data COCO2014 --data_root_dir $DATA_PATH$ --save_dir $SAVE_PATH$ --i 448  --lr 3e-4 -b 64

Test

python main.py  --data COCO2014 --data_root_dir $DATA_PATH$ --save_dir $SAVE_PATH$ --i 448  --lr 3e-4 -b 64 -e --resume checkpoint/COCO2014/checkpoint_COCO.pth

Citation

  • If you find this work is helpful, please cite our paper
@inproceedings{Zhao2021M3TR,
author = {Zhao, Jiawei and Zhao, Yifan and Li, Jia},
title = {M3TR: Multi-Modal Multi-Label Recognition with Transformer},
year = {2021},
address = {New York, NY, USA},
booktitle = {Proceedings of the 29th ACM International Conference on Multimedia},
pages = {469–477},
}

m3tr's People

Contributors

jiaweizhao-git avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

m3tr's Issues

关于Glove词向量复现的问题

Traceback (most recent call last):
File "main.py", line 73, in
main(args)
File "main.py", line 59, in main
model = get_model(num_classes, args)
File "/media/omnisky/data/wr/M3TR-master/models/init.py", line 12, in get_model
model = model_dict[args.model_name](vit, res101, num_classes)
File "/media/omnisky/data/wr/M3TR-master/models/M3TR.py", line 55, in init
self.sem_embedding = self.get_word_embedding(self.num_classes).detach()
File "/media/omnisky/data/wr/M3TR-master/models/M3TR.py", line 81, in get_word_embedding
loaded = torch.load(embedding_path)
File "/home/omnisky/anaconda2/envs/faster/lib/python3.7/site-packages/torch/serialization.py", line 595, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/omnisky/anaconda2/envs/faster/lib/python3.7/site-packages/torch/serialization.py", line 766, in _legacy_load
if magic_number != MAGIC_NUMBER:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

您好,我使用配好的环境能够成功复现您初始代码,但是将M3TR.py中#embedding_path = './Bert1/voc_embeddings.pt' 和 #embedding_path = './Bert1/coco_embeddings.pt'分别改为embedding_path = './Glove/voc_glove_word2vec.pkl'及embedding_path = './Glove/coco_glove_word2vec.pkl'之后,报上面的错误,请问可以指导一下如何修改吗?

Inference Results

Hello, I think I am having weird results like mAP=1.0 after the first epoch and wanted to be sure if the model works so well on my custom dataset. But I don't know how to analyse the outputs coming from the model. Apart from outputs1, outputs2, out3 variables, only outputs is returned as model final predictions; and all the values are negative. Since its sigmoid I was thinking to print the labels having > 0 prediction, but it doesn't seem to work because everything is negative. How should we use the
outputs = self.on_forward(inputs, targets, is_train=is_train)
line in trainer.py to print or plot the predicted labels for the given image?

I add an example screenshot to show the output vector I get for batch size = 4 (while mAP is always 1.0)

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.