Giter VIP home page Giter VIP logo

multimodal-categorization-of-crisis-events-in-social-media-1's Introduction

CVPR 2020: Multimodal Categorization of Crisis Events in Social Media

This is an unofficial implementation for the CVPR 2020 paper Multimodal Categorization of Crisis Events in Social Media.

Abavisani, Mahdi, et al. "Multimodal categorization of crisis events in social media." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.

To cite the paper:

@inproceedings{abavisani2020multimodal,
  title={Multimodal categorization of crisis events in social media},
  author={Abavisani, Mahdi and Wu, Liwei and Hu, Shengli and Tetreault, Joel and Jaimes, Alejandro},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={14679--14689},
  year={2020}
}

Note

This implementation follows the original paper whenever possible. Due to our urgent need for experiment results, we haven't had time to make it super configurable with clean handlers.

To Run

  • Initialize by running bash setup.sh
  • Run the pipeline with python main.py

Stats

We applied mixed-precision training, so it runs fast on GPUs with tensorcores (e.g. V100). The default configuration consumes about 13GB of GPU memory, and each epoch takes 3 minites on an Amazon g4dn-xlarge instance (with V100 GPU).

Warning: Model is saved for each epoch, which means it consumes 400MB of disk every 3 minutes. Take this into consideration.

Confusions

Equation 4

The authors stated that $$\alpha_{v_i}$$ was completely dependent on $$e_i$$, and $$\alpha_{e_i}$$ was completely dependent on $$\alpha_{v_i}$$, while the equations meant the opposite. The implementation will stick to the text instead of the equations.

Self-Attention in Fully Connected Layers

After obtaining a multimodal representation that incorporates both visual and textual information, the authors used fully-connected layers to perform classification. Here the authors wrote

We add self-attention in the fully-connected networks.

We assumed that they meant 'we added a fully-connected layer as self-attention'.

DenseNet

The authors did not give the size of the DenseNet they used.

Todos

  • Setting num_workers > 1 deadlocks the dataloader.

multimodal-categorization-of-crisis-events-in-social-media-1's People

Contributors

paulcccccch avatar omnyx2 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.