Giter VIP home page Giter VIP logo

gmc's Introduction

Geometric Multimodal Contrastive Representation Learning

Official Implementation of "Geometric Multimodal Contrastive Representation Learning", ICML 2022.

@article{poklukar2022gmc,
  title={Geometric Multimodal Contrastive Representation Learning},
  author={Poklukar, Petra and Vasco, Miguel and Yin, Hang and Melo, Francisco S and Paiva, Ana and Kragic, Danica},
  journal={arXiv preprint arXiv:2202.03390},
  year={2022}
}

Method

Setup/Installation

conda env create -f gmc.yml
conda activate GMC
poetry install

Additionally, to set up the Delaunay Component Analysis evaluation framework by following the instructions on the official repository.

Download Datasets

cd gmc_code/
bash download_unsupervised_dataset.sh
bash download_supervised_dataset.sh
bash download_rl_dataset.sh

Experiments

This repository contains the code to replicate the experiments presented in the paper within the gmc_code folder. In every experiment, please set up the corresponding local machine path in ingredients/machine_ingredients.py file by copying the output of pwd to the ingredient file (e.g. for the unsupervised experiment):

cd unsupervised/
pwd

# Edit unsupervised/ingredients/machine_ingredients.py
@machine_ingredient.config
def machine_config():
    m_path = "copy-output-of-pwd-here"

To replicate the results, download the pretrained models:

cd gmc_code/
bash download_unsupervised_pretrain_models.sh
bash download_supervised_pretrain_models.sh
bash download_rl_pretrain_models.sh

1) Unsupervised Learning (MHD)

- Train Model

echo "** Train GMC"
python main_unsupervised.py -f with experiment.stage="train_model" 

echo "** Train classifier"
python main_unsupervised.py -f with experiment.stage="train_downstream_classfier"

- Evaluate/Replicate Results

echo "** Evaluate GMC - Classification"
python main_unsupervised.py -f with experiment.evaluation_mods=[0,1,2,3] experiment.stage="evaluate_downstream_classifier"

echo "** Evaluate GMC - DCA"
python main_unsupervised.py -f with experiment.stage="evaluate_dca"
  • To evaluate with partial observations, select between [0], [1], [2], [3] in experiment.evaluation_mods;
  • The DCA results are saved in the evaluation/gmc_mhd/log_0/results_dca_evaluation/ folder. For example, geometric alignement of complete and image representations are given in the joint_m1/DCA_results_version0.log file.

2) Supervised Learning (CMU-MOSI/CMU-MOSEI)

- Train Model

echo "** Train representation model"
python main_supervised.py -f with experiment.scenario="mosei" experiment.stage="train_model" 

- Evaluate/Replicate Results

echo "** Evaluate GMC - Classification"
python main_supervised.py -f with experiment.scenario="mosei" experiment.evaluation_mods=[0,1,2] experiment.stage="evaluate_downstream_classifier"

echo "** Evaluate GMC - DCA"
python main_supervised.py -f with experiment.scenario="mosei" experiment.stage="evaluate_dca"
  • You can use CMU-MOSI dataset for both training and evaluation by setting experiment.scenario="mosi";
  • To evaluate with partial observations, select between [0], [1], [2] in experiment.evaluation_mods;
  • The DCA results are saved in the evaluation/gmc_mosei/log_0/results_dca_evaluation/ folder. For example, geometric alignement of complete and text representations are given in the joint_m1/DCA_results_version0.log file.

3) Reinforcement Learning (Multimodal Atari Games)

- Train Model

echo "** Train representation model"
python main_rl.py -f with experiment.stage="train_model" 

echo "** Train controller"
python main_rl.py -f with experiment.stage="train_downstream_controller" 

- Evaluate/Replicate Results

echo "** Evaluate GMC - RL Performance"
python main_rl.py -f with experiment.evaluation_mods=[0,1] experiment.stage="evaluate_downstream_controller"

echo "** Evaluate GMC - DCA"
python main_rl.py -f with experiment.stage="evaluate_dca"
  • To evaluate with partial observations, select between [0], [1] in experiment.evaluation_mods;
  • The DCA results are saved in the evaluation/gmc_pendulum/log_0/results_dca_evaluation/ folder. For example, geometric alignement of complete and text representations are given in the joint_m1/DCA_results_version0.log file.

FAQ

For any additional questions, feel free to email `miguel.vasco[at]tecnico.ulisboa.pt".

gmc's People

Contributors

miguelsvasco avatar petrapoklukar avatar miguel-sony avatar

Stargazers

Tony Davis avatar GLM avatar  avatar Fábio Vital avatar Ifty Mohammad Rezwan avatar Duy Nguyen avatar snoop2head avatar happy678jm avatar Pratinav Seth avatar  avatar Ellen Wang avatar Mingyue Tang avatar Olivia-fsm avatar  avatar TIANYU LOU avatar zeyu li avatar  avatar Devansh Khandekar avatar Levi avatar Yepeng avatar  avatar Changdae avatar Nam Hyeon-Woo avatar  avatar Aashiq Muhamed avatar  avatar

Watchers

 avatar  avatar

gmc's Issues

projection head for modality-specific representations

Thank you for your invaluable contribution to this research! I have a question regarding the implementation details described in the paper.

In the paper, it is stated that the shared projection head 𝑔() maps the intermediate representations of both the modality-specific representations and the joint representations. However, upon reviewing the code at supervised/architectures/models/gmc.py lines 60-61, it appears that the modality-specific representations only pass through the modality-specific encoder, while the joint representations are the only ones that pass through the shared projection head.

This seems to be inconsistent with the paper's description of the geometric alignment of joint representations with each modality-specific representation. Could you please clarify this aspect of the implementation? I would greatly appreciate your insight on this matter.

Thank you very much for your time and assistance.

About the GMC loss function

Dear Authors-- Thanks for sharing the code! I really enjoyed your paper. There is one issue I'm not sure if I understood-- In your paper, the CL loss function seems to be an adapted version of InfoNCE, in that instead of using symmetric construction specified in SimCLR, you used a construction where (for each single modality) each pair augmentations (i.e. $z_{1:M}^i$ and $z_m^i$) is passed as positive only once, and the denominator becomes the sum of all negatives surrounding this pair ( $\sum_{j\neq i}{s_{m, 1:M}(i,j)+s_{m,m}(i,j)+s_{1:M, 1:M}(i,j)}$ , excluding the positive pair), which looks really interesting. However, in the implementation, it looks like the loss is still the same as SimCLR, where the pair augmentation is passed as positive twice and each time the denominator is different, using negative samples corresponding to either augmentation only (i.e. $\sum_{j\neq i}{(s_{m, 1:M}(i,j)+s_{m,m}(i,j))}+s_{m, 1:M}(i,i)$ for the $m$-centric pass and vice versa for the other pass, and including the positive pair in the summation). Did I miss anything?

Thanks for your attention!

ModuleNotFoundError

Traceback (most recent call last): File "/home/tang/git_src/gmc/gmc_code/unsupervised/main_unsupervised.py", line 14, in <module> from gmc_code.unsupervised.utils.general_utils import ( File "/home/tang/git_src/gmc/gmc_code/unsupervised/utils/general_utils.py", line 4, in <module> from gmc_code.unsupervised.modules.trainers.dca_evaluation_trainer import DCAEvaluator File "/home/tang/git_src/gmc/gmc_code/unsupervised/modules/trainers/dca_evaluation_trainer.py", line 6, in <module> from gmc_code.DelaunayComponentAnalysis.schemes import ( ModuleNotFoundError: No module named 'gmc_code.DelaunayComponentAnalysis'

It looks like this file is not in the library: "gmc_code.DelaunayComponentAnalysis"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.