This repository contains the code of two distinct research projects which are closely related and share much of the same codebase. The second project is and extension to the multimodal domain of the first one.

General Representation Learning through Latent Space Masking and Prediction

Description

We want to generalize the self-distillation learning paradigm so that it applied to any kind of unimodal or fused multimodal data without the need of modality-specific augmentation or masking strategies. Instead we embed the input data into a universal input array and apply a single masking strategy in the latent space instead of the data space. We test this genealized apporach on a multitude of datasets containing text, images, audio and video data.

How to run

#TODO update this section to run with poetry Install dependencies

# clone project
git clone https://github.com/marcomoldovan/multimodal-self-distillation
cd multimodal-self-distillation

# install the correct python version
sudo apt-get install python3.10 # Linux, Python 3.7 or higher
brew install [email protected] #MacOS, Python 3.7 or higher
choco install python --version=3.9 # Windows, Python 3.7-3.9

# create python virtual environment and activate it
python3 -m venv myenv
source myenv/bin/activate

# if you have several version of python you can create a virtual environment with a specific version:
virtualenv --python=/usr/bin/<python3.x> myenv
myenv\Scripts\activate.bat

# [ALTERNATIVE] create conda environment
conda create -n myenv python=<3.x>
conda activate myenv

# install pytorch according to instructions
# https://pytorch.org/get-started/

# install requirements
pip install -r requirements.txt

Train model with default configuration

# train on CPU
python train.py trainer.gpus=0

# train on GPU
python train.py trainer.gpus=1

Train model with chosen experiment configuration from configs/experiment/unimodal

python train.py experiment=unimodal/experiment_name.yaml

You can override any parameter from command line like this

python train.py trainer.max_epochs=20 datamodule.batch_size=64

Self-Supervised Multimodal Alignment with Self-Distillation

Description

We view pairs of multimodal datapoints as augmentations of the same semantic concept and leverage this observation to apply the self-distillation paradigm to the multimodal setting in order to learn a coordinated multimodal representation space. We show that this approach is able to learn a representation space that is more aligned than the one learned by a standard contrastive loss while avoiding the need for negative mining, a cruicial weekness of the contrastive approach.

How to run

Install dependencies

# clone project
git clone https://github.com/marcomoldovan/multimodal-self-distillation
cd multimodal-self-distillation

# install the correct python version
sudo apt-get install python3.10 # Linux, Python 3.7 or higher
brew install [email protected] #MacOS, Python 3.7 or higher
choco install python --version=3.9 # Windows, Python 3.7-3.9

# create python virtual environment and activate it
python3 -m venv myenv
source myenv/bin/activate

# if you have several version of python you can create a virtual environment with a specific version:
virtualenv --python=/usr/bin/<python3.x> myenv
myenv\Scripts\activate.bat

# [ALTERNATIVE] create conda environment
conda create -n myenv python=<3.x>
conda activate myenv

# install pytorch according to instructions
# https://pytorch.org/get-started/

# install requirements
pip install -r requirements.txt

Train model with default configuration

# train on CPU
python train.py trainer.gpus=0

# train on GPU
python train.py trainer.gpus=1

Train model with chosen experiment configuration from configs/experiment/multimodal

python train.py experiment=multimodal/experiment_name.yaml

You can override any parameter from command line like this

python train.py trainer.max_epochs=20 datamodule.batch_size=64

marcomoldovan / multimodal-self-distillation Goto Github PK

multimodal-self-distillation's Introduction

General Representation Learning through Latent Space Masking and Prediction

Description

How to run

Self-Supervised Multimodal Alignment with Self-Distillation

Description

How to run

multimodal-self-distillation's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent