Giter VIP home page Giter VIP logo

shamanez / self-supervised-embedding-fusion-transformer Goto Github PK

View Code? Open in Web Editor NEW
106.0 2.0 23.0 4.77 MB

The code for our IEEE ACCESS (2020) paper Multimodal Emotion Recognition with Transformer-Based Self Supervised Feature Fusion.

License: MIT License

Python 97.29% Jupyter Notebook 0.11% C++ 0.48% Cuda 1.76% Shell 0.07% Lua 0.29%
emotion-recognition self-supervised-learning bert multimodal-deep-learning multimodal-sentiment-analysis multimodal-emotion-recognition

self-supervised-embedding-fusion-transformer's Introduction

Model Overviw

Please replace the Table 6 in the paper

Please replace the Table 6 of the paper with this table.

Basic strucutre of the code

Inspiration from fairseq

  1. This code strcuture is built on top of Faiseq interface
  2. Fairseq is an open source project by FacebookAI team that combined different SOTA architectures for sequencial data processing
  3. This also consist of SOTA optimizing mechanisms such as ealry stopage, warup learnign rates, learning rate shedulers
  4. We are trying to develop our own architecture in compatible with fairseq interface.
  5. For more understanding please read the paper published about Fairseq interaface.

Merging of our own architecture with Fairseq interface

  1. This can be bit tricky in the beggining. First it is important to udnestand that Fairseq has built in a way that all architectures can be access through the terminal commands (args).

  2. Since our architecture has lot of properties in tranformer architecture, we followed the a tutorial that describe to use Roberta for the custom classification task.

  3. We build over archtiecture by inserting new stuff to following directories in Fairseq interfeace.

    • fairseq/data
    • fairseq/models
    • fairseq/modules
    • fairseq/tasks
    • fairseq/criterions

Main scripts of the code

Our main scripts are categorized in to for parts

  1. Custom dataloader for load raw audio, faceframes and text is in the fairseq/data/raw_audio_text_video_dataset.py

  2. The task of the emotion prediction similar to other tasks such as translation is in the fairseq/tasks/emotion_prediction.py

  3. The custom architecture of our model similar to roberta,wav2vec is in the fairseq/models/mulT_emo.py

  4. To obtain Inter-Modal attention we modify the self attentional architecture a bit. They can be found in fairseq/modules/transformer_multi_encoder.py and fairseq/modules/transformer_layer.py

  5. Finally the cutom loss function scripts cab be found it fairseq/criterions/emotion_prediction_cri.py

Prerequest models

Our model uses pretrained SSL methods to extract features. It is important to download those checkpoints prior to the trainig procedure. Please you the following links to downlaod the pretrained SSL models.

  1. For audio fetures - wav2vec
  2. For facial features - Fabnet
  3. For sentence (text) features - Roberta

Training Command

python train.py --data ./T_data-old/mosei_sent --restore-file None --task emotion_prediction --reset-optimizer --reset-dataloader --reset-meters --init-token 0 --separator-token 2 --arch robertEMO_large --criterion emotion_prediction_cri --num-classes 1 --dropout 0.1 --attention-dropout 0.1 --weight-decay 0.1 --optimizer adam --adam-betas "(0.9, 0.98)" --adam-eps 1e-06 --clip-norm 0.0 --lr 1e-03 --max-epoch 32 --best-checkpoint-metric loss --encoder-layers 2 --encoder-attention-heads 4 --max-sample-size 150000 --max-tokens 150000000 --batch-size 4 --encoder-layers-cross 2 --max-positions-t 512 --max-positions-a 936 --max-positions-v 301 --no-epoch-checkpoints --update-freq 2 --find-unused-parameters --ddp-backend=no_c10d --lr-scheduler reduce_lr_on_plateau --regression-target-mos

Validation Command

CUDA_VISIBLE_DEVICES=1 python validate.py --data ./T_data/emocap --path './checkpoints/checkpoint_best.pt' --task emotion_prediction --valid-subset test --batch-size 4

self-supervised-embedding-fusion-transformer's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

self-supervised-embedding-fusion-transformer's Issues

Dataset for IEMOCAP

I found that IEMOCAP dataset had been splited and renamed, can you upload the segmented data set to the cloud disk?

Reproduction on MOSI/MOSEI

Dear Authors,
Can I have the two command lines to train on MOSI and MOSEI sentiment, as far as I know, the one given in README does not have all the parameters mentioned in the paper.
Thank you!

Training process for this project

Hi, siriwardhana. Could you introduce this project more explicitly including pre-process for raw, label assignment and division into three parts(train, valid, test). Thanks

2-class problem

Is 2-class the following code in fairseq/data/raw_audio_text_video_dataset.py?
If so, why are the sentiment_score 0 and 1?

            self.emotion_dictionary = {   #modei senti 2 class
            
                '0': 0,
                '1':1
            }

Replication issues

Dear Authors,

Thank you for sharing your software with us. I am trying to replicate your results but I am having the following issues/comments:

  1. Am I correct to use those pre-processed files from your other Repo (BERT-like-is-All-You-Need)? link . If not, can you share the pre-processed data or how to preprocess it from the raw one?
  2. If the first step is correct, I then used mosi_data as the path to the raw dataset by modifying raw_audio_text_video_dataset.py to load directly the data path root (I did not find any command arguments to input that). So far, this works for mosi
  3. training runs but sadly, the loss does not decrease, I'm using a batch size of 16 for mosi and I have 64GB of GPU memory (4*16)

Is there something I am missing? I'd appreciate your help.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.