katerynach / multimodal-emotion-recognition Goto Github PK

This repository provides implementation for the paper "Self-attention fusion for audiovisual emotion recognition with incomplete data".

License: MIT License

Python 100.00%

multimodal-emotion-recognition's People

Stargazers

Watchers

multimodal-emotion-recognition's Issues

Video_Speech_Actor_[01-24].zip doesn't exist

Hello! I'm here just to report that in the page that you linked (https://zenodo.org/records/1188976#.YkgJVijP2bh) the Video_Speech_Actor_[01-24].zip file doesn't exist (anymore?)

CMU-MOSEI

Could you share the settings related to the operation of the CUM-MOSEI dataset in the paper?

RAVDESS_multimodalcnn_15_best0.pth

Excuse me, would it be possible for you to kindly provide me with the file RAVDESS_multimodalcnn_15_best0.pth? I would be extremely grateful for your assistance.

detection.py

Hello, there is no recognition detection.py file in the code, I want to test the recognition results of the following weight file, do you plan to open the code detection.py file later

error

In accordance with the code of the paper ,I reproduced the experimental results to achieve 92 accuracy.

File "/content/drive/MyDrive/multimodal/multimodal-emotion-recognition-main/ravdess.py", line 25, in load_audio
audios = librosa.core.load(audiofile, sr)
TypeError: load() takes 1 positional argument but 2 were given

issue on this part of the code,
def load_audio(audiofile, sr):
audios = librosa.core.load(audiofile, sr)
y = audios[0]
return y, sr

layer size mismatch while combining models

hey
i wanted to add attention layer of ia model to it model just before the pooling layer i.e between the last convo1d layer and pooling layer. And i am making changes in forward_feature_3 . But there is some issue

CODE:
def forward_feature_3(self, x_audio, x_visual):
x_audio = self.audio_model.forward_stage1(x_audio)
x_visual = self.visual_model.forward_features(x_visual)
x_visual = self.visual_model.forward_stage1(x_visual)

    proj_x_a = x_audio.permute(0,2,1)
    proj_x_v = x_visual.permute(0,2,1)

    h_av = self.av1(proj_x_v, proj_x_a)
    h_va = self.va1(proj_x_a, proj_x_v)
    
    h_av = h_av.permute(0,2,1)
    h_va = h_va.permute(0,2,1)
    
    x_audio = h_av+x_audio
    x_visual = h_va + x_visual

    x_audio = self.audio_model.forward_stage2(x_audio)       
    x_visual = self.visual_model.forward_stage2(x_visual)

    proj_x_a = x_audio
    proj_x_v = x_visual

    h_av_new = self.av_new(proj_x_v, proj_x_a)
    h_va_new = self.va_new(proj_x_a, proj_x_v)
    
    if h_av_new.size(1) > 1: #if more than 1 head, take average
        h_av_new = torch.mean(h_av_new, axis=1).unsqueeze(1)
   
    h_av_new = h_av_new.sum([-2])

    if h_va_new.size(1) > 1: #if more than 1 head, take average
        h_va_new = torch.mean(h_va_new, axis=1).unsqueeze(1)

    h_va_new = h_va_new.sum([-2])

    x_audio = h_va_new*x_audio
    x_visual = h_av_new*x_visual
    
    audio_pooled = x_audio.mean([-1]) #mean accross temporal dimension
    video_pooled = x_visual.mean([-1])

    x = torch.cat((audio_pooled, video_pooled), dim=-1)
    x1 = self.classifier_1(x)
    return x1

ERROR:

train_epoch(i, train_loader, model, criterion, optimizer, opt,

File "C:\Users\HP pav\Desktop\Capstone\multimodal-emotion-recognition\train.py", line 119, in train_epoch
train_epoch_multimodal(epoch, data_loader, model, criterion, optimizer, opt, epoch_logger, batch_logger)
File "C:\Users\HP pav\Desktop\Capstone\multimodal-emotion-recognition\train.py", line 64, in train_epoch_multimodal
outputs = model(audio_inputs, visual_inputs)
File "C:\Python310\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Python310\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\HP pav\Desktop\Capstone\multimodal-emotion-recognition\models\multimodalcnn2.py", line 208, in forward
return self.forward_feature_3(x_audio, x_visual)
File "C:\Users\HP pav\Desktop\Capstone\multimodal-emotion-recognition\models\multimodalcnn2.py", line 235, in forward_feature_3
h_av_new = self.av_new(proj_x_v, proj_x_a)
File "C:\Python310\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Python310\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\HP pav\Desktop\Capstone\multimodal-emotion-recognition\models\transformer_timm.py", line 88, in forward
q = self.q(x_q).reshape(B, Nq, 1, self.num_heads, -1).permute(2, 0, 3, 1, 4)
File "C:\Python310\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Python310\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Python310\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (4096x144 and 128x128)

Model Checkpoint link is down

Hello, the pretrained model checkpoints link on the README seems to be down.
Could you please add an updated link?

Thank you and congrats for the wonderful work

An error occurred in the repetition test

Hello, your data processing file on the CMU-MOSEI dataset is not up. Moreover, when Ravdess was tested, there was no index test. Are there any omissions in your public documents?Also, based on your existing code, run out the results. Are you having problems getting Ravdess model results? prec5 can be greater than 100%. :(

There were some problems during training

Why does top1 have very low accuracy ？

inference using main file

Initializing efficientnet
Traceback (most recent call last):
File "D:\emotion_recongintion\multimodal-emotion-recognition-main\multimodal-emotion-recognition-main\main.py", line 158, in
test_data = get_test_set(opt, spatial_transform=video_transform)
File "D:\emotion_recongintion\multimodal-emotion-recognition-main\multimodal-emotion-recognition-main\dataset.py", line 34, in get_test_set
test_data = RAVDESS(
File "D:\emotion_recongintion\multimodal-emotion-recognition-main\multimodal-emotion-recognition-main\ravdess.py", line 56, in init
self.data = make_dataset(subset, annotation_path)
File "D:\emotion_recongintion\multimodal-emotion-recognition-main\multimodal-emotion-recognition-main\ravdess.py", line 34, in make_dataset
with open(annotation_path, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'ravdess_preprocessing/annotations.txt'

please share the sample data with annotation file, i ravedess website dont give annotation data. please do let me know the format of annotation file required

Ask for help when loaded the train model

Hi, first, thank you for your work. I found the following problems when I loaded your train model. I guess that the train model provided did not match the original code. How to solve this problem？
$F6 RI44N4{X7TE_WA` BN6G$

Hello, I want to reproduce your code. Can you provide more details in the data preparation section？

5折交叉

你好，源代码是在1折交叉，如果我想进行5折交叉，需要更改哪些地方呢

_pickle.UnpicklingError: could not find MARK

when I run "python main.py", there are errors to me.

Traceback (most recent call last):
File "/home/wangjl/multimodal-emotion-recognition/main.py", line 50, in
model, parameters = generate_model(opt)
File "/home/wangjl/multimodal-emotion-recognition/model.py", line 13, in generate_model
model = multimodalcnn.MultiModalCNN(opt.n_classes, fusion = opt.fusion, seq_length = opt.sample_duration, pretr_ef=opt.pretrain_path, num_heads=opt.num_heads)
File "/home/wangjl/multimodal-emotion-recognition/models/multimodalcnn.py", line 171, in init
init_feature_extractor(self.visual_model, pretr_ef)
File "/home/wangjl/multimodal-emotion-recognition/models/multimodalcnn.py", line 107, in init_feature_extractor
checkpoint = torch.load(f, map_location=torch.device('cpu'))
File "/mnt/raid1/dataMapping/NLP/env/MER/lib/python3.9/site-packages/torch/serialization.py", line 608, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/mnt/raid1/dataMapping/NLP/env/MER/lib/python3.9/site-packages/torch/serialization.py", line 777, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: could not find MARK

The pretrained model checkpoints link on the README seems to be down

Hello, the pretrained model checkpoints link on the README seems to be down.
Could you please add an updated link?

Thank you and congrats for the wonderful work

Thank you for your work

Hello, Dear Author

I reorganized your code to fit my needs, and open sourced the code, you can click here to visit it. To thank your work, I mentioned you in the README.md. This is my first time to do such a thing and I don't know if that's okay, so I'd like to get your permission.

Looking forward to hearing from you
Best regards.

Request for Trained model on Ravdess dataset

We are referencing your paper "Self-attention fusion for audiovisual emotion recognition with incomplete data, ICPR 2022" in our research.

And, we want to apply facial expression recognition with code shared at "https://github.com/katerynaCh/multimodal-emotion-recognition".

Can you share the trained model on Ravdess dataset?

It will be of great help to our research.

I look forward to hearing from you.

predicted class

how do i know the predicted class?
i am getting the precision but i couldn't able to interpret the predicted class from pretrained weights..

Training time

Hey, how long will it take to complete the training?
i know it takes long to train but can u give an approximate.
I've been training for almost 5 hours and have reached only 3/100 epoch. Am I missing something or doing something wrong?
I am student and new to the domain, if you could pls help.

katerynach / multimodal-emotion-recognition Goto Github PK

multimodal-emotion-recognition's People

Stargazers

Watchers

Forkers

multimodal-emotion-recognition's Issues

Recommend Projects

Recommend Topics

Recommend Org