Giter VIP home page Giter VIP logo

multimodal-emotion-recognition's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

multimodal-emotion-recognition's Issues

CMU-MOSEI

Could you share the settings related to the operation of the CUM-MOSEI dataset in the paper?

RAVDESS_multimodalcnn_15_best0.pth

Excuse me, would it be possible for you to kindly provide me with the file RAVDESS_multimodalcnn_15_best0.pth? I would be extremely grateful for your assistance.

detection.py

Hello, there is no recognition detection.py file in the code, I want to test the recognition results of the following weight file, do you plan to open the code detection.py file later

error

In accordance with the code of the paper ,I reproduced the experimental results to achieve 92 accuracy.

ravdess.py error

File "/content/drive/MyDrive/multimodal/multimodal-emotion-recognition-main/ravdess.py", line 25, in load_audio
audios = librosa.core.load(audiofile, sr)
TypeError: load() takes 1 positional argument but 2 were given

issue on this part of the code,
def load_audio(audiofile, sr):
audios = librosa.core.load(audiofile, sr)
y = audios[0]
return y, sr

layer size mismatch while combining models

hey
i wanted to add attention layer of ia model to it model just before the pooling layer i.e between the last convo1d layer and pooling layer. And i am making changes in forward_feature_3 . But there is some issue

CODE:
def forward_feature_3(self, x_audio, x_visual):
x_audio = self.audio_model.forward_stage1(x_audio)
x_visual = self.visual_model.forward_features(x_visual)
x_visual = self.visual_model.forward_stage1(x_visual)

    proj_x_a = x_audio.permute(0,2,1)
    proj_x_v = x_visual.permute(0,2,1)

    h_av = self.av1(proj_x_v, proj_x_a)
    h_va = self.va1(proj_x_a, proj_x_v)
    
    h_av = h_av.permute(0,2,1)
    h_va = h_va.permute(0,2,1)
    
    x_audio = h_av+x_audio
    x_visual = h_va + x_visual

    x_audio = self.audio_model.forward_stage2(x_audio)       
    x_visual = self.visual_model.forward_stage2(x_visual)

    proj_x_a = x_audio
    proj_x_v = x_visual

    h_av_new = self.av_new(proj_x_v, proj_x_a)
    h_va_new = self.va_new(proj_x_a, proj_x_v)
    
    if h_av_new.size(1) > 1: #if more than 1 head, take average
        h_av_new = torch.mean(h_av_new, axis=1).unsqueeze(1)
   
    h_av_new = h_av_new.sum([-2])

    if h_va_new.size(1) > 1: #if more than 1 head, take average
        h_va_new = torch.mean(h_va_new, axis=1).unsqueeze(1)

    h_va_new = h_va_new.sum([-2])

    x_audio = h_va_new*x_audio
    x_visual = h_av_new*x_visual
    
    audio_pooled = x_audio.mean([-1]) #mean accross temporal dimension
    video_pooled = x_visual.mean([-1])

    x = torch.cat((audio_pooled, video_pooled), dim=-1)
    x1 = self.classifier_1(x)
    return x1

ERROR:

train_epoch(i, train_loader, model, criterion, optimizer, opt,

File "C:\Users\HP pav\Desktop\Capstone\multimodal-emotion-recognition\train.py", line 119, in train_epoch
train_epoch_multimodal(epoch, data_loader, model, criterion, optimizer, opt, epoch_logger, batch_logger)
File "C:\Users\HP pav\Desktop\Capstone\multimodal-emotion-recognition\train.py", line 64, in train_epoch_multimodal
outputs = model(audio_inputs, visual_inputs)
File "C:\Python310\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Python310\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\HP pav\Desktop\Capstone\multimodal-emotion-recognition\models\multimodalcnn2.py", line 208, in forward
return self.forward_feature_3(x_audio, x_visual)
File "C:\Users\HP pav\Desktop\Capstone\multimodal-emotion-recognition\models\multimodalcnn2.py", line 235, in forward_feature_3
h_av_new = self.av_new(proj_x_v, proj_x_a)
File "C:\Python310\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Python310\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\HP pav\Desktop\Capstone\multimodal-emotion-recognition\models\transformer_timm.py", line 88, in forward
q = self.q(x_q).reshape(B, Nq, 1, self.num_heads, -1).permute(2, 0, 3, 1, 4)
File "C:\Python310\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Python310\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Python310\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (4096x144 and 128x128)

Model Checkpoint link is down

Hello, the pretrained model checkpoints link on the README seems to be down.
Could you please add an updated link?

Thank you and congrats for the wonderful work

An error occurred in the repetition test

Hello, your data processing file on the CMU-MOSEI dataset is not up. Moreover, when Ravdess was tested, there was no index test. Are there any omissions in your public documents?Also, based on your existing code, run out the results. Are you having problems getting Ravdess model results? prec5 can be greater than 100%. :(

inference using main file

Initializing efficientnet
Traceback (most recent call last):
File "D:\emotion_recongintion\multimodal-emotion-recognition-main\multimodal-emotion-recognition-main\main.py", line 158, in
test_data = get_test_set(opt, spatial_transform=video_transform)
File "D:\emotion_recongintion\multimodal-emotion-recognition-main\multimodal-emotion-recognition-main\dataset.py", line 34, in get_test_set
test_data = RAVDESS(
File "D:\emotion_recongintion\multimodal-emotion-recognition-main\multimodal-emotion-recognition-main\ravdess.py", line 56, in init
self.data = make_dataset(subset, annotation_path)
File "D:\emotion_recongintion\multimodal-emotion-recognition-main\multimodal-emotion-recognition-main\ravdess.py", line 34, in make_dataset
with open(annotation_path, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'ravdess_preprocessing/annotations.txt'

please share the sample data with annotation file, i ravedess website dont give annotation data. please do let me know the format of annotation file required

Ask for help when loaded the train model

Hi, first, thank you for your work. I found the following problems when I loaded your train model. I guess that the train model provided did not match the original code. How to solve this problem?
F6 RI44N4{X7TE_WA` BN6G

5折交叉

你好,源代码是在1折交叉,如果我想进行5折交叉,需要更改哪些地方呢

_pickle.UnpicklingError: could not find MARK

when I run "python main.py", there are errors to me.

Traceback (most recent call last):
File "/home/wangjl/multimodal-emotion-recognition/main.py", line 50, in
model, parameters = generate_model(opt)
File "/home/wangjl/multimodal-emotion-recognition/model.py", line 13, in generate_model
model = multimodalcnn.MultiModalCNN(opt.n_classes, fusion = opt.fusion, seq_length = opt.sample_duration, pretr_ef=opt.pretrain_path, num_heads=opt.num_heads)
File "/home/wangjl/multimodal-emotion-recognition/models/multimodalcnn.py", line 171, in init
init_feature_extractor(self.visual_model, pretr_ef)
File "/home/wangjl/multimodal-emotion-recognition/models/multimodalcnn.py", line 107, in init_feature_extractor
checkpoint = torch.load(f, map_location=torch.device('cpu'))
File "/mnt/raid1/dataMapping/NLP/env/MER/lib/python3.9/site-packages/torch/serialization.py", line 608, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/mnt/raid1/dataMapping/NLP/env/MER/lib/python3.9/site-packages/torch/serialization.py", line 777, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: could not find MARK

Thank you for your work

Hello, Dear Author

I reorganized your code to fit my needs, and open sourced the code, you can click here to visit it. To thank your work, I mentioned you in the README.md. This is my first time to do such a thing and I don't know if that's okay, so I'd like to get your permission.

Looking forward to hearing from you
Best regards.

predicted class

how do i know the predicted class?
i am getting the precision but i couldn't able to interpret the predicted class from pretrained weights..

Training time

Hey, how long will it take to complete the training?
i know it takes long to train but can u give an approximate.
I've been training for almost 5 hours and have reached only 3/100 epoch. Am I missing something or doing something wrong?
I am student and new to the domain, if you could pls help.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.