hujingwen6666 / mmgcn Goto Github PK

View Code? Open in Web Editor NEW

58.0 58.0 20.0 107.39 MB

Python 99.54% Shell 0.46%

mmgcn's People

Contributors

Stargazers

Watchers

mmgcn's Issues

Cannot reproduce MELD result

I can only got f1-score of 0.5760, here is the confusion matrix from stdout output log

              precision    recall  f1-score   support

           0     0.7017    0.8503    0.7689      1256
           1     0.4492    0.4875    0.4676       281
           2     0.0000    0.0000    0.0000        50
           3     0.4138    0.1154    0.1805       208
           4     0.5358    0.5398    0.5378       402
           5     0.0000    0.0000    0.0000        68
           6     0.4594    0.4261    0.4421       345

    accuracy                         0.6103      2610
   macro avg     0.3657    0.3456    0.3424      2610
weighted avg     0.5623    0.6103    0.5760      2610

[[1068   57    0   19   78    0   34]
 [  74  137    0    0   35    0   35]
 [  28    5    0    0    8    0    9]
 [ 121   16    0   24   13    0   34]
 [ 109   31    0    4  217    0   41]
 [  33   10    0    2    3    0   20]
 [  89   49    0    9   51    0  147]]

The command is:

python train.py --base-model LSTM --graph-model --nodal-attention --dropout 0.4 --lr 0.001 --batch-size 16 --l2 0.0 --graph_type=MMGCN --epochs=60 --graph_construct=direct --multi_modal --mm_fusion_mthd=concat_subsequently --modals=avl --Dataset=MELD --Deep_GCN_nlayers 4 --use_speaker

I'm currently reviewing the processed dataset you've provided in your code. In MELD_features_raw1.pkl, I understand that the second part contains speaker information. Each identifier corresponds to a segment of speech, and for each segment, there are as many vectors as there are sentences. However, I noticed that each vector has a dimension of 9. Does the value 9 have any significance? Thank you so much.

adj matrix problem

在计算adj矩阵的时候
adj = D.mm(adj).mm(D)
会出现大量甚至所有的nan 数据是怎么回事呢

The requirements of the experiment environment

hello, can you share the pip requirements?

IEMOCAP audio feature dimention is 1582 which should be 100

When it comes to audio feature, you said the acoustic raw features are extracted using the OpenSmile toolkit with IS10 configuration, which should be 100 dimention. This configuration was also used in paper "COGMEN COntextualized GNN based Multimodal Emotion recognitioN".

Your code runs well, but when I print the audio feature shape, I got 1582 dimention instead of 100.

torch.Size([50, 1582])
torch.Size([44, 1582])
torch.Size([40, 1582])
torch.Size([27, 1582])
torch.Size([38, 1582])
torch.Size([26, 1582])
torch.Size([47, 1582])
torch.Size([60, 1582])

May I ask how do you get the acoustic feature?

How can you get a DenseNet model with trained FER+ datasets? Did you fine-tuning your own? If you do it, can you share extraction model?
How to extract vision feature in video data?. In paper, you use densenet to extract vision feature. So I wondering about how to extract in video datasets. Did you use only one sample data to get feature? or use time series frame data?
Is this any plan to share code about extract all(text, vision, audio) feature?

Thank you

hujingwen6666 / mmgcn Goto Github PK

mmgcn's People

Contributors

Stargazers

Watchers

Forkers

mmgcn's Issues

Cannot reproduce MELD result

The processed data meaning

adj matrix problem

The requirements of the experiment environment

IEMOCAP audio feature dimention is 1582 which should be 100

Is modality feature extractor available?

the tensor U in the forward method of model LSTM has given shape torch.Size([94, 32, 100]).

Class MMGCN2

MELD Speakers Mapping

How can you extract vision features?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent