Giter VIP home page Giter VIP logo

mmgcn's People

Contributors

fatds-lrc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

mmgcn's Issues

Cannot reproduce MELD result

I can only got f1-score of 0.5760, here is the confusion matrix from stdout output log

              precision    recall  f1-score   support

           0     0.7017    0.8503    0.7689      1256
           1     0.4492    0.4875    0.4676       281
           2     0.0000    0.0000    0.0000        50
           3     0.4138    0.1154    0.1805       208
           4     0.5358    0.5398    0.5378       402
           5     0.0000    0.0000    0.0000        68
           6     0.4594    0.4261    0.4421       345

    accuracy                         0.6103      2610
   macro avg     0.3657    0.3456    0.3424      2610
weighted avg     0.5623    0.6103    0.5760      2610

[[1068   57    0   19   78    0   34]
 [  74  137    0    0   35    0   35]
 [  28    5    0    0    8    0    9]
 [ 121   16    0   24   13    0   34]
 [ 109   31    0    4  217    0   41]
 [  33   10    0    2    3    0   20]
 [  89   49    0    9   51    0  147]]

The command is:

python train.py --base-model LSTM --graph-model --nodal-attention --dropout 0.4 --lr 0.001 --batch-size 16 --l2 0.0 --graph_type=MMGCN --epochs=60 --graph_construct=direct --multi_modal --mm_fusion_mthd=concat_subsequently --modals=avl --Dataset=MELD --Deep_GCN_nlayers 4 --use_speaker

The processed data meaning

I'm currently reviewing the processed dataset you've provided in your code. In MELD_features_raw1.pkl, I understand that the second part contains speaker information. Each identifier corresponds to a segment of speech, and for each segment, there are as many vectors as there are sentences. However, I noticed that each vector has a dimension of 9. Does the value 9 have any significance? Thank you so much.

adj matrix problem

在计算adj矩阵的时候
adj = D.mm(adj).mm(D)
会出现大量甚至所有的nan 数据 是怎么回事呢

IEMOCAP audio feature dimention is 1582 which should be 100

When it comes to audio feature, you said the acoustic raw features are extracted using the OpenSmile toolkit with IS10 configuration, which should be 100 dimention. This configuration was also used in paper "COGMEN COntextualized GNN based Multimodal Emotion recognitioN".

Your code runs well, but when I print the audio feature shape, I got 1582 dimention instead of 100.

torch.Size([50, 1582])
torch.Size([44, 1582])
torch.Size([40, 1582])
torch.Size([27, 1582])
torch.Size([38, 1582])
torch.Size([26, 1582])
torch.Size([47, 1582])
torch.Size([60, 1582])

May I ask how do you get the acoustic feature?

MELD Speakers Mapping

This work is nice! I wonder what is the mapping for the speaker indices in the MELD features file?

How can you extract vision features?

Hi, Thanks to share good code.

I have some questions.

  1. How can you get a DenseNet model with trained FER+ datasets? Did you fine-tuning your own? If you do it, can you share extraction model?
  2. How to extract vision feature in video data?. In paper, you use densenet to extract vision feature. So I wondering about how to extract in video datasets. Did you use only one sample data to get feature? or use time series frame data?
  3. Is this any plan to share code about extract all(text, vision, audio) feature?

Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.