Giter VIP home page Giter VIP logo

sbnet's Introduction

SBNet (ICASSP 2023)

Official implementation of SBNet as described in "Single-branch Network for Multimodal Training".

Paper Link: SBNet

Presentation: https://youtu.be/bXeiy8kQQtY

Proposed Methodology

a) Two independent modality-specific embedding networks to extract features (left) and a conventional two-branch network (right) having two independent modality-specific branches to learn discriminative joint representations of the multimodal task. (b) Proposed network with a single modality-invariant branch.

Installation

We have used the following setup for our experiments:

python==3.6.5

CUDA and cuDNN Setup:

For tensorflow:

  • CUDA Toolkit 10.1
  • cudnn v7.6.5.32 for CUDA10.1

For PyTorch:

  • CUDA Toolkit 10.2
  • cudnn v8.2.1.32 for CUDA10.2

To install PyTorch and TensorFlow with GPU support:

  pip install tensorflow-gpu==1.13.1
  pip install torch==1.8.1+cu102 torchvision==0.9.1+cu102 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html

Feature Extraction

We perform experiments on cross-modal verification and cross-modal matching tasks on the large-scale VoxCeleb1 dataset.

Facial Feature Extraction

For face feature extraction we use Facenet. The official implmentation from authors is available hereGitHub stars

Voice Feature Extraction

For Voice Embeddings we use the method described in Utterance Level Aggregator. The code we used is released by authors and is publicly available hereGitHub stars

Extracted Features

The face and voice features used in our work can be accessed here. Once downloaded, place the files like this:

|-- data
  |-- voice
    |-- .csv files
  |-- face
    |--  .csv files
|-- imgs
|-- ssnet_cent_git
|-- ssnet_fop
|-- twobranch_cent_git
|-- twobranch_fop

Training and Testing

FOP Loss

# Training
python main.py --save_dir ./model --batch_size 128 --max_num_epoch 100 --dim_embed 128 --split_type <face_only, voice_only, hefhev, hevhef, random, fvfv, vfvf>

# Testing
python test.py --split_type vfvf --sh unseenunheard --test random

Cent/Git Loss

# Training
python main.py --save_dir ./model --batch_size 128 --max_num_epoch 100 --split_type <face_only, voice_only, hefhev, hevhef, random, fvfv, vfvf> --loss <git, cent>

# Testing
python test.py --split_type fvfv --sh unseenunheard --test random

Baseline

For baseline results, we leverage the work from FOP.

Citation

@inproceedings{saeed2023sbnet,
  title={Single-branch Network for Multimodal Training},
  author={Saeed, Muhammad Saad and Nawaz, Shah and Yousaf and Khan, Muhammad Haris and Zaheer, Muhammad Zaigham and Nandakumar, Karthik and Yousaf, Muhammad Haroon and Mahmood, Arf},
  booktitle={ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2023},
  organization={IEEE}
}

@inproceedings{saeed2022fusion,
  title={Fusion and Orthogonal Projection for Improved Face-Voice Association},
  author={Saeed, Muhammad Saad and Khan, Muhammad Haris and Nawaz, Shah and Yousaf, Muhammad Haroon and Del Bue, Alessio},
  booktitle={ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={7057--7061},
  year={2022},
  organization={IEEE}
}

sbnet's People

Contributors

msaadsaeed avatar shahnawazgrewal avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.