Auditory AudioInceptionNeXt

This repository implements the model proposed in the paper:

Kin Wai Lau, Yasar Abbas Ur Rehman, Yuyang Xie, Lan Ma, AudioInceptionNeXt: TCL AI LAB Submission to EPIC-SOUND Audio-Based-Interaction-Recognition Challenge 2023

[arXiv paper]

The implementation code is based on the Slow-Fast Auditory Streams for Audio Recognition, ICASSP, 2021. For more information, please refer to the link.

Citing

When using this code, kindly reference:

@article{lau2023audioinceptionnext,
  title={AudioInceptionNeXt: TCL AI LAB Submission to EPIC-SOUND Audio-Based-Interaction-Recognition Challenge 2023},
  author={Lau, Kin Wai and Rehman, Yasar Abbas Ur and Xie, Yuyang and Ma, Lan},
  journal={arXiv preprint arXiv:2307.07265},
  year={2023}
}

Pretrained models

You can download our pretrained models on VGG-Sound and EPIC-Sounds:

AudioInceptionNeXt (VGG-Sound) link
AudioInceptionNeXt (EPIC-Sound) link

Preparation

Requirements:
- PyTorch 1.7.1
- librosa: conda install -c conda-forge librosa
- h5py: conda install h5py
- wandb: pip install wandb
- fvcore: pip install 'git+https://github.com/facebookresearch/fvcore'
- simplejson: pip install simplejson
- psutil: pip install psutil
- tensorboard: pip install tensorboard
Add this repository to $PYTHONPATH.

export PYTHONPATH=/path/to/auditory-slow-fast/slowfast:$PYTHONPATH

VGG-Sound: See the instruction in Auditory Slow-Fast repository link
EPIC-KITCHENS: See the instruction in Auditory Slow-Fast repository link
EPIC-Sounds See the instruction in Epic-Sounds annotations repository link and link

Training/validation on VGG-Sound

To train the model run:

python tools/run_net.py --cfg configs/VGG-Sound/AudioInceptionNeXt.yaml --init_method tcp://localhost:9996 \
NUM_GPUS num_gpus \
OUTPUT_DIR /path/to/output_dir \
VGGSOUND.AUDIO_DATA_DIR /path/to/dataset 
VGGSOUND.ANNOTATIONS_DIR /path/to/annotations

To validate the model run:

python tools/run_net.py --cfg configs/VGG-Sound/AudioInceptionNeXt.yaml --init_method tcp://localhost:9998 \
NUM_GPUS num_gpus \
OUTPUT_DIR /path/to/experiment_dir \
VGGSOUND.AUDIO_DATA_DIR /path/to/dataset \
VGGSOUND.ANNOTATIONS_DIR /path/to/annotations \
TRAIN.ENABLE False \
TEST.ENABLE True \
TEST.CHECKPOINT_FILE_PATH /path/to/experiment_dir/checkpoints/checkpoint_best.pyth

Fine Tune/validation on EPIC-Sounds

To fine-tuning from VGG-Sound pretrained model:

python tools/run_net.py --cfg configs/EPIC-SOUND-416x128/AudioInceptionNeXt.yaml --init_method tcp://localhost:9996 \
NUM_GPUS num_gpus \
OUTPUT_DIR /path/to/output_dir \
EPICSOUND.AUDIO_DATA_FILE /path/to/EPIC-KITCHENS-100_audio.hdf5 \
EPICSOUND.ANNOTATIONS_DIR /path/to/annotations \
TRAIN.CHECKPOINT_FILE_PATH /path/to/VGG-Sound/pretrained/model

To validate the model run:

python tools/run_net.py --cfg configs/EPIC-SOUND-416x128/AudioInceptionNeXt.yaml --init_method tcp://localhost:9997 \
NUM_GPUS num_gpus \
OUTPUT_DIR /path/to/experiment_dir \
EPICKITCHENS.AUDIO_DATA_FILE /path/to/EPIC-KITCHENS-100_audio.hdf5 \
EPICKITCHENS.ANNOTATIONS_DIR /path/to/annotations \
TRAIN.ENABLE False \
TEST.ENABLE True \
TEST.CHECKPOINT_FILE_PATH /path/to/experiment_dir/checkpoints/checkpoint_best.pyth

stevenlauhkhk / audioinceptionnext Goto Github PK

audioinceptionnext's Introduction

Auditory AudioInceptionNeXt

Citing

Pretrained models

Preparation

Training/validation on VGG-Sound

Fine Tune/validation on EPIC-Sounds

audioinceptionnext's People

Contributors

Stargazers

Watchers

audioinceptionnext's Issues

OSError: Unable to synchronously open file (invalid file name)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent