Giter VIP home page Giter VIP logo

audioinceptionnext's Introduction

Auditory AudioInceptionNeXt

This repository implements the model proposed in the paper:

Kin Wai Lau, Yasar Abbas Ur Rehman, Yuyang Xie, Lan Ma, AudioInceptionNeXt: TCL AI LAB Submission to EPIC-SOUND Audio-Based-Interaction-Recognition Challenge 2023

[arXiv paper]

The implementation code is based on the Slow-Fast Auditory Streams for Audio Recognition, ICASSP, 2021. For more information, please refer to the link.

Citing

When using this code, kindly reference:

@article{lau2023audioinceptionnext,
  title={AudioInceptionNeXt: TCL AI LAB Submission to EPIC-SOUND Audio-Based-Interaction-Recognition Challenge 2023},
  author={Lau, Kin Wai and Rehman, Yasar Abbas Ur and Xie, Yuyang and Ma, Lan},
  journal={arXiv preprint arXiv:2307.07265},
  year={2023}
}

Pretrained models

You can download our pretrained models on VGG-Sound and EPIC-Sounds:

  • AudioInceptionNeXt (VGG-Sound) link
  • AudioInceptionNeXt (EPIC-Sound) link

Preparation

  • Requirements:
    • PyTorch 1.7.1
    • librosa: conda install -c conda-forge librosa
    • h5py: conda install h5py
    • wandb: pip install wandb
    • fvcore: pip install 'git+https://github.com/facebookresearch/fvcore'
    • simplejson: pip install simplejson
    • psutil: pip install psutil
    • tensorboard: pip install tensorboard
  • Add this repository to $PYTHONPATH.
export PYTHONPATH=/path/to/auditory-slow-fast/slowfast:$PYTHONPATH
  • VGG-Sound: See the instruction in Auditory Slow-Fast repository link
  • EPIC-KITCHENS: See the instruction in Auditory Slow-Fast repository link
  • EPIC-Sounds See the instruction in Epic-Sounds annotations repository link and link

Training/validation on VGG-Sound

To train the model run:

python tools/run_net.py --cfg configs/VGG-Sound/AudioInceptionNeXt.yaml --init_method tcp://localhost:9996 \
NUM_GPUS num_gpus \
OUTPUT_DIR /path/to/output_dir \
VGGSOUND.AUDIO_DATA_DIR /path/to/dataset 
VGGSOUND.ANNOTATIONS_DIR /path/to/annotations 

To validate the model run:

python tools/run_net.py --cfg configs/VGG-Sound/AudioInceptionNeXt.yaml --init_method tcp://localhost:9998 \
NUM_GPUS num_gpus \
OUTPUT_DIR /path/to/experiment_dir \
VGGSOUND.AUDIO_DATA_DIR /path/to/dataset \
VGGSOUND.ANNOTATIONS_DIR /path/to/annotations \
TRAIN.ENABLE False \
TEST.ENABLE True \
TEST.CHECKPOINT_FILE_PATH /path/to/experiment_dir/checkpoints/checkpoint_best.pyth

Fine Tune/validation on EPIC-Sounds

To fine-tuning from VGG-Sound pretrained model:

python tools/run_net.py --cfg configs/EPIC-SOUND-416x128/AudioInceptionNeXt.yaml --init_method tcp://localhost:9996 \
NUM_GPUS num_gpus \
OUTPUT_DIR /path/to/output_dir \
EPICSOUND.AUDIO_DATA_FILE /path/to/EPIC-KITCHENS-100_audio.hdf5 \
EPICSOUND.ANNOTATIONS_DIR /path/to/annotations \
TRAIN.CHECKPOINT_FILE_PATH /path/to/VGG-Sound/pretrained/model

To validate the model run:

python tools/run_net.py --cfg configs/EPIC-SOUND-416x128/AudioInceptionNeXt.yaml --init_method tcp://localhost:9997 \
NUM_GPUS num_gpus \
OUTPUT_DIR /path/to/experiment_dir \
EPICKITCHENS.AUDIO_DATA_FILE /path/to/EPIC-KITCHENS-100_audio.hdf5 \
EPICKITCHENS.ANNOTATIONS_DIR /path/to/annotations \
TRAIN.ENABLE False \
TEST.ENABLE True \
TEST.CHECKPOINT_FILE_PATH /path/to/experiment_dir/checkpoints/checkpoint_best.pyth

audioinceptionnext's People

Contributors

stevenlauhkhk avatar

Stargazers

 avatar Yasar Abbas avatar

Watchers

 avatar

audioinceptionnext's Issues

OSError: Unable to synchronously open file (invalid file name)

Thanks for open-sourcing your code!
I have fine-tuning from VGG-Sound pretrained model,but when I run the command to validate the model, the error occurs. The EPICKITCHENS.AUDIO_DATA_FILE parameter is an absolute path and this path can used in train command.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.