Giter VIP home page Giter VIP logo

mrdf's Introduction

Code for Cross-Modality and Within-Modality Regularization for Audio-Visual DeepFake Detection. (ICASSP 2024)

Environemt

Python=3.8, Pytorch=1.13, pytorch_lightning==1.7.7, CUDA=11.6

conda create -n df116 python=3.8
pip install -r requirements.txt

Code Structure

-- dataset (FakeAVCeleb)
-- model: 
    avhubert (download from https://github.com/facebookresearch/av_hubert/tree/main/avhubert) with some small changes, ImageEncoder.py, model.py
    model realted files (__init__, avdf, avdf_ensemble, avdf_multiclass, avdf_multilabel, mrdf_margin, mrdf_ce)
-- fairseq: (download from https://github.com/facebookresearch/fairseq with some small changes)
-- outputs: log, results, ckpts
-- data/FakeAVCeleb_v1.2: path to the dataset (FakeAVCeleb) and splits (5 folds)
-- utils: loss, figure, utils
-- main.py
-- train.py
-- test.py
-- requirements.txt

Data Preparation

FakeAVCeleb is an audio-visual deepfake detection dataset. FakeAVCeleb consists of 500 real videos and over 20,000 fake videos, spanning five ethnic groups, each with 100 real videos from 100 subjects. As there are no official split methods, we provide a balanced split method with a 1:1:1:1 ratio across four categories, FakeAudio-FakeVideo (FAFV), FakeAudio-RealVideo (FARV), RealAudio-FakeVideo (RAFV), and RealAudio-RealVideo (RARV), and use 500 video from each category with one video from one subject for each category. In total, we have 2000 videos and split them into 5 folds and utilize a subject-independent 5-fold-cross-validation strategy for equitable comparisons. The subjects for testing are not seen during training for each fold.

Train and Test

For direct inference on FakeAVCeleb, we provide the pretrained checkpoint example here. The related dataset splits are also available

## train example
CUDA_VISIBLE_DEVICES=0 python train.py \
  --model_type MRDF_CE --save_name MRDF_CE \
  --data_root ./data/FakeAVCeleb_v1.2/ \
  --dataset fakeavceleb

CUDA_VISIBLE_DEVICES=0 python train.py \
  --model_type MRDF_Margin --save_name MRDF_Margin \
  --data_root ./data/FakeAVCeleb_v1.2/ \
  --dataset fakeavceleb

## test example
CUDA_VISIBLE_DEVICES=0 python test.py \
  --data_root ./data/FakeAVCeleb_v1.2/ \
  --checkpoint /Path/To/MRDF_Margin_train_2.ckpt \
  --train_fold train_2.txt --model_type MRDF_Margin

CUDA_VISIBLE_DEVICES=0 python test.py \
  --data_root ./data/FakeAVCeleb_v1.2/ \
  --checkpoint /Path/To/MRDF_CE_train_2.ckpt \
  --train_fold train_2.txt --model_type MRDF_CE

More examples could be seen in sript/train.sh and sript/test.sh. Due to the environment difference, the results could be a little different with those we reported in the paper.

Citations

If you use this code as part of any published research, we'd really appreciate it if you could cite the following paper:

@article{zou2024cross,
  title={Cross-Modality and Within-Modality Regularization for Audio-Visual DeepFake Detection},
  author={Zou, Heqing and Shen, Meng and Hu, Yuchen and Chen, Chen and Chng, Eng Siong and Rajan, Deepu},
  journal={arXiv preprint arXiv:2401.05746},
  year={2024}
}

Acknowledgements

Some code is borrowed from ControlNet/LAV-DF and facebookresearch/av_hubert.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.