Giter VIP home page Giter VIP logo

looking-to-listen-at-the-cocktail-party's Introduction

Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation


The project is an audiovisual model reproduced by the contents of the paper Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation.

Ephrat A, Mosseri I, Lang O, et al. Looking to listen at the cocktail party: A speaker-independent audio-visual model for speech separation[J]. arXiv preprint arXiv:1804.03619, 2018.


Requirement

To install requirements:

pip install -r requirements.txt

You can install ffmpeg and sox using homebrew:

brew install ffmpeg
brew install sox

Pretreatment

Video Data

  1. Download the dataset from here and place files in data/csv.
  2. First use this command to download the YouTube video and use ffmpeg to capture the 3 second video as 75 images.
python3 video_download.py
  1. Then use mtcnn to get the image bounding box of the face, and then use the CSV x, y to locate the face center point.
pip install mtcnn
python3 face_detected.py
python3 check_vaild_face.py

Audio Data

  1. For the audio section, use the YouTube download tool to download the audio, then set the sample rate to 16000 via the librosa library. Finally, the audio data is normalized.
python3 audio_downloads.py
python3 audio_norm.py # audio_data normalized
  1. Pre-processing audio data, including stft, Power-law, blending, generating complex masks, etc....
python3 audio_data.py

Face embedding Feature

  • Here we use Google's FaceNet method to map face images to high-dimensional Euclidean space. In this project, we use David Sandberg's open source FaceNet preprocessing model "20180402-114759". Then use the TensorFlow_to_Keras script in this project to convert.(Model/face_embedding/

Schroff F, Kalenichenko D, Philbin J. Facenet: A unified embedding for face recognition and clustering[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 815-823.

Change the path tf_model_dir in Tensorflow_to_Keras.py

python3 Tensorflow_to_Keras.py
python3 face_emb.py

  1. Create AVdataset_train.txt and AVdataset_val.txt
python3 AV_data_log.py

Training

  • Support continuous training after interrupt training
  • Support multi-GPU multi-process training.
  • According to the description in the paper, set the following parameters:
people_num = 2 # How many people you want to separate?
epochs = 100
initial_epoch = 0
batch_size = 1 # 2,4 need to GPU
gamma_loss = 0.1
beta_loss = gamma_loss * 2
  • Then use the script train.py to train

Plan to achieve

  • Implemented with Pytorch
  • Provide a trained model
  • Optimize code style
  • ......

Part of the code reference this github https://github.com/bill9800/speech_separation

looking-to-listen-at-the-cocktail-party's People

Contributors

ayushtiwari avatar dependabot[bot] avatar jusperlee avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.