Giter VIP home page Giter VIP logo

longle2718 / task-4-large-scale-weakly-supervised-sound-event-detection-for-smart-cars Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ankitshah009/task-4-large-scale-weakly-supervised-sound-event-detection-for-smart-cars

0.0 2.0 0.0 1.14 MB

Task 4 Large-scale weakly supervised sound event detection for smart cars

Home Page: http://www.cs.tut.fi/sgn/arg/dcase2017/challenge/task-large-scale-sound-event-detection

License: Other

Python 100.00%

task-4-large-scale-weakly-supervised-sound-event-detection-for-smart-cars's Introduction

Task 4 Large-scale weakly supervised sound event detection for smart cars

Coming soon: Baseline performance. Baseline system will be based on Task 3's system. Last update Apr2: Added strong labels. Update Apr1: Added evaluation folder.

Coordinators

Benjamin Elizalde, Emmanuel Vincent, Bhiksha Raj

Data Preparation, Annotations

Ankit Shah ([email protected]), Benjamin Elizalde ([email protected])

Annotations, Baseline and Subtask A Metric

Rohan Badlani ([email protected]), Benjamin Elizalde ([email protected])

Index

  1. Script to download the development data for Task 4

  2. Script to evaluate Task 4 - Subtask A (Audio tagging)

  3. Strong Label's annotations for Testing


  1. Script to download the development data for Task 4

Prerequisite installations

  1. youtube-dl - [sudo] pip install --upgrade youtube_dl
  2. pafy - [sudo] pip install pafy
  3. tqdm (progress bar) - [sudo] pip install tqdm
  4. multiprocessing - [sudo] pip install multiprocessing
  5. sox tool - sudo apt-get install sox
  6. ffmpeg tool - sudo apt-get install ffmpeg

Features

  1. Downloads the audio from the videos for the testing set first and then for the training set. - Multiprocessing - ensures three files are downloaded simultaneously to reduce the heavy download time to 40 percent as compared with single threaded performance.
  2. Formats the audio with consistent parameters - currently set as 1 channel, 16 bit precision, 44.1kHz sampling rate.
  3. Extracts the 10-sec segments from the formatted audio according to the start and end times.
  4. The script output includes the audio for 1,2 and 3, unless testing script is modified to remove audio from 2 and/or 3, that is the original audio and the formatted audio.
  5. To denote a unique identifier for every run/launch of downloading files - script stores the timestamp and assigns to each of the output files and folder names.
  6. Please, contact Ankit/Benjamin in case one or more videos are not properly downloaded or available, or with any other issue. Participants can create their own scripts to download the audio. Please ensure that you have all the 10-sec clip in the lists.

Lists

Download audio: testing_set.csv, training_set.csv Groundtruth weak labels: groundtruth_weak_label_testing_set.csv groundtruth_weak_label_training_set.csv Groundtruth strong labels: groundtruth_strong_label_testing_set.csv groundtruth_strong_label_training_set.csv

Usage

$python download_audio.py <CSV filename - relative path is also fine> Sample Usage - python download_audio.py training_set.csv

User Modifiable Parameters and Options

  1. Audio formatting can be modified in the "format_audio" method defined in the script download_youtube_audio_from_csv_and_delete_original.py
  2. Removal of original audio and/or formatted audio paths can be done by uncommenting and modifying <os.system(cmdstring2)> in "download_audio_method" function defined in download_audio.py

Output

  1. First folder contains original best audio from youtube: <csv_name><testing/training>_audio_downloaded
  2. Second folder contains the corresponding formatted audio: <csv_name><testing/training>_audio_formatted_downloaded
  3. Third folder contains the extracted 10-sec segments: <csv_name><testing/training>_audio_formatted_downloaded_and_ssegmented_downloads

Note:- To each downloaded audio string "Y" is added as tools like sox and ffmpeg causes problem when filename starts with "--" or "-".

Number of Audio id count files

  1. testing_set_num_files_per_class.csv - For each class - specifies number of audio segments present in the testing set
  2. training_set_num_files_per_class.csv - For each class - specifies number of audio segments present in the training set

  1. Script to evaluate Task 4 - Subtask A (Audio tagging)

Usage

$python TaskAEvaluate.py groundtruth/groundtruth_weak_label_testing_set.csv prediction/perfect_prediction.csv output/perfect_prediction_output.csv


  1. Strong Label's annotations for Testing

  1. Only one person was involved in the annotation of each 10-sec clip.
  2. The sound event annotations were based on the audio and not the video.
  3. The strong labels correspond to the file: groundtruth_strong_label_testing_set.csv
  4. The format of strong labels is the same as the DCASE format (Task 3 and Task 4: Audio tagging).
  5. Less than 2% of the 10-sec clips had the presence of a sound according to AudioSet, but didn't seem to contain the sound event. Thus, start and end time were assigned 0.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.