Giter VIP home page Giter VIP logo

objects-that-sound's Introduction

Objects that Sound

The repository includes unofficial but full reproduction of the paper "Objects that Sound" from ECCV 2018. ๐Ÿ˜Š

Environment

We implement this work in PyTorch with Python 3.6, and we strongly recommend to use Ubuntu 16.04.

It took about less than 20 hours to train the models on 48,000 videos, with Intel i7-9700k and RTX 2080 Ti.

For detailed package requirements, please refer to requirements.txt for more information.

Model

model We implement AVE-Net, AVOL-Net, and also L3-Net which is one of the baseline models.

Dataset

We use or construct 3 different dataset to train and evaluate the models.

  • AudioSet-Instruments
  • AudioSet-Animal
  • AVE-Dataset

Due to the resource limitation, we use 20% subset of AudioSet-Instruments suggested in the paper.

Please refer to Final Report for more detailed explanations about each dataset.

Results

We note that our results are quite different from the paper.

We expect this difference may come from difference in size of the dataset, batch size, learning rate, or any other subtle difference in training configurations.

1. Accuracy on AVC Task

2. Cross-modal Retrieval (Qualitative)

More qualitative results are available in our slides provided below.

3. Cross-modal Retrieval (Quantitative)

4. Sound Localization (Qualitative)

5. Embedding Visualization with t-SNE

Embeddings with โ— are from images, and embeddings with ร— are from audios.

Acknowledgement

We have gotten many insights of implementation from this repository, thanks to @rohitrango.

Supplementary Material

As this is made for our course project, we ready for PPT slides with corresponding presentation.

Name Slide Video
Project Proposal PPTX YouTube
Progress Update PPTX YouTube
Final Presentation PPTX YouTube

Also, please refer to our Final Report for detailed explanation of implementation and training configurations.

Contact

We are welcoming any questions and issues of implementation. If you have any, please contact to e-mail below or leave a issue.

Contributor E-mail
Kyuyeon Kim [email protected]
Hyeongryeol Ryu [email protected]
Yeonjae Kim [email protected]

Comment

If you find this repository to be useful, please Star โญ or Fork ๐Ÿด this repository.

objects-that-sound's People

Contributors

dependabot[bot] avatar hyeongyeolryu avatar kyuyeonpooh avatar yjkim721 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

objects-that-sound's Issues

How to get audio?

Hello, I'm trying to use your model to test my video. And my location is terrible. How do you process video to get audio?
This is how I extract audio:
from moviepy.editor import *
audioclip = AudioFileClip(video_path) #read video
audioclip.write_audiofile(audio_path, 48000) #save as wav
The shape of the read audio data is (n * 2), so I can only take the average value so that the program can run normally.
extractor.py๏ผš
rate, sample = wavfile.read(aud_path)
sample = np.mean(sample, 1) #todo: I added it myself
But my location is terrible, so I'd like to know how you extract audio from your own video.
In addition, do you have a good positioning effect?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.