Giter VIP home page Giter VIP logo

binaural-sound-perception's Introduction

Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds

Arun Balajee Vasudevan, Dengxin Dai, Luc Van Gool

This repo contains the code of our ECCV 2020 paper

License

This software is released under a creative commons license which allows for personal and research use only. You can view a license summary here.

Installation

  • The code is tested with pytorch 1.5 and python 3.7.1

Data

wget http://data.vision.ee.ethz.ch/arunv/binaural_perception/data.zip
unzip data.zip 

Note: Audio track 3 (LAB-T***.WAV/LAB-T***_Tr3.WAV) and Audio track 4 (LAB-T***.WAV/LAB-T***_Tr4.WAV) of scenes from 1 to 27 are missing in the dataset. This is due to the manual error while recording the initial 27 videos of the dataset.

To extract video segments and corresponding sound time-frequency representations:

python extract_videosegments.py
python extract_spectrograms.py

Training

a) Semantic prediction, Depth prediction and Spatial sound super-resolution

python train_noBG_Paralleltask_depth.py

b) Depth prediction and Spatial sound super-resolution

python scripts/train_noBG_Paralleltask_depth_noSeman.py

c) Semantic prediction and Spatial sound super-resolution

python scripts/train_noBG_Paralleltask.py

Acknowledgement

This work is funded by Toyota Motor Europe via the research project TRACE (Toyota Research on Automated Cars in Europe) Zurich and was carried out at the CV Lab at ETH Zurich. Our codes include adapted version from the external repository of NVIDIA:

Citation

@inproceedings{vasudevan2020semantic,
  title={Semantic object prediction and spatial sound super-resolution with binaural sounds},
  author={Vasudevan, Arun Balajee and Dai, Dengxin and Van Gool, Luc},
  booktitle={European Conference on Computer Vision},
  pages={638--655},
  year={2020},
  organization={Springer}
}

binaural-sound-perception's People

Contributors

arunbalajeev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

binaural-sound-perception's Issues

missing configuration scripts in datasets/init.py

Thank you for your paper, I think this is a very impressive contribution. I would really like to reproduce the results but I am running into several issues and some I can't fix myself. I am going to create a few issues here, one for each point found, so you may be able to fix the code and supply the additional data necessary.

Basically, all scripts which are imported in datasets/init.py: 'citiscapes', 'mapillar', 'OmniAudio_4ch_audioBG_SeqSpec_diffmask_Comp_3class' are not in the repository.

On the other hand, the scripts needed for the code are not imported so it is necessary to add the lines:

from datasets import OmniAudio_noBG_Paralleltask_depth
from datasets import OmniAudio_noBG_Paralleltask_depth_noSeman

But it is necessary to comment out all others.

However, this leads to another big issue. In those files, the make_dataset method wants to load SoundEnergy_60scenes.npy in line 178. That file is not in the repository and there is no hint on how to download it.

Dataset size

Hello,

In the paper for this work, it states that there are 64,250 video clips. When I download the dataset from the link, and run extract_videosegments.py on the directory, I only end up with 31569 frames in all of the split_videoframes directory.

@arunbalajeev do you have any idea why this is? I looked at the extraction scripts and it seems reasonable.

Thanks!

bg subtraction and the manually annotate mid-frame semantic segmentation masks

Hello! First of all, thank you so much for sharing this. I'm really impressed with the great work!

Would it be possible to share the scripts or more instructions on the background subtraction, sound-making target object, and your manual annotation of the middle frame in each split video? and the train/val/test split? Thus, we can compare with you work. Thanks!

SyntaxError: from __future__ imports must occur at the beginning of the file

Thank you for your paper, I think this is a very impressive contribution. I would really like to reproduce the results but I am running into several issues and some I can't fix myself. I am going to create a few issues here, one for each point found, so you may be able to fix the code and supply the additional data necessary.

Issue:

  File "train_noBG_Paralleltask_depth.py", line 8
    from __future__ import absolute_import

Unfortunately, the docstring comments at the top of the file can not be there before the future import.
This applies to the training scripts as well.

No way to get img_path

Hello! Thank you so much for this project and repo @arunbalajeev.

When I run the code, the dataloader wants to open the img_path (bg.png) however this is not the in the datasets or any scripts to setup the project. I was wondering if you would be able to provide this?

For example:

Image.open(sem_mask), np.load(audio1), np.load(audio6), np.load(audio_path1), np.load(audio_path6), np.load(audio_path2), np.load(audio_path5)
  File "/.local/lib/python3.7/site-packages/PIL/Image.py", line 2904, in open
    fp = builtins.open(filename, "rb")
FileNotFoundError: [Errno 2] No such file or directory: '/binaural-sound-perception/dataset_public/scene0089/bg.png'

Thank you so much!

audio input size

Thanks for making the code available. I have a doubt regarding the spectrogram calculation.
As per the paper, the input audio spectrogram is of size 257x601. But the length of the audio is set to 2 sec with a sampling rate of 96kHz and hop size of 160. So, for this setting, the time axis in the spectrogram should have dimension (2x96000)/160 which comes around 1201 (not 601). Could you please clarify this or have I grossly misunderstood something?

missing optimizer_two5_SharedEnc

Thank you for your paper, I think this is a very impressive contribution. I would really like to reproduce the results but I am running into several issues and some I can't fix myself. I am going to create a few issues here, one for each point found, so you may be able to fix the code and supply the additional data necessary.

train_noBG_Paralleltask_depth loads optimizer_two5_SharedEnc_depth which is in the repository, however, train_noBG_Paralleltask wants to load optimizer_two5_SharedEnc which is not in the repository:

  File "scripts/train_noBG_Paralleltask.py", line 19, in <module>
    import optimizer_two5_SharedEnc as optimizer
ModuleNotFoundError: No module named 'optimizer_two5_SharedEnc'

Please excuse if the line numbers do not fit your version, I had to remove comments at the top of the file to get around the future import issue.

Issue with train_noBG_Paralleltask.py

Hello, thank you for this project!

I had an issue when trying to run train_noBG_Paralleltask.py script on the downloaded dataset. When I do I get the error:

/datasets/OmniAudio_noBG_Paralleltask.py", line 214, in make_dataset
    if it_full in audioDict.item()[sc]:
ValueError: can only convert an array of size 1 to a Python scalar

When I look at the code, audioDIct is when we load SoundEnergy_array_165scenes.npy but audioDict is an array of float64 of size (31326,), and it_full is a path to a .npy data file. Do you have any ideas about what is happening? My initial thought is that audioDIct is not being loaded properly and should be some kind of string object.

Thank you so much!

missing manually annotated test set (AuditoryTestManual)

@arunbalajeev
Thank you for your paper and code. It's impressive work.

I would like to use your dataset, but it seems that it doesn't include manually annotated test set AuditoryTestManual for evaluation.
I appreciate if you could share the test set. (I will use it for my research.)

Thank you !

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.