arunbalajeev / binaural-sound-perception Goto Github PK

View Code? Open in Web Editor NEW

18.0 4.0 4.0 603 KB

Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds

License: Other

Python 100.00%

binaural-sound-perception's Introduction

Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds

Arun Balajee Vasudevan, Dengxin Dai, Luc Van Gool

This repo contains the code of our ECCV 2020 paper

License

This software is released under a creative commons license which allows for personal and research use only. You can view a license summary here.

Installation

The code is tested with pytorch 1.5 and python 3.7.1

Data

wget http://data.vision.ee.ethz.ch/arunv/binaural_perception/data.zip
unzip data.zip

Note: Audio track 3 (LAB-T***.WAV/LAB-T***_Tr3.WAV) and Audio track 4 (LAB-T***.WAV/LAB-T***_Tr4.WAV) of scenes from 1 to 27 are missing in the dataset. This is due to the manual error while recording the initial 27 videos of the dataset.

To extract video segments and corresponding sound time-frequency representations:

python extract_videosegments.py
python extract_spectrograms.py

Training

a) Semantic prediction, Depth prediction and Spatial sound super-resolution

python train_noBG_Paralleltask_depth.py

b) Depth prediction and Spatial sound super-resolution

python scripts/train_noBG_Paralleltask_depth_noSeman.py

c) Semantic prediction and Spatial sound super-resolution

python scripts/train_noBG_Paralleltask.py

Acknowledgement

This work is funded by Toyota Motor Europe via the research project TRACE (Toyota Research on Automated Cars in Europe) Zurich and was carried out at the CV Lab at ETH Zurich. Our codes include adapted version from the external repository of NVIDIA:

Semantic Segmentation: https://github.com/NVIDIA/semantic-segmentation The associated license is here.

Citation

@inproceedings{vasudevan2020semantic,
  title={Semantic object prediction and spatial sound super-resolution with binaural sounds},
  author={Vasudevan, Arun Balajee and Dai, Dengxin and Van Gool, Luc},
  booktitle={European Conference on Computer Vision},
  pages={638--655},
  year={2020},
  organization={Springer}
}

binaural-sound-perception's People

Contributors

Stargazers

Watchers

Forkers

daidengxin drydenwiebe terrisgo masatate

binaural-sound-perception's Issues

missing configuration scripts in datasets/init.py

Thank you for your paper, I think this is a very impressive contribution. I would really like to reproduce the results but I am running into several issues and some I can't fix myself. I am going to create a few issues here, one for each point found, so you may be able to fix the code and supply the additional data necessary.

Basically, all scripts which are imported in datasets/init.py: 'citiscapes', 'mapillar', 'OmniAudio_4ch_audioBG_SeqSpec_diffmask_Comp_3class' are not in the repository.

On the other hand, the scripts needed for the code are not imported so it is necessary to add the lines:

from datasets import OmniAudio_noBG_Paralleltask_depth
from datasets import OmniAudio_noBG_Paralleltask_depth_noSeman

But it is necessary to comment out all others.

However, this leads to another big issue. In those files, the make_dataset method wants to load SoundEnergy_60scenes.npy in line 178. That file is not in the repository and there is no hint on how to download it.

utils.attr_dict is missing

from utils.attr_dict import AttrDict
ModuleNotFoundError: No module named 'utils.attr_dict'

Dataset size

Hello,

In the paper for this work, it states that there are 64,250 video clips. When I download the dataset from the link, and run extract_videosegments.py on the directory, I only end up with 31569 frames in all of the split_videoframes directory.

@arunbalajeev do you have any idea why this is? I looked at the extraction scripts and it seems reasonable.

Thanks!

bg subtraction and the manually annotate mid-frame semantic segmentation masks

Hello! First of all, thank you so much for sharing this. I'm really impressed with the great work!

Would it be possible to share the scripts or more instructions on the background subtraction, sound-making target object, and your manual annotation of the middle frame in each split video? and the train/val/test split? Thus, we can compare with you work. Thanks!

SyntaxError: from future imports must occur at the beginning of the file

Issue:

  File "train_noBG_Paralleltask_depth.py", line 8
    from __future__ import absolute_import

Unfortunately, the docstring comments at the top of the file can not be there before the future import.
This applies to the training scripts as well.

No way to get img_path

Hello! Thank you so much for this project and repo @arunbalajeev.

When I run the code, the dataloader wants to open the img_path (bg.png) however this is not the in the datasets or any scripts to setup the project. I was wondering if you would be able to provide this?

For example:

Image.open(sem_mask), np.load(audio1), np.load(audio6), np.load(audio_path1), np.load(audio_path6), np.load(audio_path2), np.load(audio_path5)
  File "/.local/lib/python3.7/site-packages/PIL/Image.py", line 2904, in open
    fp = builtins.open(filename, "rb")
FileNotFoundError: [Errno 2] No such file or directory: '/binaural-sound-perception/dataset_public/scene0089/bg.png'

Thank you so much!

audio input size

Thanks for making the code available. I have a doubt regarding the spectrogram calculation.
As per the paper, the input audio spectrogram is of size 257x601. But the length of the audio is set to 2 sec with a sampling rate of 96kHz and hop size of 160. So, for this setting, the time axis in the spectrogram should have dimension (2x96000)/160 which comes around 1201 (not 601). Could you please clarify this or have I grossly misunderstood something?

missing optimizer_two5_SharedEnc

train_noBG_Paralleltask_depth loads optimizer_two5_SharedEnc_depth which is in the repository, however, train_noBG_Paralleltask wants to load optimizer_two5_SharedEnc which is not in the repository:

  File "scripts/train_noBG_Paralleltask.py", line 19, in <module>
    import optimizer_two5_SharedEnc as optimizer
ModuleNotFoundError: No module named 'optimizer_two5_SharedEnc'

Please excuse if the line numbers do not fit your version, I had to remove comments at the top of the file to get around the future import issue.

Do we extract the spectrograms for evrey track when we run extract_spectrograms.py

Hello! Thank you so much for this work and dataset, I am looking forward to using it.

I was wondering that when we run extract_spectrograms.py, we must do it for all tracks in the dataset. It seems like this is the case.

Thank you!

Issue with train_noBG_Paralleltask.py

Hello, thank you for this project!

I had an issue when trying to run train_noBG_Paralleltask.py script on the downloaded dataset. When I do I get the error:

/datasets/OmniAudio_noBG_Paralleltask.py", line 214, in make_dataset
    if it_full in audioDict.item()[sc]:
ValueError: can only convert an array of size 1 to a Python scalar

When I look at the code, audioDIct is when we load SoundEnergy_array_165scenes.npy but audioDict is an array of float64 of size (31326,), and it_full is a path to a .npy data file. Do you have any ideas about what is happening? My initial thought is that audioDIct is not being loaded properly and should be some kind of string object.

Thank you so much!

missing manually annotated test set (AuditoryTestManual)

@arunbalajeev
Thank you for your paper and code. It's impressive work.

I would like to use your dataset, but it seems that it doesn't include manually annotated test set AuditoryTestManual for evaluation.
I appreciate if you could share the test set. (I will use it for my research.)

Thank you !

arunbalajeev / binaural-sound-perception Goto Github PK

binaural-sound-perception's Introduction

Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds

Arun Balajee Vasudevan, Dengxin Dai, Luc Van Gool

License

Installation

Data

Training

Acknowledgement

Citation

binaural-sound-perception's People

Contributors

Stargazers

Watchers

Forkers

binaural-sound-perception's Issues

Recommend Projects

Recommend Topics

Recommend Org