Listen2Scene: Interactive material-aware binaural sound propagation for reconstructed 3D scenes

This is the official implementation of our end-to-end binaural audio rendering approach (Listen2Scene) for virtual reality (VR) and augmented reality (AR) applications. Our Neural Sound Rendering results is available here.

Requirements

Python3.9.7
pip3 install numpy
pip3 install wheel
pip3 install torch==1.13.0+cu117 torchvision==0.14.0+cu117 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu117
pip3 install python-dateutil
pip3 install soundfile
pip3 install pandas
pip3 install scipy
pip3 install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.13.0+cu117.html
pip3 install librosa
pip3 install easydict
pip3 install cupy-cuda11x
pip3 install wavefile
pip3 install torchfile
pip3 install pyyaml==5.4.1
pip3 install pymeshlab
pip install openmesh
pip3 install gdown
pip3 install matplotlib
pip3 install IPython
pip3 install pydub
pip3 install torch-geometric==2.1.0

Please note that, in the above requirements we installed and tested on cupy library and torch-geometric library compatible with CUDAv11.7. For different CUDA versions, you can find the appropriate installation commands here.

1) https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html
2) https://docs.cupy.dev/en/stable/install.html

Note - If you have issues with loading the trained model, downgrade torch-geometric (pip3 install torch-geometric==2.1.0)

Download Listen2Scene Dataset

To download the Listen2Scene dataset run the following command

source download_data.sh

You also can directly download it from the following link

https://drive.google.com/uc?id=1FnBadVRQvtV9jMrCz_F-U_YwjvxkK8s0

Evaluation

Download the trained model, sample 3D indoor real environment meshes from ScanNet dataset, and sample source-receiver paths files using the following command.

source download_files.sh

Generate embedding with different receiver and source locations for five different real 3D indoor scenes. For 5 different real indoor scenes, we have stored sample source-receiver locations in a CSV format inside the Paths folder. Columns 2-4 give the 3D cartesian coordinates of the source and receiver positions. Column 1 with negative values corresponds to source positions and Column 1 with non-negative values corresponds to listener positions.

python3 embed_generator.py

Generate binaural IRs corresponding to each embedding file inside Embeddings folder using the following command.

python3 evaluate.py

Error loading Model

Hi, trying to use the pretrained model but there seemed to be mismatched between the definition in the code and the provided weights.
here is the error fully -
----------path: Models/MESH2IR/netG_epoch_40.pth
----------path: Models/MESH2IR/mesh_net_epoch_40.pth
thus STAGE1_G(
(cond_net): COND_NET(
(fc): Linear(in_features=14, out_features=10, bias=True)
(relu): PReLU(num_parameters=1)
)
(fc): Sequential(
(0): Linear(in_features=10, out_features=32768, bias=False)
(1): BatchNorm1d(32768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): PReLU(num_parameters=1)
)
(upsample1): Sequential(
(0): ConvTranspose1d(2048, 1024, kernel_size=(41,), stride=(4,), padding=(19,), output_padding=(1,))
(1): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): PReLU(num_parameters=1)
)
(upsample2): Sequential(
(0): ConvTranspose1d(1024, 512, kernel_size=(41,), stride=(4,), padding=(19,), output_padding=(1,))
(1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): PReLU(num_parameters=1)
)
(upsample3): Sequential(
(0): ConvTranspose1d(512, 256, kernel_size=(41,), stride=(4,), padding=(19,), output_padding=(1,))
(1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): PReLU(num_parameters=1)
)
(upsample4): Sequential(
(0): ConvTranspose1d(256, 128, kernel_size=(41,), stride=(2,), padding=(20,), output_padding=(1,))
(1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): PReLU(num_parameters=1)
)
(upsample5): Sequential(
(0): ConvTranspose1d(128, 128, kernel_size=(41,), stride=(2,), padding=(20,), output_padding=(1,))
(1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): PReLU(num_parameters=1)
)
(RIR): Sequential(
(0): ConvTranspose1d(128, 2, kernel_size=(41,), stride=(1,), padding=(20,))
(1): Tanh()
)
)
Load from: Models/MESH2IR/netG_epoch_40.pth
Traceback (most recent call last):
File "C:\Users\noamk\OneDrive\Desktop\NOAM\Miluim\L2S-main\L2S-main\evaluate.py", line 191, in
evaluate()
File "C:\Users\noamk\OneDrive\Desktop\NOAM\Miluim\L2S-main\L2S-main\evaluate.py", line 111, in evaluate
netG, mesh_net = load_network_stageI(netG_path,mesh_net_path)
File "C:\Users\noamk\OneDrive\Desktop\NOAM\Miluim\L2S-main\L2S-main\evaluate.py", line 73, in load_network_stageI
mesh_net.load_state_dict(state_dict)
File "C:\Users\noamk\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\module.py", line 2152, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for MESH_NET:
Missing key(s) in state_dict: "pool1.select.weight", "pool2.select.weight", "pool3.select.weight".
Unexpected key(s) in state_dict: "pool1.weight", "pool2.weight", "pool3.weight".
[Finished in 6.9s]
Would love your help!
Thanks!

anton-jeran / l2s Goto Github PK

l2s's Introduction

Listen2Scene: Interactive material-aware binaural sound propagation for reconstructed 3D scenes

Requirements

Download Listen2Scene Dataset

Evaluation

l2s's People

Contributors

Stargazers

Watchers

Forkers

l2s's Issues

Error loading Model

Alternative ways to download the data

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent