Giter VIP home page Giter VIP logo

l2s's Introduction

Listen2Scene: Interactive material-aware binaural sound propagation for reconstructed 3D scenes

This is the official implementation of our end-to-end binaural audio rendering approach (Listen2Scene) for virtual reality (VR) and augmented reality (AR) applications. Our Neural Sound Rendering results is available here.

Requirements

Python3.9.7
pip3 install numpy
pip3 install wheel
pip3 install torch==1.13.0+cu117 torchvision==0.14.0+cu117 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu117
pip3 install python-dateutil
pip3 install soundfile
pip3 install pandas
pip3 install scipy
pip3 install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.13.0+cu117.html
pip3 install librosa
pip3 install easydict
pip3 install cupy-cuda11x
pip3 install wavefile
pip3 install torchfile
pip3 install pyyaml==5.4.1
pip3 install pymeshlab
pip install openmesh
pip3 install gdown
pip3 install matplotlib
pip3 install IPython
pip3 install pydub
pip3 install torch-geometric==2.1.0

Please note that, in the above requirements we installed and tested on cupy library and torch-geometric library compatible with CUDAv11.7. For different CUDA versions, you can find the appropriate installation commands here.

1) https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html
2) https://docs.cupy.dev/en/stable/install.html

Note - If you have issues with loading the trained model, downgrade torch-geometric (pip3 install torch-geometric==2.1.0)

Download Listen2Scene Dataset

To download the Listen2Scene dataset run the following command

source download_data.sh

You also can directly download it from the following link

https://drive.google.com/uc?id=1FnBadVRQvtV9jMrCz_F-U_YwjvxkK8s0

Evaluation

Download the trained model, sample 3D indoor real environment meshes from ScanNet dataset, and sample source-receiver paths files using the following command.

source download_files.sh

Generate embedding with different receiver and source locations for five different real 3D indoor scenes. For 5 different real indoor scenes, we have stored sample source-receiver locations in a CSV format inside the Paths folder. Columns 2-4 give the 3D cartesian coordinates of the source and receiver positions. Column 1 with negative values corresponds to source positions and Column 1 with non-negative values corresponds to listener positions.

python3 embed_generator.py

Generate binaural IRs corresponding to each embedding file inside Embeddings folder using the following command.

python3 evaluate.py

l2s's People

Contributors

anton-jeran avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

l2s's Issues

Error loading Model

Hi, trying to use the pretrained model but there seemed to be mismatched between the definition in the code and the provided weights.
here is the error fully -
----------path: Models/MESH2IR/netG_epoch_40.pth
----------path: Models/MESH2IR/mesh_net_epoch_40.pth
thus STAGE1_G(
(cond_net): COND_NET(
(fc): Linear(in_features=14, out_features=10, bias=True)
(relu): PReLU(num_parameters=1)
)
(fc): Sequential(
(0): Linear(in_features=10, out_features=32768, bias=False)
(1): BatchNorm1d(32768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): PReLU(num_parameters=1)
)
(upsample1): Sequential(
(0): ConvTranspose1d(2048, 1024, kernel_size=(41,), stride=(4,), padding=(19,), output_padding=(1,))
(1): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): PReLU(num_parameters=1)
)
(upsample2): Sequential(
(0): ConvTranspose1d(1024, 512, kernel_size=(41,), stride=(4,), padding=(19,), output_padding=(1,))
(1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): PReLU(num_parameters=1)
)
(upsample3): Sequential(
(0): ConvTranspose1d(512, 256, kernel_size=(41,), stride=(4,), padding=(19,), output_padding=(1,))
(1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): PReLU(num_parameters=1)
)
(upsample4): Sequential(
(0): ConvTranspose1d(256, 128, kernel_size=(41,), stride=(2,), padding=(20,), output_padding=(1,))
(1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): PReLU(num_parameters=1)
)
(upsample5): Sequential(
(0): ConvTranspose1d(128, 128, kernel_size=(41,), stride=(2,), padding=(20,), output_padding=(1,))
(1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): PReLU(num_parameters=1)
)
(RIR): Sequential(
(0): ConvTranspose1d(128, 2, kernel_size=(41,), stride=(1,), padding=(20,))
(1): Tanh()
)
)
Load from: Models/MESH2IR/netG_epoch_40.pth
Traceback (most recent call last):
File "C:\Users\noamk\OneDrive\Desktop\NOAM\Miluim\L2S-main\L2S-main\evaluate.py", line 191, in
evaluate()
File "C:\Users\noamk\OneDrive\Desktop\NOAM\Miluim\L2S-main\L2S-main\evaluate.py", line 111, in evaluate
netG, mesh_net = load_network_stageI(netG_path,mesh_net_path)
File "C:\Users\noamk\OneDrive\Desktop\NOAM\Miluim\L2S-main\L2S-main\evaluate.py", line 73, in load_network_stageI
mesh_net.load_state_dict(state_dict)
File "C:\Users\noamk\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\module.py", line 2152, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for MESH_NET:
Missing key(s) in state_dict: "pool1.select.weight", "pool2.select.weight", "pool3.select.weight".
Unexpected key(s) in state_dict: "pool1.weight", "pool2.weight", "pool3.weight".
[Finished in 6.9s]
Would love your help!
Thanks!

Alternative ways to download the data

Hello,
Thank you for your interesting works!

I have frequent connection and permission issues with Google Drive.
Are there other ways to download the data?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.