Giter VIP home page Giter VIP logo

tiny-cuda-nn's Introduction

Tiny CUDA Neural Networks

This is a small, self-contained framework for training and querying neural networks. Most notably, it contains a lightning fast "fully fused" multi-layer perceptron (technical paper), a versatile multiresolution hash encoding (technical paper), as well as support for various other input encodings, losses, and optimizers.

Performance

Image Fully fused networks vs. TensorFlow v2.5.0 w/ XLA. Measured on 64 (solid line) and 128 (dashed line) neurons wide multi-layer perceptrons on an RTX 3090. Generated by benchmarks/bench_ours.cu and benchmarks/bench_tensorflow.py using data/config_oneblob.json.

Usage

Tiny CUDA neural networks have a simple C++/CUDA API:

#include <tiny-cuda-nn/common.h>

// Configure the model
nlohmann::json config = {
	{"loss", {
		{"otype", "L2"}
	}},
	{"optimizer", {
		{"otype", "Adam"},
		{"learning_rate", 1e-3},
	}},
	{"encoding", {
		{"otype", "HashGrid"},
		{"n_levels", 16},
		{"n_features_per_level", 2},
		{"log2_hashmap_size", 19},
		{"base_resolution", 16},
		{"per_level_scale", 2.0},
	}},
	{"network", {
		{"otype", "FullyFusedMLP"},
		{"activation", "ReLU"},
		{"output_activation", "None"},
		{"n_neurons", 64},
		{"n_hidden_layers", 2},
	}},
};

using namespace tcnn;

auto model = create_from_config(n_input_dims, n_output_dims, config);

// Train the model (batch_size must be a multiple of tcnn::BATCH_SIZE_GRANULARITY)
GPUMatrix<float> training_batch_inputs(n_input_dims, batch_size);
GPUMatrix<float> training_batch_targets(n_output_dims, batch_size);

for (int i = 0; i < n_training_steps; ++i) {
	generate_training_batch(&training_batch_inputs, &training_batch_targets); // <-- your code

	float loss;
	model.trainer->training_step(training_batch_inputs, training_batch_targets, &loss);
	std::cout << "iteration=" << i << " loss=" << loss << std::endl;
}

// Use the model
GPUMatrix<float> inference_inputs(n_input_dims, batch_size);
generate_inputs(&inference_inputs); // <-- your code

GPUMatrix<float> inference_outputs(n_output_dims, batch_size);
model.network->inference(inference_inputs, inference_outputs);

Example: learning a 2D image

We provide a sample application where an image function (x,y) -> (R,G,B) is learned. It can be run via

tiny-cuda-nn$ ./build/mlp_learning_an_image data/images/albert.jpg data/config_hash.json

producing an image every couple of training steps. Each 1000 steps should take a bit over 1 second with the default configuration on an RTX 4090.

10 steps 100 steps 1000 steps Reference image
10steps 100steps 1000steps reference

Requirements

  • An NVIDIA GPU; tensor cores increase performance when available. All shown results come from an RTX 3090.
  • A C++14 capable compiler. The following choices are recommended and have been tested:
    • Windows: Visual Studio 2019 or 2022
    • Linux: GCC/G++ 8 or higher
  • A recent version of CUDA. The following choices are recommended and have been tested:
    • Windows: CUDA 11.5 or higher
    • Linux: CUDA 10.2 or higher
  • CMake v3.21 or higher.
  • The fully fused MLP component of this framework requires a very large amount of shared memory in its default configuration. It will likely only work on an RTX 3090, an RTX 2080 Ti, or higher-end GPUs. Lower end cards must reduce the n_neurons parameter or use the CutlassMLP (better compatibility but slower) instead.

If you are using Linux, install the following packages

sudo apt-get install build-essential git

We also recommend installing CUDA in /usr/local/ and adding the CUDA installation to your PATH. For example, if you have CUDA 11.4, add the following to your ~/.bashrc

export PATH="/usr/local/cuda-11.4/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-11.4/lib64:$LD_LIBRARY_PATH"

Compilation (Windows & Linux)

Begin by cloning this repository and all its submodules using the following command:

$ git clone --recursive https://github.com/nvlabs/tiny-cuda-nn
$ cd tiny-cuda-nn

Then, use CMake to build the project: (on Windows, this must be in a developer command prompt)

tiny-cuda-nn$ cmake . -B build -DCMAKE_BUILD_TYPE=RelWithDebInfo
tiny-cuda-nn$ cmake --build build --config RelWithDebInfo -j

If compilation fails inexplicably or takes longer than an hour, you might be running out of memory. Try running the above command without -j in that case.

PyTorch extension

tiny-cuda-nn comes with a PyTorch extension that allows using the fast MLPs and input encodings from within a Python context. These bindings can be significantly faster than full Python implementations; in particular for the multiresolution hash encoding.

The overheads of Python/PyTorch can nonetheless be extensive if the batch size is small. For example, with a batch size of 64k, the bundled mlp_learning_an_image example is ~2x slower through PyTorch than native CUDA. With a batch size of 256k and higher (default), the performance is much closer.

Begin by setting up a Python 3.X environment with a recent, CUDA-enabled version of PyTorch. Then, invoke

pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch

Alternatively, if you would like to install from a local clone of tiny-cuda-nn, invoke

tiny-cuda-nn$ cd bindings/torch
tiny-cuda-nn/bindings/torch$ python setup.py install

Upon success, you can use tiny-cuda-nn models as in the following example:

import commentjson as json
import tinycudann as tcnn
import torch

with open("data/config_hash.json") as f:
	config = json.load(f)

# Option 1: efficient Encoding+Network combo.
model = tcnn.NetworkWithInputEncoding(
	n_input_dims, n_output_dims,
	config["encoding"], config["network"]
)

# Option 2: separate modules. Slower but more flexible.
encoding = tcnn.Encoding(n_input_dims, config["encoding"])
network = tcnn.Network(encoding.n_output_dims, n_output_dims, config["network"])
model = torch.nn.Sequential(encoding, network)

See samples/mlp_learning_an_image_pytorch.py for an example.

Components

Following is a summary of the components of this framework. The JSON documentation lists configuration options.

Networks    
Fully fused MLP src/fully_fused_mlp.cu Lightning fast implementation of small multi-layer perceptrons (MLPs).
CUTLASS MLP src/cutlass_mlp.cu MLP based on CUTLASS' GEMM routines. Slower than fully-fused, but handles larger networks and still is reasonably fast.
Input encodings    
Composite include/tiny-cuda-nn/encodings/composite.h Allows composing multiple encodings. Can be, for example, used to assemble the Neural Radiance Caching encoding [Müller et al. 2021].
Frequency include/tiny-cuda-nn/encodings/frequency.h NeRF's [Mildenhall et al. 2020] positional encoding applied equally to all dimensions.
Grid include/tiny-cuda-nn/encodings/grid.h Encoding based on trainable multiresolution grids. Used for Instant Neural Graphics Primitives [Müller et al. 2022]. The grids can be backed by hashtables, dense storage, or tiled storage.
Identity include/tiny-cuda-nn/encodings/identity.h Leaves values untouched.
Oneblob include/tiny-cuda-nn/encodings/oneblob.h From Neural Importance Sampling [Müller et al. 2019] and Neural Control Variates [Müller et al. 2020].
SphericalHarmonics include/tiny-cuda-nn/encodings/spherical_harmonics.h A frequency-space encoding that is more suitable to direction vectors than component-wise ones.
TriangleWave include/tiny-cuda-nn/encodings/triangle_wave.h Low-cost alternative to the NeRF's encoding. Used in Neural Radiance Caching [Müller et al. 2021].
Losses    
L1 include/tiny-cuda-nn/losses/l1.h Standard L1 loss.
Relative L1 include/tiny-cuda-nn/losses/l1.h Relative L1 loss normalized by the network prediction.
MAPE include/tiny-cuda-nn/losses/mape.h Mean absolute percentage error (MAPE). The same as Relative L1, but normalized by the target.
SMAPE include/tiny-cuda-nn/losses/smape.h Symmetric mean absolute percentage error (SMAPE). The same as Relative L1, but normalized by the mean of the prediction and the target.
L2 include/tiny-cuda-nn/losses/l2.h Standard L2 loss.
Relative L2 include/tiny-cuda-nn/losses/relative_l2.h Relative L2 loss normalized by the network prediction [Lehtinen et al. 2018].
Relative L2 Luminance include/tiny-cuda-nn/losses/relative_l2_luminance.h Same as above, but normalized by the luminance of the network prediction. Only applicable when network prediction is RGB. Used in Neural Radiance Caching [Müller et al. 2021].
Cross Entropy include/tiny-cuda-nn/losses/cross_entropy.h Standard cross entropy loss. Only applicable when the network prediction is a PDF.
Variance include/tiny-cuda-nn/losses/variance_is.h Standard variance loss. Only applicable when the network prediction is a PDF.
Optimizers    
Adam include/tiny-cuda-nn/optimizers/adam.h Implementation of Adam [Kingma and Ba 2014], generalized to AdaBound [Luo et al. 2019].
Novograd include/tiny-cuda-nn/optimizers/lookahead.h Implementation of Novograd [Ginsburg et al. 2019].
SGD include/tiny-cuda-nn/optimizers/sgd.h Standard stochastic gradient descent (SGD).
Shampoo include/tiny-cuda-nn/optimizers/shampoo.h Implementation of the 2nd order Shampoo optimizer [Gupta et al. 2018] with home-grown optimizations as well as those by Anil et al. [2020].
Average include/tiny-cuda-nn/optimizers/average.h Wraps another optimizer and computes a linear average of the weights over the last N iterations. The average is used for inference only (does not feed back into training).
Batched include/tiny-cuda-nn/optimizers/batched.h Wraps another optimizer, invoking the nested optimizer once every N steps on the averaged gradient. Has the same effect as increasing the batch size but requires only a constant amount of memory.
Composite include/tiny-cuda-nn/optimizers/composite.h Allows using several optimizers on different parameters.
EMA include/tiny-cuda-nn/optimizers/average.h Wraps another optimizer and computes an exponential moving average of the weights. The average is used for inference only (does not feed back into training).
Exponential Decay include/tiny-cuda-nn/optimizers/exponential_decay.h Wraps another optimizer and performs piecewise-constant exponential learning-rate decay.
Lookahead include/tiny-cuda-nn/optimizers/lookahead.h Wraps another optimizer, implementing the lookahead algorithm [Zhang et al. 2019].

License and Citation

This framework is licensed under the BSD 3-clause license. Please see LICENSE.txt for details.

If you use it in your research, we would appreciate a citation via

@software{tiny-cuda-nn,
	author = {M\"uller, Thomas},
	license = {BSD-3-Clause},
	month = {4},
	title = {{tiny-cuda-nn}},
	url = {https://github.com/NVlabs/tiny-cuda-nn},
	version = {1.7},
	year = {2021}
}

For business inquiries, please visit our website and submit the form: NVIDIA Research Licensing

Publications & Software

Among others, this framework powers the following publications:

Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
Thomas Müller, Alex Evans, Christoph Schied, Alexander Keller
ACM Transactions on Graphics (SIGGRAPH), July 2022
Website / Paper / Code / Video / BibTeX

Extracting Triangular 3D Models, Materials, and Lighting From Images
Jacob Munkberg, Jon Hasselgren, Tianchang Shen, Jun Gao, Wenzheng Chen, Alex Evans, Thomas Müller, Sanja Fidler
CVPR (Oral), June 2022
Website / Paper / Video / BibTeX

Real-time Neural Radiance Caching for Path Tracing
Thomas Müller, Fabrice Rousselle, Jan Novák, Alexander Keller
ACM Transactions on Graphics (SIGGRAPH), August 2021
Paper / GTC talk / Video / Interactive results viewer / BibTeX

As well as the following software:

NerfAcc: A General NeRF Accleration Toolbox
Ruilong Li, Matthew Tancik, Angjoo Kanazawa
https://github.com/KAIR-BAIR/nerfacc

Nerfstudio: A Framework for Neural Radiance Field Development
Matthew Tancik*, Ethan Weber*, Evonne Ng*, Ruilong Li, Brent Yi, Terrance Wang, Alexander Kristoffersen, Jake Austin, Kamyar Salahi, Abhik Ahuja, David McAllister, Angjoo Kanazawa
https://github.com/nerfstudio-project/nerfstudio

Please feel free to make a pull request if your publication or software is not listed.

Acknowledgments

Special thanks go to the NRC authors for helpful discussions and to Nikolaus Binder for providing part of the infrastructure of this framework, as well as for help with utilizing TensorCores from within CUDA.

tiny-cuda-nn's People

Contributors

anadodik avatar cedric-chedaleux avatar chanket avatar dmholtz avatar eltociear avatar enter-tainer avatar fferroni avatar hturki avatar ilya-muromets avatar jamesperlman avatar jc211 avatar jsharpe avatar kacperkan avatar mabl avatar mickeyding avatar solonets avatar tcantenot avatar tom94 avatar udonda avatar vasuagrawal avatar ventusff avatar vladislavzavadskyy avatar weiphil avatar wendazhou avatar yzy1996 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tiny-cuda-nn's Issues

Loading weights into TinyCUDA

Hi! I'm very excited by TinyCUDA and I'd like to test it out for an inference task on a pre-trained model. I have the network weights as a .npy file and I'd ideally like to load them into the fully fused MLP. From a quick scan of the codebase it looks like there isn't any way to load pre-computed model weights (please correct me if I'm wrong). Do you have any advice on how I could go about accomplishing this?

compile error in win10 with vs2019.16.11 and cuda 11.3

Hello,
Now i have the same issue with #74, but i used vs2019.16.11 and cuda 11.3, but i saw the compile error is the same, so could you someone give a help to check that.

Hardware: GEFORCE RTX 3090
DRIVER: 472.84
CUDA: 11.3
VS: 2019.16.11.11
windows 10 SDK: 10.0.19041.0

In addition, why i used cuda11.3 is because the stable torch is only supporting cuda11.3 to compile binding/torch.

Error log:
Compiling CUDA source file ....\src\common_device.cu...
Compiling CUDA source file ....\src\common.cu...
Compiling CUDA source file ....\src\cpp_api.cu...
Compiling CUDA source file ....\src\encoding.cu...
Compiling CUDA source file ....\src\cutlass_mlp.cu...
Compiling CUDA source file ....\src\object.cu...
Compiling CUDA source file ....\src\reduce_sum.cu...
Compiling CUDA source file ....\src\loss.cu...
Compiling CUDA source file ....\src\network.cu...
Compiling CUDA source file ....\src\optimizer.cu...

E:\Zhansheng\software\installed\tiny-cuda-nn\build\src>"E:\Zhansheng\software\installed\NVDIA-GPU-computing-toolkit\CUDA\v11.3\bin\nvcc.exe" -gencode=arch=compute_86,code="compute_86,compute_86" -gencode=arch=compute_86,code="sm_86,compute_86" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\HostX64\x64" -x cu -I"E:\Zhansheng\software\installed\tiny-cuda-nn\include" -I"E:\Zhansheng\software\installed\tiny-cuda-nn\dependencies" -I"E:\Zhansheng\software\installed\tiny-cuda-nn\dependencies\cutlass\include" -I"E:\Zhansheng\software\installed\tiny-cuda-nn\dependencies\cutlass\tools\util\include" -I"E:\Zhansheng\software\installed\NVDIA-GPU-computing-toolkit\CUDA\v11.3\include" --keep-dir x64\RelWithDebInfo -maxrregcount=0 --machine 64 --compile -cudart static --extended-lambda --expt-relaxed-constexpr -std=c++14 -Xcompiler="/EHsc -Zi -Ob1" -D_WINDOWS -DNDEBUG -DTCNN_MIN_GPU_ARCH=86 -DTCNN_SHAMPOO -D"CMAKE_INTDIR="RelWithDebInfo"" -D_MBCS -D"CMAKE_INTDIR="RelWithDebInfo"" -Xcompiler "/EHsc /W1 /nologo /O2 /FdE:\Zhansheng\software\installed\tiny-cuda-nn\build\src\RelWithDebInfo\tiny-cuda-nn.pdb /FS /Zi /MD /GR" -o tiny-cuda-nn.dir\RelWithDebInfo\common.obj "E:\Zhansheng\software\installed\tiny-cuda-nn\src\common.cu"

E:\Zhansheng\software\installed\tiny-cuda-nn\build\src>"E:\Zhansheng\software\installed\NVDIA-GPU-computing-toolkit\CUDA\v11.3\bin\nvcc.exe" -gencode=arch=compute_86,code="compute_86,compute_86" -gencode=arch=compute_86,code="sm_86,compute_86" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\HostX64\x64" -x cu -I"E:\Zhansheng\software\installed\tiny-cuda-nn\include" -I"E:\Zhansheng\software\installed\tiny-cuda-nn\dependencies" -I"E:\Zhansheng\software\installed\tiny-cuda-nn\dependencies\cutlass\include" -I"E:\Zhansheng\software\installed\tiny-cuda-nn\dependencies\cutlass\tools\util\include" -I"E:\Zhansheng\software\installed\NVDIA-GPU-computing-toolkit\CUDA\v11.3\include" --keep-dir x64\RelWithDebInfo -maxrregcount=0 --machine 64 --compile -cudart static --extended-lambda --expt-relaxed-constexpr -std=c++14 -Xcompiler="/EHsc -Zi -Ob1" -D_WINDOWS -DNDEBUG -DTCNN_MIN_GPU_ARCH=86 -DTCNN_SHAMPOO -D"CMAKE_INTDIR="RelWithDebInfo"" -D_MBCS -D"CMAKE_INTDIR="RelWithDebInfo"" -Xcompiler "/EHsc /W1 /nologo /O2 /FdE:\Zhansheng\software\installed\tiny-cuda-nn\build\src\RelWithDebInfo\tiny-cuda-nn.pdb /FS /Zi /MD /GR" -o tiny-cuda-nn.dir\RelWithDebInfo\common_device.obj "E:\Zhansheng\software\installed\tiny-cuda-nn\src\common_device.cu"

E:\Zhansheng\software\installed\tiny-cuda-nn\build\src>"E:\Zhansheng\software\installed\NVDIA-GPU-computing-toolkit\CUDA\v11.3\bin\nvcc.exe" -gencode=arch=compute_86,code="compute_86,compute_86" -gencode=arch=compute_86,code="sm_86,compute_86" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\HostX64\x64" -x cu -I"E:\Zhansheng\software\installed\tiny-cuda-nn\include" -I"E:\Zhansheng\software\installed\tiny-cuda-nn\dependencies" -I"E:\Zhansheng\software\installed\tiny-cuda-nn\dependencies\cutlass\include" -I"E:\Zhansheng\software\installed\tiny-cuda-nn\dependencies\cutlass\tools\util\include" -I"E:\Zhansheng\software\installed\NVDIA-GPU-computing-toolkit\CUDA\v11.3\include" --keep-dir x64\RelWithDebInfo -maxrregcount=0 --machine 64 --compile -cudart static --extended-lambda --expt-relaxed-constexpr -std=c++14 -Xcompiler="/EHsc -Zi -Ob1" -D_WINDOWS -DNDEBUG -DTCNN_MIN_GPU_ARCH=86 -DTCNN_SHAMPOO -D"CMAKE_INTDIR="RelWithDebInfo"" -D_MBCS -D"CMAKE_INTDIR="RelWithDebInfo"" -Xcompiler "/EHsc /W1 /nologo /O2 /FdE:\Zhansheng\software\installed\tiny-cuda-nn\build\src\RelWithDebInfo\tiny-cuda-nn.pdb /FS /Zi /MD /GR" -o tiny-cuda-nn.dir\RelWithDebInfo\cpp_api.obj "E:\Zhansheng\software\installed\tiny-cuda-nn\src\cpp_api.cu"
Compiling CUDA source file ....\src\fully_fused_mlp.cu...

E:\Zhansheng\software\installed\tiny-cuda-nn\build\src>"E:\Zhansheng\software\installed\NVDIA-GPU-computing-toolkit\CUDA\v11.3\bin\nvcc.exe" -gencode=arch=compute_86,code="compute_86,compute_86" -gencode=arch=compute_86,code="sm_86,compute_86" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\HostX64\x64" -x cu -I"E:\Zhansheng\software\installed\tiny-cuda-nn\include" -I"E:\Zhansheng\software\installed\tiny-cuda-nn\dependencies" -I"E:\Zhansheng\software\installed\tiny-cuda-nn\dependencies\cutlass\include" -I"E:\Zhansheng\software\installed\tiny-cuda-nn\dependencies\cutlass\tools\util\include" -I"E:\Zhansheng\software\installed\NVDIA-GPU-computing-toolkit\CUDA\v11.3\include" --keep-dir x64\RelWithDebInfo -maxrregcount=0 --machine 64 --compile -cudart static --extended-lambda --expt-relaxed-constexpr -std=c++14 -Xcompiler="/EHsc -Zi -Ob1" -D_WINDOWS -DNDEBUG -DTCNN_MIN_GPU_ARCH=86 -DTCNN_SHAMPOO -D"CMAKE_INTDIR="RelWithDebInfo"" -D_MBCS -D"CMAKE_INTDIR="RelWithDebInfo"" -Xcompiler "/EHsc /W1 /nologo /O2 /FdE:\Zhansheng\software\installed\tiny-cuda-nn\build\src\RelWithDebInfo\tiny-cuda-nn.pdb /FS /Zi /MD /GR" -o tiny-cuda-nn.dir\RelWithDebInfo\encoding.obj "E:\Zhansheng\software\installed\tiny-cuda-nn\src\encoding.cu"

E:\Zhansheng\software\installed\tiny-cuda-nn\build\src>"E:\Zhansheng\software\installed\NVDIA-GPU-computing-toolkit\CUDA\v11.3\bin\nvcc.exe" -gencode=arch=compute_86,code="compute_86,compute_86" -gencode=arch=compute_86,code="sm_86,compute_86" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\HostX64\x64" -x cu -I"E:\Zhansheng\software\installed\tiny-cuda-nn\include" -I"E:\Zhansheng\software\installed\tiny-cuda-nn\dependencies" -I"E:\Zhansheng\software\installed\tiny-cuda-nn\dependencies\cutlass\include" -I"E:\Zhansheng\software\installed\tiny-cuda-nn\dependencies\cutlass\tools\util\include" -I"E:\Zhansheng\software\installed\NVDIA-GPU-computing-toolkit\CUDA\v11.3\include" --keep-dir x64\RelWithDebInfo -maxrregcount=0 --machine 64 --compile -cudart static --extended-lambda --expt-relaxed-constexpr -std=c++14 -Xcompiler="/EHsc -Zi -Ob1" -D_WINDOWS -DNDEBUG -DTCNN_MIN_GPU_ARCH=86 -DTCNN_SHAMPOO -D"CMAKE_INTDIR="RelWithDebInfo"" -D_MBCS -D"CMAKE_INTDIR="RelWithDebInfo"" -Xcompiler "/EHsc /W1 /nologo /O2 /FdE:\Zhansheng\software\installed\tiny-cuda-nn\build\src\RelWithDebInfo\tiny-cuda-nn.pdb /FS /Zi /MD /GR" -o tiny-cuda-nn.dir\RelWithDebInfo\network.obj "E:\Zhansheng\software\installed\tiny-cuda-nn\src\network.cu"

E:\Zhansheng\software\installed\tiny-cuda-nn\build\src>"E:\Zhansheng\software\installed\NVDIA-GPU-computing-toolkit\CUDA\v11.3\bin\nvcc.exe" -gencode=arch=compute_86,code="compute_86,compute_86" -gencode=arch=compute_86,code="sm_86,compute_86" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\HostX64\x64" -x cu -I"E:\Zhansheng\software\installed\tiny-cuda-nn\include" -I"E:\Zhansheng\software\installed\tiny-cuda-nn\dependencies" -I"E:\Zhansheng\software\installed\tiny-cuda-nn\dependencies\cutlass\include" -I"E:\Zhansheng\software\installed\tiny-cuda-nn\dependencies\cutlass\tools\util\include" -I"E:\Zhansheng\software\installed\NVDIA-GPU-computing-toolkit\CUDA\v11.3\include" --keep-dir x64\RelWithDebInfo -maxrregcount=0 --machine 64 --compile -cudart static --extended-lambda --expt-relaxed-constexpr -std=c++14 -Xcompiler="/EHsc -Zi -Ob1" -D_WINDOWS -DNDEBUG -DTCNN_MIN_GPU_ARCH=86 -DTCNN_SHAMPOO -D"CMAKE_INTDIR="RelWithDebInfo"" -D_MBCS -D"CMAKE_INTDIR="RelWithDebInfo"" -Xcompiler "/EHsc /W1 /nologo /O2 /FdE:\Zhansheng\software\installed\tiny-cuda-nn\build\src\RelWithDebInfo\tiny-cuda-nn.pdb /FS /Zi /MD /GR" -o tiny-cuda-nn.dir\RelWithDebInfo\cutlass_mlp.obj "E:\Zhansheng\software\installed\tiny-cuda-nn\src\cutlass_mlp.cu"

E:\Zhansheng\software\installed\tiny-cuda-nn\build\src>"E:\Zhansheng\software\installed\NVDIA-GPU-computing-toolkit\CUDA\v11.3\bin\nvcc.exe" -gencode=arch=compute_86,code="compute_86,compute_86" -gencode=arch=compute_86,code="sm_86,compute_86" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\HostX64\x64" -x cu -I"E:\Zhansheng\software\installed\tiny-cuda-nn\include" -I"E:\Zhansheng\software\installed\tiny-cuda-nn\dependencies" -I"E:\Zhansheng\software\installed\tiny-cuda-nn\dependencies\cutlass\include" -I"E:\Zhansheng\software\installed\tiny-cuda-nn\dependencies\cutlass\tools\util\include" -I"E:\Zhansheng\software\installed\NVDIA-GPU-computing-toolkit\CUDA\v11.3\include" --keep-dir x64\RelWithDebInfo -maxrregcount=0 --machine 64 --compile -cudart static --extended-lambda --expt-relaxed-constexpr -std=c++14 -Xcompiler="/EHsc -Zi -Ob1" -D_WINDOWS -DNDEBUG -DTCNN_MIN_GPU_ARCH=86 -DTCNN_SHAMPOO -D"CMAKE_INTDIR="RelWithDebInfo"" -D_MBCS -D"CMAKE_INTDIR="RelWithDebInfo"" -Xcompiler "/EHsc /W1 /nologo /O2 /FdE:\Zhansheng\software\installed\tiny-cuda-nn\build\src\RelWithDebInfo\tiny-cuda-nn.pdb /FS /Zi /MD /GR" -o tiny-cuda-nn.dir\RelWithDebInfo\object.obj "E:\Zhansheng\software\installed\tiny-cuda-nn\src\object.cu"

E:\Zhansheng\software\installed\tiny-cuda-nn\build\src>"E:\Zhansheng\software\installed\NVDIA-GPU-computing-toolkit\CUDA\v11.3\bin\nvcc.exe" -gencode=arch=compute_86,code="compute_86,compute_86" -gencode=arch=compute_86,code="sm_86,compute_86" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\HostX64\x64" -x cu -I"E:\Zhansheng\software\installed\tiny-cuda-nn\include" -I"E:\Zhansheng\software\installed\tiny-cuda-nn\dependencies" -I"E:\Zhansheng\software\installed\tiny-cuda-nn\dependencies\cutlass\include" -I"E:\Zhansheng\software\installed\tiny-cuda-nn\dependencies\cutlass\tools\util\include" -I"E:\Zhansheng\software\installed\NVDIA-GPU-computing-toolkit\CUDA\v11.3\include" --keep-dir x64\RelWithDebInfo -maxrregcount=0 --machine 64 --compile -cudart static --extended-lambda --expt-relaxed-constexpr -std=c++14 -Xcompiler="/EHsc -Zi -Ob1" -D_WINDOWS -DNDEBUG -DTCNN_MIN_GPU_ARCH=86 -DTCNN_SHAMPOO -D"CMAKE_INTDIR="RelWithDebInfo"" -D_MBCS -D"CMAKE_INTDIR="RelWithDebInfo"" -Xcompiler "/EHsc /W1 /nologo /O2 /FdE:\Zhansheng\software\installed\tiny-cuda-nn\build\src\RelWithDebInfo\tiny-cuda-nn.pdb /FS /Zi /MD /GR" -o tiny-cuda-nn.dir\RelWithDebInfo\loss.obj "E:\Zhansheng\software\installed\tiny-cuda-nn\src\loss.cu"

E:\Zhansheng\software\installed\tiny-cuda-nn\build\src>"E:\Zhansheng\software\installed\NVDIA-GPU-computing-toolkit\CUDA\v11.3\bin\nvcc.exe" -gencode=arch=compute_86,code="compute_86,compute_86" -gencode=arch=compute_86,code="sm_86,compute_86" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\HostX64\x64" -x cu -I"E:\Zhansheng\software\installed\tiny-cuda-nn\include" -I"E:\Zhansheng\software\installed\tiny-cuda-nn\dependencies" -I"E:\Zhansheng\software\installed\tiny-cuda-nn\dependencies\cutlass\include" -I"E:\Zhansheng\software\installed\tiny-cuda-nn\dependencies\cutlass\tools\util\include" -I"E:\Zhansheng\software\installed\NVDIA-GPU-computing-toolkit\CUDA\v11.3\include" --keep-dir x64\RelWithDebInfo -maxrregcount=0 --machine 64 --compile -cudart static --extended-lambda --expt-relaxed-constexpr -std=c++14 -Xcompiler="/EHsc -Zi -Ob1" -D_WINDOWS -DNDEBUG -DTCNN_MIN_GPU_ARCH=86 -DTCNN_SHAMPOO -D"CMAKE_INTDIR="RelWithDebInfo"" -D_MBCS -D"CMAKE_INTDIR="RelWithDebInfo"" -Xcompiler "/EHsc /W1 /nologo /O2 /FdE:\Zhansheng\software\installed\tiny-cuda-nn\build\src\RelWithDebInfo\tiny-cuda-nn.pdb /FS /Zi /MD /GR" -o tiny-cuda-nn.dir\RelWithDebInfo\optimizer.obj "E:\Zhansheng\software\installed\tiny-cuda-nn\src\optimizer.cu"

E:\Zhansheng\software\installed\tiny-cuda-nn\build\src>"E:\Zhansheng\software\installed\NVDIA-GPU-computing-toolkit\CUDA\v11.3\bin\nvcc.exe" -gencode=arch=compute_86,code="compute_86,compute_86" -gencode=arch=compute_86,code="sm_86,compute_86" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\HostX64\x64" -x cu -I"E:\Zhansheng\software\installed\tiny-cuda-nn\include" -I"E:\Zhansheng\software\installed\tiny-cuda-nn\dependencies" -I"E:\Zhansheng\software\installed\tiny-cuda-nn\dependencies\cutlass\include" -I"E:\Zhansheng\software\installed\tiny-cuda-nn\dependencies\cutlass\tools\util\include" -I"E:\Zhansheng\software\installed\NVDIA-GPU-computing-toolkit\CUDA\v11.3\include" --keep-dir x64\RelWithDebInfo -maxrregcount=0 --machine 64 --compile -cudart static --extended-lambda --expt-relaxed-constexpr -std=c++14 -Xcompiler="/EHsc -Zi -Ob1" -D_WINDOWS -DNDEBUG -DTCNN_MIN_GPU_ARCH=86 -DTCNN_SHAMPOO -D"CMAKE_INTDIR="RelWithDebInfo"" -D_MBCS -D"CMAKE_INTDIR="RelWithDebInfo"" -Xcompiler "/EHsc /W1 /nologo /O2 /FdE:\Zhansheng\software\installed\tiny-cuda-nn\build\src\RelWithDebInfo\tiny-cuda-nn.pdb /FS /Zi /MD /GR" -o tiny-cuda-nn.dir\RelWithDebInfo\reduce_sum.obj "E:\Zhansheng\software\installed\tiny-cuda-nn\src\reduce_sum.cu"

E:\Zhansheng\software\installed\tiny-cuda-nn\build\src>"E:\Zhansheng\software\installed\NVDIA-GPU-computing-toolkit\CUDA\v11.3\bin\nvcc.exe" -gencode=arch=compute_86,code="compute_86,compute_86" -gencode=arch=compute_86,code="sm_86,compute_86" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\HostX64\x64" -x cu -I"E:\Zhansheng\software\installed\tiny-cuda-nn\include" -I"E:\Zhansheng\software\installed\tiny-cuda-nn\dependencies" -I"E:\Zhansheng\software\installed\tiny-cuda-nn\dependencies\cutlass\include" -I"E:\Zhansheng\software\installed\tiny-cuda-nn\dependencies\cutlass\tools\util\include" -I"E:\Zhansheng\software\installed\NVDIA-GPU-computing-toolkit\CUDA\v11.3\include" --keep-dir x64\RelWithDebInfo -maxrregcount=0 --machine 64 --compile -cudart static --extended-lambda --expt-relaxed-constexpr -std=c++14 -Xcompiler="/EHsc -Zi -Ob1" -D_WINDOWS -DNDEBUG -DTCNN_MIN_GPU_ARCH=86 -DTCNN_SHAMPOO -D"CMAKE_INTDIR="RelWithDebInfo"" -D_MBCS -D"CMAKE_INTDIR="RelWithDebInfo"" -Xcompiler "/EHsc /W1 /nologo /O2 /FdE:\Zhansheng\software\installed\tiny-cuda-nn\build\src\RelWithDebInfo\tiny-cuda-nn.pdb /FS /Zi /MD /GR" -o tiny-cuda-nn.dir\RelWithDebInfo\fully_fused_mlp.obj "E:\Zhansheng\software\installed\tiny-cuda-nn\src\fully_fused_mlp.cu"
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\include\xutility(1309): error : expected a "(" [E:\Zhansheng\software\installed\tiny-cuda-nn\build\src\tiny-cuda-nn.vcxproj]
detected during instantiation of "void std::_Adl_verify_range(const _Iter &, const _Sentinel &) [with _Iter=const char *, _Sentinel=const char *]"
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\include\xlocale(1990): here

C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\include\xutility(1310): error : identifier "_Verify_range" is undefined [E:\Zhansheng\software\installed\tiny-cuda-nn\build\src\tiny-cuda-nn.vcxproj]
detected during instantiation of "void std::_Adl_verify_range(const _Iter &, const _Sentinel &) [with _Iter=const char *, _Sentinel=const char *]"
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\include\xlocale(1990): here

C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\include\xutility(1309): error : expected a "(" [E:\Zhansheng\software\installed\tiny-cuda-nn\build\src\tiny-cuda-nn.vcxproj]
detected during instantiation of "void std::_Adl_verify_range(const _Iter &, const _Sentinel &) [with _Iter=__wchar_t *, _Sentinel=__wchar_t *]"
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\include\xlocale(1991): here
...

Error log file:
build.log

Thanks for your contribution!

Regars

Any plans for double backward / second-order gradients ? i.e. backward for backward functions.

Hi,
First of all, thanks for the great repo! I've already built a project based on tcnn and found it extremely helpful.

However during usage, I found out that since the backward functions are c++ implemented, they are not trackable by pytorch, causing autograd.grad(..., create_graph=True) fails to generate grad_fn for grads (i.e. second-order gradients).

This functionality is helpful when training and losses are related to first-order gradients. For example, when training a SDF MLP, typically a eikonal loss will be used, which is a loss applied on dy_dx (nablas) of the network. To achieve this, a d(dy_dx)_dparam is needed.
Ref: https://arxiv.org/abs/2002.10099
Fig:
image

Currently I'm writing custom backward_backward functions upon tcnn's grid.h and fully_fused_mlp.cu, but it would be really nice if this could be officially supported. 😄

BR,
Ventus


🎉🎉🎉 UPDATE: to all people who reach here

For now, a partial support for double backward and only for grid encodings is implemented within the tiny-cuda-nn repo.

Example usage script could be found here.

For implementation details, please check the original PR #69 .

Max Level - Grid Encoding

I don't understand why max_level is being multiplied here.

max_level = (max_level * num_grid_features) / N_FEATURES_PER_LEVEL;

This is effectively saying max_level *= n_levels and its not clear why it must be so. I also assume that max_level_gpu is some kind of fine grained control over a "per entry" max_level.

requires_grad be False after encoding

Thanks for the great project!
I encountered a problem when applying backward function. Specifically, the requires_grad of the outputs from the encoder would be False. I just want to calculate the gradients of the sigma value to the coordinates to estimate the normals.

import torch
import torch.nn.functional as F
import tinycudann as tcnn

sh_enc_degree = 4

sh_encoder = tcnn.Encoding(
    n_input_dims=3,
    encoding_config={"otype": "HashGrid", "degree": sh_enc_degree},
    dtype=torch.float32,
)

inputs = F.normalize(torch.randn(100, 3), dim=-1).cuda()
inputs.requires_grad_()
outputs = sh_encoder(inputs)
outputs.requires_grad

Horizontal wrapping hash grid?

How difficult would it be to make the HashGrid wrap horizontally? This would be useful for representing panoramic scenes with a NeRF.

Error while installing python extension

Hello,
This is most likely a issue on my end, not the repo. But I am posting it here, because I thought that you here will probably know what is up with it.
I am trying to install the python extension, but keep getting this error:

running build_ext
error: [WinError 2] System cannot find the file specified

While doing some shennanigans trying to get it working I made slight progress, and instead got different error:

running build_ext
building 'tinycudann_bindings._C' extension
error: Don't know how to compile ../../src/common.cu

It might be caused by a messed up PATH or something, but I can cmake tcnn successefully, so I am clueless.
I would like to install it to Anaconda env.
I have the latest MSVC (VS19) installed, and in PATH
The latest CUDA installed, and in PATH
All the python dependecies, including torch with active gpu

Any help appreciated
BR,
Paul

CUTLASS Error when output size of network > 16

There appears to be a bug when the output size of a network is greater than 16. This appears to be related to the padding of the output, as it jumps to size 32. I am mostly testing this using instant-grp by setting the n_output_dims of the density network to something larger. Here is a more complete log by replacing the exit() with an assert(False):

python: /home/dronelab/instant-ngp/dependencies/tiny-cuda-nn/include/tiny-cuda-nn/cutlass_matmul.h:363: 
void tcnn::fc_multiply_impl(cudaStream_t, const typename Gemm::Arguments&) 
[with Gemm = cutlass::gemm::device::Gemm<
cutlass::half_t, cutlass::layout::RowMajor, cutlass::half_t, cutlass::layout::ColumnMajor, cutlass::half_t, cutlass::layout::RowMajor, cutlass::half_t, cutlass::arch::OpClassTensorOp, cutlass::arch::Sm80, 
cutlass::gemm::GemmShape<128, 32, 32>, cutlass::gemm::GemmShape<32, 32, 32>, cutlass::gemm::GemmShape<16, 8, 8>,
tcnn::ActivationEpilogue<cutlass::half_t, 8, cutlass::half_t, cutlass::half_t, cutlass::FloatRoundStyle::round_to_nearest>, 
cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<>, 2, 8, 8, false, cutlass::arch::OpMultiplyAdd, false>; 
cudaStream_t = CUstream_st*; 
typename Gemm::Arguments = cutlass::gemm::device::Gemm<cutlass::half_t, cutlass::layout::RowMajor, 
cutlass::half_t, cutlass::layout::ColumnMajor, cutlass::half_t, cutlass::layout::RowMajor, cutlass::half_t, cutlass::arch::OpClassTensorOp, cutlass::arch::Sm80, 
cutlass::gemm::GemmShape<128, 32, 32>, cutlass::gemm::GemmShape<32, 32, 32>, cutlass::gemm::GemmShape<16, 8, 8>, 
tcnn::ActivationEpilogue<cutlass::half_t, 8, cutlass::half_t, cutlass::half_t, cutlass::FloatRoundStyle::round_to_nearest>, cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<>, 2, 8, 8, false, cutlass::arch::OpMultiplyAdd, false>::Arguments]: Assertion `false' failed.

Pytorch binding - How can I customize?

I want to modify a few lines in your codes, and recompile to use it with Pytorch binding.
To check if it is successful, I ran the binary file mlp_learning_an_image, and saw it worked.
Since my goal was to use it from Pytorch, I ran the command tiny-cuda-nn/bindings/torch$ python setup.py install
so that I see the difference when I import tinycudann from python.
However, tinycudann does not seem to have been updated when I run mlp_learning_an_image.py.

What am I missing? Thanks.

Suggestion: Format source code for long lines ......

Dear friends,
Thanks for your good job.
We have one suggestion: Would you like use some source code format tools to reformat your source code? Because some lines are too long for reading and debug ?
Best regards,

namespace "std" has no member "lcm" tiny-cuda-nn

void set_alignment(uint32_t alignment) override {
	alignment = std::lcm(alignment, min_alignment());
	m_n_padded_output_dims = next_multiple(m_n_output_dims, alignment);
	m_n_to_pad = m_n_padded_output_dims - m_n_output_dims;
}

the command line is:
"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.5\bin\nvcc.exe" -gencode=arch=compute_86,code="compute_86,compute_86" -gencode=arch=compute_86,code="sm_86,compute_86" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.29.30133\bin\HostX64\x64" -x cu -I"N:\Test\tiny-cuda-nn\include" -I"N:\Test\tiny-cuda-nn\dependencies" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.5\include" --keep-dir x64\Debug -maxrregcount=0 --machine 64 --compile -cudart static --extended-lambda --expt-relaxed-constexpr -Xcompiler="/EHsc -Zi -Ob0" -g -D_WINDOWS -D"CMAKE_INTDIR="Debug"" -D"CMAKE_INTDIR="Debug"" -D_MBCS -Xcompiler "/EHsc /W1 /nologo /Od /FdN:\Test\tiny-cuda-nn\build\src\Debug\tiny-cuda-nn.pdb /FS /Zi /RTC1 /MDd /GR" -o tiny-cuda-nn.dir\Debug\encoding.obj "N:\Test\tiny-cuda-nn\src\encoding.cu"” 1。 tiny-cuda-nn C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\MSBuild\Microsoft\VC\v160\BuildCustomizations\CUDA 11.5.targets 785

enviroment:
visual c++ 2019, cmake 3.18
windows 10

How big can you make a fully fused MLP while retaining performance benefits?

The speedup in this repo relies on getting the memory traffic close to the chip - in caches/registers etc. This is going to stop working if an MLP is sufficiently large, but I'm unclear where the boundary is.

Does anyone know the answers to these questions:

  1. How big can you make an MLP while retaining the performance benefits? (Has anyone tested this?)
  2. Can you "trade off" a smaller batch size for a larger MP and still keep the benefits?
  3. Would using more powerful hardware (e.g. an A100 which has 40MB L2 cache over an RTX3090 which 6MB L2 cache) expand this performance window?

I could potentially help out with testing (3)

Error when building bindings/torch

Hello.

First of all, thank you for releasing this cool work. I read the recent works and found this release.

I have successfully built the tiny-cuda-nn and ran mlp_learning_an_image and checked that it work.
However, I'm facing the following error when building bindings/torch.

tinycudann/bindings.cpp: In function ‘void* void_data_ptr(at::Tensor&)’:
tinycudann/bindings.cpp:60:48: error: expected primary-expression before ‘float’
case torch::kFloat32: return tensor.data_ptr();
^~~~~
tinycudann/bindings.cpp:60:48: error: expected ‘;’ before ‘float’
tinycudann/bindings.cpp:60:53: error: expected unqualified-id before ‘>’ token
case torch::kFloat32: return tensor.data_ptr();
^
tinycudann/bindings.cpp:61:56: error: expected primary-expression before ‘>’ token
case torch::kHalf: return tensor.data_ptrtorch::Half();
^
tinycudann/bindings.cpp:61:58: error: expected primary-expression before ‘)’ token
case torch::kHalf: return tensor.data_ptrtorch::Half();
^
tinycudann/bindings.cpp: In member function ‘std::tuple<tcnn::cpp::Context, at::Tensor> Module::fwd(at::Tensor, at::Tensor)’:
tinycudann/bindings.cpp:96:59: error: expected primary-expression before ‘float’
m_module->inference(stream, batch_size, input.data_ptr(), void_data_ptr(output), void_data_ptr(params));
^~~~~
tinycudann/bindings.cpp:98:63: error: expected primary-expression before ‘float’
ctx = m_module->forward(stream, batch_size, input.data_ptr(), void_data_ptr(output), void_data_ptr(params), input.requires_grad());
^~~~~
tinycudann/bindings.cpp: In member function ‘std::tuple<at::Tensor, at::Tensor> Module::bwd(const tcnn::cpp::Context&, at::Tensor, at::Tensor, at::Tenr)’:
tinycudann/bindings.cpp:161:48: error: expected primary-expression before ‘float’
input.requires_grad() ? dL_dinput.data_ptr() : nullptr,
^~~~~
tinycudann/bindings.cpp:161:48: error: expected ‘:’ before ‘float’
tinycudann/bindings.cpp:161:48: error: expected primary-expression before ‘float’
tinycudann/bindings.cpp:164:20: error: expected primary-expression before ‘float’
input.data_ptr(),
^~~~~
tinycudann/bindings.cpp: In member function ‘at::Tensor Module::initial_params(size_t)’:
tinycudann/bindings.cpp:175:53: error: expected primary-expression before ‘float’
m_module->initialize_params(seed, output.data_ptr());
^~~~~
error: command 'gcc' failed with exit status 1

I googled about this kind of error, and it says its about the wrong use of types.
However, I feel like there's nothing wrong with the code.

I've read #48, but it seems the problem is not the same.

I'm having a strong suspicion

My configuration is as below.
ubuntu 18.04
RTX8000
Driver version
cuda 11.0
torch 1.2.1

Thank you in advance!

Composite encoding with Grid

Hi Thomas,

Thanks for sharing this code it's super helpful.

Is combining the Grid encoding with other encodings using Composite not supported? When I add the Grid encoding in a Composite encoding for the 3 first dimension I get a RuntimeError: CUDA error: an illegal memory access was encountered while if I replace it with a Frequency encoding the code works.

My config is:

	"loss": {
		"otype": "RelativeL2"
	},
	"optimizer": {
		"otype": "Adam",
		"learning_rate": 1e-2,
		"beta1": 0.9,
		"beta2": 0.99,
		"epsilon": 1e-15,
		"l2_reg": 1e-6
	},
	"encoding":	{
		"otype": "Composite",
		"nested": [
			{
				"n_dims_to_encode": 3,
				"otype": "Grid",           // Component type.
				"type": "Hash",            // Type of backing storage of the
										   // grids. Can be "Hash", "Tiled"
										   // or "Dense".
				"n_levels": 16,            // Number of levels (resolutions)
				"n_features_per_level": 2, // Dimensionality of feature vector
										   // stored in each level's entries.
				"log2_hashmap_size": 19,   // If type is "Hash", is the base-2
										   // logarithm of the number of elements
										   // in each backing hash table.
				"base_resolution": 16,     // The resolution of the coarsest le-
										   // vel is base_resolution^input_dims.
				"per_level_scale": 2.0,    // The geometric growth factor, i.e.
										   // the factor by which the resolution
										   // of each grid is larger (per axis)
										   // than that of the preceeding level.
				"interpolation": "Linear"  // How to interpolate nearby grid
										   // lookups. Can be "Nearest", "Linear",
										   // or "Smoothstep" (for smooth deri-
										   // vatives).
			},
			{
				"otype": "Identity"
			}

		]
	},
	"network": {
		"otype": "FullyFusedMLP",
		"activation": "ReLU",
		"output_activation": "None",
		"n_neurons": 64,
		"n_hidden_layers": 5
	}
}

Thanks in advance!

compiler errors: common_device.h(75): error: more than one conversion function from "tcnn::network_precision_t" to a built-in type applies:

tiny-cuda-nn/include/tiny-cuda-nn/common_device.h(75): error: more than one conversion function from "tcnn::network_precision_t" to a built-in type applies:
function "__half::operator float() const"
function "__half::operator short() const"
function "__half::operator unsigned short() const"
function "__half::operator int() const"
function "__half::operator unsigned int() const"
function "__half::operator long long() const"
function "__half::operator unsigned long long() const"
function "__half::operator __nv_bool() const"
detected during:
instantiation of "void tcnn::warp_activation<T,fragment_t>(tcnn::Activation, const fragment_t &, fragment_t &) [with T=tcnn::network_precision_t, fragment_t=tcnn::vector_fragment_t<tcnn::network_precision_t, 8U>]"
(245): here
instantiation of "void tcnn::kernel_activation(uint32_t, tcnn::Activation, const T *, T *) [with T=tcnn::network_precision_t, N=8U]"
(287): here
instantiation of "void tcnn::activation_gpu(cudaStream_t, uint32_t, tcnn::Activation, const T *, T *) [with T=tcnn::network_precision_t]"

enviroment:
ubuntu 18.04
gtx 1080
g++ 9.4.0
cuda-11.0

Why does the fully fused MLP use a skew?

Hi,

I was just reading through the code in fully_fused_mlp.cu to try to better understand what makes tiny-cuda-nn so fast. The overall approach makes sense to me, with a single thread block handling 128 rows of the input batch (computed 16 rows at a time), and each warp within that thread block performing the multiplication with one column of the layer weights. However, I see that when shared memory is used (either to store the first layer's weights, or to store the intermediate activations), there's a SKEW parameter that gets used to stride the data. From what I can tell, when the feature size is a multiple of 16, this makes the data go from being densely packed to having 2 empty __half after each 8 __half value, increasing the shared memory usage by 25%. I wasn't able to find a place where the empty __half values get used, so I'm posting this issue to ask what the SKEW parameter is there for. My guess is that it's something related to the performance optimizations for tiny-cuda-nn, but if that's the case I'm not sure why this helps. I'd appreciate any explanation you could provide :).

Thanks for the wonderful library!

CUDA error occured

Hi, I have tried to run the mlp_learning_an_image demo with a 2080 GPU. However, a CUDA error occured.

D:\codes\tiny-cuda-nn\build>mlp_learning_an_image.exe ../data/images/albert.exr ../data/config.json
Loading custom json config '../data/config.json'.
Saved exr file. [ reference.exr ]
Created NetworkWithInputEncoding with dimensionality Encoding(2,0)->Network(128)->Output(3).
Trainer: Initializing 83968 params and resetting training.
FullyFusedMLP: initializing 83968 params
Beginning optimization with 10000000 training steps.
GPUMatrix: Allocating 14680064 bytes shared among 5 matrices.
GPUMatrix: Allocating 16777216 bytes shared among 1 matrices.
GPUMatrix: Allocating 167772160 bytes shared among 10 matrices.
GPUMatrix: Allocating 16777216 bytes shared among 1 matrices.
GPUMatrix: Allocating 169869312 bytes shared among 11 matrices.
CUTLASS GEMM: Allocating temporary workspace of 131072 bytes.
CUTLASS GEMM: Allocating temporary workspace of 1048576 bytes.
CUTLASS GEMM: Allocating temporary workspace of 1048576 bytes.
CUTLASS GEMM: Allocating temporary workspace of 1048576 bytes.
CUTLASS GEMM: Allocating temporary workspace of 1048576 bytes.
CUTLASS GEMM: Allocating temporary workspace of 1048576 bytes.
reduce_sum: Allocating temporary workspace with size 65536 bytes.
Step#0: loss=6.73637 time=15516[s]
GPUMatrix: Allocating 268435456 bytes shared among 1 matrices.
GPUMatrix: Allocating 570425344 bytes shared among 3 matrices.
CUTLASS GEMM: Freeing temporary workspace of 131072 bytes.
CUTLASS GEMM: Freeing temporary workspace of 1048576 bytes.
CUTLASS GEMM: Freeing temporary workspace of 1048576 bytes.
CUTLASS GEMM: Freeing temporary workspace of 1048576 bytes.
CUTLASS GEMM: Freeing temporary workspace of 1048576 bytes.
CUTLASS GEMM: Freeing temporary workspace of 1048576 bytes.
Uncaught exception: CUDA Error: cudaFuncSetAttribute(kernel_mlp_fused<WIDTH, BLOCK_DIM_Z, N_ITERS, half, ACTIVATION, INFERENCE>, cudaFuncAttributeMaxDynamicSharedMemorySize, (int)shmem_size) failed with error invalid argument

I have already change the GPU information in CMakeLists.txt and include/tiny-cuda-nn/cutlass_matmul.h.

Wrong config option in sample code

In line 150 of samples/mlp_learning_an_image.cu the following code is used to set the number of layers:

{"n_layers", 4},

However, this is ignored and the default is used instead, as the correct name for this option is n_hidden_layers. Also this doesn't give any warnings as unknown options are just ignored.

PS: A proper documentation for the config would be very nice, as currently it is necessary to look for the available options in the code. Also the possibility to print the used configuration would be helpful for debugging and avoiding errors like these.

Problems running PyTorch bidings

Hello everyone, thank you for sharing this great technique! I have been having problems building the Python bidings from source.
I tried running the command with python setup.py install, however, I got the following error message: error_log.txt.

My system setup is as follows:
Windows 10
GTX 1060
CUDA 11.3
Visual Studio 2019
Cmake 3.22.1
PyTorch 1.10.2

Thanks for the attention :)

Will "nerf_network.h" become a default header file in tiny-cuda-nn?

Hi, thanks for your great work and make it open source for others.

I was trying to implement the Nerf network from c++ and using this library by watching your great paper "Instant Neural Graphics Primitives using a Multiresolution Hash Encoding". Then I saw the nerf_network.h in instant-ngp repo contains the Nerf network using tiny-cuda-nn.

Is it possible to let nerf_network.h become a default header file in tiny-cuda-nn? It contains most implementations for building a Nerf model. However, I saw that the license in instant-ngp repo is different from tiny-cuda-nn repo and I afraid that copying the source code in nerf_network.h will violate the license.

Fused MLP

For the Fused MLP, is it true the more layers are, the more benefits of the fused MLP can achieve?

Creating fully fused with existing weights

Hi, thanks for this! I cannot seem to find anything in the docs about how to create/init a network with existing weights, or even export and store trained weights. Maybe I've missed that. Could anyone point me to the docs for that or give a hint?

build failing

Build is failing

[ 5%] Building CUDA object src/CMakeFiles/tiny-cuda-nn.dir/common.cu.o
nvcc fatal : A single input file is required for a non-link phase when an outputfile is specified
make[2]: *** [src/CMakeFiles/tiny-cuda-nn.dir/build.make:76: src/CMakeFiles/tiny-cuda-nn.dir/common.cu.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:134: src/CMakeFiles/tiny-cuda-nn.dir/all] Error 2
make: *** [Makefile:91: all] Error 2

Multi-GPU support for the PyTorch bindings?

Hi,

I found that the following code would fail:

import torch as th
import tinycudann as tcnn
config = {
        "otype": "FullyFusedMLP",
        "activation": "ReLU",
        "output_activation": "None",
        "n_neurons": 64,
        "n_hidden_layers": 1
        }
net = tcnn.Network(16, 16, config)
net = net.to("cuda:1")
out = net(th.rand((256, 16), device="cuda:1"))

It seems the module does not have proper support to run on a different gpu even if we have called .to(device). Is it possible to fix this?

In addition, I also tried using torch.nn.DataParallel together with the hash encoding & tiny mlp. They seem to fail in such use cases. Is it possible to fix this? Thanks a lot!

Falcor version

Thanks for the great work.
I want to integrate NRC to the Falcor rendering framework. And what's the Falcor version in your implementation?

Does sample code with one-blob encoding still work?

Thanks for sharing the great library.

I tried the sample code using the image in the data directory and have confirmed the program instantly learns the image with "HashGrid" encoding (config_hash.json). However, with one-blob encoding (config_oneblob.json), the program constantly produces completely black images.

I tried bisection to find with which revision this issue occurred and it seems c88c1dcf81d89e0b4217ae4aaac34c963bf29d86 (The first revision for v1.2).

Does the sample with one-blob encoding still work in other environments other than me?

My environment:
Windows 10 21H2 (19044.1526)
Visual Studio Community 2022 (17.1.0)
cmake 3.21.4
RTX 3080 10GB
CUDA 11.6
Driver: 511.79

Thanks,

Compilation error when building bindings/torch

Hi there,
First thanks for the really cool lib

When I try to build the torch bindings, I get the following error:

/mnt/DeepDrive/Projects/tiny-cuda-nn/src/fully_fused_mlp.cu(416): error: explicit type is missing ("int" assumed)

/mnt/DeepDrive/Projects/tiny-cuda-nn/src/fully_fused_mlp.cu(416): error: expected a ")"

/mnt/DeepDrive/Projects/tiny-cuda-nn/src/fully_fused_mlp.cu(496): error: explicit type is missing ("int" assumed)

/mnt/DeepDrive/Projects/tiny-cuda-nn/src/fully_fused_mlp.cu(496): error: expected a ")"

FAILED: /mnt/DeepDrive/Projects/tiny-cuda-nn/bindings/torch/src/fully_fused_mlp.o
/home/pierre/anaconda3/envs/torch-ngp/bin/nvcc -I/mnt/DeepDrive/Projects/tiny-cuda-nn/include -I/mnt/DeepDrive/Projects/tiny-cuda-nn/dependencies -I/home/pierre/anaconda3/envs/torch-ngp/lib/python3.8/site-packages/torch/include -I/home/pierre/anaconda3/envs/torch-ngp/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/pierre/anaconda3/envs/torch-ngp/lib/python3.8/site-packages/torch/include/TH -I/home/pierre/anaconda3/envs/torch-ngp/lib/python3.8/site-packages/torch/include/THC -I/home/pierre/anaconda3/envs/torch-ngp/include -I/home/pierre/anaconda3/envs/torch-ngp/include/python3.8 -c -c /mnt/DeepDrive/Projects/tiny-cuda-nn/src/fully_fused_mlp.cu -o /mnt/DeepDrive/Projects/tiny-cuda-nn/bindings/torch/src/fully_fused_mlp.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -std=c++14 --extended-lambda --expt-relaxed-constexpr -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -Xcompiler=-mf16c -Xcompiler=-Wno-float-conversion -Xcompiler=-fno-strict-aliasing -DTCNN_MIN_GPU_ARCH=61 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61

Configuration:
ubuntu 18.04
NVidia GeForce 1080ti
Driver Version 470.103.01
cuda 11.3.1
torch 10.2

Note: I have tried solution given in #45 but it does not seem to work

Thanks in advance,

Pierre

Weight normalization

Hello. I would like to ask, do you plan to implement the weight normalization? Thank you.

Got cutlass error: Error Internal at: 363, when trying to run samples/mlp_learning_an_image_pytorch.py

Hi, thank you for your pytorch extention!
When I tried to run samples/mlp_learning_an_image_pytorch.py, I got an error message:

Warning: FullyFusedMLP is not supported for the selected architecture 61. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
Warning: FullyFusedMLP is not supported for the selected architecture 61. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
NetworkWithInputEncoding(n_input_dims=2, n_output_dims=3, seed=1337, dtype=torch.float32, hyperparams={'encoding': {'base_resolution': 16, 'interpolation': 'Linear', 'log2_hashmap_size': 15, 'n_features_per_level': 2, 'n_levels': 16, 'otype': 'Grid', 'per_level_scale': 1.5, 'type': 'Hash'}, 'network': {'activation': 'ReLU', 'n_hidden_layers': 2, 'n_neurons': 64, 'otype': 'CutlassMLP', 'output_activation': 'None'}, 'otype': 'NetworkWithInputEncoding'})
Writing 'reference.jpg'... done.
Beginning optimization with 10000000 training steps.
samples/mlp_learning_an_image_pytorch.py:74: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
xs = xs * torch.tensor([shape[1], shape[0]], device=xs.device).float()
samples/mlp_learning_an_image_pytorch.py:74: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
xs = xs * torch.tensor([shape[1], shape[0]], device=xs.device).float()
Got cutlass error: Error Internal at: 363

Maybe there is something wrong with my environment?

My environment:
Ubuntu 20.04.4 LTS
GeForce GTX 1080 Ti
CUDA 11.0 / Driver Version: 470.86
pytorch 1.7.1+cu110
cmake 3.22.2
I installed tinycudann by running python setup.py install.

Thank you

Compilation issues

Hello,

I'm having trouble compiling this project on my system. I'm using Ubuntu 20.04, RTX 2070 super, CUDA 11.6/ Driver Version: 510.47, cmake 3.23.0-rc1 and gcc 9.3.0.

The errors I get are when running:
tiny-cuda-nn$ cmake --build build --config RelWithDebInfo -j 16
and I see:

[  5%] Building CUDA object src/CMakeFiles/tiny-cuda-nn.dir/common_device.cu.o
[ 15%] Building CUDA object src/CMakeFiles/tiny-cuda-nn.dir/common.cu.o
[ 15%] Building CUDA object src/CMakeFiles/tiny-cuda-nn.dir/cpp_api.cu.o
[ 21%] Building CUDA object src/CMakeFiles/tiny-cuda-nn.dir/cutlass_mlp.cu.o
[ 26%] Building CUDA object src/CMakeFiles/tiny-cuda-nn.dir/encoding.cu.o
[ 31%] Building CUDA object src/CMakeFiles/tiny-cuda-nn.dir/cutlass_resnet.cu.o
[ 36%] Building CUDA object src/CMakeFiles/tiny-cuda-nn.dir/loss.cu.o
[ 42%] Building CUDA object src/CMakeFiles/tiny-cuda-nn.dir/network.cu.o
[ 47%] Building CUDA object src/CMakeFiles/tiny-cuda-nn.dir/object.cu.o
[ 52%] Building CUDA object src/CMakeFiles/tiny-cuda-nn.dir/optimizer.cu.o
[ 57%] Building CUDA object src/CMakeFiles/tiny-cuda-nn.dir/reduce_sum.cu.o
[ 63%] Building CUDA object src/CMakeFiles/tiny-cuda-nn.dir/fully_fused_mlp.cu.o
/home/a/Downloads/tiny-cuda-nn-1.4/src/fully_fused_mlp.cu(416): error: explicit type is missing ("int" assumed)

/home/a/Downloads/tiny-cuda-nn-1.4/src/fully_fused_mlp.cu(416): error: expected a ")"

/home/a/Downloads/tiny-cuda-nn-1.4/src/fully_fused_mlp.cu(496): error: explicit type is missing ("int" assumed)

/home/a/Downloads/tiny-cuda-nn-1.4/src/fully_fused_mlp.cu(496): error: expected a ")"

etc ... I have attached the full output for reference err.txt

Ultimately I would like to use tiny-cuda-nn with https://github.com/NVlabs/instant-ngp but I see the same problems with tiny-cuda-nn when compiling that also. Any help with this would be appreciated.

Cheers,
Alex

Other neural architectures?

Hi Thomas,

Really nice work!!! It's great to see that neural networks have been accelerated so much on rendering.

I'm not a GPU expert so I'm wondering if this can be extended to other architectures like CNN besides MLP? If so, that may benefit a lot of neural rendering work that use convolution on images.

Thanks,
Shilin

[pytorch extension] call forward multiple times before backward

Hi,
Thanks for the pytorch extension! In a simple test, I find that tcnn.Network doesn't support calling forward multiple times before backward, which can be useful sometimes.
It seems to be caused by the intermediate buffers (m_forward) saved as an attribute of the torch.nn.Module, instead of the ctx of the torch.autograd.Function. Anyway, maybe the error message can be more clear?

A simple example:

import torch                                                                                                                                                                                               
import tinycudann as tcnn                                                                                                                                                                                  
                                                                                                                                                                                                           
net = tcnn.Network(                                                                                                                                                                                        
    n_input_dims=3,                                                                                                                                                                                        
    n_output_dims=3,                                                                                                                                                                                       
    network_config={                                                                                                                                                                                       
        "otype": "FullyFusedMLP",                                                                                                                                                                          
        "activation": "ReLU",                                                                                                                                                                              
        "output_activation": "None",                                                                                                                                                                       
        "n_neurons": 16,                                                                                                                                                                                   
        "n_hidden_layers": 2,                                                                                                                                                                              
    },                                                                                                                                                                                                     
).cuda()                                                                                                                                                                                                   
                                                                                                                                                                                                           
x = torch.rand(256, 3, device='cuda')                                                                                                                                                                      
y = net(x)                                                                                                                                                                                                 
y.sum().backward() # OK                                                                                                                                                                                    
                                                                                                                                                                                                           
                                                                                                                                                                                                           
x2 = torch.rand(256, 3, device='cuda')                                                                                                                                                                     
y = net(x)                                                                                                                                                                                                 
y2 = net(x2)                                                                                                                                                                                               
(y + y2).sum().backward() # RuntimeError: Must call forward() before calling backward()           

Loading and saving weights of pytorch wrapper of tiny-cuda-nn

Hi! I'm very excited by TinyCUDA implemented in pytorch version.
I'd like to test it out for an inference task on a pre-trained model.
How can I save the weight of trained model in python?
Also, how can I load the weight of pre-trained model from the saved weight file?

Do you have any advice on how I could go about accomplishing this?

Thanks :)

fatal error: filesystem: No such file or directory

json.hpp:3954:14: fatal error: filesystem: No such file or directory
#include
^~~~~~~~~~~~
compilation terminated.
src/CMakeFiles/tiny-cuda-nn.dir/build.make:103: recipe for target 'src/CMakeFiles/tiny-cuda-nn.dir/cutlass_mlp.cu.o' failed
make[2]: *** [src/CMakeFiles/tiny-cuda-nn.dir/cutlass_mlp.cu.o] Error 1
CMakeFiles/Makefile2:133: recipe for target 'src/CMakeFiles/tiny-cuda-nn.dir/all' failed
make[1]: *** [src/CMakeFiles/tiny-cuda-nn.dir/all] Error 2

enviroment:
ubuntu 18.04
cuda-11.0
gtx 1080

The backward error of SphericalHarmonics encoding

Hi Thomas,

Thanks for your amazing work.

I met some problems when I try to backward the output of the SphericalHarmonics module.
Here is the code:

import torch
import torch.nn.functional as F
import tinycudann as tcnn

sh_enc_degree = 4

sh_encoder = tcnn.Encoding(
    n_input_dims=3,
    encoding_config={"otype": "SphericalHarmonics", "degree": sh_enc_degree},
    dtype=torch.float32,
)

inputs = F.normalize(torch.randn(100, 3), dim=-1).cuda()
inputs.requires_grad_()
outputs = sh_encoder(inputs)

outputs.backward(torch.zeros_like(outputs))

And here is the error in console:

Traceback (most recent call last):
  File "test_sh_encoder.py", line 17, in <module>
    outputs.backward(torch.zeros_like(outputs))
  File "xxx/site-packages/torch/tensor.py", line 245, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "xxx/site-packages/torch/autograd/__init__.py", line 145, in backward
    Variable._execution_engine.run_backward(
  File "xxx/site-packages/torch/autograd/function.py", line 89, in apply
    return self._forward_cls.backward(self, *args)  # type: ignore
  File "xxx/site-packages/torch/autograd/function.py", line 210, in wrapper
    outputs = fn(ctx, *args)
  File "xxx/tiny-cuda-nn/bindings/torch/tinycudann/modules.py", line 52, in backward
    input_grad, weight_grad = ctx.native_tcnn_module.bwd(ctx.native_ctx, input, params, output, scaled_grad)
RuntimeError: Encoding: forward(prepare_input_gradients) must be called before backward(dL_dinput)

I found that you have realized the kernel_sh_backward in spherical_harmonics.h. It maybe some problems in pytorch autograd wrapper?

Otherwise, the backward of HashGrid is bug-free, thanks :)

Benchmarking

Hi! First, thanks for sharing this! It's super impressive.

I'm trying to benchmark tiny-cuda-nn on clang-cuda, and I'd like to compare it with the numbers in the graph in the README.md. What are were the parameters used to generate that graph? Is it just running both benchmarks on 'data/config.json' and changing the number of neurons from 128 to 64?

Thanks!

CPU support

Hello.

I was just curious if you guys are planning on developing CPU support as well.

Thanks.

Win10 / VS 2019 build error: nvcc.exe (...) exited with code 1

Hi, I am having an issue with Win10 / VS 2019 - I run cmake, all fine, except for one warning. Then I open tiny-cuda-nn.sln and run Build Solution. It exits with multiple errors. (cmake output at the bottom) Any hints? I am not a windows person, development wise my knowledge is scarse.

EDIT: Just double checked, the issue only seems to happen on master. Downloading the release zip and building that works just fine.

VS Error:

Severity	Code	Description	Project	File	Line	Suppression State
Error	MSB3721	The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin\nvcc.exe" -gencode=arch=compute_86,code=\"compute_86,compute_86\" -gencode=arch=compute_86,code=\"sm_86,compute_86\" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\HostX64\x64" -x cu   -I"C:\Users\rootkid\Documents\Unreal Projects\laif5\external\tiny-cuda-nn\include" -I"C:\Users\rootkid\Documents\Unreal Projects\laif5\external\tiny-cuda-nn\dependencies" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\include"     --keep-dir x64\Debug  -maxrregcount=0  --machine 64 --compile -cudart static --extended-lambda --expt-relaxed-constexpr -std=c++14 -Xcompiler="/EHsc -Zi -Ob0" -g  -D_WINDOWS -DTCNN_MIN_GPU_ARCH=86 -DTCNN_SHAMPOO -D"CMAKE_INTDIR=\"Debug\"" -D_MBCS -D"CMAKE_INTDIR=\"Debug\"" -Xcompiler "/EHsc /W1 /nologo /Od /Fd"C:\Users\rootkid\Documents\Unreal Projects\laif5\external\tiny-cuda-nn\src\Debug\tiny-cuda-nn.pdb" /FS /Zi /RTC1 /MDd /GR" -o tiny-cuda-nn.dir\Debug\common.obj "C:\Users\rootkid\Documents\Unreal Projects\laif5\external\tiny-cuda-nn\src\common.cu"" exited with code 1.	tiny-cuda-nn	C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\MSBuild\Microsoft\VC\v160\BuildCustomizations\CUDA 11.6.targets	790	

cmake:

-- Selecting Windows SDK version 10.0.17763.0 to target Windows 10.0.19043.
-- The CUDA compiler identification is NVIDIA 11.6.55
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.6/bin/nvcc.exe - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Targeting GPU architectures: 86
CMake Warning (dev) at CMakeLists.txt:120 (set):
  Cannot set "TCNN_DEFINITIONS": current scope has no parent.
This warning is for project developers.  Use -Wno-dev to suppress it.

-- Configuring done
-- Generating done
-- Build files have been written to: C:/Users/rootkid/Documents/Unreal Projects/laif5/external/tiny-cuda-nn

Help implementing 3D SDF/UDF fitting

Hi,
This framework is really amazing, thanks a lot for sharing it.

I've been playing around with samples/mlp_learning_an_image.cu and I was wondering if it is possible to adapt it in order to fit a point cloud or a mesh by fitting its UDF/SDF.
I'm not really expert in CUDA programming, do you think it's doable with a reasonable effort? Do you have any suggestion?
I was thinking about computing coordinates and groundtruths in python and saving them in numpy files, then loading these numpy files inside the CUDA program (with cnpy). But after that, I'm not sure about the next steps. In the image case, you create a CUDA texture that seems quite specific for 2D images, how should I adapt it to 3D data?

Thanks in advance for any kind of help,
Luca

Do you plan to have a python wrapper for the fully fused MLP?

Hi, I am not an expert on cuda coding but have more experience on pytorch/tensorflow...
Do you have any plans to have this code with a python (more specifically pytorch) wrapper?
Or will it be possible to point the location for forward/backward function of this MLP implementation so that we can potentially incorporate this into other python code?

Thanks a lot

What does "fully fused" actually mean?

What does "fully fused" actually mean?

I can't find a technical definition of this anywhere.

Why is this faster/ideal?

How is this different to pytorch/jax/TF2?

`linear_kernel` truncates `n_elements` to `uint32_t`

Hi folks, the linear_kernel call downcasts (or upcasts it too I suppose) n_elements to uint32_t in here:

https://github.com/NVlabs/tiny-cuda-nn/blob/master/include/tiny-cuda-nn/common.h#L288

So if I call this with something that has more than the max allowed value of uint32_t, it overflows.

Could we not just keep the type of n_elements the same, e.g. like this:

--- a/include/tiny-cuda-nn/common.h
+++ b/include/tiny-cuda-nn/common.h
@@ -285,7 +285,7 @@ inline void linear_kernel(K kernel, uint32_t shmem_size, cudaStream_t stream, T
        if (n_elements <= 0) {
                return;
        }
-       kernel<<<n_blocks_linear(n_elements), n_threads_linear, shmem_size, stream>>>((uint32_t)n_elements, args...);
+       kernel<<<n_blocks_linear(n_elements), n_threads_linear, shmem_size, stream>>>(n_elements, args...);
 }

compile error in win10 with vs2019.16.11 and cuda 11.0

[E:\cv\instant-ngp-master\build\dependencies\tiny-cuda-nn\src\tiny-cuda-nn.vcxproj]
E:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.29.30133\include\xmemory(309): error
: no operator "+" matches these operands [E:\cv\instant-ngp-master\build\dependencies\tiny-cuda-nn\src\tiny-cuda-nn.vcx
proj]

E:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.29.30133\include\xmemory(919): error
: expected a "(" [E:\cv\instant-ngp-master\build\dependencies\tiny-cuda-nn\src\tiny-cuda-nn.vcxproj]

Compiler Errors WRT json.hpp and CPP version 14 requirement

problem seems to stem from :
CUDACOMPILE : nvcc warning : The -std=c++14 flag is not supported with the configured host compiler. Flag will be ignored. [P:\opensource\tiny-cuda-nn\build\src\tiny-cuda-nn.vcxproj]
json cpp which needs c++14
is there a way to tweak the nvcc compiler ?

this also happens when compiling instant-ngp

Tried to build on multiple systems, specs listed

  • cuda 11.4, sm86 - 3090
  • cuda 10.2 sm70 - TitanV

build output:
tiny-cuda-nn_build.log

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.