Giter VIP home page Giter VIP logo

quick-mlp's Introduction

QuickMLP: Fused Networks for Scene Representation Networks

This project contains fused CUDA kernels for evaluating scene representation networks using custom activation functions, various input encodings and flexible layer specifications.

It is the successor of fV-SRN (https://github.com/shamanDevel/fV-SRN), my previous project; and adapts further idea from tiny-cuda-nn (https://github.com/NVlabs/tiny-cuda-nn), a concurrent development of such fused MLPs.

Features

QuickMLP (this project) fV-SRN tiny-cuda-nn
full CUDA code generation and compilation during runtime partial CUDA runtime compilation static CUDA compilation
different sizes at every layer supported limited layer sizes, all must be identical flexible layer sizes, but all must be identical and limited to a pre-defined set
supports bias in the fully-connected layers supports bias no bias supported
fully-customizable and user-definable activation functions, can be different at every layer fixed set of activation functions fixed set of activation functions
supports fused training+inference only inference is fused supports fused training+inference
single kernel for input encoding + network evaluation single kernel for input encoding + network evaluation one kernel per input encoding, one kernel for the network

In other words, this project builds upon the runtime compilation idea of fV-SRN to provide fully fused kernels with maximal flexibility. The user can freely customize the network layer sizes and activation functions and arbitrarily plug them together. To support training as well, we adapt ideas of tiny-cuda-nn how the weight update and backpropagation can be realized.

Example network specification

TBA

Performance

TBA

Compilation + Project Structure

Requirements

  • CUDA 11.x (tested with 11.6)
  • C++17-compatible compiler (tested with MSVC2019 and GCC 9.4.0)
  • PyTorch for the bindings, tested with version 1.11, but should work with newer as well

Don't forget to clone this repository with submodules (git clone --recurse-submodules). Or if you forgot to clone with submodules, you can initialize them afterwards with git submodule init & git submodule update.

C++ library

The C++ library is located in src_cpp/include and src_cpp/src.

To compile, include the root folder of QuickMLP as a subdirectory in CMake. Then, link against qmlp::qmlp-library.

Python / PyTorch bindings

Compile the PyTorch extension: Use setup.py!

  1. Activate your python environment, if desired (virtualenv or conda)
  2. Go to the root directory of QuickMLP.
  3. Call pip -e .
  4. Enjoy!

Note: right now, compilation is only possible in developer-mode, i.e. the files in this folder are directly used and not copied to the python installation. I haven't figured out yet how to copy the resource files (kernel sources) to the installation target in setup.py. Ideas, issues, PRs are welcome!

API Documentation

The following documentation is written for the Python bindings, but it holds true for the C++ library as well. Just change the class names from snake_case to CamelCase and you'll have the associated C++ class / method.

Example json specification of the network and encoding:

{
    "num_inputs": 3,
    "num_outputs": 1,
    "activation_specification": [
        "qmlp/builtin-activations.json"
    ],
    "encodings": [
        {
            "id": "identity",
            "start_in": 0,
            "n_in": 3
        }
    ],
    "network": [
        {
            "n_out": 16,
            "bias": false,
            "activation": "relu"
        },
        {
            "n_out": 1,
            "bias": false,
            "activation": "relu"
        }
    ],
	"options": {}  //<-- optional
}

TODO: json documentation for encoding+network

Compile Options

The compile options are key-value pairs in the options field of the network specification. The following options are available:

  • overwrite_blocksize_inference [int]: overwrites the kernel block size for the network inference. If unspecified (or negative), the maximal size is used
  • overwrite_blocksize_forward [int]: overwrites the kernel block size for the network forward kernel, i.e. inference with added storing of intermediate results for the backward kernels. If unspecified (or negative), the maximal size is used
  • overwrite_blocksize_backward [int]: overwrites the kernel block size for the network backward kernel. If unspecified (or negative), the maximal size is used
  • overwrite_blocksize_weight_update [int]: overwrites the kernel block size for the weight update kernels during the backward pass. If unspecified (or negative), the maximal size is used
  • skew_shared_memory [bool]: If true, skew the shared memory. This reduces the bank conflicts but requires 1.5x the amount of shared memory. Default: false
  • parallel_weight_update [bool]: If true, the weight update kernels are launched in separate channels, allowing for possible parallel execution. Default: true

ROADMAP

Encodings:

  • Identity
  • 1D-6D Dense & Hash Grid
  • Line Integration
  • Spherical Harmonics
  • Fourier Features

Activations:

  • ReLU, CeLU, Sine, Identity
  • Snake and other trigonometric ones
  • sigmoid, ...

Network

  • Fused forward evaluation
  • Input Encoding + Network fusion
  • Proper padding if input/output channels are not a multiple of 16
  • Proper handling if the batch size is not a multiple of 16
  • Gradients for the input
  • Gradients for the weight matrices
  • Gradients for the bias vector

License

QuickMLP is shipped under the permissive MIT license.

Bug reports

If you find bugs in the library, feel free to open an issue. I will continue to use this library in future projects and therefore continue to improve and extend this library. Of course, pull requests are more than welcome.

quick-mlp's People

Contributors

chrismile avatar shamandevel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

chrismile

quick-mlp's Issues

Q: Kernel fusion details

Do you have any details how you fuse kernels together?
If I am not mistaken, Nvidia's project does it by hand.
Do you do it automatically? Are there any limitations?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.