This repository contains a university assignment for High Performance Computation course.
In this assignment we implemented a simple copy of PyTorch deep learning library using Numpy and CUDA. The following features are supported:
- General:
- Dynamic graphs with backward propagation.
- SGD optimizer (with momentum).
Module
class for making neural network modules.
- CPU operation:
- ReLU.
- Log.
- Element-wise addition.
- Element-wise multiplication.
- Matrix multiplication.
- Softmax.
- Summation of tensors along several axes.
- CUDA operation:
- ReLU.
- Element-wise addition.
- Matrix multiplication (only pure matrice, no batch support).
- Summation of tensors along several axes.
The above lists will be updated over time.
Since it is a "PyTorch"-ish Python library, you can implement arbitrary neural networks using the provided components and train them using gradient descent. You can also use it for arbitrary computational purposes (on GPU).
Everything is wrapped into a Python API using Cython and corresponds (almost) to PyTorch API, so please use PyTorch documentation as a reference for functionality available in this project.
Folders:
- Architecture - contains several pictures depicting overall architecture of the project.
- KaruiFlow - the core code written in C++ and CUDA.
- KaruiFlowCython - Cython wrapper for the C++ code.
The project has several dependencies:
- CUDA 11.2.
- CUBLAS.
- CuTensor.
All of the dependencies are located in KaruiFlow/dependencies
folder. You have to download and put corresponding .lib
files into the lib
folders.
The project is compiled via MVSC 2019. First compile KaruiFlow solution, then run pip install -e .
in the KaruiFlowCython directory. At this point you're good to go.
The project is still in development and has a very limited set of features. Most of the features are implemented in Numpy only at the moment.