gorgonia / gorgonia Goto Github PK

Gorgonia is a library that helps facilitate machine learning in Go.

License: Apache License 2.0

Go 96.34% C 2.81% Assembly 0.02% Python 0.05% Cuda 0.78%

machine-learning artificial-intelligence neural-network computation-graph differentiation golang go gradient-descent gorgonia deep-learning

gorgonia's Introduction

Gorgonia is a library that helps facilitate machine learning in Go. Write and evaluate mathematical equations involving multidimensional arrays easily. If this sounds like Theano or TensorFlow, it's because the idea is quite similar. Specifically, the library is pretty low-level, like Theano, but has higher goals like Tensorflow.

Gorgonia:

Can perform automatic differentiation
Can perform symbolic differentiation
Can perform gradient descent optimizations
Can perform numerical stabilization
Provides a number of convenience functions to help create neural networks
Is fairly quick (comparable to Theano and TensorFlow speed)
Supports CUDA/GPGPU computation (OpenCL not yet supported, send a pull request)
Will support distributed computing

Goals

The primary goal for Gorgonia is to be a highly performant machine learning/graph computation-based library that can scale across multiple machines. It should bring the appeal of Go (simple compilation and deployment process) to the ML world. It's a long way from there currently, however, the baby steps are already there.

The secondary goal for Gorgonia is to provide a platform for the exploration of non-standard deep-learning and neural network-related things. This includes things like neo-hebbian learning, corner-cutting algorithms, evolutionary algorithms, and the like.

Why Use Gorgonia?

The main reason to use Gorgonia is developer comfort. If you're using a Go stack extensively, now you have access to the ability to create production-ready machine learning systems in an environment that you are already familiar with and comfortable with.

ML/AI at large is usually split into two stages: the experimental stage where one builds various models, tests, and retests; and the deployed state where a model after being tested and played with, is deployed. This necessitates different roles like data scientist and data engineer.

Typically the two phases have different tools: Python (PyTorch, etc) is commonly used for the experimental stage, and then the model is rewritten in some more performant language like C++ (using dlib, mlpack etc). Of course, nowadays the gap is closing and people frequently share the tools between them. Tensorflow is one such tool that bridges the gap.

Gorgonia aims to do the same but for the Go environment. Gorgonia is currently fairly performant - its speeds are comparable to PyTorch's and Tensorflow's CPU implementations. GPU implementations are a bit finicky to compare due to the heavy CGO tax, but rest assured that this is an area of active improvement.

Getting started

Installation

The package is go-gettable: go get -u gorgonia.org/gorgonia.

Gorgonia is compatible with Go modules.

Documentation

Up-to-date documentation, references, and tutorials are present on the official Gorgonia website at https://gorgonia.org.

Keeping Updated

Gorgonia's project has a Slack channel on gopherslack, as well as a Twitter account. Official updates and announcements will be posted to those two sites.

Usage

Gorgonia works by creating a computation graph and then executing it. Think of it as a programming language, but is limited to mathematical functions, and has no branching capability (no if/then or loops). In fact, this is the dominant paradigm that the user should be used to thinking about. The computation graph is an AST.

Microsoft's CNTK, with its BrainScript, is perhaps the best at exemplifying the idea that building a computation graph and running the computation graphs are different things and that the user should be in different modes of thought when going about them.

Whilst Gorgonia's implementation doesn't enforce the separation of thought as far as CNTK's BrainScript does, the syntax does help a little bit.

Here's an example - say you want to define a math expression z = x + y. Here's how you'd do it:

package gorgonia_test

import (
	"fmt"
	"log"

	. "gorgonia.org/gorgonia"
)

// Basic example of representing mathematical equations as graphs.
//
// In this example, we want to represent the following equation
//		z = x + y
func Example_basic() {
	g := NewGraph()

	var x, y, z *Node
	var err error

	// define the expression
	x = NewScalar(g, Float64, WithName("x"))
	y = NewScalar(g, Float64, WithName("y"))
	if z, err = Add(x, y); err != nil {
		log.Fatal(err)
	}

	// create a VM to run the program on
	machine := NewTapeMachine(g)
	defer machine.Close()

	// set initial values then run
	Let(x, 2.0)
	Let(y, 2.5)
	if err = machine.RunAll(); err != nil {
		log.Fatal(err)
	}

	fmt.Printf("%v", z.Value())
	// Output: 4.5
}

You might note that it's a little more verbose than other packages of similar nature. For example, instead of compiling to a callable function, Gorgonia specifically compiles into a *program which requires a *TapeMachine to run. It also requires manual a Let(...) call.

The author would like to contend that this is a Good Thing - to shift one's thinking to machine-based thinking. It helps a lot in figuring out where things might go wrong.

Additionally, there is no support for branching - that is to say, there are no conditionals (if/else) or loops. The aim is not to build a Turing-complete computer.

More examples are present in the example subfolder of the project, and step-by-step tutorials are present on the main website

Using CUDA

Gorgonia comes with CUDA support out of the box. Please see the reference documentation about how cuda works on the Gorgonia.org website, or jump to the tutorial.

About Gorgonia's development process

Versioning

We use semver 2.0.0 for our versioning. Before 1.0, Gorgonia's APIs are expected to change quite a bit. API is defined by the exported functions, variables, and methods. For the developers' sanity, there are minor differences to SemVer that we will apply before version 1.0. They are enumerated below:

The MINOR number will be incremented every time there is a deleterious break in API. This means any deletion or any change in function signature or interface methods will lead to a change in the MINOR number.
Additive changes will NOT change the MINOR version number before version 1.0. This means that if new functionality were added that does not break the way you use Gorgonia, there would not be an increment in the MINOR version. There will be an increment in the PATCH version.

API Stability

Gorgonia's API is as of right now, not considered stable. It will be stable from version 1.0 forward.

Go Version Support

Gorgonia supports 2 versions below the Master branch of Go. This means Gorgonia will support the current released version of Go, and up to 4 previous versions - providing something doesn't break. Where possible a shim will be provided (for things like new sort APIs or math/bits which came out in Go 1.9).

The current version of Go is 1.13.1. The earliest version Gorgonia supports is Go 1.11.x but Gonum supports only 1.12+. Therefore, the minimum Go version to run the master branch is Go > 1.12.

Hardware and OS supported

Gorgonia runs on :

linux/AMD64
linux/ARM7
linux/ARM64
win32/AMD64
darwin/AMD64
freeBSD/AMD64

If you have tested Gorgonia on other platforms, please update this list.

Hardware acceleration

Gorgonia uses some pure assembler instructions to accelerate some mathematical operations. Unfortunately, only amd64 is supported.

Contributing

Obviously, since you are most probably reading this on Github, Github will form the major part of the workflow for contributing to this package.

Contributors and Significant Contributors

All contributions are welcome. However, there is a new class of contributors, called Significant Contributors.

A Significant Contributor has shown a deep understanding of how the library works and/or its environs. Here are examples of what constitutes a Significant Contribution:

Wrote significant amounts of documentation on why/the mechanics of particular functions/methods and how the different parts affect one another
Wrote code and tests around the more intricately connected parts of Gorgonia
Wrote code and tests, and had at least 5 pull requests accepted
Provided expert analysis on parts of the package (for example, you may be a floating point operations expert who optimized one function)
Answered at least 10 support questions.

The significant Contributors list will be updated once a month (if anyone even uses Gorgonia that is).

How To Get Support

The best way of support right now is to open a ticket on Github.

Frequently Asked Questions

Why are there seemingly random `runtime.GC()` calls in the tests?

The answer to this is simple - the design of the package uses CUDA in a particular way: specifically, a CUDA device and context are tied to a VM, instead of at the package level. This means for every VM created, a different CUDA context is created per device per VM. This way all the operations will play nicely with other applications that may be using CUDA (this needs to be stress-tested, however).

The CUDA contexts are only destroyed when the VM gets garbage collected (with the help of a finalizer function). In the tests, about 100 VMs get created, and garbage collection for the most part can be considered random. This leads to cases where the GPU runs out of memory as there are too many contexts being used.

Therefore at the end of any tests that may use GPU, a runtime.GC() call is made to force garbage collection, freeing GPU memories.

In production, one is unlikely to start that many VMs, therefore it's not a problem. If there is, open a ticket on GitHub, and we'll look into adding a Finish() method for the VMs.

Licence

Gorgonia is licensed under a variant of Apache 2.0. It's the same as the Apache 2.0 Licence, except not being able to commercially profit directly from the package unless you're a Significant Contributor (for example, providing commercial support for the package). It's perfectly fine to profit directly from a derivative of Gorgonia (for example, if you use Gorgonia as a library in your product)

Everyone is still allowed to use Gorgonia for commercial purposes (for example: using it in software for your business).

Dependencies

There are very few dependencies that Gorgonia uses - and they're all pretty stable, so as of now there isn't a need for vendoring tools. These are the list of external packages that Gorgonia calls, ranked in order of reliance that this package has (sub-packages are omitted):

Package	Used For	Vitality	Notes	Licence
gonum/graph	Sorting `*ExprGraph`	Vital. Removal means Gorgonia will not work	Development of Gorgonia is committed to keeping up with the most updated version	gonum license (MIT/BSD-like)
gonum/blas	Tensor subpackage linear algebra operations	Vital. Removal means Gorgonial will not work	Development of Gorgonia is committed to keeping up with the most updated version	gonum license (MIT/BSD-like)
cu	CUDA drivers	Needed for CUDA operations	Same maintainer as Gorgonia	MIT/BSD-like
math32	`float32` operations	Can be replaced by `float32(math.XXX(float64(x)))`	Same maintainer as Gorgonia, same API as the built-in `math` package	MIT/BSD-like
hm	Type system for Gorgonia	Gorgonia's graphs are pretty tightly coupled with the type system	Same maintainer as Gorgonia	MIT/BSD-like
vecf64	optimized `[]float64` operations	Can be generated in the `tensor/genlib` package. However, plenty of optimizations have been made/will be made	Same maintainer as Gorgonia	MIT/BSD-like
vecf32	optimized `[]float32` operations	Can be generated in the `tensor/genlib` package. However, plenty of optimizations have been made/will be made	Same maintainer as Gorgonia	MIT/BSD-like
set	Various set operations	Can be easily replaced	Stable API for the past 1 year	set licence (MIT/BSD-like)
gographviz	Used for printing graphs	Graph printing is only vital to debugging. Gorgonia can survive without, but with a major (but arguably nonvital) feature loss	Last update 12th April 2017	gographviz license (Apache 2.0)
rng	Used to implement helper functions to generate initial weights	Can be replaced fairly easily. Gorgonia can do without the convenience functions too		rng license (Apache 2.0)
errors	Error wrapping	Gorgonia won't die without it. In fact Gorgonia has also used goerrors/errors in the past.	Stable API for the past 6 months	errors licence (MIT/BSD-like)
gonum/mat	Compatibility between `Tensor` and Gonum's Matrix	Development of Gorgonia is committed to keeping up with the most updated version		gonum license (MIT/BSD-like)
testify/assert	Testing	Can do without but will be a massive pain in the ass to test		testify license (MIT/BSD-like)

Various Other Copyright Notices

These are the packages and libraries that inspired and were adapted from in the process of writing Gorgonia (the Go packages that were used were already declared above):

Source	How it's Used	Licence
Numpy	Inspired large portions. Directly adapted algorithms for a few methods (explicitly labeled in the docs)	MIT/BSD-like. Numpy Licence
Theano	Inspired large portions. (Unsure: number of directly adapted algorithms)	MIT/BSD-like Theano's license
Caffe	`im2col` and `col2im` directly taken from Caffe. Convolution algorithms inspired by the original Caffee methods	Caffe Licence

gorgonia's People

Contributors

Stargazers

Watchers

Forkers

bussiere vseledkin merajat benjamesbabala wavelets ligadous yurenyong123 livitki sumyfly bygreencn duzhanyuan neuroradiology davidmr001 mantyr biless jamestang0219 fnet123 golang-kit ai-kit nitiansky hugcoday coooold hellguardian aj07 egnwd srijancse-zz alienchow qxs820624 ruiaylin nuaays alex-werner songofhack anilkumarnandamuri 3ygun liqingrikeiikyeong yidanegolang catroo freakomonk ronyuzhang marcelfarres msn217 duansd tautomaton augustogoulart leobcn yhtsnda xiazemin mstat snowflake8864 freepe joe2hpimn hammingcube koalacxr kokizzu cjmxp solertis etning mylearning2017 lf2186 leehaesung raviu shanghaichengtai g7n3 njnuwjq rinack ablozhou guangminglion lvjiangzhao kabaka0 jsonkey dfrsg matthewmcneely h4ck3rm1k3 ifarhankhan wotheai ai42 patricktoca qizexi justintung superryanguo colinsongf kyle-hy swizzley xdnice mwpg swordkee ngd01 pbarker phinphing zwphit ren-it mobiletta emeraldbay houcy refaqtor kortschak docmerlin ferhatelmas oliveo gregadams4

gorgonia's Issues

Fill Out all the Batched Functions in Blase

This issue will only close once every BLAS subroutine is covered by Blase

CSV read/writing to Tensor

Currently Tensors are gobbable, and certain tensors have WriteNpy methods to write to numpy files. Writing to CSV seems like a good idea too.

Support Batched BLAS/External (Cgo/CUDA/OpenCL) for *LispMachine

*LispMachine should keep track of node dependencies and then perform batched BLAS calls. This would enable future use of CuBLAS ops

Rework `Op`

Currently Op is not extensible by 3rd parties who want to write their own ops. The main roadblock were the unexported methods, and the current remaining roadblocks is the type system.

The Ideal Op interface should be this:

type Op interface {
	// metadata
	Type() Type
	Arity() int
	InferShape(types.Shape...) types.Shape
	ReturnsPtr() bool
	CallsExtern() bool
	OverwritesInput() int

	// the actual op
	Do(...Value) (Value, error)

	// serialization and shit
	WriteHash(h hash.Hash)
	Hashcode() uint32
	fmt.Stringer
}

Further optional op types:

type SymDiffOp interface {
	Op

	DiffWRT(int) []bool
	SymDiff(inputs Nodes, outputNode, gradNode *Node) (Nodes, error)
}

How to Get There

Export all the methods
Add Arity() method
Rework all the InferShape() methods
Rework anything that calls SymDiff() and DiffWRT to use ~~SymDiffOp~~ SDOp
Move type system to external package (see #26)
Clean up Value types and interface (see #44)
~~Move Op into its own package~~ (Unfeasible)

Rework errors

This task is broken into two parts:

Wrap all errors in Gorgonia with the errors package, with meaningful error messages.
Remove all of the runtime.Caller() function from the basic errors.

Add Correlate() method to Tensors

https://en.wikipedia.org/wiki/Cross-correlation

Distributed Computing

There are many ways to do distributed computing for something like Gorgonia. There are a few things that need to be cleared up when discussing distributed neural networks.

Firstly, which part is distributed? The currently dominant methods basically works by splitting up the calculation of the gradients and the gradient updates on different parts of the network.

Other more traditional systems have different batches being parallelly trained across the network - but this usually relies on special algorithms that are capable of handling delays and latencies.

Or the entire neural network, if large enough, could be split up across the network. This is Google level engineering that I have no ability to emulate.

The more future-looking method involves synthetic/approximated gradients, functioning more like a database with locks and updates. I am personally in favour of this future-looking design. However, it is a deceptively simple problem and I have run into various hairy issues with this.

Of course, one can also combine the multiple notions of distributedness, but I think that may be a bit too ambitious.

Existing Implementations

These gradient descent methods lend themselves to being easily parallalized:

~~God's~~ Jeff Dean's DownpourSGD
Google's Delay Tolerant Adagrad
HogWild is also worthy of checking out.

Things To Be Aware/Think About

Latency kills progress
CAP theorem - well, marginally. Distributed NNs are far from requiring consistency. In fact I'd argue that distributed NNs require linearizability the most
Network consensus - given the abundance of RAFT implementations in Go, I'd say this is one of the few problems to be least worried about.
CapnProto looks good, but everyone else is using Protobuf to do their talking. Why?

Add Norm For Tensor (and Op for Gorgonia)

API should be something along the lines of this: https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.norm.html

Reason: NTMs need Norms.

Fix up TensorDot

TensorDot() is currently broken and under the process of re-writing. This needs to be fixed ASAP

Sigmoid for Tensors are slow

Mainly due to the fact that Tensor.Apply(sigmoidFn) is slow. There should be a way to optimize this for entire arrays

Table driven tests for Repeat()

Current tests for repeat may be incomplete.

Which File

github.com/chewxy/gorgonia/tensor/f64/matop_test.go

Which function

TestRepeat

What to Test

Repeat scalar on 1, 2, n axes
Repeat colvec on 1, 2, n axes
Repeat rowvec on 1, 2, n axes
Repeat vector on 1, 2, n axes
Repeat matrix on 1, 2, n axes
Idiotic actions a user might do (these should all return errors).

Add Stack() to Tensor

It'd be like Numpy's stack.

Preliminary design looks something like this:

func (t *Tensor) Stack(other *Tensor, axis int) (*Tensor, error)

And a package-level function:

func Stack(axis int, ...*Tensor) (*Tensor, error)

Transpose on Views Bug

T := tf64.NewTensor(tf64.WithShape(8, 10), tf64.WithBacking(tf64.RangeFloat64(0, 80)))
T2, _ := T.Slice(ss(0))
T2.T()

fmt.Printf("%v\n", T2.AP)

yields:

Shape: (10, 1), Stride: [1], Dims: 2, Lock: false

Should be:

Shape: (10), Stride: [1], Dims, 1, Lock false

Further investigation shows that this bug is entirely due to an issue in the T() and Slice() method, which doesn't play that well with views.

Add Kronecker() to Tensor

It's like Outer() but applies to Tensors greater than dimension 1. Can be quite difficult. A lot of weird corner cases to think about.

Create axis iterator for AP

What

An AxisIterator is one where you iterate along an axis or multiple axes (but no greater than the len(ap.shape))

type AxisIterator struct {
    *AP

    // additional fields for tracking position etc
}

The AxisIterator conforms to a hypothetical iterator interface:

type Iterator interface {
    next() (int, error)
}

Nice to have features:

support arbitrary starting position
step

Purpose

replace the various iterator implementation in each concrete Tensor type
works as helper struct/function for various access needs - instead of writing them out manually.

// +build fastmath

Port over the fastmath functions.

Standardize Solvers

Currently the different Solvers have different features. They should all have the same features: l1reg, l2reg, clip.

Also the Solver code is messy. Clean it up, with tests.

Break all the User Unfriendly API (and replace them with better ones)

There are a great number of things that I am not happy about with regards to the API of Gorgonia. The original package Gorgonia was based on was designed to do a few machine learning things well (notably LSTMs, and deep perceptrons). As it becomes more and more general purpose, there would need to be some API changes. The only way I can discover these API-unfriendliness is through the creation of varied neural-network stuff.

For now this issue will act as a living document of sorts. Bear in mind that these are extremely trivial to fix with gorename so they will all concentrate here in this issue.

Here are the current ones on my list of bugbears, please feel free to add your own by commenting

`NewMatrix`, `NewVector` functions

Example:

x := NewMatrix(g, Float64, WithName("x"), WithShape(2,3), WithValue(xT)

This is clearly Bad Design with capital letters. There are two things that I'm not happy about with this:

Given that we already know it's a Matrix/Vector, why not enforce the shape right away?
The New... prefix makes one think that one is creating a new Matrix, not a new *Node that represents and holds a Tensor with 2 dimensions. An alternative would be the older IsAVector() name, but the reason for changing from that is because Is...() is typically reserved for functions that return bool

Proposed Fix

x := NewNodeOfVector(g, Float64, 5, WithName("x"))
y := NewNodeOfMatrix(g, Float32, 2, 3, WithName("y"), WithValue(yT))

x := NodeOfVector(g, Float64, 5, WithName("x"))
y := NodeOfMatrix(g, Float32, 2, 3, WithName("y"), WithValue(yT))

`NewNodeFromAny` should be called `NodeFromValue`

It's currently called NewNodeFromAny just to fit into the whole New...() schema.

`NewTensor` in each of the `Tensor` packages should really just be `New`

~~This will be fixed in #71 - the whole tensor package has been rewritten from ground up to be more generic.~~
FIXED. The tensor package was rewritten from ground up to be more generic.

Create Mental Separation When Creating Nodes

The one thing I like about CNTK is BrainScript, which HN user IshKebab's comment made me look deeper into. I find that it creates two modes of thinking - one mode when defining the computation graph, one mode for writing code surrounding the runtime of the computation graph. This was clearly what was lacking in Theano.

On the other hand, Theano and Tensorflow have both shared semantics with Numpy, which made defining the computation graph a lot more familiar.

Dropout is plain wrong.

Subtle UX bug

This will fail:

g := NewGraph()
x := NewVector(g, Float64, WithShape(4))
e := NewMatrix(g, Float64, WithShape(4, 10))
w := NewMatrix(g, Float64, WithShape(20, 10))
w2 := NewMatrix(g, Float64, WithShape(10, 20))
xe := Must(Mul(x, e))
act := Must(Cube(Must(Mul(w, xe))))
do := Must(Dropout(act, 0.5))

act2 := Must(Cube(Must(Mul(do, w2))))
cost := Must(Sum(act2))

_, err := Grad(cost, x, w, w2)
if err != nil {
	// ioutil.WriteFile("fullGraph.dot", []byte(g.ToDot()), 0644)
	t.Errorf("%+v", err)
}

Specifically it will fail when calculating the gradients of this line:

Must(Mul(do, w2))

>>> Shape mismatch: (20) and (10)

This is because Mul(a, b) has its semantics overloaded. When a is a vector and b is a matrix, Mul does bᵀ × a, but there is no way for the Grad function to know this. Therefore Mul(vec, Mat) is allowed (no panics), but when it comes to calculating the symbolic derivatives, it fails due to shape mismatch.

Current Solution

A hacky solution would be to do this: where Mul(vec, Mat) is called, switch the mat and vec around, to be this: Mul(Mat, vec), but this should still be fixed because it is poor usability

Multislice Bug

Simplest Reproduction Case

import T "github.com/chewxy/gorgonia"

x := NewMatrix(g, T.Float64, WithShape(2,3), WithName("x"))

T.Slice(x, T.S(0), T.S(1)

What Happens

Panic. Specifically index out of range when inferring shape of sliceOp

Suggested Fix

Rework all the slicing related stuff to share one common architecture

Add Argmin() to Tensor

Argmax already exists (see reference implementation), should be trivial to implement argmin

Add Col2Im Op

The inverse operation of #13

Add Im2Col Op

Im2Col takes a image as a 3-Tensor and makes it into a colvec. It is extremely useful in building convolutional neural networks for image-related stuff

Add SVD() to Tensor

Single Value Decomp. Should be fairly easy if familiar with linear algebra.

Returns error if Tensor is not a matrix

Add RollAxis

func (t *Tensor) RollAxis(axis, start int)

Similar to Numpy's rollaxis, which is essentially this:

 axes = list(range(0, n))
    axes.remove(axis)
    axes.insert(start, axis)
    return a.transpose(axes)

link, hacktoberfest

add link for Theano on wiki page

Add Conv2D to Gorgonia operations

Upgrade Slice definition to include step

Current Slice definition:

type Slice interface{
    Start() int
    End() int
}

Upgrade to:

type Slice interface {
    Start() int
    End() int
    Step() int
}

Restore AVX/SSE code for tensorf32

Something went wrong with the transfer to this repository, and all the assembly files wrt Float32 operations failed to pass the tests. Figure out what's wrong and fix it.

`concatOp` and `sliceIncrOp` needs serious optimization

In a slice-heavy neural network, concatOp and sliceIncrOp are the major bottlenecks because it relies on FlatIterator.

Here's the relevant pprof:

Showing top 10 nodes out of 104 (cum >= 83.20s)
      flat  flat%   sum%        cum   cum%
    20.38s  7.58%  7.58%     54.31s 20.19%  github.com/chewxy/gorgonia/tensor/types.(*FlatIterator).Next
    17.12s  6.36% 13.94%     59.04s 21.95%  runtime.findrunnable
    15.52s  5.77% 19.71%     15.52s  5.77%  github.com/chewxy/gorgonia/tensor/types.(*FlatIterator).ndNext
    13.97s  5.19% 24.90%     18.15s  6.75%  github.com/chewxy/gorgonia/tensor/types.(*FlatIterator).singleNext
    12.56s  4.67% 29.57%     14.50s  5.39%  runtime.runqgrab
    10.53s  3.91% 33.48%     39.56s 14.70%  runtime.chansend
     8.79s  3.27% 36.75%      9.92s  3.69%  runtime.casgstatus
        8s  2.97% 39.72%         8s  2.97%  runtime.releaseSudog
     7.03s  2.61% 42.34%      7.95s  2.96%  runtime.lock
     6.91s  2.57% 44.91%     83.20s 30.93%  runtime.schedule

and cumulatively:

Showing top 30 nodes out of 104 (cum >= 15.62s)
      flat  flat%   sum%        cum   cum%
     0.02s 0.0074% 0.0074%    172.46s 64.10%  runtime.goexit
         0     0% 0.0074%    100.49s 37.35%  github.com/chewxy/cubNN/TestNN2
         0     0% 0.0074%    100.49s 37.35%  testing.tRunner
     0.02s 0.0074% 0.015%    100.44s 37.33%  github.com/chewxy/cubNN.(*neuralnetwork2).train
     0.37s  0.14%  0.15%     99.69s 37.06%  github.com/chewxy/gorgonia.(*tapeMachine).RunAll
     0.05s 0.019%  0.17%     99.31s 36.91%  github.com/chewxy/gorgonia.(*execOp).exec
     1.05s  0.39%  0.56%     99.22s 36.88%  github.com/chewxy/gorgonia.execOp.exec
     1.07s   0.4%  0.96%     93.24s 34.66%  runtime.mcall
     3.11s  1.16%  2.12%     91.07s 33.85%  runtime.park_m
     6.91s  2.57%  4.68%     83.20s 30.93%  runtime.schedule
     2.77s  1.03%  5.71%     65.08s 24.19%  github.com/chewxy/gorgonia/tensor/types.(*FlatIterator).Chan.func1
    17.12s  6.36% 12.08%     59.04s 21.95%  runtime.findrunnable
    20.38s  7.58% 19.65%     54.31s 20.19%  github.com/chewxy/gorgonia/tensor/types.(*FlatIterator).Next
     2.23s  0.83% 20.48%     43.19s 16.05%  runtime.chansend1
         0     0% 20.48%     41.33s 15.36%  github.com/chewxy/gorgonia.(*sliceIncrOp).Do
     0.08s  0.03% 20.51%     41.33s 15.36%  github.com/chewxy/gorgonia.sliceIncrOp.Do
    10.53s  3.91% 24.42%     39.56s 14.70%  runtime.chansend
         0     0% 24.42%     37.17s 13.82%  github.com/chewxy/gorgonia.(*concatOp).Do
         0     0% 24.42%     37.17s 13.82%  github.com/chewxy/gorgonia.concatOp.Do
     0.04s 0.015% 24.44%     37.07s 13.78%  github.com/chewxy/gorgonia/tensor.Concat
     0.21s 0.078% 24.52%     37.02s 13.76%  github.com/chewxy/gorgonia/tensor/f64.(*Tensor).Concat
     3.02s  1.12% 25.64%     35.35s 13.14%  github.com/chewxy/gorgonia/tensor/f64.assignArray
     3.91s  1.45% 27.09%     33.61s 12.49%  github.com/chewxy/gorgonia/tensor/f64.(*Tensor).VAdd
     1.25s  0.46% 27.56%     30.55s 11.36%  runtime.chanrecv2
     6.22s  2.31% 29.87%     29.30s 10.89%  runtime.chanrecv
     2.24s  0.83% 30.70%     24.64s  9.16%  runtime.systemstack
     5.65s  2.10% 32.80%     20.15s  7.49%  runtime.runqsteal
    13.97s  5.19% 38.00%     18.15s  6.75%  github.com/chewxy/gorgonia/tensor/types.(*FlatIterator).singleNext
     1.79s  0.67% 38.66%     15.65s  5.82%  runtime.recv
     0.27s   0.1% 38.76%     15.62s  5.81%  runtime.goready

Of particular note is also the assignArray function. It uses the (*FlatIterator).Chan() method, which may or may not be a detriment.

Refactor type system out into its own package

Required before #3 happens

Type system should be refined too:

remove typeClass (think about this first!)
concretify (replace *typeVariable with a unified type using pruneCompletely()) more aggressively, instead of relying on *typeVariable everywhere, which leads to a lot of GC pressure
keep functionType but export it.

Simplify Value type

Currently Value is this:

type Value interface {
    Type() Type
    Shape() types.Shape
    Size() int
    Dtype() Dtype
    Eq(other Value) bool
    Data() interface{}

    clone() (Value, error)
    zero() Value

    fmt.Formatter
}

As I added the Data() inteface method into Value, I realized that this could have been better (better because we're going with the "smaller interfaces means less leaks" idea:

type Value interface {
    Shape() types.Shape
    Size() int
    Data() interface{}
    fmt.Formatter
}

With this definition of Value, we'd of course need to write these functions:

func typeOf(v Value) Type {}
func dtypeOf(v Value) Dtype {}
func valueEq(a, b Value) boo {}
func cloneValue(v Value) (Value, error) {}
func zero(v Value) Value {}

But our interface becomes smaller, and the Tensor type can be removed completely, because types.Tensor inherently already fulfils the new Value interface.

And instead of having a catchall-type for Scalar, this can possibly be done:

type F64 float64
type F32 float32
type I int

then we can get rid of NewScalarValue and NewTensorValue. This would simplify the end API too (see also: #33 )

Add Solve() Method to Tensor

Solve a matrix. Should be fairly easy if you are familiar with linear algebra.

return error if Tensor is not a matrix

Remove the panic()'s

There are a few places in Gorgonia where we panic on errors instead of returning meaningful errors. for most (all?) these cases, we can probably return meaningful errors instead. This is a non-trivial amount of work, as the signature of the panic-ing functions will change, so some work needs to be done for to adjust those functions and all places where those functions are called.

Create `May`

May is the maybe monad. Causes bugs like these to be non-existant.

On the plus side, if the maybe monad is exported, it also helps users - they'd now have Must() and May()

errors clean up

After the initial work on the errors, there is some more work that needs to be done to clean things up:

We need to add a way to handle the errors at the top of the package (probably gorgonia.go) which uses errors.Print to print the stack trace of errors which are caught to the user.
For this, we may wish to have a Handle(err) function which allows the users of this library to cleanly have access to our errors, which will errensially do a `errors.Print(err) if err is not nil.
We may wish to add this error handling to our tests. Currently, we user t.error(err) when an error is encountered which does not give us the stack trace.adding a Handle(err) would be a very cheap and effective way to add a stack trace to the errors in our tests which would help in debugging.

Write tests for Apply()

Test with:

vanilla Tensor
view (viewOf != nil)
thunked transposes (transposeWith !=nil)

Add Min() to Tensor

Refer to #19

Table driven tests for Transpose()

Rewrite tests for all the individual Tensor packages to use table driven tests for Transpose.

Reason is I think the current tests are incomplete, and something is leaking. A table driven test would be more complete.

Rationale: I was working on improving the performance of Materialize() and kept running into a bug with Transpose()

if xgrad, err := x.Grad(); err != nil {
    fmt.Printf("dz/dx: %v", xgrad)
}

if ygrad, err := y.Grad(); err != nil {
    fmt.Printf("dz/dy: %v", ygrad)
}

func (t *Tensor) Max(along ...int) (retVal *Tensor, err error) {
    return nil, nil
}

This may be required to do #15

Argmax and Argmin are dead wrong

Writing new tests confirmed this