gonum / gonum Goto Github PK

Gonum is a set of numeric libraries for the Go programming language. It contains libraries for matrices, statistics, optimization, and more

Home Page: https://www.gonum.org/

License: BSD 3-Clause "New" or "Revised" License

Shell 0.17% Go 93.85% Assembly 2.48% TeX 0.24% Makefile 0.01% MATLAB 0.20% Fortran 2.95% Ragel 0.11%

go golang scientific-computing data-analysis matrix statistics graph

gonum's Introduction

Gonum

Installation

The core packages of the Gonum suite are written in pure Go with some assembly. Installation is done using go get.

go get -u gonum.org/v1/gonum/...

Supported Go versions

Gonum supports and tests using the gc compiler on the two most recent Go releases on Linux (386, amd64 and arm64), macOS and Windows (both on amd64).

Note that floating point behavior may differ between compiler versions and between architectures due to differences in floating point operation implementations.

Release schedule

The Gonum modules are released on a six-month release schedule, aligned with the Go releases. i.e.: when Go-1.x is released, Gonum-v0.n.0 is released around the same time. Six months after, Go-1.x+1 is released, and Gonum-v0.n+1.0 as well.

The release schedule, based on the current Go release schedule is thus:

Gonum-v0.n.0: February
Gonum-v0.n+1.0: August

Build tags

The Gonum packages use a variety of build tags to set non-standard build conditions. Building Gonum applications will work without knowing how to use these tags, but they can be used during testing and to control the use of assembly and CGO code.

The current list of non-internal tags is as follows:

safe — do not use assembly or unsafe
bounds — use bounds checks even in internal calls
noasm — do not use assembly implementations
tomita — use Tomita, Tanaka, Takahashi pivot choice for maximimal clique calculation, otherwise use random pivot (only in topo package)

Issues

If you find any bugs, feel free to file an issue on the github issue tracker. Discussions on API changes, added features, code review, or similar requests are preferred on the gonum-dev Google Group.

https://groups.google.com/forum/#!forum/gonum-dev

License

Original code is licensed under the Gonum License found in the LICENSE file. Portions of the code are subject to the additional licenses found in THIRD_PARTY_LICENSES. All third party code is licensed either under a BSD or MIT license.

Code in graph/formats/dot is dual licensed Public Domain Dedication and Gonum License, and users are free to choose the license which suits their needs for this code.

The W3C test suites in graph/formats/rdf are distributed under both the W3C Test Suite License and the W3C 3-clause BSD License.

gonum's People

Stargazers

Watchers

Forkers

zeroviscosity sbinet-gonum mewbak mewpull junxie6 sckelemen sebito91 szaydel bpgray mingrammer gosundy astaxie silence-lml betashepherd rpinheiroalmeida blockchain-2017 mydp2017 irevoire j143-zz james-bowman tengelskar bramp chewxy gophersgang linpingchuan katamaritaco sglyon tynovsky joshua danielfireman ariefdarmawan kidzss thalesfsp jchatkinson juroland themushrr00m suhuaguo wdv4758h scorpiokat campoy kczimm dlb233 leoneu deasmhumhna koalacxr huncent wynalh jherekhealy red7hj linuxerwang shengyao apt304 pythonai vlastv millerick erlang-boy rn0311 ragodev abhisheklakra007 tlming16 monad-one corona10 etsangsplk boyone paulrigor donkahlero davidwalter0 shangy sssgun 32bitkid gustafj resulknad marcsantiago cmeury benluteijn dailing adrpino therishidesai fintrek tingxin williamsyb kazi308 daniel-m poopoothegorilla radiusnetworks sidhu177 ifraixedes divan chrishines alexpantyukhin johnnyurosevic wozniakjan brentp ahlusar1989 kzahedi kunalpowar abduld andradeandrey highway900 bigflood

gonum's Issues

unit: Package documentation has unimplemented features

Many of the examples in doc.go do not work. For example, unit.Temperature, unit.Pressure, unit.Density, unit.Bar, unit.Energy, and unit.Area do not exist.

Currently, it is unclear how this package is meant to be used. Are the features in the documentation features that used to exist that are now deprecated, or are they features that are still desired? I would be happy to help improve the package and documentation, if the authors could explain their vision.

Thanks!

blas/gonum: Move scaling of C into the loop

The Dgemm code right now scales the data in C, and then performs C += alpha * A * B. This makes two passes over the data in C, and is also serial. The beta should be moved into the loops like it is in the reference implementation.

mat: Factorization methods should be consistent about usage

Factorization methods (Cholesky, Eigen) should be consistent about their behavior when factorization fails. For example, they should panic when Solve is called.

optimize: Move FunctionTol into FunctionConverge

We could get around the function tolerance by moving it to function converge. It fits there anyway.

optimize: Hessian rescaling in Quasi-Newton methods

In the Quasi-Newton methods, especially close to the optimum, it often gets to the point that the search direction is very nearly perpendicular to the gradient. This harms our convergence by at least an order of magnitude or two in many cases. We should implement some form of discovery switch for this case and do (approximate) Hessian restart and/or some other form of conditioning.

mat: Test behavior of zero-sized matrices

list_test.go does not test zero sized matrices. I quickly checked adding them and the testing suite panics. We should make sure there is a fixed behavior when there is a zero sized argument (at least on a per function/method basis).

mat: Improvements to list_test testing behavior

List test works by generating random matrices. We should improve this behavior by a) Having an inner loop where we test on a set of random matrices b) Instead of using rand.Float64() to generate random elements, we should instead use const*rand.NormFloat64()

Fix a) helps to check "nonlinear" functions like Max where the location of the maximum element can vary, and Fix b) ensures we have have matrix values of a variety of signs and sizes. Plausibly we also should have a small change of being 0 as well (5% or something).

mat: add capacity to define alternative formatted styles

Ideally this would be simple like time format definitions. So the current unicode style would be:

`[0  1]
⎡2  3⎤
⎢4  5⎥
⎣6  7⎦`

a MATLAB input style would be:

`[0 1]
[2 3; 4 5; 6 7]`

etc.

We need to sort out some heuristics for how this works and be able to document them simply. For example: cell width alignment is not calculated with there is no newline between 3 and 4, and 5 and 6.

stat/distmv, stat/distuv: Special case StudentsT for nu = inf

If nu is infinity, StudentsT is a Gaussian. We may want to special case the code to work when nu = inf. For example, one may want a set of StudentsT where nu = []float64{3,10, 30, 100, inf}

mat: do slicing bounds checks when using '-tags bounds'

We don't currently do bounds checks on the sub-slicing operations we do in some methods. We should.

graph/all: fix up documentation

Throughout there are issues with the documentation, including but not limited to absence of package docs.

optimize: Make sure that linesearchers handle NaN after a step.

graph: Consider how to help with dynamically created graph implementations

I do a lot with what I'd call "dynamic graphs". This is a graph where a state generates its successors, rather than having its entire representation preconstructed. I implemented a toy problem with our interface yesterday night just to test it for my use case.

The problem is that a lot of methods end up being incompletely implemented. Many times in these implementations it's intractable to generate Predecessors (meaning that it's also intractable to generate Neighbors). In fact, it's computationally expensive to determine if an edge even exists between two nodes in some cases. You have to compute the successors of each and check if one happens to be equivalent to one of the successors of the other. For combinatorically large numbers of successors to generate, this gets expensive quickly.

Numeric IDs are a problem too, but you can sneak around that by having a map[string]int in the graph and implementing a String() method on the node type, so every time you generate a node you check if g.idMap[n.String()] exists and if not request a new ID and store it. It's ugly, but acceptable.

Other things that are impossible to implement would be NodeList, EdgeList, and by extension Order (#18), as well as Degree (which has been removed but worth considering in the future). In fact, these graphs may or may not be infinite.

Obviously this means that using these types of graphs with certain algorithms is impossible. I can use A*, but not FloydWarshall, for instance. That's acceptable since, well, it's a logical fact that if you can't implement certain methods you can't use them. However, it does lead to a lot of dead methods to implement.

The problem I have is fragmenting graph into increasingly small interfaces. It seems messy to have an individual interface for every conceivable method. Any ideas on how to handle this?

lapack: Tests should be off of explicit block size

Right now, most of the blocked algorithms are tested by creating large matrices. Instead, the actual block sizes should be gotten from ilaenv and those values should be used to generate matrix sizes (blockSize, 2*blockSize, blockSize - 1, etc.)

mat: use Dlauum instead of SymOuterK in Cholesky.ToSym

To reconstruct a SymDense from the cholesky decomposition, we currently use SymOuterK. There is apparently an explicitly lapack function for computing L^T * L (Dlauum). We should use that instead.

blas/gonum: []float64 slices should be sliced to their shortest needed length

Currently we us s[n:] which matches the fortran. We are missing the opportunity for safety here by not saying s[n:m] where we know what m is.

stat: Export corrToCov and covToCorr?

In one of my projects I am given a covariance matrix, and need to convert it to a correlation matrix. I have currently vendored the functionality, but it may be more generally useful.

stat: Should distribution measures panic if the sum is not one?

Entropy (for example) does not panic if p is not a probability distribution (doesn't sum to 1). Should it?

graph/community: add api for controlling iteration, time and progress limits

I'm thinking this is best handled by functional options, which may allow us to remove the all bool on the multiplex modularisation routines.

Proposed options: Epsilon(float64), Iterations(int), Timeout(time.Duration) and OnlyConnected().

stat: Need p-value and r-squared by feature in multi variable linear regression

Was trying to port something over from Python using (statsmodel) and could not find support for p-value or r-squared by feature.

optimize: Minibatch optimization

It would be nice to add optimizers for mini-batch optimization. The classic is stochastic gradient descent, but there are other nice methods like "Fast large-scale optimization by unifying stochastic gradient and quasi-Newton methods" by @Sohl-Dickstein, @poolio and @ganguli-lab.

I like sofopt (sum-of-functions optimization) as a package name, since that's what it really is, though it does cause some confusion with the similarly-named algorithm in the paper I note above.

mat: Cholesky should provide a way to query the condition number

There is no way to know the condition number of the matrix without trying to do a solve first.

optimize: Pass the problem dimension to getDefaultMethod() and use it there

lapack/gonum: []float64 slices should be sliced to their shortest needed length

Currently we us s[n:] which matches the fortran. We are missing the opportunity for safety here by not saying s[n:m] where we know what m is.

mat: What about sparse matrices?

I have big-size matrices, but it too sparsed. I do not have enough memory, and I do not see other libraries to do it.

stat/distmv: Add test for StudentsT marginal

There is a TestMarginal for Normal which matches the samples drawn with the marginal computed. Student's T should have a similar function. MarginalSingle for Student's T should compare with that test as well.

optimize: Add an exported Test() function to functions subpackage

mat: element-wise arithmetic operators use slow path with transposed parameters

Currently the Add, Sub, MulElem,... methods do not make use of an available RawMatrixer if it obtainable from an untranspose. This means a.T() or b.T() parameters fall through to the slow path. Fixing this complexifies the code, so benchmarks should be run to see if the change is warranted.

stat: ROC signature

I came up with what I think is an improvement to the ROC function,
in which the cutoffs are defined explicitly instead of the number of cutoffs n.
This allows for unequally-spaced cutoffs, allows for all cutoffs to be used
more naturally, and simplifies the code itself.

Is this a good idea -- should I put a PR in for it, @sbinet @kortschak ?

Also, I was thinking maybe ROC should output an error as well, instead of all
the panic statements? Or is it appropriate that it panic?

What I did is here, in case you wanted to look at the specifics of what I am
suggesting:
https://github.com/Armadilloa16/stat/tree/roc

graph: add Moving Target D* Lite and Field D*

MT D* Lite: http://idm-lab.org/bib/abstracts/papers/aamas10a.pdf
Field D*: http://robots.stanford.edu/isrr-papers/final/final-23.pdf

~~I have a D* Lite implementation done, but am waiting on #85 to submit that - and will then make the necessary fork/mods for MT D* and Field D*.~~

optimize: Add DecreaseFactor to bisection

Should be there for completeness.

diff: Add finite difference approximation of the Jacobian-vector product

floats: Add ScaleTo?

Unlike other functions, Scale does not have its To variant. Is there a reason for it? Reasons for adding: 1) Consistency 2) It could be backed by asm.DscalTo 3) optimize could used it at several places.

blas/native/Dgemm: Remove re-declarations for 1.5

Dgemm redeclares a number of variables in the parallel loop to avoid global references. There is a PR to change gc to capture these variables by value, which will render this optimization unnecessary.

https://groups.google.com/forum/#!topic/golang-codereviews/ckUrDjXR5pI

optimize: Consider adding Method.Supports(Function) (bool, error)

@btracey 's comment from #45 : Given we do use some magic, I can see the need for support. It seems like the better case is for the optimizer to have a Support function that either returns a boolean (if it supports), or an error which is nil if it supports, and provides a detailed explanation if it isn't, i.e. "Bfgs requires the Function to have a Gradient. Function does not implement either Gradient or FunctionGradient."

lapack/gonum: review all functions

Much of the code in native sacrifices code quality in the name of correctness. Now there is a lot of redundancy in tests, and code quality can be improved with a reduced risk in error. We should take a second pass over the functions to make them idiomatic go (removing gotos, etc.)

unit: Intermittent test failure in TestFormat

I seem to be getting an occasional test failure in TestFormat (and travis-ci does as well):

--- FAIL: TestFormat (0.00 seconds)
unit_test.go:44: Format "%#v": got: "&unit.Unit{dimensions:unit.Dimensions{6:-1, 4:2}, formatted:"", value:6.62606957e-34}" expected: "&unit.Unit{dimensions:unit.Dimensions{4:2, 6:-1}, formatted:"", value:6.62606957e-34}"
FAIL

go test -race doesn't find anything wrong.

mat: Redesign Eigen

Right now, Eigen stores the eigenvalue matrix as a *Dense. Conceptually, this is wrong, because in the general case, it's really a complex matrix. The code instead uses extra rows and columns to represent complex values, which means that the physical size of the matrix is larger than its conceptual size. It would be better to just have the values just be complex. Acyclic imports mean we can't have mat128 depend on mat64 and vis-versa, but we can have mat64 import blas128. Eigen can return the blas structs, which can then be converted into mat128 matrices. This will take some work to interface with lapack routines, but not doing this work means we just push the problem onto the user (making them do the conversion from 2x2 float64 blocks to complex values). The eigenvalues by blas128.Banded

We should also reconsider the function signatures. First of all, it seems easier to me to have it be
Eigen(a *Dense, epsilon) (eigenvalues blas128.Banded, eigenvectors *Dense)
I'm not sure what the EigenFactors struct helps with
Secondly, we should also have
EigenSym(a *Symmetric, epsilon) (eigenvalues *Diagonal, eigenvectors *Dense)
as the eigenvalues of a symmetric matrix are real.
The asymmetry between the two is unfortunate (Diagonal vs. Banded), however there is no Diagonal matrix in BLAS. We could add a definition both into blas64 and blas128, or we could plausibly have a non-blas package containing the definition of RawDiagonal (name tbd) for both complex and real diagonal matrices. We could also leave as-is, and solve the problem in a different manner (there are a few choices).

optimize: Decide proper behavior after LinesearchMethod.Init() returns an error

graph: consider SetEdge() returning an error

The motivation is that for some graphs (e.g., planar) not every edge is legal and checking if setting an edge is possible can be the same amount of work as setting it and seeing if it fails.

However, that would mean adding error checking and responding to errors from SetEdge() all over the place for a very minor application.

graph: graph serialisation

It would be good to have some standard graph serialisation capacity.

Which would we like to support?

Probably DOT at the very least, but I think at least one of the richer markup languages, probably GraphML since it seems to have the widest compatibility range, though GEXF would probably be nice as well.

In the first instance marshaling support would be the goal, later to add unmarshaling.

stat: Fix distmv to allow zero-matrices

For example in Normal.CovarianceMatrix

optimize: Add a unit test when the initial F is NaN or Inf

mat: add Cholesky.SymShift (name tentative)

There is a need for a method on Cholesky that updates the factorization after a permutation matrix P has been applied to the matrix A as P^T * A * P.

According to @btracey : The use case is that if you want to find the marginal of a Gaussian but don’t want to reconstruct the covariance matrix. You can drop rows of the Cholesky factorization, but only if they’re the last variables. So, you need to swap the variables around to make them the last ones, and then you can chop off those rows.

LINPACK has http://www.netlib.org/linpack/dchex.f which performs, given two column indices k < l,
a right circular shift rearranging the columns of U as
1,...,k-1,l,k,k+1,...,l-1,l+1,...,n
or a left circular shift rearranging the columns of U as
1,...,k-1,k+1,k+2,...,l,k,l+1,...,n

Implementation is not difficult, the approach is similar to that of Cholesky.SymRankOne when alpha > 0.

When solving a least-squares problem with Cholesky, it seems that it is useful to be able to update also a given vector together with U, denoted as z in dchex. The API should account for that.

Exact API is open for discussion but this addition probably should/could wait until after 1.0 release (?).

stat/distmv: Add Nu() method to MV Student's T

It cannot be extracted at present, but it can get created unobserved with Conditional

mathext: Implement modified bessel function

The modified Bessel function comes up at least in the Matern kernel (scipy.special.kv). It's implemented in the amos library in zbest.

mat: Increase usage of the list_test routines

The list_test routines have caught several bugs in existing routines, and have pre-emptively caught many subtle bugs during my development. With #239, the four major classes are tested. We should add calls to list_test to existing routines/methods that do not have it like MulElem.

Should it be the case that StepTolerance is a relative value while Minimu and Maximum are absolute values? Seems odd.

I think that Minimum and MaximumStep being absolute is not odd. They are a priori bounds on the step independent of the actual value of the step. Our current optimizers are unconstrained and do not bound the step. But for example Nocedal's LBFGS-B code uses the maximum bound to limit the step length in the given direction and that was also my motivation for including it here.

Sorry, I do understand why Max and Min steps are absolute, but I don't understand why StepTolerance should be relative, especially when the other Step numbers are all absolute.

I don't insist on the step tolerance being relative, it's simply what the original code is doing. What would probably make most sense would be to make it an absolute tolerance for the interval length scaled by a norm of the descent direction, something like |step_max - step_min| * ||dir|| < abs_step_tol.

I think absolute is easier to think about. There's no clear definition of "relative" here (though I know you provided one). Additionally, the tolerance, even absolutely, is already scaled no ||dir||. Our "coordinates" have a step of size ||dir|| equal to 1.

Yes, that's why I offered the alternative to measure the interval width in real, non-scaled-on-dir units. However, for that we would have to change the Linesearcher interface and that does not feel worthy. I don't have a strong argument for a relative tolerance, so I will change it to an absolute tolerance and see what it does with the tests.