Giter VIP home page Giter VIP logo

gonum / gonum Goto Github PK

View Code? Open in Web Editor NEW
7.3K 118.0 517.0 16.72 MB

Gonum is a set of numeric libraries for the Go programming language. It contains libraries for matrices, statistics, optimization, and more

Home Page: https://www.gonum.org/

License: BSD 3-Clause "New" or "Revised" License

Shell 0.17% Go 93.85% Assembly 2.48% TeX 0.24% Makefile 0.01% MATLAB 0.20% Fortran 2.95% Ragel 0.11%
go golang scientific-computing data-analysis matrix statistics graph

gonum's Introduction

Gonum

Build status Build status codecov.io go.dev reference GoDoc Go Report Card stability-unstable

Installation

The core packages of the Gonum suite are written in pure Go with some assembly. Installation is done using go get.

go get -u gonum.org/v1/gonum/...

Supported Go versions

Gonum supports and tests using the gc compiler on the two most recent Go releases on Linux (386, amd64 and arm64), macOS and Windows (both on amd64).

Note that floating point behavior may differ between compiler versions and between architectures due to differences in floating point operation implementations.

Release schedule

The Gonum modules are released on a six-month release schedule, aligned with the Go releases. i.e.: when Go-1.x is released, Gonum-v0.n.0 is released around the same time. Six months after, Go-1.x+1 is released, and Gonum-v0.n+1.0 as well.

The release schedule, based on the current Go release schedule is thus:

  • Gonum-v0.n.0: February
  • Gonum-v0.n+1.0: August

Build tags

The Gonum packages use a variety of build tags to set non-standard build conditions. Building Gonum applications will work without knowing how to use these tags, but they can be used during testing and to control the use of assembly and CGO code.

The current list of non-internal tags is as follows:

  • safe — do not use assembly or unsafe
  • bounds — use bounds checks even in internal calls
  • noasm — do not use assembly implementations
  • tomita — use Tomita, Tanaka, Takahashi pivot choice for maximimal clique calculation, otherwise use random pivot (only in topo package)

Issues TODOs

If you find any bugs, feel free to file an issue on the github issue tracker. Discussions on API changes, added features, code review, or similar requests are preferred on the gonum-dev Google Group.

https://groups.google.com/forum/#!forum/gonum-dev

License

Original code is licensed under the Gonum License found in the LICENSE file. Portions of the code are subject to the additional licenses found in THIRD_PARTY_LICENSES. All third party code is licensed either under a BSD or MIT license.

Code in graph/formats/dot is dual licensed Public Domain Dedication and Gonum License, and users are free to choose the license which suits their needs for this code.

The W3C test suites in graph/formats/rdf are distributed under both the W3C Test Suite License and the W3C 3-clause BSD License.

gonum's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gonum's Issues

unit: Package documentation has unimplemented features

Many of the examples in doc.go do not work. For example, unit.Temperature, unit.Pressure, unit.Density, unit.Bar, unit.Energy, and unit.Area do not exist.

Currently, it is unclear how this package is meant to be used. Are the features in the documentation features that used to exist that are now deprecated, or are they features that are still desired? I would be happy to help improve the package and documentation, if the authors could explain their vision.

Thanks!

blas/gonum: Move scaling of C into the loop

The Dgemm code right now scales the data in C, and then performs C += alpha * A * B. This makes two passes over the data in C, and is also serial. The beta should be moved into the loops like it is in the reference implementation.

optimize: Hessian rescaling in Quasi-Newton methods

In the Quasi-Newton methods, especially close to the optimum, it often gets to the point that the search direction is very nearly perpendicular to the gradient. This harms our convergence by at least an order of magnitude or two in many cases. We should implement some form of discovery switch for this case and do (approximate) Hessian restart and/or some other form of conditioning.

mat: Test behavior of zero-sized matrices

list_test.go does not test zero sized matrices. I quickly checked adding them and the testing suite panics. We should make sure there is a fixed behavior when there is a zero sized argument (at least on a per function/method basis).

mat: Improvements to list_test testing behavior

List test works by generating random matrices. We should improve this behavior by a) Having an inner loop where we test on a set of random matrices b) Instead of using rand.Float64() to generate random elements, we should instead use const*rand.NormFloat64()

Fix a) helps to check "nonlinear" functions like Max where the location of the maximum element can vary, and Fix b) ensures we have have matrix values of a variety of signs and sizes. Plausibly we also should have a small change of being 0 as well (5% or something).

mat: add capacity to define alternative formatted styles

Ideally this would be simple like time format definitions. So the current unicode style would be:

`[0  1]
⎡2  3⎤
⎢4  5⎥
⎣6  7⎦`

a MATLAB input style would be:

`[0 1]
[2 3; 4 5; 6 7]`

etc.

We need to sort out some heuristics for how this works and be able to document them simply. For example: cell width alignment is not calculated with there is no newline between 3 and 4, and 5 and 6.

graph: Consider how to help with dynamically created graph implementations

I do a lot with what I'd call "dynamic graphs". This is a graph where a state generates its successors, rather than having its entire representation preconstructed. I implemented a toy problem with our interface yesterday night just to test it for my use case.

The problem is that a lot of methods end up being incompletely implemented. Many times in these implementations it's intractable to generate Predecessors (meaning that it's also intractable to generate Neighbors). In fact, it's computationally expensive to determine if an edge even exists between two nodes in some cases. You have to compute the successors of each and check if one happens to be equivalent to one of the successors of the other. For combinatorically large numbers of successors to generate, this gets expensive quickly.

Numeric IDs are a problem too, but you can sneak around that by having a map[string]int in the graph and implementing a String() method on the node type, so every time you generate a node you check if g.idMap[n.String()] exists and if not request a new ID and store it. It's ugly, but acceptable.

Other things that are impossible to implement would be NodeList, EdgeList, and by extension Order (#18), as well as Degree (which has been removed but worth considering in the future). In fact, these graphs may or may not be infinite.

Obviously this means that using these types of graphs with certain algorithms is impossible. I can use A*, but not FloydWarshall, for instance. That's acceptable since, well, it's a logical fact that if you can't implement certain methods you can't use them. However, it does lead to a lot of dead methods to implement.

The problem I have is fragmenting graph into increasingly small interfaces. It seems messy to have an individual interface for every conceivable method. Any ideas on how to handle this?

lapack: Tests should be off of explicit block size

Right now, most of the blocked algorithms are tested by creating large matrices. Instead, the actual block sizes should be gotten from ilaenv and those values should be used to generate matrix sizes (blockSize, 2*blockSize, blockSize - 1, etc.)

stat: Export corrToCov and covToCorr?

In one of my projects I am given a covariance matrix, and need to convert it to a correlation matrix. I have currently vendored the functionality, but it may be more generally useful.

optimize: Minibatch optimization

It would be nice to add optimizers for mini-batch optimization. The classic is stochastic gradient descent, but there are other nice methods like "Fast large-scale optimization by unifying stochastic gradient and quasi-Newton methods" by @Sohl-Dickstein, @poolio and @ganguli-lab.

I like sofopt (sum-of-functions optimization) as a package name, since that's what it really is, though it does cause some confusion with the similarly-named algorithm in the paper I note above.

stat/distmv: Add test for StudentsT marginal

There is a TestMarginal for Normal which matches the samples drawn with the marginal computed. Student's T should have a similar function. MarginalSingle for Student's T should compare with that test as well.

stat: ROC signature

I came up with what I think is an improvement to the ROC function,
in which the cutoffs are defined explicitly instead of the number of cutoffs n.
This allows for unequally-spaced cutoffs, allows for all cutoffs to be used
more naturally, and simplifies the code itself.

Is this a good idea -- should I put a PR in for it, @sbinet @kortschak ?

Also, I was thinking maybe ROC should output an error as well, instead of all
the panic statements? Or is it appropriate that it panic?

What I did is here, in case you wanted to look at the specifics of what I am
suggesting:
https://github.com/Armadilloa16/stat/tree/roc

floats: Add ScaleTo?

Unlike other functions, Scale does not have its To variant. Is there a reason for it? Reasons for adding: 1) Consistency 2) It could be backed by asm.DscalTo 3) optimize could used it at several places.

optimize: Consider adding Method.Supports(Function) (bool, error)

@btracey 's comment from #45 : Given we do use some magic, I can see the need for support. It seems like the better case is for the optimizer to have a Support function that either returns a boolean (if it supports), or an error which is nil if it supports, and provides a detailed explanation if it isn't, i.e. "Bfgs requires the Function to have a Gradient. Function does not implement either Gradient or FunctionGradient."

lapack/gonum: review all functions

Much of the code in native sacrifices code quality in the name of correctness. Now there is a lot of redundancy in tests, and code quality can be improved with a reduced risk in error. We should take a second pass over the functions to make them idiomatic go (removing gotos, etc.)

unit: Intermittent test failure in TestFormat

I seem to be getting an occasional test failure in TestFormat (and travis-ci does as well):

--- FAIL: TestFormat (0.00 seconds)
unit_test.go:44: Format "%#v": got: "&unit.Unit{dimensions:unit.Dimensions{6:-1, 4:2}, formatted:"", value:6.62606957e-34}" expected: "&unit.Unit{dimensions:unit.Dimensions{4:2, 6:-1}, formatted:"", value:6.62606957e-34}"
FAIL

go test -race doesn't find anything wrong.

mat: Redesign Eigen

Right now, Eigen stores the eigenvalue matrix as a *Dense. Conceptually, this is wrong, because in the general case, it's really a complex matrix. The code instead uses extra rows and columns to represent complex values, which means that the physical size of the matrix is larger than its conceptual size. It would be better to just have the values just be complex. Acyclic imports mean we can't have mat128 depend on mat64 and vis-versa, but we can have mat64 import blas128. Eigen can return the blas structs, which can then be converted into mat128 matrices. This will take some work to interface with lapack routines, but not doing this work means we just push the problem onto the user (making them do the conversion from 2x2 float64 blocks to complex values). The eigenvalues by blas128.Banded

We should also reconsider the function signatures. First of all, it seems easier to me to have it be
Eigen(a *Dense, epsilon) (eigenvalues blas128.Banded, eigenvectors *Dense)
I'm not sure what the EigenFactors struct helps with
Secondly, we should also have
EigenSym(a *Symmetric, epsilon) (eigenvalues *Diagonal, eigenvectors *Dense)
as the eigenvalues of a symmetric matrix are real.
The asymmetry between the two is unfortunate (Diagonal vs. Banded), however there is no Diagonal matrix in BLAS. We could add a definition both into blas64 and blas128, or we could plausibly have a non-blas package containing the definition of RawDiagonal (name tbd) for both complex and real diagonal matrices. We could also leave as-is, and solve the problem in a different manner (there are a few choices).

graph: consider SetEdge() returning an error

The motivation is that for some graphs (e.g., planar) not every edge is legal and checking if setting an edge is possible can be the same amount of work as setting it and seeing if it fails.

However, that would mean adding error checking and responding to errors from SetEdge() all over the place for a very minor application.

graph: graph serialisation

It would be good to have some standard graph serialisation capacity.

Which would we like to support?

Probably DOT at the very least, but I think at least one of the richer markup languages, probably GraphML since it seems to have the widest compatibility range, though GEXF would probably be nice as well.

In the first instance marshaling support would be the goal, later to add unmarshaling.

mat: add Cholesky.SymShift (name tentative)

There is a need for a method on Cholesky that updates the factorization after a permutation matrix P has been applied to the matrix A as P^T * A * P.

According to @btracey : The use case is that if you want to find the marginal of a Gaussian but don’t want to reconstruct the covariance matrix. You can drop rows of the Cholesky factorization, but only if they’re the last variables. So, you need to swap the variables around to make them the last ones, and then you can chop off those rows.

LINPACK has http://www.netlib.org/linpack/dchex.f which performs, given two column indices k < l,
a right circular shift rearranging the columns of U as
1,...,k-1,l,k,k+1,...,l-1,l+1,...,n
or a left circular shift rearranging the columns of U as
1,...,k-1,k+1,k+2,...,l,k,l+1,...,n

Implementation is not difficult, the approach is similar to that of Cholesky.SymRankOne when alpha > 0.

When solving a least-squares problem with Cholesky, it seems that it is useful to be able to update also a given vector together with U, denoted as z in dchex. The API should account for that.

Exact API is open for discussion but this addition probably should/could wait until after 1.0 release (?).

mat: Increase usage of the list_test routines

The list_test routines have caught several bugs in existing routines, and have pre-emptively caught many subtle bugs during my development. With #239, the four major classes are tested. We should add calls to list_test to existing routines/methods that do not have it like MulElem.

optimize: Relative or absolute MoreThuente.StepTolerance?

Discussion copied from #148:

Should it be the case that StepTolerance is a relative value while Minimu and Maximum are absolute values? Seems odd.

I think that Minimum and MaximumStep being absolute is not odd. They are a priori bounds on the step independent of the actual value of the step. Our current optimizers are unconstrained and do not bound the step. But for example Nocedal's LBFGS-B code uses the maximum bound to limit the step length in the given direction and that was also my motivation for including it here.

Sorry, I do understand why Max and Min steps are absolute, but I don't understand why StepTolerance should be relative, especially when the other Step numbers are all absolute.

I don't insist on the step tolerance being relative, it's simply what the original code is doing. What would probably make most sense would be to make it an absolute tolerance for the interval length scaled by a norm of the descent direction, something like |step_max - step_min| * ||dir|| < abs_step_tol.

I think absolute is easier to think about. There's no clear definition of "relative" here (though I know you provided one). Additionally, the tolerance, even absolutely, is already scaled no ||dir||. Our "coordinates" have a step of size ||dir|| equal to 1.

Yes, that's why I offered the alternative to measure the interval width in real, non-scaled-on-dir units. However, for that we would have to change the Linesearcher interface and that does not feel worthy. I don't have a strong argument for a relative tolerance, so I will change it to an absolute tolerance and see what it does with the tests.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.