ems-tu-ilmenau / fastmat Goto Github PK

View Code? Open in Web Editor NEW

24.0 24.0 8.0 1.39 MB

A library to build up lazily evaluated expressions of linear transforms for efficient scientific computing.

Home Page: https://fastmat.readthedocs.io

License: Apache License 2.0

Python 32.60% C 0.12% Makefile 1.15% Cython 66.13%

numerics operators optimization python

fastmat's People

Contributors

Stargazers

Watchers

Forkers

mp4096 fabiankrieg christophwwagner sebastiansemper zhubincheng elseviersoftwarex lse672 stefanheyder

fastmat's Issues

Enhancements of the MLUltrasound-Class

Changes to be made

~~Rename to BlockTwoLevelToeplitz~~
Constructor should not load any data, but can only receive a path to the defining vectors
The diagonalization process should not happen in the constructor but instead when befor the first projection/multiplication
getCols() and getRows() should be overloaded with routines not using the frequency domain
get rid of Kronecker-Products of Fourier-Matrices

Expected gains

World domination
Be able to extract parts of the matrix, w\o constructing it completely in the first place
speed-ups

Future extensions:

Can we extend this to N-Level?

Problem

Currently we offer a Matrix.normalized() function, which returns Matrix * Diag such that this matrices columns are normalized. This yields certain drawbacks:

There is no obvious way to tell, that the columns of the matrix are normalized. So the name is a little bit to abiguous.
We need to differentiate between column and row normalization, since both features are needed in various contexts.

Solution

Implement Matrix.colNormalized, which returns Matrix * Diag and Matrix.rowNormalized, which returns Diag * Matrix.

Profit

The Product class allows nifty normalization
...

Rename numN and numM Properties

We should rename the basically nonsensical properties numN and numM to something more selfexplanatory, like numRows and numCols.

returned vector shape of getCol() and getRow() inconsistent

The vector orientation of the getCol() and getRow() return vector shapes ((N, 1) or (1, N)) is inconsistent and causes trouble when fed directly into .forward()

MWP

>>> from fastmat import *
>>> from numpy import *
>>> import scipy.sparse as sps
>>> matS = Sparse(sps.dia_matrix((arange(10), 0), (10, 10)))
>>> matP = matS * matS
>>> matP.getRow(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "fastmat/Matrix.pyx", line 338, in fastmat.Matrix.Matrix.getRow
  File "fastmat/Product.pyx", line 160, in fastmat.Product.Product._getRow
  File "fastmat/Matrix.pyx", line 1006, in fastmat.Matrix.Matrix.backward
  File "fastmat/Matrix.pyx", line 1032, in fastmat.Matrix.Matrix.backward
ValueError: Dimension mismatch (10, 10).H <-!-> (1, 10)
>>> matP.getCol(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "fastmat/Matrix.pyx", line 296, in fastmat.Matrix.Matrix.getCol
  File "fastmat/Product.pyx", line 140, in fastmat.Product.Product._getCol
  File "fastmat/Matrix.pyx", line 913, in fastmat.Matrix.Matrix.forward
  File "fastmat/Matrix.pyx", line 937, in fastmat.Matrix.Matrix.forward
ValueError: Dimension mismatch (10, 10) <-!-> (1, 10)

Known classes failing
Sparse
Product
not tested for further violations in other classes

What happens and what goes wrong
Product._getCol(idx) calls X.getCol(idx)where X is the rightmost product term in the product. It then repeatedly iterates X = Y.forward(X) for all the remaining terms in reverse order. This assumes X.getCol(idx) to return a (N, 1) or (N, ) shaped vector. In this instance the implementation of Sparse._getCol(idx) and Sparse._getRow(idx) yields a vector of shape (1, N) or (N, 1), depending on the type of the embedded scipy sparse matrix in `Sparse.

What should be different
All Matrix._getCol(idx) and Matrix._getRow(idx) routines shall yield a vector of (N, ) shape and the class tests for getColand getRow need to enforce this constraint. This issue partly arises from the test scripts currently not being able to detect cases where the vector shape deflects from this.

Calibration against multiple datatypes

Currently, calibration data is only generated for one data type. Extension to multiple data types should be straightforward.

Coverage testing for Cython

Maybe this might work: http://blog.behnel.de/posts/coverage-analysis-for-cython-modules.html

Extension and Renaming of largestSV and largestEV

We should rename the existing properties to largest{Singular,Eigen}Value, since {S,E}V could mean bot singular vector or singular value.
We also calculate the {singular,eigen} vectors and cache these like we do with the values already. This poses no additional computational overhead, since during the calculations of the {singular,eigen} values, these vectors are generated anyway.

Merge(Circulant, MLCirulant) and Merge(Toeplitz, MLToeplitz)

As both effectively already share most of its code there is no reason not to combine them as from a user's perspective why should it be different to instantiate a Circulant matrix with a ndarray and get something with multiple levels?

Add int16 datatype

Currently, we support the data types int8, int32, int64, float32, float64, complex64 and complex128.

Many matrices that occur regularly have low quantization (int8 often), but are relatively large. This requires jumping to int32 then, which itself would extend to float64 when combined with a single float32 number, or matrix along the path. Often observed: A Fourier transform would then default go to double for a not-so-obvious reason (low dimensionality, float32 input). If we had an int16 smaller int8 matrices could be processed using an accumulator that could promote to float32.

Long story short, there are some advantages:

type promotion produces more compact memory representations with lesser expensive operations
less memory, faster execution for data sets that are medium-sized but not that much that int32 would be required

On the downside there are (1) more type combinations to be tested. But I would think we could get rid of a lot of tests anyway by thinking sharply about them and (2) compile time of core/cmath could rise slightly (also as there are more combinations then). But that shouldn't be too hard.

Generally speaking, I'm in favour of this and would expect large benefits for common signal processing applications involving quantized data and models, e.g. evaluation of measurement data.

Fourier complexity estimates differ for init()-level decision on whether to apply Bluestein

Take the complexity estimate from #37 and also apply it during the initial decision on whether to use Bluestein (during __init__())

Add contribution guidelines

As github suggests it would be a very nice thing to introduce contribution guidelines and a code of conduct to (1) welcome new users to participate in the project and (2) to define some rough rules for living and extending fastmat.

Do you have suggestions on a particular guideline, what you want to see or what you don't?

Code of Conduct template:
https://github.com/nayafia/contributing-template/blob/master/CONTRIBUTING-template.md

Example guideline:
https://github.com/opengovernment/opengovernment/blob/master/CONTRIBUTING.md

I'd like to suggest creating example documents in a separate branch and have contributions collected there. Please feel free to propose other ideas on how to proceed

Storing and loading fastmat matrices

Feature request

Problem

Fastmat offers a nice set of features for efficiently dealing with structured and sparse and whatever matrices. Now, some users might create pretty advanced matrices which take time to compute, using the several fastmat classes as containers to allow fast products. Storing these for later use (to disk) is not straight forward.

as Cython is used we cannot pickle them (as @ChristophWWagner mentioned)
as we lose all the structure and benefits of fastmat there is no point in going back to numpy and using their store functions
many fastmat matrix types exploit a typical structure and define the matrices only by a subset of values compared to the full matrix. Keeping these mechanisms is also beneficial when storing such things to disk

Solution

I did some research on the topic but got no thorough solution yet.

First idea: Make fastmat Pickle-able

As I'm a Python-newby I was also new to pickle. I learned that pickles allows pretty convenient serialization of python objects for e.g. file IO. I also made up a small example which was pretty convenient to implement. Consider some class like this:

import fastmat

class SomeBlockMatrix(fastmat.Matrix):
    def __init__(self, items):
        # *items is a list of matrices (fastmat, numpy, scipy sparse) 
        # that is accessed for calculation of products
        self._items = items
        [...]

now we use this as:

import numpy
import scipy

a = numpy.random.randn(10, 20)
b = scipy.sparse.rand(10, 20)

A = SomeBlockMatrix([a, b, a])
B = SomeBlockMatrix([A, A])

But how to store it to disk? To do so, we have to tell pickle how to pickle, which means, that we have to provide a __reduce__() function for SomeBlockMatrix. This function returns the name of the class, s.t. pickle can instantiate a new object of that class upon loading. Furthermore, it returns a tuple of arguments that are passed to the constructor of the class, s.t. an object of the same content is initialized by pickle

class SomeBlockMatrix(fastmat.Matrix):
    def __init__(self, items):
        # *items is a list of matrices (fastmat, numpy, scipy sparse) 
        # that is accessed for calculation of products
        self._items = items
        [...]

    # tell pickle how to pickle
    def __reduce__(self):
        # first argument: class
        # second argument: tuple of stuff required by constructor
        # reference: 
        # https://stackoverflow.com/questions/19855156/whats-the-exact-usage-of-reduce-in-pickler
        return (self.__class__, (self._items))

This pretty much did it, we can now write and load this to disk, hence, every item is pickable itself:

import pickle
filename = 'test.mat'

# store to disk
with open(filename, 'wb') as f:
    pickle.dump([A, B], f)

# load from disk
with open(filename, 'rb') as f:
    C, D = pickle.load(f)

# with C == A and D == B

Note

When I tried to pickle some Cython-stuff like fastmat matrices which have no pickle interface yet I always run into Seg-Faults. There was no warning message as it will occur for pure Python stuff.

Dill instead of Pickle

https://pypi.python.org/pypi/dill

dill extends python’s pickle module

I got some IOErrors when I did call my pickling function to save a file from a different module than the load function was residing at. The corresponding module was not found. There are some hints, e.g. in the discussion of https://stackoverflow.com/questions/2121874/python-pickling-after-changing-a-modules-directory,
that this might not be the case with dill, as this directly serializes the objects. Not tested by me though

Further reads

Numpy is much faster at storing/loading matrices than pickle:
https://github.com/mverleg/array_storage_benchmark

Security issues of pickle:
https://www.synopsys.com/blogs/software-security/python-pickling/

More on Dill vs. pickle:
https://stackoverflow.com/questions/33968685/pickle-yet-another-importerror-no-module-named-my-module

Harsh cornercases with pickles on Linux, unpickling on Windows
uqfoundation/dill#218

Get rid of old documentation code

After the switch to sphynx there is no need for the getDocumentation() methods and the stuff in inspect to handle it anymore. So let's get rid of it and slim down our codebase remove some leftovers from our code trunk.

Bug in Fourier: wrong complexity estimates

When Fourier optimization decides to apply Bluestein's method the class instance incorrectly reports the complexity estimate for the (apparently worse conditioned) baseline fourier order. For larger sizes this can lead to the undesired effect, that a dense matrix multiplication is enforced (which should not be neccessary). Also, there is a mismatch in complexity estimate and actual computation, which is reflected in bad calibration estimates.

Suggested fix: Add a complexity estimation for the actual FFT complexity.

Pinpoint to new release

Update gh-pages
Update fastmatDoc -> link to release and gh-pages

Error in type propagation of Product with chain scalar-matrix products

When chaining multiple matrix-scalar products (i.e. when using the mul operator override on multiple matrices) the scalar datatype might not get propagated properly such that the resulting matrix datatype is insufficient to hold the multiplication products.

This can currently be avoided by consolidating matrices and scalars first using parantheses.
However, the problem lies in THAT line

fastmat/fastmat/Product.pyx

Line 84 in 66d9055

self._scalar = self._scalar * factor._scalar

in the context when a fastmat matrix class gets multiplied with a fastmat product containing a scalar factor. Then the scalar value of the product gets correctly represented but the type is not propagated. Therefore the resulting type does not represent the necessary scalar type promotion.

Example:
Consider a product of a complex-valued scalar S and two real-valued matrices A and B.
S * A * B results in a Product of real dtype whereas S * (A * B) results in a Product of complex dtype.

Proposed solution:
move the line

fastmat/fastmat/Product.pyx

Line 90 in 66d9055

__promoteType(factor.dtype)

outside the if-clause to make sure it gets executed for Product classes as well (what it does not as it is right now).

[for further discussion: written by @ChristophWWagner]

Remove matplotlib from the package dependencies

Thanks to the addition of travis CI from @mp4096 an package dependency issue was found:

As matplotlib is only used for plotting in some demos and the bee-utility it is not an actual requirement of the package. Suggesting the replacement of import matplotlib by:

try:
    import matplotlib
except ImportError:
    raise ImportError("matplotlib is not installed but plotting was requested.")

Any comments on this?

Rework calibration and performance plots of util/bee

These are a mess, currently.

Add a input data type promotion option to fastmat Matrices

Extends the scope of #40.

If we already are at it, let's add another option that promotes the input arrays type to fulfill some minimal requirement. In conjunction with a consistent roll-out of **options kwargs to all __init__() functions this offers the user to widen data types where needed in a meta instance whenever an overflow needs to be avoided, accuracy maintained or for lean matrix representation just to be preserved.

As an example think of an int8 matrix class you really don't want to cast to a larger data type (for that it is huge or just out of principle as int8 is enough). Now if you apply its forward transform to another int8 ndarray shit's about to hit the fan if the matrix is large enough to cause accumulation overflows. If something like this could occur it would be nice if the user could just specify an minType=int32 option and be safe.

Any suggestions?

Use numpy svd as fallback, if power iteration to compute the largest singular value fails

Problem

The power iteration to compute the largest Singular Value sometimes fails, if the matrix is badly conditioned leading to two "largest" singular values that have approximately the same value.

Enhancement/Discussion

We could think about using numpy.linalg.svd in that case as a fallback. Obviously, this is not a proper solution for large(r) problem sizes.

Reduce the number of tests

Currently, many tests are performed which happens to test some parts of the code multiple times. This is good in a sense that we do not take the actual class implementation into account but rather test for as many corner cases as possible. This however slows down testing time.

How could we find a middle path to this?

Add hard and soft thresholding to cmath

Could we gain some speed from that? Since we are now starting to work on some large arrays, we might be able to squeeze some seconds out of these operations, eh?

Add colored text capability for windows

See inspect.common.fmtStr:

Currently the function only supports colors for linux systems supporting ASCII-color-escape-codes. For the readability of test output it would be wise to also support colored text under windows.

No idea yet though on how to implement this. Do you?

Rework Algorithm structure

Currently, all matrices in fastmat are objects which allows great benefits in handling them, i.e. data storage, nesting instances and caching intermediate computation results. However, the algorithms are still implemented as a function and to every algorithm there exists a dulli-class to manifest the algorithms to the test system. Aside from this ugly inconsistency it is highly desireable to refactor all algorithms to a proper class structure as this would then enable addressing some long-wanted items on our wish list:

Peek onto intermediate results and variables during execution (inspect)
Keep track of an algorithms' state while still following the KISS principle
Log this state efficiently during execution to allow inspection of an algorithms behaviour (profile)
Invoke arbitrary user code during the algorithm execution (callback)
Store intermediate computation results, e.g. during algorithm startup (caching)
Allow mutations by introducing code segmentation with overload support

Some thoughts on the implementation:

The Algorithm baseclass
Any algorithm will be inherited from a baseclass which implements a general interface for common tasks (processing data, inspection, logging, callback handling, ...). This is quite similar to the Matrix baseclass concept, however it makes sense not to force immutable objects here. An algorithm is defined by its implementation, which will be represented by an individual class for every algorithm that specifies the implementation while following the baseclass' paradigms on interfaces and general behaviour. In order to actually use an algorithm a set of parameters and resources are required. These must be given upon instantiation of the base class object. Then it is possible to process chunks of data by calling the .process(data)method of the actual algorithm. It is highly desireable that the algorithm instance is mutable to allow flexible processing with changing parameters

Parameters and variables
A parameter controls the behaviour of an algorithm, a variable describes its (internal) state. During the execution many variables are needed, which are in the current implementation not accessible at all. That's bad! One way to get around that limitation is to enforce all variables to be attributes of the algorithms instance. It is promising to also implement generic __setattr__() / __getattr__() handlers to actually store anything the algorithm instance gets assigned to, e.g. by an internal variable dictionary. Handling it this way allows achieving full performance by defining critical parameters or variables as typed members in a cython class header definition while keeping full flexibility with the (slower) dictionary-based variable space.

States
A state can be defined as the complete set of variables and parameters in an Algorithm instance. In this sense logging could be achieved by just copying the algorithm instance object at an arbitrary point in time (snapshotting). This could be encapsulated by a .snapshot()routine in the base class. An additional .trace flag could be used in conjunction with an (baseclass-) internal mechanism that automatically adds a state snapshow to a list, effectively generating a trace log. However, logging must actively be invoked via a callback from the algorithm implementation.

Callbacks
A callback is a class method that can be overloaded by the user to get active signaling from the algorithm implementation. In order to keep the interface lean and allow for static typing a callback has precisely one argument, being a Algorithm (baseclass) instance. During a call the algorithm implementation, which is usually the callback origin, the algorithm instance will be passed to enable the callee to extract or modify information as required.

Please discuss this proposal and add your ideas.

Partial sanity check on index type required

Partial should complain during __init__() if indexing arrays of non-integer data type are passed.

Adjust transform test for Partial class

Sometimes Partial tests fail due to a single unlucky selection of parameters that causes a higher numerical error than is allowed by the testing threshold. The solution should be somewhere in the selection of the Matrix to be partialled and proper selection of the indices.

Factories for Inverse and Pseudoinverse

It would be nice to finally have multiplication with the inverse matrix or the more general moore penrose inverse for rank deficient matrices at hand.

After putting some thoughts into it and how to address it and after thinking about adding a property to the matrix, I think we should proceed similar as in the Hermitian, conjugate and transpose case and add a factory, to provide A.inverse and A.pseudoInverse. For A.inverse the forward transform is simply solving a system of linear equations, using the now accessible methods from scipy with the linearOperator property. The backward transform is solving the system of equations using A.H, since A.H.inverse = A.inverse.H and in the real valued case A.T.inverse = A.inverse.T. Of course, we here have the identity A.inverse.inverse = A. The scipy method used here, would be bicgstab. It even allows to make use of a preconditioning matrix, which in some cases might be easy for us to provide, since we know the structure of the matrices.

In case of the moore penrose inverse, we don't solve a system of equations, but rather do what is suggested here. Essentially solving a least squares problem, which is also provided by scipy using their LinearOperator. For the backward transform, we can do the same thing as in the case of A.inverse, so A.pseudoInverse.H = A.H.pseudoInverse. Here we can use lsmr form the scipy side.

This would extend our use cases tremendously and also fits our philosophy well, since it makes building up inverses of matrices very easy and neatly integrated with the rest of the package.

I have added it to the 0.3 milestone, since it will require a huge amount of testing and polishing before it can be let out in the wild.

Move documentation to Sphinx

Sphinx

Sphinx is a configurable parser of python source code, which can automatically extract docstrings from modules, submodules and so forth. It generates *.rst-files, which then can be built into html or LaTeX for deployment.

Goals

Get rid of the cancerous appendices to the current fastmat-classes which forms the documentation and incorporate this content into the class documentation itself
Add docstrings, which are readable via help() from the python shell to every module, submodule and avaiable function in fastmat
Include and show performance plots for each class in its respective documentation
Automatic deployment of the documentation to fastmat.readthedocs.io

Resources

Document inspect modules

Fix the demos (and add a page to the documentation)

Somewhere along the tracks of milestone 0.2 something seems to have changed. Anyway, the demos seem not to find that amusing at all. Let's convince them!

Mutlidimensional Fourier Transforms

Problem

import fastmat as fm
import numpy as np
import timeit

def A():
    K * x

def B():
    np.fft.fftn(x.reshape(l*(n,))).reshape(n ** l)

l = 2
n = 4096
t = 10

K = fm.Kron(*(l * [fm.Fourier(n)]))

x = np.random.randn(K.shape[1])

print(timeit.timeit(A,number=t)/t)
print(timeit.timeit(B,number=t)/t)

The first print statement displays a much higher number than the second one. Both calculations actually do the same calculation, but the internal numpy routine is more efficient.

Solution

So, what we need is a Fourier-Class, which realizes a d-dimensional Fourier Transform. The constructor should look like
Fd = fm.dFourier(n1, n2, ..., nd)
and internally for each dimension the Bluestein-Algorithm should be employed individually, as we currently do in the MLCirculant and MLToeplitz cases.

Benefits

The implementations of our multilevel convolutionary transforms will get a lot more leaner, since all the optimizations for the Fouriertransform are then handled by the new class. Additionally we should get a decent speed boost for their forward and backward projections.

Not using calibration of Matrix class causes full arrays to be allocated always

During initialization of a class instance allowBypass is always set to true, even if no calibration data for either the instances' baseclass or the Matrix class (the class to bypass with) persists. This must be fixed either by checking for existance of all neccessary calibration data or by verifying sane values for the computation models during Matrix.forward() and Matrix.backward()

Hadamard returns an array of shape (0,0) for too large orders

Problem

When doing something (stupid) like
someMatrix = fm.Hadamard(reallylargeNumber)
someMatrix will be an array of shape (0,0) if 2**reallyLargeNumber > 2**N-2
where N is the width of your architecture (e.g. 64 or 32)

Make _getNormalized great again!

For some transforms, i.e. MLCirculant, MLToeplitz and MLUltraSound/BlkTwoLvlToep the _getNormalized function is highly inefficient, since each time a product is spawned instead of recreating the matrices with new defining elements.

possible issues with recreating the matrices is that we might get two explicit copies of the same transform only with scaled defining elements. might be problematic when working with really large transforms such that storing the defining elements actually becomes an issue.

Class Diag wrongly named Diagonal in class example

Remove depreated numN and numM properties

fastmat and Spyder's "User Module Reloader"

When I do something like

import fastmat as fm
HadamardMatrix = fm.Hadamard(10)

and run this in an IPython console, it will run smoothly the first time. Starting from the second execution (in the same console), I get

  File "/path/.local/lib/python2.7/site-packages/fastmat/core/calibration.py", line 68, in getMatrixCalibration
    return calData.get(target, None)

AttributeError: 'NoneType' object has no attribute 'get'

each time fastmat is used in the code.

fastmat seems to have problems with Spyder's User Module Reloader (UMR) feature causing the warnings. Simply excluding it from the UMR works so far.

Include current version when building docs

The caption of the generated HTML documentation always reads 'fastmat 1.0'. Instead the current version tag should be read from /fastmat/version.py (this file gets maintained by setup.py and always reflects the package version)

We should fix this as fast as possible as it is bad when we can't distinguish different versions of the documentation

Refactor console size detection for cross-platform compatibility

The current implemented method only supports unix systems with a potentially unsafe system call to stty. This should be fixed by platform-specific OS or API calls replacing inspect.common.getConsoleSize(). This then in turns leads to improved security and improved platform compatibility (and less codacy issues)

Feature proposal: slicing of matrices in element access

Include the functionality of the Partial() class into Matrix.__getitem__

Basically a submatrix may be returned by passing slices with getitem():

intended usage examples:

>>> F = Fourier(4)
>>> F[1,1]
(0. -1.j)
>>> F[:, [2,3]]
<Partial[4x2](Fourier[4x4]):0x1234567ABCD>
>>> F[:2, 2:]
<Partial[2x2](Fourier[4x4]):0x1234567BCDE>
>>> F[:2, 2:].array
array([[ 1.+0.j,  1.+0.j],
        -1.+0.j,  0.+1.j]])

Get rid of np.exp(...) to generate Bluestein Fourier Matrix

I've determined that np.exp(...) might not have a good performance for large matrices. Maybe there's another way to create the Fourier Matrix in case of the Bluestein Fourier Transform.

Partial test often fails

Tests for Partial transforms often fail on the given accuracy threshold. A possible cause could be that the full matrix gets pruned in a way that reduces its largest element. This then prohibits correct setting of the accuracy threshold, resulting in a too-narrow error boundary.

Multidimensional Transforms

ChristophWWagner proposed:

As we now have multiple multi-level classes (Kron, MLCirculant) and soon have some more (MLToeplitz) we could also use some helpers for folding / unfolding multi-dimensional data structures. Architecture-wise we could introduce a Superclass for all multi-level classes which provides an additional multi-level interface to its ancestors (and only to them). I was thinking of a structure like:

class MultiLevel(Matrix):
    def __init__():
        raise RuntimeError("Direct Instantiation of MultiLevel class is not allowed.")

    def multiLevelForward(...):
        ...
    def multiLevelBackward(...):
        ...

# [another file]
class Kron(MultiLevel):
    ...

# [another file]
class MLCirculant(MultiLevel):
    ....

The interface could not only add multi-dimensional interfaces to forward / backward but also multidimensional element access, slicing, ...

Any comments on that?

Extend performance graphs

The performance graphs for "forward projection" should contain a second (and maybe third) plot for x10 (and x100 respectively) vectors. To be explicit:
Current implementation is numpy vs. fastmat (both with one data vector per transform).
Proposed variant is: (numpy x1 vs. fastmat x1 va. fastmat x10 (vs. fastmat x100))

@SebastianSemper Incorporate this into the tex extraction scheme?

Add additional test target for yet uncovered code

Coveralls reveals some sections that are not covered yet. Add a test target that explicitly targets these routines in specifically constructed test cases to ensure these behave as intended.

Also check that raise conditions are hit when triggered.

Automatic benchmark plot generation in matplotlib

Problem

For the migration to Sphinx we need means of automatically generating plots based on data from CSV-files. Moreover it is useful for inspection during development to compare differing implementation with respect to speed, memory consumption and scaling. For this the detour over *TeX is very inconvenient and uncommon.

Requirements

independence from *TeX
few dependencies python-wise in terms of packages
flexibility in sources and destinations
automatic choice of the correct plot layout for the given benchmark data
multiple output formats (PDF, PNG, TikZ, ...)

Solution

We extend util/bee.py with a plot mode. This mode should require one input. First, we specify the path of a folder, where the CSV-files are located. If the user specifies a second path, the PDF-files containing the plots will be saved to that destination. Moreover the user should be allowed to specify one or multiple formats, for the plot output. s possible call then could look like

python util/bee.py plot -i bencharks/ -o doc/_static/plots -f png pdf

The plot generator should be able to automatically detect from the filename of the CSV-file, how the data should be plotted by making sane choices for axis scaling, naming and data selection.

Add wheel generation for multiple systems

Configure cibuildwheel with fastmat to enable automated wheel generation for all new tagged versions.

Transition to ScipyLinearOperator

As it seems SciPy offers a huge amount of algorithms that only make use of forward and backward multiplications. With our easy translation from a fastmat Matrix object to a SciPy LinearOperator object, we can access this plethora of algorithms.

https://docs.scipy.org/doc/scipy/reference/sparse.linalg.html

I think we should get rid of all algorithms in fastmat that have a duplicate in SciPy, since these are are exposed to way more testing and also we have less duplication in the Python ecosystem.

The question now is, whether we should keep properties, like largestSV or largestEV, and fill them by calling the appropriate SciPy operations, thus introducing some kind of convenience in using these SciPy routines, or do we leave the user out in the wild and manage all of this stuff on his own?

In the first case, we would have to rewrite some of the routines and also expose some more flexibility, since SciPy for instance allows to calculate several singular and eigen values at once. But, we could also add a lot more, like the calculation of the k principal eigen vectors or singular vectors.

In the latter case we make fastmat itself a littler lighter and we shift the focus back to our original goal in providing fast linear transforms. But we would also loose the possibility to provide cloded form solutions to singular values and eigenvalues for example in the case of Fourier matrices or Circulant ones.

Thoughts?

Expose options interface to all classes

Finally there was a case where fiddling around with Matrix baseclass' internal options to achieve something a user actually dreams of. Explicitly, forcing element contiguousy and ordering for input arrays prior to processing. Internally controlled by the flags _forceInputAlignment and _useFortranStyle.

Exposed to the outer world these can be controlled via the _initProperties arguments forceInputAlignment=True and fortranStyle=True. Unfortunately not all classes currently forward arbitrary **options of __init__() to _initProperties, yet alone offers **options in the first place. This must change for consistency reasons alone

Technically a user could actually just call _initProperties, supply the fixed arguments from the object itself and attach some new options but this is ugly. Very ugly. And on top of that any method starting with an _ should never be called by a user in the first place.

So this needs to change :)

ems-tu-ilmenau / fastmat Goto Github PK

fastmat's People

Contributors

Stargazers

Watchers

Forkers

fastmat's Issues

Changes to be made

Expected gains

Future extensions:

Problem

Solution

Profit

Feature request

Problem

Solution

First idea: Make fastmat Pickle-able

Note

Dill instead of Pickle

Further reads

Problem

Enhancement/Discussion

Sphinx

Goals

Resources

Problem

Solution

Benefits

Problem

Problem

Requirements

Solution

Recommend Projects

Recommend Topics

Recommend Org