Giter VIP home page Giter VIP logo

larray's People

Contributors

alixdamman avatar gbryon avatar gdementen avatar jehanneman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

larray's Issues

ptp is broken

TypeError: ptp() got an unexpected keyword argument 'keepdims'

sparse array support

by either

  • using a pd.MultiIndex
  • storing a pd.Dataframe in memory instead of np.ndarray (mostly done in pandasbased3 branch)
  • implementing our own MultiIndex-like object

refactor viewer Model to include the concept of axes names

The goals are to cleanup the current code.
It would probably help to store/support LArrays directly in the model, instead of converting to np.ndarray, but I don't want the whole model to require the use of LArrays (because in that case we will not be able to send the code back to upstream Spyder). One option is to make a generic model and have a specific LArray model which would inherit from it. The goal would be to have as much functionality as possible/reasonable available in the generic model (ie plot, , copy & paste, filter -- but not by labels obviously). Unsure if that is reasonable though :)
One clear requirement is to keep the ability to view non-LArrays (np.ndarray, lists, tuples). The easiest way for this would be to convert them to LArray in the init of the Model, but I would rather avoid that for the above reason.
Another point to keep in mind is that it should be capable to handle Pandas Dataframes in the future without too much change.

generalize/extend Session to be more LArray-like

It should be as close as possible to an LArray with an "array" axis.

# sum the age axis of all arrays *iff* they have such an axis
s.sum(x.age)
# sum all axes of each array present in the session
s.sum_by(x.array)

to_clipboard is broken

The bug is in Pandas and/or in pyperclip (used by Pandas).
I suspect it is fixed in upstream pyperclip but not in the copy included in Pandas. So it might only needs a PR to pandas which simply updates pyperclip to the latest version.

split core.py

  • at a minimum all Axis related stuff should go to an "axis" module
    Axis, AxisCollection
  • potential other modules:
    group
    expr

Implement a better syntax for initializing an array with "constant" values

eg. provide an alternative to:

>>> nat = Axis('nat', ['BE', 'FO'])
>>> sex = Axis('sex', ['M', 'F'])
>>> LArray([[0, 1], [2, 3]], [nat, sex])
nat\sex | M | F
     BE | 0 | 1
     FO | 2 | 3

because it is so error prone.

For a 1d array, stack works nicely, but for 2+, it quickly gets awful.

>>> stack([('M', 0), ('F', 1)], 'sex')

implementing a from_lists function would probably solve this nicely (though a better name might help):

>>> from_lists([['nat\sex', 'M', 'F'],
...             ['BE',        0,   1],
...             ['FO',        2,   3]])
nat\sex | M | F
     BE | 0 | 1
     FO | 2 | 3

cleanup unit tests

we should rewrite most LArray unit tests using small-ish arrays created using ndtest() instead of the current demo-related examples.

add XXX_by aggregate methods

eg a.sum_by(x.age)

which should be equivalent to

a.sum(a.axes - x.age)

(which does not work, because aggregate functions do not support an AxisCollection argument)

but this works:

a.sum(*(a.axes - x.age))

generalize stack to more than 1 dimension

Here are a few syntax experiments (but see also #30):

# 2D
stack([(('BE', 'M'), 1.0), (('BE', 'F'), 0.0),
       (('FO', 'M'), 1.0), (('FO', 'F'), 0.0)], ('nat', 'sex'))

# 3D
# a) flat list, label tuple
stack([(('BE', 1, 'M'), 1.0), (('BE', 1, 'F'), 0.0),
       (('BE', 2, 'M'), 1.0), (('BE', 2, 'F'), 0.0),
       (('BE', 3, 'M'), 1.0), (('BE', 3, 'F'), 0.0),
       (('FO', 1, 'M'), 1.0), (('FO', 1, 'F'), 0.0),
       (('FO', 2, 'M'), 1.0), (('FO', 2, 'F'), 0.0),
       (('FO', 3, 'M'), 1.0), (('FO', 3, 'F'), 0.0)],
      ('nat', 'type', 'sex'))

# b) recursive structure
stack([('BE', [(1, [('M', 1.0), ('F', 0.0)]),
               (2, [('M', 1.0), ('F', 0.0)]),
               (3, [('M', 1.0), ('F', 0.0)])]),
       ('FO', [(1, [('M', 1.0), ('F', 0.0)]),
               (2, [('M', 1.0), ('F', 0.0)]),
               (3, [('M', 1.0), ('F', 0.0)])])],
       ('nat', 'type', 'sex'))

implement Axis.set[] to create LSet directly

axis.set[a, b]

should be equivalent to:

axis[a, b].set()

The goal is mostly that the __repr__ in #44 actually works, so one option might be to simply change LSet __repr__ to:

axis[a, b].set()

But the .set[] syntax would also be more efficient, so...

implement replace_axis

to complement with_axes, we need a way to replace only one axis (or a few axes). The goal is to have something nicer than:

a.with_axes(a.axes.replace(x.products, industries))

e.g:

a.replace_axis(x.products, industries)
# or
a.with_axes(products=industries)

fix read_excel for sparse files

It works when using engine='xlrd' but since the default engine changed to 'xlwings' it does not work.
To fix this nicely would need a lot of work: support for sparse arrays (#28) and reindex (on sparse index), however we could use a temporary shortcut: read the data as (or convert it to) a pd.dataframe, reindex, convert back to larray. Far from optimal but much easier to implement.

explore implementing set operations on Axis and Group

setdiff1d (numpy) -- works
delete (numpy) -- works for int, slice or list of indices
list.remove (python) -- works for value (inplace)
list.pop (python) -- index

since LArray is more like numpy arrays than Python lists => not remove and pop
=> delete and idelete?
does the label version (delete?) returns only unique? ie set-like op?

union
intersect
setdiff
setxor
setin

possibly on LArray too (though 1d/flattened only in that case like numpy -- because otherwise that returns non cubic arrays).

position or index

We should pick one of the two terms and stick with it. Currently we use both (.i and .ipoints but PGroup and posarg*). We should either have:
.p[], .ppoints[], PGroup and posarg*
or
.i[], .ipoints[], IGroup and iarg* (or indarg*)

add unit tests for Excel I/O

This is a very important part and it is not tested at all currently, and I manage to break it every other release.

implement some way to escape special characters in labels

list of characters/patterns with special meaning:

  • current: , ; : .. [ ] >> name[] name.i[] {} numbers
  • whitespace could be considered a special character (because it is not kept as is) and we might want to make it "escapable"
  • planned (for automatic patterns): * ?
  • potential (for logic operators): | & !

we might want to reserve some or all other special characters just in case: # @ % / = + -
Or, we could define the precise list of characters a label can be made of which we can guarantee will not be interpreted.

provide larrayenv package

which depends on larray and all optional larray dependencies so that our users only need to do:

conda update larrayenv

and be sure to have all the functionalities installed.

implement .extend on Axis

... it should accept a list of labels or an Axis and return a new Axis object (not modify the Axis in-place)

implement .reindex_axis

>>> arr = ndtest((2, 3))
>>> arr.reindex_axis(x.a, ['a1', 'a0', 'a1', 'a3'])
a\\b |  b0 |  b1 |  b2
  a1 |   3 |   4 |   5
  a0 |   0 |   1 |   2
  a1 |   3 |   4 |   5
  a3 | nan | nan | nan

It might be easy to implement using something vaguely looking like:

>>> new_axis = Axis(old_axis.name, new_labels)
>>> missing_value = missing[dtype]
>>> old_indices = old_axis.translate(new_labels, missing=-1)
>>> result = self.i[old_indices]
>>> # fix up those which were missing
>>> result[old_indices == -1] = missing_value

split unit tests

at a minimum, move Axis, AxisCollection and LGroup tests out of test_la.py

Window functions API

Rolling, expanding, ...

See
http://pandas.pydata.org/pandas-docs/stable/computation.html#window-functions
http://xarray.pydata.org/en/stable/generated/xarray.DataArray.rolling.html#xarray.DataArray.rolling

For numpy:
https://gist.github.com/seberg/3866040
http://www.rigtorp.se/2011/01/01/rolling-statistics-numpy.html

Bottleneck also supports move_*
https://pypi.python.org/pypi/Bottleneck
there are no built-in move functions in numpy, so it compares against its own implementation:
https://github.com/kwgoodman/bottleneck/blob/master/bottleneck/slow/move.py

But it seems like Pandas works well even with numpy arrays, so I guess I shouldn't bother and simply use Pandas which has a lot more features than all the other solutions anyway.

http://stackoverflow.com/a/30141358/288162

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.