larray-project / larray Goto Github PK
View Code? Open in Web Editor NEWN-dimensional labelled arrays in Python
Home Page: https://larray.readthedocs.io/
License: GNU General Public License v3.0
N-dimensional labelled arrays in Python
Home Page: https://larray.readthedocs.io/
License: GNU General Public License v3.0
TypeError: ptp() got an unexpected keyword argument 'keepdims'
Namely to raise a more meaningful error when the totals are swapped.
by either
equivalent to:
view(Session('filepath'))
The goals are to cleanup the current code.
It would probably help to store/support LArrays directly in the model, instead of converting to np.ndarray, but I don't want the whole model to require the use of LArrays (because in that case we will not be able to send the code back to upstream Spyder). One option is to make a generic model and have a specific LArray model which would inherit from it. The goal would be to have as much functionality as possible/reasonable available in the generic model (ie plot, , copy & paste, filter -- but not by labels obviously). Unsure if that is reasonable though :)
One clear requirement is to keep the ability to view non-LArrays (np.ndarray, lists, tuples). The easiest way for this would be to convert them to LArray in the init of the Model, but I would rather avoid that for the above reason.
Another point to keep in mind is that it should be capable to handle Pandas Dataframes in the future without too much change.
mostly for output/reporting
short/long
language
...
It should be as close as possible to an LArray with an "array" axis.
# sum the age axis of all arrays *iff* they have such an axis
s.sum(x.age)
# sum all axes of each array present in the session
s.sum_by(x.array)
I.e. fill missing data points after non-missing data points.
See:
http://stackoverflow.com/questions/22491628/extrapolate-values-in-pandas-dataframe/35959909#35959909
Pandas supports interpolation natively (ie fill missing data points between non-missing data points).
http://pandas.pydata.org/pandas-docs/stable/missing_data.html#interpolation
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.interpolate.html#pandas-dataframe-interpolate
e.g. pycharm_minicourse.s_pop (when qx is not clipped)
s_pop = proj_pop.sum_by(x.time)[2040:]
should allow to create groups without axis (ie relying on guess axis).
eg.
G[2:7]
as a followup to #12, we also need tests for open_excel stuff.
The bug is in Pandas and/or in pyperclip (used by Pandas).
I suspect it is fixed in upstream pyperclip but not in the copy included in Pandas. So it might only needs a PR to pandas which simply updates pyperclip to the latest version.
so that we can register the extension(s) with the viewer/editor/future IDE
e.g. .lacsv .lah5 ou .lcsv et .lh5?
Replace *args and **kwargs by equivalent arguments of Numpy functions.
eg. provide an alternative to:
>>> nat = Axis('nat', ['BE', 'FO'])
>>> sex = Axis('sex', ['M', 'F'])
>>> LArray([[0, 1], [2, 3]], [nat, sex])
nat\sex | M | F
BE | 0 | 1
FO | 2 | 3
because it is so error prone.
For a 1d array, stack works nicely, but for 2+, it quickly gets awful.
>>> stack([('M', 0), ('F', 1)], 'sex')
implementing a from_lists function would probably solve this nicely (though a better name might help):
>>> from_lists([['nat\sex', 'M', 'F'],
... ['BE', 0, 1],
... ['FO', 2, 3]])
nat\sex | M | F
BE | 0 | 1
FO | 2 | 3
via specialized methods (union, difference (or setdiff) and intersection)
We might want to use a template for part of it.
we should rewrite most LArray unit tests using small-ish arrays created using ndtest() instead of the current demo-related examples.
eg a.sum_by(x.age)
which should be equivalent to
a.sum(a.axes - x.age)
(which does not work, because aggregate functions do not support an AxisCollection argument)
but this works:
a.sum(*(a.axes - x.age))
The objective is double: keep a trace of them and write them as we go during development so that making a release is not as painful.
Here are a few syntax experiments (but see also #30):
# 2D
stack([(('BE', 'M'), 1.0), (('BE', 'F'), 0.0),
(('FO', 'M'), 1.0), (('FO', 'F'), 0.0)], ('nat', 'sex'))
# 3D
# a) flat list, label tuple
stack([(('BE', 1, 'M'), 1.0), (('BE', 1, 'F'), 0.0),
(('BE', 2, 'M'), 1.0), (('BE', 2, 'F'), 0.0),
(('BE', 3, 'M'), 1.0), (('BE', 3, 'F'), 0.0),
(('FO', 1, 'M'), 1.0), (('FO', 1, 'F'), 0.0),
(('FO', 2, 'M'), 1.0), (('FO', 2, 'F'), 0.0),
(('FO', 3, 'M'), 1.0), (('FO', 3, 'F'), 0.0)],
('nat', 'type', 'sex'))
# b) recursive structure
stack([('BE', [(1, [('M', 1.0), ('F', 0.0)]),
(2, [('M', 1.0), ('F', 0.0)]),
(3, [('M', 1.0), ('F', 0.0)])]),
('FO', [(1, [('M', 1.0), ('F', 0.0)]),
(2, [('M', 1.0), ('F', 0.0)]),
(3, [('M', 1.0), ('F', 0.0)])])],
('nat', 'type', 'sex'))
the fact that it does not breaks ndtest title argument
Should add some tests also.
axis.set[a, b]
should be equivalent to:
axis[a, b].set()
The goal is mostly that the __repr__
in #44 actually works, so one option might be to simply change LSet __repr__
to:
axis[a, b].set()
But the .set[] syntax would also be more efficient, so...
Something like "Python for Econometrics" should be an inspiration. It's a bit messy (order of chapters seems weird to me), but it covers a lot of stuff.
https://www.kevinsheppard.com/images/0/09/Python_introduction.pdf
to complement with_axes, we need a way to replace only one axis (or a few axes). The goal is to have something nicer than:
a.with_axes(a.axes.replace(x.products, industries))
e.g:
a.replace_axis(x.products, industries)
# or
a.with_axes(products=industries)
It works when using engine='xlrd' but since the default engine changed to 'xlwings' it does not work.
To fix this nicely would need a lot of work: support for sparse arrays (#28) and reindex (on sparse index), however we could use a temporary shortcut: read the data as (or convert it to) a pd.dataframe, reindex, convert back to larray. Far from optimal but much easier to implement.
setdiff1d (numpy) -- works
delete (numpy) -- works for int, slice or list of indices
list.remove (python) -- works for value (inplace)
list.pop (python) -- index
since LArray is more like numpy arrays than Python lists => not remove and pop
=> delete and idelete?
does the label version (delete?) returns only unique? ie set-like op?
union
intersect
setdiff
setxor
setin
possibly on LArray too (though 1d/flattened only in that case like numpy -- because otherwise that returns non cubic arrays).
We should pick one of the two terms and stick with it. Currently we use both (.i and .ipoints but PGroup and posarg*). We should either have:
.p[], .ppoints[], PGroup and posarg*
or
.i[], .ipoints[], IGroup and iarg* (or indarg*)
This is a very important part and it is not tested at all currently, and I manage to break it every other release.
list of characters/patterns with special meaning:
, ; : .. [ ] >> name[] name.i[] {} numbers
* ?
| & !
we might want to reserve some or all other special characters just in case: # @ % / = + -
Or, we could define the precise list of characters a label can be made of which we can guarantee will not be interpreted.
which depends on larray and all optional larray dependencies so that our users only need to do:
conda update larrayenv
and be sure to have all the functionalities installed.
should also change other methods index_col default value to None instead of [], but the code needs to be changed too
... it should accept a list of labels or an Axis and return a new Axis object (not modify the Axis in-place)
>>> letters = Axis('letters', 'a..z')
>>> letters[':c'].set() & letters['b:d'].set()
letters.set[OrderedSet(['b', 'c'])]
It should rather be:
letters.set['b', 'c']
and possibly Group
>>> arr = ndtest((2, 3))
>>> arr.reindex_axis(x.a, ['a1', 'a0', 'a1', 'a3'])
a\\b | b0 | b1 | b2
a1 | 3 | 4 | 5
a0 | 0 | 1 | 2
a1 | 3 | 4 | 5
a3 | nan | nan | nan
It might be easy to implement using something vaguely looking like:
>>> new_axis = Axis(old_axis.name, new_labels)
>>> missing_value = missing[dtype]
>>> old_indices = old_axis.translate(new_labels, missing=-1)
>>> result = self.i[old_indices]
>>> # fix up those which were missing
>>> result[old_indices == -1] = missing_value
at a minimum, move Axis, AxisCollection and LGroup tests out of test_la.py
Rolling, expanding, ...
See
http://pandas.pydata.org/pandas-docs/stable/computation.html#window-functions
http://xarray.pydata.org/en/stable/generated/xarray.DataArray.rolling.html#xarray.DataArray.rolling
For numpy:
https://gist.github.com/seberg/3866040
http://www.rigtorp.se/2011/01/01/rolling-statistics-numpy.html
Bottleneck also supports move_*
https://pypi.python.org/pypi/Bottleneck
there are no built-in move functions in numpy, so it compares against its own implementation:
https://github.com/kwgoodman/bottleneck/blob/master/bottleneck/slow/move.py
But it seems like Pandas works well even with numpy arrays, so I guess I shouldn't bother and simply use Pandas which has a lot more features than all the other solutions anyway.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.