cardillo / joinery Goto Github PK

View Code? Open in Web Editor NEW

692.0 692.0 167.0 682 KB

Data frames for Java

Home Page: https://joinery.sh

License: GNU General Public License v3.0

Java 98.37% CSS 0.49% JavaScript 1.02% C# 0.12%

data-frame data-frames dataframe-library java joinery

joinery's People

Contributors

Stargazers

Watchers

Forkers

lejon longlingmichael davegerson patrickhookerdgi peterfig robmcdan sometxdude dafei1288 alessandroleite wbuchanan ebottabi shriyaarora edouardswiac joseam84 sujan7 borisanc medmatix rapidark mindis nraov whiletruelearn miaochenal ylazouski quant2007 miguelvm narendramohan benmccann dims12 williamdengnewyork tonyvu2014 winrec luigidurso woung717 smallleopard hujunxianligong brianchambers iuliandumitru ronlygithub gitwyy listenbehind spudmcq izhangzhihao show1po xuqianjin-stars haitwang-cloud frankmanzhu bsound83 yackoa harsham4026 mmmika daniellansun stexxen yildizib dqinyuan ebllyfork allenray hugeorangedev dyf102 shujiehan tool-recommender-bot huangdongsheng tfnick yjjj0007 ai-nick donaldlee2010 readreply pydawan stahamtan jiyulongxu gavinage newsky chong-zi suyuan bennabnm mji1653 hkscript waijay1992 zenithda gavin-kang yamingd dial911 andy-wagner honoripaddr mugbya 1621740748 noideafornickname lynshine tafoca liaoxuewei knoppixmeister angkorpeach 478368324 joejoesucks frostgu foxcodehu willcup gm19900510 0autumn caozhitong wangzaidali

joinery's Issues

sorting / grouping implementation efficiency improvements

such as avoiding copies, making efficient use of memory, parallelization, etc.

add support for additional plot styles

area, bar, scatter, etc.

Extra argument apply?

Hi,

How can i add an extra argument numShares?
public void getStaticProfile(int numShares) {
this.service.getDf().apply(new Function<Object,Number>() {
public Number apply(Object value) {
BigDecimal b = new BigDecimal(value.toString());
System.out.println(b);
return b.multiply(new BigDecimal(numShares));
}
});

Rgds,

Calculate covariance, is it posible?

Could you show me a way to calculate the covariance of this Dataframe?
0 1
0_left 29789,00000000 29657,00000000
0_right 39140,00000000 26047,00000000
0 1380349,00000000 698550,00000000

Rgds,

Update license to add linking exception or dual license under Apache

Hi, I would like to know why this project is copyright to IBM? Is it supported by IBM? Additionally, is it possible to use another open source license, such as MIT?

add rank and quantile statistics methods

allow custom csv delimiters

and other reading flexibility such as preferred numeric type and date time format

method name tab completion in shell

groupby use of bitset requires too much memory

worst case is single element groups in which the higher the index the larger the bitset, could go back to hashmap based implementation or investigate sparse bitset.

better support for timeseries data

this will take some significant refactoring since currently the row and column indices are hard coded as strings (and parsing to/from datetime types is not a solution).

consider using ndarray for dataframe backing store

http://nd4j.org, http://math.nist.gov/javanumerics/jama/, or http://dst.lbl.gov/ACSSoftware/colt/ for example. the primary issue appears to be support for primitives versus objects.

all date output (string, csv, swing table, ...) should use consistent date formatting rules

add ability to display grid of charts

occasionally, it is more clear to display a grid of charts with one column per chart rather than plotting all the columns on a single chart.

Support for Factors

It would be great to have factors (like in R) as type for columns. I didn't find something similar. Are there any plans in this directions?

Add ability to plot to user provided component

This isn't an issue (sorry), just wanted to say thanks for making this. As a programmer who used pandas religiously in grad school, and now forced to program in pure Java, this is a godsend.

PS, wondering what it would take to embed the dataframe plot into an existing JFrame/JPanel? I have an existing JPanel with a JFree chart in there, and would love to swap it out with the plot that's mostly controlled by the dataframe.

There is a long way to go for joinery!

Joinery misses too many useful api. For example, I cannot even set a specific type for fields from .csv file in the method of read_csv method. Joinery just reads original String type as Double and I just can do nothing about this.

add joinery version to shell banner

fix plotting with time series data to use dates as x-axis

add option to plot trend line with points in charting functions

need sortby function (similar to groupby)

fix date/time serialization in csv writer

dates are currently written out as strings in a format not recognized by the csv reader

handle null values in aggregation functions

Creating a new column based on existing columns + function

The function provided as argument would be iterated through the rows and its output would be stored in the newly created column.

I can think of many use cases but the one why I am posting is: creating mid prices from bid and ask columns for time series of prices.

Ideally the iterated function would accept the whole current row as argument. The row itself would be a Map so as to be able to access individual items using the column name, not using their position.

Coercion methods

More of a question/potential enhancement request than anything. Basically, I was just wondering what it would take to create methods to coerce existing objects into a DataFrame object? I would imagine 2d Arrays would be fairly easy to handle (although I could be completely wrong). My hope was that as I get some other work wrapped up on some readers/parsers for Stata formatted files (as well as others in the future) it'd be possible to build the classes/methods around an idea of being able to coerce the data into a DataFrame (then there'd be the advantage of joins/unions of files from different statistical software platforms). Also, I haven't looked too much into the documentation yet, but if there is a way to retain any metadata with the file that would be helpful as well (e.g., variable labels (distinct from column names), value labels (e.g., analogous to descriptions in a look up table in a SQL database), etc...).

add melt implementation

inverse of pivot, possibly add stack and unstack as well.

interrupt in the shell should stop the currently running statement

currently, interrupt (^C) in the shell stops the shell altogether, this is ok if nothing is running, but it would be nice if it could also be used to stop the current command and leave the shell running if one wants to cancel a long-running operation.

handle missing values in plots

add summary statistics view

similar to summary(df) in R

implement rolling window version of apply

diff and percentChange could be updated to use this as well.

add diff implementation

add interactive js shell to dataframe utils

ensure column alignment in string representation of dataframes

improve detecting and dealing with missing values

improve chart series labeling using indices

update indices to use objects instead of forcing strings

should the column and row indices have parameter types

after toying with this idea a bit, it isn't trivial to make the row and column indices generic, but with a little work and/or casting it could work. the biggest challenge is how to represent the type of index returned by grouping (could be any object type or a list depending on the number of columns). at this point I don't think it is worth the added complexity.

differentiate between applying aggregate functions, row functions, and single value functions

the current way of overloading is confusing and prone to surprises, each of these operations probably needs its own name.

add sort by comparator function

similar to grouping by a custom key function, it should be possible to sort with a custom comparator.

better multilevel index support

this will also require refactoring the indices to no longer be strictly strings (similar to timeseries support)

Inheritance of joinery DataFrame

Is there a way to extend the joinery Dataframe class to add custom methods.

allow shell commands to span multiple lines

add slicing operations

convert throws if column contains longs and then doubles

the initial convert implementation only checked the first row for type conversions, therefore it was possible it would detect a column as Long numeric value but later in the frame a Double would be found causing the conversion to fail. though it is rather expensive, convert shouldn't be used all that often (the most common use will be just after reading from disk) so I think it is safe to scan the entire column to test the conversion.