nl253 / dataframe Goto Github PK

Dataframe & Series library for statistics and tabular data manipulation (like pandas)

License: MIT License

JavaScript 100.00%

data-analysis data-analytics data-manipulation javascript library node node-js pandas series statistics tabular tabular-data

dataframe's People

Contributors

Watchers

Forkers

manu87ds

dataframe's Issues

Fix JSDocs type annotation *[] is not accepted

Tests are failing because of this.

Make DataFrame indexable using integer indexes

So that:

df[0] === df.cols[0]

Document DataFrame.row indexer Proxy

This is a new feature and does not have any documentation.

Fix overloading of the `in` operator for this.rows

Currently there is a draft implementation of the overloading of the in operator but doesn't seem to work properly. I got it to work with a string dataframe but not numeric. You should be able to:

const hasRow = [1,2,3] in df;

You need to use the proxy that is already returned from the DataFrame.rows getter.

Add DataFrame tests

Don't re-implement tests for functionality that is forwarded to the Column. Test agg, call, matrix and other core methods in DataFrame.

Add export to SQL updates and inserts

Remove transpose from docs as it's no longer a part of the API

Object.freeze what can be frozen

Some properties should never be assigned to. Research which ones can be made read-only and Object.freeze the object they are on.

Document dataset lookup

Currently it's not clear where these default datasets come from and how you can plug into this framework by changing a DataFrame.opts variable.

Printing many columns in some cases causes the columns to not fit in the terminal screen

Optimize DataFrame.sample which currently constructs a new array for each chosen row

Allow for selection of column(s) using regex

E.g.:

df.slice(/coordX/, /userNam/)

Allow indexing of df.cols also by the name of the columns

So that:

df.cols[0] === df.cols.firstName

Generalise all conversion and export methods

df.toJSON, df.toHTML, df.toObj and df.toCSV have a lot in common. They all use the internal structure of the DataFrame object to convert it to something if no args are supplied. If an argument is supplied then it's treated as the file name of where the result of conversion can be saved. The task is to generalise all of these methods and make them partials of the new method.

Figure out what hasn't been documented

This should be done once a higher level of API stability is reached. Ie after all bugs have been dealt with and all features implemented.

Refactor Column tests

The tests for the Column class have been written quite a while ago and need to be looked at.

Overload the `in` operator for DataFrame.cols so that you can check if a column is part of DataFrame

You should be able to:

(df.cols[0] in df.cols) === true

Update docs (df.val and df.col have been removed)

Refactor API into high level many-signature methods calling low-level, specific methods

Currently the API is clean in that there are very few methods with many if statements in them. However, if someone were to run them in a loop this would become costly. Ideally there would be a way to call selectByFunc instead of call select and check if isFunc(params[0]).

In essence:

class DataFrame {
  
  select(...params) {
    // choose select* depending on value of params
    // ...
  }

  selectByFunc(f) { ... }

  selectByIdx(n, m) { ... }

}

Add documentation for developers and a note about feature requests

Expand Column tests

Not all Column functionality is covered with tests.

DataFrame.dist doesn't make it clear what the method does, it should be called distance

Add a row indexer object

So that:

df.row[0] === [df.cols[0][0], df.cols[1][0], ...]

This should be done by returning a Proxy which will allow to overload the indexing operator.

JSDocs are not as concise as they could be, check if type|null could be replaced with type?

Reconsider the use of null as an alias for "all columns"

Currently the API understands null as meaning "all columns" which may be confusing. "all" would be better but may clash with actual column names.

Add enums for fixed string parameters such as "des", "asc", "is" etc.

This would be something along the lines of:

const SortOrder = {
  ASCENDING: "asc",
  DESCENDING: "des"
  // ...
}

Allow replacing columns by assigning to properties of df.cols

So that

df.cols.firstName = df.cols.firstName.map(name => name[0].toUpper().concat(name.slice(1)))

works.

Turn logging off depending on the environment

Custom ValidationError to control formatting

I would prefer to avoid having to do this and have the error constructor do it all:

throw new Error(msg('you need to provide pairs of colId, newName (e.g. df.rename(1, "Width", -2, "Length"))'))

Optimize the loading of DataFrame from a CSV text file (most common case)

The map method on column does not convert to ColStr if the mapping maps elements to string

This also applies to other datatypes. Currently the convert method is not called (but should be) after a call to map.

Document export to file of df.toHTML, df.toJSON etc.

Document missing features

Depends on #24.

Ensure df.colNames is always a ColStr so that new unlabeled columns are "col1", "col2", ... not 0, 1, 2 ...

This will ensure that df.colNames has a consistent type.

Allow loading of archived data as DataFrame (gzip, zip, lzma etc.)

The constructor should accept filenames with the .zip, .gzip and .lzma and .7z extensions.

When the filename is passed to the constructor, the library should attempt to look for an existing (already un-archived) file and if it's not present, unarchive and place it in the same directory as the archive only without the archive extension. If it is present, then it's read as usual.

a function (val, idx) => bool with an optional column id to ensure val is the value of that column and not the whole row
ints to get rows
regex AND column id that selects all rows where the matching column values match the pattern

Add summary method to Column

So that:

> df.summary()
{
    "mean": 123.12312,
    "var": 123.3112,
    "min": 12,
    "max": 111123,
    "std": 11,11
}

nl253 / dataframe Goto Github PK

dataframe's People

Contributors

Watchers

Forkers

dataframe's Issues

Recommend Projects

Recommend Topics

Recommend Org