Giter VIP home page Giter VIP logo

frames's People

Contributors

aghid avatar dinal avatar gkirok avatar gshatz avatar gtopper avatar katyakats avatar omesser avatar orkiguazio avatar pavius avatar sharon-iguazio avatar sweetops avatar taliguaz avatar tebeka avatar yaronha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

frames's Issues

make generic rows to col (frame) converter

instead of having specific implementation per backend of iterating over full/sparse rows and creating columns and frames add a generic class/method to add rows to a frame e.g. Frame.AddRow(index interface{}, row map[string]interface)

Rename Verbose to LogLevel

We use Verbose in configuration to set the log level and use that name in the code. IMO this is misleading and we should rename Verbose to LogLevel.

Multiple file/location backends

We'd like to be to work with various file formats - CSV, Parquet, JSOn ...
We'd also like to have a way to fetch the file from various locations: file system, S3, HDFS ...

Python client hangs when there a write error

Using the following code:

import pandas as pd
import numpy as np
import v3io_frames as v3f

backend, table = 'tsdb', 'miki_stocks'

df = pd.read_csv('_t/stocks.csv', parse_dates=['Date'], index_col='Date')
chunks = (c for _, c in df.groupby(np.arange(len(df))//100))
c = v3f.Client('http://localhost:8080')
c.write(backend, table, chunks)

Create backends on server startup

Current we create a backend on every request. We'd like to create them when the server starts.

We need to think how clients can refrence a specific backend and how can we avoid connections being opened for the life of the server (pool?)

Server errors don't propagate to the client

@yaronha reported that the server show the error below but the client operation ended successfully with a nil (None?) DataFrame

18.10.27 15:51:19.101 ?[37m              v3io-frames?[0m ?[34m(I)?[0m read request {"request": {"backend":"stream","schema":null,"data_format":"","row_layout":false,"multi_index":false,"query":"","table":"stockes_stream","columns":null,"filter":"","group_by":""
,"Join":null,"limit":0,"max_in_message":0,"marker":"","segments":null,"total_segment":0,"sharding_keys":null,"sort_key_range_start":"","sort_key_range_end":"","start":"now-2d","end":"","step":"","aggragators":"","seek":"time","shard":"0","sequence":140}}
18.10.27 15:51:19.265 ?[37m              v3io-frames?[0m ?[31m(E)?[0m can't query {"error": "Error in Seek operation - Failed POST with status 400"}
18.10.27 15:51:19.265 ?[37m              v3io-frames?[0m ?[31m(E)?[0m error reading {"error": "can't query: Error in Seek operation - Failed POST with status 400", "errorVerbose": "Error in Seek operation - Failed POST with status 400\ncan't query\ngithub.com/v
3io/frames/api.(*API).Read\n\tC:/Users/yaronh/go/src/github.com/v3io/frames/api/api.go:96\ngithub.com/v3io/frames/server.(*Server).handleRead.func1\n\tC:/Users/yaronh/go/src/github.com/v3io/frames/server/server.go:180\nruntime.goexit\n\tC:/Go/src/runtime/asm_am
d64.s:2361", "errorCauses": [{"error": "can't query: Error in Seek operation - Failed POST with status 400", "errorVerbose": "Error in Seek operation - Failed POST with status 400\ncan't query", "errorCauses": [{"error": "Error in Seek operation - Failed POST w
ith status 400"}]}]}

RowIter should handle new columns

@yaronha said:

RawIter, doesnt handle the case were new rows dont have some of the fields the old ones had, see below "

18.10.27 20:11:16.451 ?[37m              v3io-frames?[0m ?[31m(E)?[0m error during iteration {"error": "Failed to create frame - \"cleaned\" column size mismatch (15 != 239)"}
18.10.27 20:11:16.453 ?[37m              v3io-frames?[0m ?[31m(E)?[0m error reading {"error": "error during iteration: Failed to create frame - \"cleaned\" column size mismatch (15 != 239)", "errorVerbose": "Failed to create frame - \"cleaned\" column size mism
atch (15 != 239)\nerror during iteration\ngithub.com/v3io/frames/api.(*API).Read\n\tC:/Users/yaronh/go/src/github.com/v3io/frames/api/api.go:106\ngithub.com/v3io/frames/server.(*Server).handleRead.func1\n\tC:/Users/yaronh/go/src/github.com/v3io/frames/server/se
rver.go:180\nruntime.goexit\n\tC:/Go/src/runtime/asm_amd64.s:2361", "errorCauses": [{"error": "error during iteration: Failed to create frame - \"cleaned\" column size mismatch (15 != 239)", "errorVerbose": "Failed to create frame - \"cleaned\" column size mism
atch (15 != 239)\nerror during iteration", "errorCauses": [{"error": "Failed to create frame - \"cleaned\" column size mismatch (15 != 239)"}]}]}

add create param if_exists

in create operation allow user to select what happens if table already exist:
fail (default), overwrite (delete old and write), ignore (dont fail, keep working with the old)

Refactor configuration

Currently configuration is in backend configuration, environment variables and command line switches (to framesd) and soon in session details.

We use our own code to manage this. However there are packages for this, probably cobra + viper

Change slice column message keys to match dtype

Currently in slice column message data fields we use ints for []int floats for []float ...
Instead, have the data field name match the dtype (e.g. '[]int`), this will simplify the logic both in server and clients.

Client should hold session information

Currently, the server holds the connection information per backend. We'd like to change that and have the client pass connection information with every request (very much like regular database driver).

KV readers adds index columns to the Frame as well

@yaronha said:

i notice in kv/reader Next() we have index in both Indices & Columns (only delete it from ByName (delete(byName, indexColKey))
can u fix it so the kv Index col (__name) only adds to the indexes and is not duplicated in Columns (see pic below w the issue):
image 2

Bug with MultiIndex indices

@yaronha said:

df.index.levels only return the unique values, where df.index.labels returns the mapping, the len(col) will not be the frame size if index values repeat and it cause an error on server checkEqualLen()

i assume u need to use df.index.get_level_values() and we can optimize that if len(level)==1 we can make it a Label column (i.e. the same value prob repeat it self in all the rows since there is only one such val)
see: http://pandas.pydata.org/pandas-docs/stable/advanced.html

CSV write truncates data

Assuming config.toml is

verbose = "info"

[[backends]]
name = "weather"
type = "csv"
rootDir = "/tmp/weather"

The run

#!/bin/bash

root_dir=/tmp/weather
rm -rf ${root_dir}
mkdir -p ${root_dir}
cp _examples/client/py/weather.csv ${root_dir}
go run cmd/framesd/framesd.go -config _t/config.toml

And then

PYTHONPATH=clients/py python _examples/client/py/client.py

You'll see the generated file is truncated.

Grafana interface

We'd like to have an HTTP interface for grafana.

  • Support read only
  • Parameters in URL query
  • Return JSON

We just need a skeleton, @yaronha will take over and fill it.

Remove nunique test in Python client

Currently the Python client checks if there's only one value in a column and then encodes it as a label column. However the check to nunique is expensive (84.2µs for a column with 3 unique elements and size of 1000) - Remove it.

Support empty column names & duplicate column names

We need to decide what to do with empty column names.

Looking at pandas:

  • read_csv will automatically assign names to columns if it's empty.
  • If there's a duplicate column names, and you try to get it - pandas will return a column (probably the first) that matches this name

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.