Giter VIP home page Giter VIP logo

dataframe-go's People

Contributors

pasdam avatar pjebs avatar propersam avatar rocketlaunchr-cto avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dataframe-go's Issues

Potential collision and risk from indirect dependence "github.com/gotestyourself/gotestyourself"

Background

Repo rocketlaunchr/dataframe-go used the old path to import gotestyourself indirectly.
This caused that github.com/gotestyourself/gotestyourself and gotest.tools coexist in this repo:
https://github.com/rocketlaunchr/dataframe-go/blob/master/go.mod (Line 20 & 40)

github.com/gotestyourself/gotestyourself v2.2.0+incompatible // indirect
gotest.tools v2.2.0+incompatible // indirect 

That’s because the gotestyourself has already renamed it’s import path from "github.com/gotestyourself/gotestyourself" to "gotest.tools". When you use the old path "github.com/gotestyourself/gotestyourself" to import the gotestyourself, will reintroduces gotestyourself through the import statements "import gotest.tools" in the go source file of gotestyourself.

https://github.com/gotestyourself/gotest.tools/blob/v2.2.0/fs/example_test.go#L8

package fs_test
import (
	…
	"gotest.tools/assert"
	"gotest.tools/assert/cmp"
	"gotest.tools/fs"
	"gotest.tools/golden"
)

"github.com/gotestyourself/gotestyourself" and "gotest.tools" are the same repos. This will work in isolation, bring about potential risks and problems.

Solution

Add replace statement in the go.mod file:

replace github.com/gotestyourself/gotestyourself => gotest.tools v2.2.0

Then clean the go.mod.

Progress for re-write of dataframe-go?

It's written in the README file that "Once Go 1.18 (Generics) is introduced, the ENTIRE package will be rewritten.", As Go 1.18 has been released for a while, I'm wondering if work has started on re-writing of the entire package. If so, how's the progress?

undefined: dataframe.LoadFromCSV

When i run the below example to read in a df from a dummy string i get the below errors:

➜  learn-go git:(main) ✗ go run dev.go
# command-line-arguments
./dev.go:21:13: undefined: dataframe.LoadFromCSV
./dev.go:21:65: undefined: dataframe.CSVLoadOptions
package main

import (
	"context"
	"fmt"
	"strings"

	imports "github.com/rocketlaunchr/dataframe-go"
)

func main() {

	csvStr := `colA,colB
	1,"First"
	2,"Second"
	3,"Third"
	4,"Fourth"`

	ctx := context.Background()

	df, err := imports.LoadFromCSV(ctx, strings.NewReader(csvStr), imports.CSVLoadOptions{
		DictateDataType: map[string]interface{}{
			"colA": int64(0),
			"colB": "",
		},
	})

	fmt.Println(err)
	fmt.Println(df)

}

I'm not really sure what i'm doing wrong here. Was trying to use example from this sort of approach: https://github.com/rocketlaunchr/dataframe-go/blob/master/imports/infer_test.go

Here is the output of go env

GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/andre/.cache/go-build"
GOENV="/home/andre/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/andre/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/andre/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build868960458=/tmp/go-build -gno-record-gcc-switches"

p.s. i'm new to Go so pretty sure it could be me doing something silly :)

nothing for Xaxis and Yaxis

I copy the followed code from readme, but the chart haven't value in a Xaxis and Yaxis

import (
	chart "github.com/wcharczuk/go-chart"
	"github.com/rocketlaunchr/dataframe-go/plot"
	wc "github.com/rocketlaunchr/dataframe-go/plot/wcharczuk/go-chart"
)

sales := dataframe.NewSeriesFloat64("sales", nil, 50.3, nil, 23.4, 56.2, 89, 32, 84.2, 72, 89)
cs, _ := wc.S(ctx, sales, nil, nil)

graph := chart.Chart{Series: []chart.Series{cs}}

plt, _ := plot.Open("Monthly sales", 450, 300)
graph.Render(chart.SVG, plt)
plt.Display()
<-plt.Closed

CSV import does not support dictated type for fields with potential empty values

When importing CSV and a field has empty values at some rows, import ignores the dictated data type and forces to interpret the field as string.

csvStr := `sometimes_empty,label
,"First"
2,"Second"
,"Third"
4,"Fourth"`

ctx := context.Background()	

df, err := imports.LoadFromCSV(ctx, strings.NewReader(csvStr), imports.CSVLoadOptions {
	DictateDataType: map[string]interface{}	{
		"sometimes_empty": int64(0),
		"label": "",
	},
})

fmt.Println(err)
fmt.Println(df)

This code produces the error:

can't force string:  to int64. row: 0 field: sometimes_empty

How should I be able to create a dataframe from a CSV with empty values, that still follows the dictated type and have NaN instead on the empty vlaues?

Expected output should be:

+-----+-----------------+--------+
|     | SOMETIMES EMPTY | LABEL  |
+-----+-----------------+--------+
| 0:  |       NaN       | First  |
| 1:  |        2        | Second |
| 2:  |       NaN       | Third  |
| 3:  |        4        | Fourth |
+-----+-----------------+--------+
| 4X2 |      INT64      | STRING |
+-----+-----------------+--------+

HOw to use CSVLoadOptions ?

hi
i have one csv fiile ,has four fields [USERID ,MOVIEID,RATING, TIMESTAMP) ,LoadFromCSV default load
all fields data type are string ,I want to change it with float64 when load init ,so I create CSVLoadOptions
var csvOp imports.CSVLoadOptions
csvOp.DictateDataType =make(map[string]interface{})
csvOp.DictateDataType["USERID"]= float64(0)
csvOp.DictateDataType["MOVIEID"]=float64(0)
csvOp.DictateDataType["RATING"]=float64(0)
csvOp.DictateDataType["TIMESTAMP"]=float64(0)

ratingDf, err := imports.LoadFromCSV(ctx, file,csvOp)

but has load error ,I dont know why ,is use the CSVLoadOptions is not correct ?

how to get dataframe all data convert to gonum dense matrix ?

I want to use it ,but I found some problem ,You make the property about SeriesInt64 values private !!!,why ?

would you like tell how to convert dataframe to gonum dense matrix ?

and how to use LoadFromCSV(ctx,strings.NewReader(csvStr)),which ctx ,how to define the context.Context

LoadFromJSON Not Working

files, err := ioutil.ReadFile("device.json")
if err != nil {
fmt.Println(err)
}

var ctx = context.Background()
df2, _ := imports.LoadFromJSON(ctx, strings.NewReader(string(files)))

fmt.Println(df2.Table())

Error to read parquet with latest parquet-go

  1. Create a file with python pandas
dataframe = pandas.DataFrame({
        "A": ["a", "b", "c", "d"],
        "B": [2, 3, 4, 1],
        "C": [10, 20, None, None]
    })

dataframe.to_parquet("1.parquet")

This file looks like:
image

  1. Read this file
func main() {
    ctx := context.Background()
    fr, _ := local.NewLocalFileReader("1.parquet")
    df, err := imports.LoadFromParquet(ctx, fr)
    if err != nil {
        panic(err)
    }
    fmt.Println(df)
}
  1. Got a unique name error
panic: names of series must be unique: 

goroutine 1 [running]:
github.com/rocketlaunchr/dataframe-go.NewDataFrame({0xc0001f8000, 0x3, 0xc000149a10?})
        .../rocketlaunchr/[email protected]/dataframe.go:41 +0x33c
github.com/rocketlaunchr/dataframe-go/imports.LoadFromParquet({0x1497868, 0xc000020080}, {0x1498150?, 0xc00000e798?}, {0xc0000021a0?, 0xc000149f70?, 0x1007599?})
        .../go/pkg/mod/github.com/rocketlaunchr/[email protected]/imports/parquet.go:110 +0x8ae
main.main()
        .../main.go:13 +0x78
  1. Following the stack, I found some useful informations
  • All series in method imports.LoadFromParquet with empty names

image

  • goFieldNameToActual
    each keys in this map with prefix "Scheme", but goName didn't, may be it's the reason why can't not find a name from this map

image

image

This's the first time I use golang to read parquet files. It is an error cause by parquet-go breaking changes or something else ?

Getting back Float64/Int64/Mixed series from dataframe

I wanted to know if there is a way to convert a series interface to get the original type of series (Float64/Int64/Mixed) underneath it.
I will describe mu use case.

After creating a dataframe, I am trying to use gonum to do some analysis. For eg. linear regression of two series from dataframe. But for this I have to iterate over the whole series(using ValuesIterator) to get back each element into a []float64, which is required by gonum.
ToSeriesFloat64 does not help since it is not implemented by Series.

Is there an easier way to access the whole underlying series into into corresponding concrete slice?

Reading from Parquet

Hello,

Are there any plans to support reading a Parquet file into a dataframe? I have a need for this and am evaluating this library to use in an application.

Thanks!

Problem getting the package

Problem getting the package:

$ go get -u github.com/rocketlaunchr/dataframe-go
go get: github.com/rocketlaunchr/dataframe-go@none updating to
        github.com/rocketlaunchr/[email protected] requires
        github.com/blend/[email protected]: reading github.com/blend/go-sdk/go.mod at revision v1.1.1: unknown revision v1.1.1

Expand docs to include other common dataframe operations, etc.

Greetings!

Just a minor suggestion, but if you have the time, it could be useful to expand the docs a bit more to cover some additional common operations applied to dataframe-like structures, where supported.

For example:

  • retrieving a single row
  • retrieving a single column
  • selecting row/column subsets by indices or ranges
  • selecting a single value by <row, column> indices

Further, one other thing I noticed when employing the package for the first time, is that many of the dataframe.xx() function calls include a nil as the first argument.

From looking at the code for dataframe.go, these appear to be relating to an optional Options struct, so it makes sense that this would be set to nil in many instances. It may just be worth mentioning this explicitly in the examples for .Append() in the docs.

Finally two other things that could be useful to consider including in the docs:

  • Limitations compared with R/pandas
  • Cheatsheet of commands comparing dataframe-go with R/pandas (more effort, and probably better suited for a separate wiki page, etc., but would be really useful for people coming from these worlds..)

Thanks for taking the time to put together and share this really useful package!

TODO: new features/updates

  1. Update copyright + remove coverall
  2. Add Row function to dataframe (also allow it to decide what of data is returned) + DeepLock opt
  3. In utils, Search function to allow option to stop after finding N values.
  4. In imports pkg, for importing from SQL, change from stmt to interface.
  5. Allow imports.CSVLoadOptions to have default Comma,
  6. Allow imports.DictateDataType to accept Series type (to allow for custom data)
  7. Create random Series and Dataframes (with a random interface)
  8. Change signature of Sort functions to accept context. Perhaps add ctx to more functions.
  9. Allow custom time formats when importing.
  10. Add some more popular Pandas functions. (https://www.dataquest.io/blog/pandas-cheat-sheet/)
  11. Add example in readme on how to integrate with gonum pkg.
  12. Add Filter, GroupBy and Join (and also Append df): https://github.com/robpike/filter
  13. Generate fake data for Dataframe
  14. SeriesTime should be generate intervals
  15. Arima forecasting
  16. Add epsilon to SeriesFloat64
  17. Consider implementing OrderedMaps with doubly linked lists: http://www.tugberkugurlu.com/archive/implementing-ordered-map-in-go-2-0-by-using-generics-with-delete-operation-in-o-1-time-complexity and https://github.com/elliotchance/orderedmap
  18. Create a new Series type that uses a linked list container/list package for float64 data instead of an []float64. (PUT IN XSERIES PKG): https://github.com/huandu/skiplist
  19. Remove locking functionality -> leave to users to implement externally
  20. Clean up api with regards to a function to accept values (and always make options optional). V(args ...interface{})
  21. Advocate dot import for V function (pkg name dot)
  22. When importing from csv, allow series names to be chosen when the csv file doesn't have a headings row (Headings option) Thanks @pasdam
  23. df.AddSeries should increase number of rows of newly added series automatically to match number of rows in df
  24. Make Apply and Filter operate concurrently
  25. Add alias for maths variables as an option: fn := funcs.RegFunc("sin(2*𝜋*x/24)")funcs.Evaluate(ctx, df, fn, 1)
  26. Create adapter for: https://github.com/go-echarts/go-echarts
  27. Parquet export - configure time type per series
  28. Explore Range embedding/chaining another Range (to define multiple ranges)
  29. For Values Iterator, interpret Step=0 as Step=1
  30. For utils/faker, use gofakeit v6 and also create constants for the function names.
  31. Use rowerr for json importing

OrderedMap bug

For ordered map:

When setting an existing value, should we remove old value and append new?

	o.store[key] = val
	o.keys = append(o.keys, key)

For Delete function, if key is not found, then return immediately.

Appending a dataframe with another one.

Hello. Is there simple way to join two dataframes of same dimension? Something like df = append(df, another_df) or similar.

Name1 Name2
0 D E
1 F G

and

Name1 Name2
2 D E
3 F G

=

Name1 Name2
0 D E
1 F G
2 D E
3 F G

Can I read CSV file in batch?

I would like to know if I can read a CSV file in batches. If it is possible, how can I do it? I looked through the docs and examples and could not find the required information. Any help would be appreciated.

Thank you.

Export to Parquet example

Can someone give me an example of how to write the code to export a dataframe to a parquet file (couldn't find one anywhere)? I have no idea how to define the writer inside the ExportToParquet.
Knowing that i have a dataframe df, i have this code inside main:

ctx := context.Background()
w, err := os.Create("output.parquet")
exports.ExportToParquet(ctx, writer.NewParquetFromWriter(w, df, 4))

Getting this error:
image

Thanks again!

Examples on the main Page don't work

Example with ctx arguements don't work. and I wanted to know how to directly get the data from SQL to a dataframe and serve it as an API. I read your blog too. but that too had exactly the same examples.

This is a great package. thanks for working so hard.

Draw graphs from columns of dataframe

Hi! At the moment I have managed to plot a separate dataframe column by this strange method:

func main() {
        // all values of df are strings representing floating point numbers
        df := df, err := imports.LoadFromCSV(ctx, r, imports.CSVLoadOptions{Comma: ';'})
	s := df.Series[2] // trying to plot column 2
	series := dataframe.NewSeriesFloat64("test_name", nil, nil)

	i := s.ValuesIterator(dataframe.ValuesOptions{InitialRow: 0, Step: 1, DontReadLock: false})
	for {
		row, vals, _ := i()
		if row == nil {
			break
		}
		val, err := strconv.ParseFloat(vals.(string), 64)
		if err != nil {
			continue
		}
		series.Append(val)
	}
	Plot(series)
}

func Plot(ser *dataframe.SeriesFloat64) {
	ctx := context.TODO()
	cs, _ := wcharczuk_chart.S(ctx, ser, nil, nil)
	graph := chart.Chart{
		Title:  "test_graph",
		Width:  640,
		Height: 480,
		Series: []chart.Series{cs},
	}
	f, err := os.Create("graph.svg")
	if err != nil {
		panic(err)
	}
	defer f.Close()

	plt := bufio.NewWriter(f)
	_ = graph.Render(chart.SVG, plt)
}

Is there any simplier or more elegant method to do this job? And another question is if I can plot several columns on one plot? And if it is possible, how can I do this? Thanks in advance.

sort having issue with ctx

sks := []dataframe.SortKey{
{Key: "sales", Desc: true},
{Key: "day", Desc: true},
}

df.Sort(ctx, sks)

In this code you have provided in readme, ctx is not defined any where before that. I got it is coming from context but I think we need to initialize this ctx first before calling df.Sort(ctx, sks). Kindly guide me. Thanks in advance.

Error to import csv, raised parquet-go error

Hi, I got an error to import csv, need help!
code from README.md

package main

import (
	"context"
	"fmt"
	"strings"

	"github.com/rocketlaunchr/dataframe-go/imports"
)

var ()

func main() {
	csvStr := `
Country,Date,Age,Amount,Id
"United States",2012-02-01,50,112.1,01234
"United States",2012-02-01,32,321.31,54320
"United Kingdom",2012-02-01,17,18.2,12345
"United States",2012-02-01,32,321.31,54320
"United Kingdom",2012-05-07,NA,18.2,12345
"United States",2012-02-01,32,321.31,54320
"United States",2012-02-01,32,321.31,54320
Spain,2012-02-01,66,555.42,00241
`
	fmt.Println(csvStr)

	ctx := context.Background()

	df, err := imports.LoadFromCSV(ctx, strings.NewReader(csvStr))
	fmt.Println(df)
	fmt.Println(err)
}

There is error to run

$ go run main.go
# github.com/xitongsys/parquet-go/parquet
C:\Users\hellogo\go\pkg\mod\github.com\xitongsys\[email protected]\parquet\parquet.go:631:16: not enough arguments in call to iprot.ReadStructBegin
        have ()
        want (context.Context)
C:\Users\hellogo\go\pkg\mod\github.com\xitongsys\[email protected]\parquet\parquet.go:637:37: not enough arguments in call to iprot.ReadFieldBegin
        have ()
        want (context.Context)
C:\Users\hellogo\go\pkg\mod\github.com\xitongsys\[email protected]\parquet\parquet.go:649:30: not enough arguments in call to iprot.Skip
        have (thrift.TType)
        want (context.Context, thrift.TType)
C:\Users\hellogo\go\pkg\mod\github.com\xitongsys\[email protected]\parquet\parquet.go:659:30: not enough arguments in call to iprot.Skip
        have (thrift.TType)
        want (context.Context, thrift.TType)
C:\Users\hellogo\go\pkg\mod\github.com\xitongsys\[email protected]\parquet\parquet.go:669:30: not enough arguments in call to iprot.Skip
        have (thrift.TType)
        want (context.Context, thrift.TType)
C:\Users\hellogo\go\pkg\mod\github.com\xitongsys\[email protected]\parquet\parquet.go:679:30: not enough arguments in call to iprot.Skip
        have (thrift.TType)
        want (context.Context, thrift.TType)
C:\Users\hellogo\go\pkg\mod\github.com\xitongsys\[email protected]\parquet\parquet.go:689:30: not enough arguments in call to iprot.Skip
        have (thrift.TType)
        want (context.Context, thrift.TType)
C:\Users\hellogo\go\pkg\mod\github.com\xitongsys\[email protected]\parquet\parquet.go:699:30: not enough arguments in call to iprot.Skip
        have (thrift.TType)
        want (context.Context, thrift.TType)
C:\Users\hellogo\go\pkg\mod\github.com\xitongsys\[email protected]\parquet\parquet.go:704:28: not enough arguments in call to iprot.Skip
        have (thrift.TType)
        want (context.Context, thrift.TType)
C:\Users\hellogo\go\pkg\mod\github.com\xitongsys\[email protected]\parquet\parquet.go:708:15: not enough arguments in call to iprot.ReadFieldEnd
        have ()
        want (context.Context)
C:\Users\hellogo\go\pkg\mod\github.com\xitongsys\[email protected]\parquet\parquet.go:708:15: too many errors

Inconsistent behavior for Apply when using with ApplyDataFrameFn

I'm trying to concatenate two columns in a dataframe and put it into a new column. The behavior is very inconsistent. Sometimes the strings are concatenated into the new column. Sometimes the value is just set to NaN.

In this run, the value for concat_contact_number in the resulting dataframe was correctly set to 97312345678.
The map value for concat_contact_number also reflects the concatenated value.

Expected output:

$ go run main.go 
INFO[0000] In applyConcatDf: vals[contact_number_country_code]: 973 
INFO[0000] In applyConcatDf: vals[concat_contact_number]: 973 
INFO[0000] In applyConcatDf: vals[contact_number]: 12345678 
INFO[0000] In applyConcatDf: vals[concat_contact_number]: 97312345678 
INFO[0000] In applyConcatDf: vals: map[0:973 1:12345678 2:<nil> concat_contact_number:97312345678 contact_number:12345678 contact_number_country_code:973] 
INFO[0000] In prepareDataframe:                         
INFO[0000] +-----+-----------------------------+----------------+-----------------------+
|     | CONTACT NUMBER COUNTRY CODE | CONTACT NUMBER | CONCAT CONTACT NUMBER |
+-----+-----------------------------+----------------+-----------------------+
| 0:  |             973             |    12345678    |      97312345678      |
+-----+-----------------------------+----------------+-----------------------+
| 1X3 |           STRING            |     STRING     |        STRING         |
+-----+-----------------------------+----------------+-----------------------+ 
INFO[0000] In main:                                     
INFO[0000] +-----+-----------------------------+----------------+-----------------------+
|     | CONTACT NUMBER COUNTRY CODE | CONTACT NUMBER | CONCAT CONTACT NUMBER |
+-----+-----------------------------+----------------+-----------------------+
| 0:  |             973             |    12345678    |      97312345678      |
+-----+-----------------------------+----------------+-----------------------+
| 1X3 |           STRING            |     STRING     |        STRING         |
+-----+-----------------------------+----------------+-----------------------+ 

In this run, the value for concat_contact_number in the resulting dataframe was incorrectly set to NaN.
Same as with the correct run, the map value for concat_contact_number is also set to the expected concatenated value.

Erroneous output:

$ go run main.go 
INFO[0000] In applyConcatDf: vals[contact_number_country_code]: 973 
INFO[0000] In applyConcatDf: vals[concat_contact_number]: 973 
INFO[0000] In applyConcatDf: vals[contact_number]: 12345678 
INFO[0000] In applyConcatDf: vals[concat_contact_number]: 97312345678 
INFO[0000] In applyConcatDf: vals: map[0:973 1:12345678 2:<nil> concat_contact_number:97312345678 contact_number:12345678 contact_number_country_code:973] 
INFO[0000] In prepareDataframe:                         
INFO[0000] +-----+-----------------------------+----------------+-----------------------+
|     | CONTACT NUMBER COUNTRY CODE | CONTACT NUMBER | CONCAT CONTACT NUMBER |
+-----+-----------------------------+----------------+-----------------------+
| 0:  |             973             |    12345678    |          NaN          |
+-----+-----------------------------+----------------+-----------------------+
| 1X3 |           STRING            |     STRING     |        STRING         |
+-----+-----------------------------+----------------+-----------------------+ 
INFO[0000] In main:                                     
INFO[0000] +-----+-----------------------------+----------------+-----------------------+
|     | CONTACT NUMBER COUNTRY CODE | CONTACT NUMBER | CONCAT CONTACT NUMBER |
+-----+-----------------------------+----------------+-----------------------+
| 0:  |             973             |    12345678    |          NaN          |
+-----+-----------------------------+----------------+-----------------------+
| 1X3 |           STRING            |     STRING     |        STRING         |
+-----+-----------------------------+----------------+-----------------------+ 

It can be observed that in both cases the map value for 2 is always <nil>. Is this expected?

Run this code several times to see deviances in the output. The issue may not show up immediately. Sometimes it takes 10x runs, sometimes only 2x run. Again the behavior is inconsistent.

Working code:

package main

import (
	"context"
	"fmt"
	"strings"

	dataframe "github.com/rocketlaunchr/dataframe-go"
	"github.com/rocketlaunchr/dataframe-go/imports"
	log "github.com/sirupsen/logrus"
)

// applyConcatDf returns an ApplyDataFrameFn that concatenates the given column names into another column
func applyConcatDf(dest_column string, columns []string) dataframe.ApplyDataFrameFn {
	return func(vals map[interface{}]interface{}, row, nRows int) map[interface{}]interface{} {
		vals[dest_column] = ""
		for _, key := range columns {
			log.Infof("vals[%s]: %s", key, vals[key].(string))
			vals[dest_column] = vals[dest_column].(string) + vals[key].(string)
			log.Infof("vals[%s]: %s", dest_column, vals[dest_column].(string))
		}

		log.Infof("vals: %v", vals)
		return vals
	}
}

// applySetupDataframe initializes the dataframe from a CSV string
func setupDataframe() *dataframe.DataFrame {
	ctx := context.Background()

	csvStr := `contact_number_country_code,contact_number
"973","12345678"`

	df, _ := imports.LoadFromCSV(ctx, strings.NewReader(csvStr), imports.CSVLoadOptions{
		DictateDataType: map[string]interface{}{
			"contact_number_country_code": "",
			"contact_number":              "",
		},
	})

	return df
}

// prepareDataframe applies the concatenation on the loaded dataframe
func prepareDataframe(df *dataframe.DataFrame) {
	ctx := context.Background()

	sConcatContactNumber := dataframe.NewSeriesString("concat_contact_number", &dataframe.SeriesInit{Size: df.NRows()})
	df.AddSeries(sConcatContactNumber, nil)

	_, err := dataframe.Apply(ctx, df, applyConcatDf("concat_contact_number", []string{"contact_number_country_code", "contact_number"}), dataframe.FilterOptions{InPlace: true})

	if err != nil {
		log.WithError(err).Error("concatenation cannot be applied")
	}

	fmt.Println(df)
}

func main() {
	df := setupDataframe()
	prepareDataframe(df)
	fmt.Println(df)
}

Timeindex and Resample like pandas

Hi, This is a very helpful package, I was wondering do we have plans implementing Timeindex for series / dataframe and implement resample just like pandas? Could be extremely useful. If no one is doing it I am happy to contribute.

Support for Go modules?

Go modules makes it easier to work on multiple forked packages at the same time.
I suppose a simple go mod init github.com/rocketlaunchr/dataframe-go would be sufficient to have it.

If you accept the proposal, I can submit a PR later.

panic: runtime error: invalid memory address or nil pointer dereference

content, err := ioutil.ReadFile(parsedDirectory, file.Name()) if err != nil{ fmt.Println(err) return } df, err := imports.LoadFromCSV(ctx, bytes.NewReader(content)) var writers io.Writer jsonErr := exports.ExportToJSON(ctx, writers, df) if jsonErr != nil{ fmt.Println("Json export error") }

The above script throws below error

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x4e9f01]

goroutine 1 [running]:
encoding/json.(*Encoder).Encode(0xc000045e00, 0x5b83c0, 0xc0000601e0, 0x0, 0xc0002f22e8)
/usr/local/go/src/encoding/json/stream.go:231 +0x1b1
github.com/rocketlaunchr/dataframe-go/exports.ExportToJSON(0x62a220, 0xc000016080, 0x0, 0x0, 0xc00001a080, 0x0, 0x0, 0x0, 0x0, 0x0)
/go/pkg/mod/github.com/rocketlaunchr/[email protected]/exports/jsonl.go:83 +0x395

current license creates uncertainty

Licenses are code; tricky to the point of being analogous to cryptographic algorithms. Writing a custom one often creates unintended consequences. For instance, as currently worded, even with the best of intentions, a web developer might deploy the package believing themselves to be in full compliance with the license, then later learn that a site user is non-compliant; this new knowledge causes the developer to become immediately non-compliant.

Other than a return to a standard MIT license or one of the others listed at https://pkg.go.dev/license-policy, I don't have any good suggestions for what to do about this; I don't know of any standard open-source licenses that satisfy the full intent of the current license. That doesn't mean there aren't any -- the JSON license, for instance, is on the pkg.go.dev list, and is an MIT derivative that tries to do the right thing. Here's a related conversation on a stack exchange site that covers some of the issues in more detail: https://softwareengineering.stackexchange.com/questions/199055/open-source-licenses-that-explicitly-prohibit-military-applications

Getting dataframe.ApplySeriesFn undefined error

Thanks for creating this library!

I can get this code to work:

ctx := context.TODO()

// step 1: open the csv
csvfile, err := os.Open("data/example.csv")
if err != nil {
	log.Fatal(err)
}

dataframe, err := imports.LoadFromCSV(ctx, csvfile)

Here's the data that's printed:

fmt.Print(dataframe.Table())

+-----+------------+-----------------+
|     | FIRST NAME | FAVORITE NUMBER |
+-----+------------+-----------------+
| 0:  |  matthew   |       23        |
| 1:  |   daniel   |        8        |
| 2:  |  allison   |       42        |
| 3:  |   david    |       18        |
+-----+------------+-----------------+
| 4X2 |   STRING   |     STRING      |
+-----+------------+-----------------+

I cannot get this code working:

s := dataframe.Series[2]

applyFn := dataframe.ApplySeriesFn(func(val interface{}, row, nRows int) interface{} {
	return 2 * val.(int64)
})

dataframe.Apply(ctx, s, applyFn, dataframe.FilterOptions{InPlace: true})

fmt.Print(dataframe.Table())

Here's the error message:

./dataframe_go.go:36:22: dataframe.ApplySeriesFn undefined (type *dataframe.DataFrame has no field or method ApplySeriesFn)
./dataframe_go.go:40:11: dataframe.Apply undefined (type *dataframe.DataFrame has no field or method Apply)
./dataframe_go.go:40:44: dataframe.FilterOptions undefined (type *dataframe.DataFrame has no field or method FilterOptions)

Here's the code: https://github.com/MrPowers/go-dataframe-examples/blob/master/dataframe_go.go

Sorry if this is a basic question. I am a Go newbie!

Thanks again for making this library!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.