rocketlaunchr / dataframe-go Goto Github PK

View Code? Open in Web Editor NEW

1.1K 1.1K 94.0 1.01 MB

DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration

License: Other

Go 100.00%

data-science dataframe dataframes go golang machine-learning pandas pandas-dataframe python statistics

dataframe-go's People

Contributors

Stargazers

Watchers

Forkers

mayankbaluni developgo pjebs lucmski kokizzu propersam aryiaa pds2208 caldempsey drgo awesomegolang typeless pubgo avi3tal mrpowers daniel-007 golifes amasser yssource craig8 arnab621 n-is zdevwu zhangjielun1994 zeta1999 lgyaxx windhooked oudy525i tawawhite charan678 litong860418 simongarisch guduxingzou pgallanis aboubakressadiq-redjil pasdam padchin thinknlearn kiraninbng jdfergason zhangivan baajarmeh quantxtz rookiemaker leauneo jueyanyingyuu tiezhong2004 sunhailin-leo celephaiss cchadv yonch dk333 lopsdir hiok2000 kunaldawn isgasho mewbak standardgalactic esonata abyssalora dcount107 ajunlonglive xiaobing2020 tglmm9377 kamran151199 elitedatai888 systemquant mmavka stevebriskin esword618 hkant26 soyoo altoplano go-disf sannysanoff xinjiayu birdalugur pvol felbit paulo-lopes-estevao wcx0726 neilaconway vinodborole zfg88287508 wilsonliu123 thiruselvaa iq-scm pevgeniy007 reidlai bkhamitov blastbao seahurt petergtz rifai-rizqi3

dataframe-go's Issues

Potential collision and risk from indirect dependence "github.com/gotestyourself/gotestyourself"

Background

Repo rocketlaunchr/dataframe-go used the old path to import gotestyourself indirectly.
This caused that github.com/gotestyourself/gotestyourself and gotest.tools coexist in this repo：
https://github.com/rocketlaunchr/dataframe-go/blob/master/go.mod （Line 20 & 40）

github.com/gotestyourself/gotestyourself v2.2.0+incompatible // indirect
gotest.tools v2.2.0+incompatible // indirect

That’s because the gotestyourself has already renamed it’s import path from "github.com/gotestyourself/gotestyourself" to "gotest.tools". When you use the old path "github.com/gotestyourself/gotestyourself" to import the gotestyourself, will reintroduces gotestyourself through the import statements "import gotest.tools" in the go source file of gotestyourself.

https://github.com/gotestyourself/gotest.tools/blob/v2.2.0/fs/example_test.go#L8

package fs_test
import (
	…
	"gotest.tools/assert"
	"gotest.tools/assert/cmp"
	"gotest.tools/fs"
	"gotest.tools/golden"
)

"github.com/gotestyourself/gotestyourself" and "gotest.tools" are the same repos. This will work in isolation, bring about potential risks and problems.

Solution

Add replace statement in the go.mod file:

replace github.com/gotestyourself/gotestyourself => gotest.tools v2.2.0

Then clean the go.mod.

Progress for re-write of dataframe-go?

It's written in the README file that "Once Go 1.18 (Generics) is introduced, the ENTIRE package will be rewritten.", As Go 1.18 has been released for a while, I'm wondering if work has started on re-writing of the entire package. If so, how's the progress?

is group by supported?

undefined: dataframe.LoadFromCSV

When i run the below example to read in a df from a dummy string i get the below errors:

➜  learn-go git:(main) ✗ go run dev.go
# command-line-arguments
./dev.go:21:13: undefined: dataframe.LoadFromCSV
./dev.go:21:65: undefined: dataframe.CSVLoadOptions

package main

import (
	"context"
	"fmt"
	"strings"

	imports "github.com/rocketlaunchr/dataframe-go"
)

func main() {

	csvStr := `colA,colB
	1,"First"
	2,"Second"
	3,"Third"
	4,"Fourth"`

	ctx := context.Background()

	df, err := imports.LoadFromCSV(ctx, strings.NewReader(csvStr), imports.CSVLoadOptions{
		DictateDataType: map[string]interface{}{
			"colA": int64(0),
			"colB": "",
		},
	})

	fmt.Println(err)
	fmt.Println(df)

}

I'm not really sure what i'm doing wrong here. Was trying to use example from this sort of approach: https://github.com/rocketlaunchr/dataframe-go/blob/master/imports/infer_test.go

Here is the output of go env

GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/andre/.cache/go-build"
GOENV="/home/andre/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/andre/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/andre/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build868960458=/tmp/go-build -gno-record-gcc-switches"

p.s. i'm new to Go so pretty sure it could be me doing something silly :)

How to remove duplicate rows in DataFrame?

How to remove duplicate rows in DataFrame?
python's pd.drop_duplicates()

nothing for Xaxis and Yaxis

I copy the followed code from readme, but the chart haven't value in a Xaxis and Yaxis

import (
	chart "github.com/wcharczuk/go-chart"
	"github.com/rocketlaunchr/dataframe-go/plot"
	wc "github.com/rocketlaunchr/dataframe-go/plot/wcharczuk/go-chart"
)

sales := dataframe.NewSeriesFloat64("sales", nil, 50.3, nil, 23.4, 56.2, 89, 32, 84.2, 72, 89)
cs, _ := wc.S(ctx, sales, nil, nil)

graph := chart.Chart{Series: []chart.Series{cs}}

plt, _ := plot.Open("Monthly sales", 450, 300)
graph.Render(chart.SVG, plt)
plt.Display()
<-plt.Closed

CSV import does not support dictated type for fields with potential empty values

When importing CSV and a field has empty values at some rows, import ignores the dictated data type and forces to interpret the field as string.

csvStr := `sometimes_empty,label
,"First"
2,"Second"
,"Third"
4,"Fourth"`

ctx := context.Background()	

df, err := imports.LoadFromCSV(ctx, strings.NewReader(csvStr), imports.CSVLoadOptions {
	DictateDataType: map[string]interface{}	{
		"sometimes_empty": int64(0),
		"label": "",
	},
})

fmt.Println(err)
fmt.Println(df)

This code produces the error:

can't force string:  to int64. row: 0 field: sometimes_empty

How should I be able to create a dataframe from a CSV with empty values, that still follows the dictated type and have NaN instead on the empty vlaues?

Expected output should be:

+-----+-----------------+--------+
|     | SOMETIMES EMPTY | LABEL  |
+-----+-----------------+--------+
| 0:  |       NaN       | First  |
| 1:  |        2        | Second |
| 2:  |       NaN       | Third  |
| 3:  |        4        | Fourth |
+-----+-----------------+--------+
| 4X2 |      INT64      | STRING |
+-----+-----------------+--------+

HOw to use CSVLoadOptions ?

hi
i have one csv fiile ,has four fields [USERID ,MOVIEID,RATING, TIMESTAMP) ,LoadFromCSV default load
all fields data type are string ,I want to change it with float64 when load init ,so I create CSVLoadOptions
var csvOp imports.CSVLoadOptions
csvOp.DictateDataType =make(map[string]interface{})
csvOp.DictateDataType["USERID"]= float64(0)
csvOp.DictateDataType["MOVIEID"]=float64(0)
csvOp.DictateDataType["RATING"]=float64(0)
csvOp.DictateDataType["TIMESTAMP"]=float64(0)

ratingDf, err := imports.LoadFromCSV(ctx, file,csvOp)

but has load error ，I dont know why ，is use the CSVLoadOptions is not correct ?

how to get dataframe all data convert to gonum dense matrix ?

I want to use it ,but I found some problem ,You make the property about SeriesInt64 values private !!!,why ?

would you like tell how to convert dataframe to gonum dense matrix ?

and how to use LoadFromCSV(ctx,strings.NewReader(csvStr)),which ctx ,how to define the context.Context

LoadFromJSON Not Working

files, err := ioutil.ReadFile("device.json")
if err != nil {
fmt.Println(err)
}

var ctx = context.Background()
df2, _ := imports.LoadFromJSON(ctx, strings.NewReader(string(files)))

fmt.Println(df2.Table())

Error to read parquet with latest parquet-go

Create a file with python pandas

dataframe = pandas.DataFrame({
        "A": ["a", "b", "c", "d"],
        "B": [2, 3, 4, 1],
        "C": [10, 20, None, None]
    })

dataframe.to_parquet("1.parquet")

This file looks like:

Read this file

func main() {
    ctx := context.Background()
    fr, _ := local.NewLocalFileReader("1.parquet")
    df, err := imports.LoadFromParquet(ctx, fr)
    if err != nil {
        panic(err)
    }
    fmt.Println(df)
}

Got a unique name error

panic: names of series must be unique: 

goroutine 1 [running]:
github.com/rocketlaunchr/dataframe-go.NewDataFrame({0xc0001f8000, 0x3, 0xc000149a10?})
        .../rocketlaunchr/[email protected]/dataframe.go:41 +0x33c
github.com/rocketlaunchr/dataframe-go/imports.LoadFromParquet({0x1497868, 0xc000020080}, {0x1498150?, 0xc00000e798?}, {0xc0000021a0?, 0xc000149f70?, 0x1007599?})
        .../go/pkg/mod/github.com/rocketlaunchr/[email protected]/imports/parquet.go:110 +0x8ae
main.main()
        .../main.go:13 +0x78

Following the stack, I found some useful informations

All series in method imports.LoadFromParquet with empty names

goFieldNameToActual
each keys in this map with prefix "Scheme", but goName didn't, may be it's the reason why can't not find a name from this map

This's the first time I use golang to read parquet files. It is an error cause by parquet-go breaking changes or something else ?

Getting back Float64/Int64/Mixed series from dataframe

I wanted to know if there is a way to convert a series interface to get the original type of series (Float64/Int64/Mixed) underneath it.
I will describe mu use case.

After creating a dataframe, I am trying to use gonum to do some analysis. For eg. linear regression of two series from dataframe. But for this I have to iterate over the whole series(using ValuesIterator) to get back each element into a []float64, which is required by gonum.
ToSeriesFloat64 does not help since it is not implemented by Series.

Is there an easier way to access the whole underlying series into into corresponding concrete slice?

Reading from Parquet

Hello,

Are there any plans to support reading a Parquet file into a dataframe? I have a need for this and am evaluating this library to use in an application.

Thanks!

Problem getting the package

Problem getting the package:

$ go get -u github.com/rocketlaunchr/dataframe-go
go get: github.com/rocketlaunchr/dataframe-go@none updating to
        github.com/rocketlaunchr/[email protected] requires
        github.com/blend/[email protected]: reading github.com/blend/go-sdk/go.mod at revision v1.1.1: unknown revision v1.1.1

Expand docs to include other common dataframe operations, etc.

Greetings!

Just a minor suggestion, but if you have the time, it could be useful to expand the docs a bit more to cover some additional common operations applied to dataframe-like structures, where supported.

For example:

retrieving a single row
retrieving a single column
selecting row/column subsets by indices or ranges
selecting a single value by <row, column> indices

Further, one other thing I noticed when employing the package for the first time, is that many of the dataframe.xx() function calls include a nil as the first argument.

From looking at the code for dataframe.go, these appear to be relating to an optional Options struct, so it makes sense that this would be set to nil in many instances. It may just be worth mentioning this explicitly in the examples for .Append() in the docs.

Finally two other things that could be useful to consider including in the docs:

Limitations compared with R/pandas
Cheatsheet of commands comparing dataframe-go with R/pandas (more effort, and probably better suited for a separate wiki page, etc., but would be really useful for people coming from these worlds..)

Thanks for taking the time to put together and share this really useful package!

TODO: new features/updates

~~Update copyright + remove coverall~~
~~Add Row function to dataframe (also allow it to decide what of data is returned) + DeepLock opt~~
~~In utils, Search function to allow option to stop after finding N values.~~
~~In imports pkg, for importing from SQL, change from stmt to interface.~~
~~Allow imports.CSVLoadOptions to have default Comma,~~
~~Allow imports.DictateDataType to accept Series type (to allow for custom data)~~
~~Create random Series and Dataframes (with a random interface)~~
~~Change signature of Sort functions to accept context. Perhaps add ctx to more functions.~~
~~Allow custom time formats when importing.~~
Add some more popular Pandas functions. (https://www.dataquest.io/blog/pandas-cheat-sheet/)
~~Add example in readme on how to integrate with gonum pkg.~~
Add ~~Filter~~, GroupBy and Join (and also Append df): https://github.com/robpike/filter
~~Generate fake data for Dataframe~~
~~SeriesTime should be generate intervals~~
Arima forecasting
Add epsilon to SeriesFloat64
Consider implementing OrderedMaps with doubly linked lists: http://www.tugberkugurlu.com/archive/implementing-ordered-map-in-go-2-0-by-using-generics-with-delete-operation-in-o-1-time-complexity and https://github.com/elliotchance/orderedmap
Create a new Series type that uses a linked list container/list package for float64 data instead of an []float64. (PUT IN XSERIES PKG): https://github.com/huandu/skiplist
Remove locking functionality -> leave to users to implement externally
Clean up api with regards to a function to accept values (and always make options optional). V(args ...interface{})
Advocate dot import for V function (pkg name dot)
~~When importing from csv, allow series names to be chosen when the csv file doesn't have a headings row (Headings option)~~ Thanks @pasdam
df.AddSeries should increase number of rows of newly added series automatically to match number of rows in df
Make Apply and Filter operate concurrently
Add alias for maths variables as an option: fn := funcs.RegFunc("sin(2*𝜋*x/24)")funcs.Evaluate(ctx, df, fn, 1)
Create adapter for: https://github.com/go-echarts/go-echarts
Parquet export - configure time type per series
Explore Range embedding/chaining another Range (to define multiple ranges)
~~For Values Iterator, interpret Step=0 as Step=1~~
For utils/faker, use gofakeit v6 and also create constants for the function names.
Use rowerr for json importing

It shouldn't print all the rows and columns as default

Default should print only as much as console.

How to import this package

Hi. I am new to Go. Can someone please guide me how to import this package in local.

OrderedMap bug

For ordered map:

When setting an existing value, should we remove old value and append new?

	o.store[key] = val
	o.keys = append(o.keys, key)

For Delete function, if key is not found, then return immediately.

add support for parquet read/write

https://github.com/xitongsys/parquet-go

Error to read csv encoding utf-8 with bom and export back to parquet

the error is exception recovered: reflect.StructOf: field 0 has invalid name
at ompluscator/[email protected]/builder.go:192
export.csv

Appending a dataframe with another one.

Hello. Is there simple way to join two dataframes of same dimension? Something like df = append(df, another_df) or similar.

	Name1	Name2
0	D	E
1	F	G

and

	Name1	Name2
2	D	E
3	F	G

	Name1	Name2
0	D	E
1	F	G
2	D	E
3	F	G

Can I read CSV file in batch?

I would like to know if I can read a CSV file in batches. If it is possible, how can I do it? I looked through the docs and examples and could not find the required information. Any help would be appreciated.

Thank you.

Export to Parquet example

Can someone give me an example of how to write the code to export a dataframe to a parquet file (couldn't find one anywhere)? I have no idea how to define the writer inside the ExportToParquet.
Knowing that i have a dataframe df, i have this code inside main:

ctx := context.Background()
w, err := os.Create("output.parquet")
exports.ExportToParquet(ctx, writer.NewParquetFromWriter(w, df, 4))

Getting this error:

Thanks again!

Examples on the main Page don't work

Example with ctx arguements don't work. and I wanted to know how to directly get the data from SQL to a dataframe and serve it as an API. I read your blog too. but that too had exactly the same examples.

This is a great package. thanks for working so hard.

Please provide a way to create a Dataframe of type array/slice and export to parquet

The library is very useful and somewhat tries to imitate the pandas library of Python . It would be very beneficial if you could provide support to create a NewSeriesArray() method that would be beneficial to create a proper parquet file.

how to achieve multi index ?

hi there,

Hope someone can help me, how can I achieve multi index similar to https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html

Any help is much appraciated.

Regards,
Julio

Draw graphs from columns of dataframe

Hi! At the moment I have managed to plot a separate dataframe column by this strange method:

func main() {
        // all values of df are strings representing floating point numbers
        df := df, err := imports.LoadFromCSV(ctx, r, imports.CSVLoadOptions{Comma: ';'})
	s := df.Series[2] // trying to plot column 2
	series := dataframe.NewSeriesFloat64("test_name", nil, nil)

	i := s.ValuesIterator(dataframe.ValuesOptions{InitialRow: 0, Step: 1, DontReadLock: false})
	for {
		row, vals, _ := i()
		if row == nil {
			break
		}
		val, err := strconv.ParseFloat(vals.(string), 64)
		if err != nil {
			continue
		}
		series.Append(val)
	}
	Plot(series)
}

func Plot(ser *dataframe.SeriesFloat64) {
	ctx := context.TODO()
	cs, _ := wcharczuk_chart.S(ctx, ser, nil, nil)
	graph := chart.Chart{
		Title:  "test_graph",
		Width:  640,
		Height: 480,
		Series: []chart.Series{cs},
	}
	f, err := os.Create("graph.svg")
	if err != nil {
		panic(err)
	}
	defer f.Close()

	plt := bufio.NewWriter(f)
	_ = graph.Render(chart.SVG, plt)
}

Is there any simplier or more elegant method to do this job? And another question is if I can plot several columns on one plot? And if it is possible, how can I do this? Thanks in advance.

sort having issue with ctx

sks := []dataframe.SortKey{
{Key: "sales", Desc: true},
{Key: "day", Desc: true},
}

df.Sort(ctx, sks)

In this code you have provided in readme, ctx is not defined any where before that. I got it is coming from context but I think we need to initialize this ctx first before calling df.Sort(ctx, sks). Kindly guide me. Thanks in advance.

Add equivalent of `pandas`.`read_html`

To get more feature parity with Pandas, integrate with https://github.com/nfx/go-htmltable.

Error to import csv, raised parquet-go error

Hi, I got an error to import csv, need help!
code from README.md

package main

import (
	"context"
	"fmt"
	"strings"

	"github.com/rocketlaunchr/dataframe-go/imports"
)

var ()

func main() {
	csvStr := `
Country,Date,Age,Amount,Id
"United States",2012-02-01,50,112.1,01234
"United States",2012-02-01,32,321.31,54320
"United Kingdom",2012-02-01,17,18.2,12345
"United States",2012-02-01,32,321.31,54320
"United Kingdom",2012-05-07,NA,18.2,12345
"United States",2012-02-01,32,321.31,54320
"United States",2012-02-01,32,321.31,54320
Spain,2012-02-01,66,555.42,00241
`
	fmt.Println(csvStr)

	ctx := context.Background()

	df, err := imports.LoadFromCSV(ctx, strings.NewReader(csvStr))
	fmt.Println(df)
	fmt.Println(err)
}

There is error to run

$ go run main.go
# github.com/xitongsys/parquet-go/parquet
C:\Users\hellogo\go\pkg\mod\github.com\xitongsys\[email protected]\parquet\parquet.go:631:16: not enough arguments in call to iprot.ReadStructBegin
        have ()
        want (context.Context)
C:\Users\hellogo\go\pkg\mod\github.com\xitongsys\[email protected]\parquet\parquet.go:637:37: not enough arguments in call to iprot.ReadFieldBegin
        have ()
        want (context.Context)
C:\Users\hellogo\go\pkg\mod\github.com\xitongsys\[email protected]\parquet\parquet.go:649:30: not enough arguments in call to iprot.Skip
        have (thrift.TType)
        want (context.Context, thrift.TType)
C:\Users\hellogo\go\pkg\mod\github.com\xitongsys\[email protected]\parquet\parquet.go:659:30: not enough arguments in call to iprot.Skip
        have (thrift.TType)
        want (context.Context, thrift.TType)
C:\Users\hellogo\go\pkg\mod\github.com\xitongsys\[email protected]\parquet\parquet.go:669:30: not enough arguments in call to iprot.Skip
        have (thrift.TType)
        want (context.Context, thrift.TType)
C:\Users\hellogo\go\pkg\mod\github.com\xitongsys\[email protected]\parquet\parquet.go:679:30: not enough arguments in call to iprot.Skip
        have (thrift.TType)
        want (context.Context, thrift.TType)
C:\Users\hellogo\go\pkg\mod\github.com\xitongsys\[email protected]\parquet\parquet.go:689:30: not enough arguments in call to iprot.Skip
        have (thrift.TType)
        want (context.Context, thrift.TType)
C:\Users\hellogo\go\pkg\mod\github.com\xitongsys\[email protected]\parquet\parquet.go:699:30: not enough arguments in call to iprot.Skip
        have (thrift.TType)
        want (context.Context, thrift.TType)
C:\Users\hellogo\go\pkg\mod\github.com\xitongsys\[email protected]\parquet\parquet.go:704:28: not enough arguments in call to iprot.Skip
        have (thrift.TType)
        want (context.Context, thrift.TType)
C:\Users\hellogo\go\pkg\mod\github.com\xitongsys\[email protected]\parquet\parquet.go:708:15: not enough arguments in call to iprot.ReadFieldEnd
        have ()
        want (context.Context)
C:\Users\hellogo\go\pkg\mod\github.com\xitongsys\[email protected]\parquet\parquet.go:708:15: too many errors

Is this a Fork on GOTA?

Is this a fork on GOTA.

Bad import, was an upstream dependency deleted?

go: github.com/sjwhitworth/[email protected] requires
        github.com/rocketlaunchr/[email protected] requires
        github.com/blend/[email protected]: reading github.com/blend/go-sdk/go.mod at revision v1.1.1: unknown revision v1.1.1

It looks like v1.1.1 of github.com/blend/go-sdk is missing. Are you seeing the same or am I taking crazy pills today?

Inconsistent behavior for Apply when using with ApplyDataFrameFn

I'm trying to concatenate two columns in a dataframe and put it into a new column. The behavior is very inconsistent. Sometimes the strings are concatenated into the new column. Sometimes the value is just set to NaN.

In this run, the value for concat_contact_number in the resulting dataframe was correctly set to 97312345678.
The map value for concat_contact_number also reflects the concatenated value.

Expected output:

$ go run main.go 
INFO[0000] In applyConcatDf: vals[contact_number_country_code]: 973 
INFO[0000] In applyConcatDf: vals[concat_contact_number]: 973 
INFO[0000] In applyConcatDf: vals[contact_number]: 12345678 
INFO[0000] In applyConcatDf: vals[concat_contact_number]: 97312345678 
INFO[0000] In applyConcatDf: vals: map[0:973 1:12345678 2:<nil> concat_contact_number:97312345678 contact_number:12345678 contact_number_country_code:973] 
INFO[0000] In prepareDataframe:                         
INFO[0000] +-----+-----------------------------+----------------+-----------------------+
|     | CONTACT NUMBER COUNTRY CODE | CONTACT NUMBER | CONCAT CONTACT NUMBER |
+-----+-----------------------------+----------------+-----------------------+
| 0:  |             973             |    12345678    |      97312345678      |
+-----+-----------------------------+----------------+-----------------------+
| 1X3 |           STRING            |     STRING     |        STRING         |
+-----+-----------------------------+----------------+-----------------------+ 
INFO[0000] In main:                                     
INFO[0000] +-----+-----------------------------+----------------+-----------------------+
|     | CONTACT NUMBER COUNTRY CODE | CONTACT NUMBER | CONCAT CONTACT NUMBER |
+-----+-----------------------------+----------------+-----------------------+
| 0:  |             973             |    12345678    |      97312345678      |
+-----+-----------------------------+----------------+-----------------------+
| 1X3 |           STRING            |     STRING     |        STRING         |
+-----+-----------------------------+----------------+-----------------------+

In this run, the value for concat_contact_number in the resulting dataframe was incorrectly set to NaN.
Same as with the correct run, the map value for concat_contact_number is also set to the expected concatenated value.

Erroneous output:

$ go run main.go 
INFO[0000] In applyConcatDf: vals[contact_number_country_code]: 973 
INFO[0000] In applyConcatDf: vals[concat_contact_number]: 973 
INFO[0000] In applyConcatDf: vals[contact_number]: 12345678 
INFO[0000] In applyConcatDf: vals[concat_contact_number]: 97312345678 
INFO[0000] In applyConcatDf: vals: map[0:973 1:12345678 2:<nil> concat_contact_number:97312345678 contact_number:12345678 contact_number_country_code:973] 
INFO[0000] In prepareDataframe:                         
INFO[0000] +-----+-----------------------------+----------------+-----------------------+
|     | CONTACT NUMBER COUNTRY CODE | CONTACT NUMBER | CONCAT CONTACT NUMBER |
+-----+-----------------------------+----------------+-----------------------+
| 0:  |             973             |    12345678    |          NaN          |
+-----+-----------------------------+----------------+-----------------------+
| 1X3 |           STRING            |     STRING     |        STRING         |
+-----+-----------------------------+----------------+-----------------------+ 
INFO[0000] In main:                                     
INFO[0000] +-----+-----------------------------+----------------+-----------------------+
|     | CONTACT NUMBER COUNTRY CODE | CONTACT NUMBER | CONCAT CONTACT NUMBER |
+-----+-----------------------------+----------------+-----------------------+
| 0:  |             973             |    12345678    |          NaN          |
+-----+-----------------------------+----------------+-----------------------+
| 1X3 |           STRING            |     STRING     |        STRING         |
+-----+-----------------------------+----------------+-----------------------+

It can be observed that in both cases the map value for 2 is always <nil>. Is this expected?

Run this code several times to see deviances in the output. The issue may not show up immediately. Sometimes it takes 10x runs, sometimes only 2x run. Again the behavior is inconsistent.

Working code:

package main

import (
	"context"
	"fmt"
	"strings"

	dataframe "github.com/rocketlaunchr/dataframe-go"
	"github.com/rocketlaunchr/dataframe-go/imports"
	log "github.com/sirupsen/logrus"
)

// applyConcatDf returns an ApplyDataFrameFn that concatenates the given column names into another column
func applyConcatDf(dest_column string, columns []string) dataframe.ApplyDataFrameFn {
	return func(vals map[interface{}]interface{}, row, nRows int) map[interface{}]interface{} {
		vals[dest_column] = ""
		for _, key := range columns {
			log.Infof("vals[%s]: %s", key, vals[key].(string))
			vals[dest_column] = vals[dest_column].(string) + vals[key].(string)
			log.Infof("vals[%s]: %s", dest_column, vals[dest_column].(string))
		}

		log.Infof("vals: %v", vals)
		return vals
	}
}

// applySetupDataframe initializes the dataframe from a CSV string
func setupDataframe() *dataframe.DataFrame {
	ctx := context.Background()

	csvStr := `contact_number_country_code,contact_number
"973","12345678"`

	df, _ := imports.LoadFromCSV(ctx, strings.NewReader(csvStr), imports.CSVLoadOptions{
		DictateDataType: map[string]interface{}{
			"contact_number_country_code": "",
			"contact_number":              "",
		},
	})

	return df
}

// prepareDataframe applies the concatenation on the loaded dataframe
func prepareDataframe(df *dataframe.DataFrame) {
	ctx := context.Background()

	sConcatContactNumber := dataframe.NewSeriesString("concat_contact_number", &dataframe.SeriesInit{Size: df.NRows()})
	df.AddSeries(sConcatContactNumber, nil)

	_, err := dataframe.Apply(ctx, df, applyConcatDf("concat_contact_number", []string{"contact_number_country_code", "contact_number"}), dataframe.FilterOptions{InPlace: true})

	if err != nil {
		log.WithError(err).Error("concatenation cannot be applied")
	}

	fmt.Println(df)
}

func main() {
	df := setupDataframe()
	prepareDataframe(df)
	fmt.Println(df)
}

Timeindex and Resample like pandas

Hi, This is a very helpful package, I was wondering do we have plans implementing Timeindex for series / dataframe and implement resample just like pandas? Could be extremely useful. If no one is doing it I am happy to contribute.

How to set_index with two columns?

I want to use two columns to be a unique index. How can I implement it?

Indirect dependency `github.com/blend/go-sdk v1.1.1` does not exist

I suspect that the library maintainers prepended "legacy-" to versions before changing the versioning scheme. At the least, this dependency should be updated to legacy-v1.1.1.

Support for Go modules?

Go modules makes it easier to work on multiple forked packages at the same time.
I suppose a simple go mod init github.com/rocketlaunchr/dataframe-go would be sufficient to have it.

If you accept the proposal, I can submit a PR later.

panic: runtime error: invalid memory address or nil pointer dereference

content, err := ioutil.ReadFile(parsedDirectory, file.Name()) if err != nil{ fmt.Println(err) return } df, err := imports.LoadFromCSV(ctx, bytes.NewReader(content)) var writers io.Writer jsonErr := exports.ExportToJSON(ctx, writers, df) if jsonErr != nil{ fmt.Println("Json export error") }

The above script throws below error

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x4e9f01]

goroutine 1 [running]:
encoding/json.(*Encoder).Encode(0xc000045e00, 0x5b83c0, 0xc0000601e0, 0x0, 0xc0002f22e8)
/usr/local/go/src/encoding/json/stream.go:231 +0x1b1
github.com/rocketlaunchr/dataframe-go/exports.ExportToJSON(0x62a220, 0xc000016080, 0x0, 0x0, 0xc00001a080, 0x0, 0x0, 0x0, 0x0, 0x0)
/go/pkg/mod/github.com/rocketlaunchr/[email protected]/exports/jsonl.go:83 +0x395

DF Practice

current license creates uncertainty

Licenses are code; tricky to the point of being analogous to cryptographic algorithms. Writing a custom one often creates unintended consequences. For instance, as currently worded, even with the best of intentions, a web developer might deploy the package believing themselves to be in full compliance with the license, then later learn that a site user is non-compliant; this new knowledge causes the developer to become immediately non-compliant.

Other than a return to a standard MIT license or one of the others listed at https://pkg.go.dev/license-policy, I don't have any good suggestions for what to do about this; I don't know of any standard open-source licenses that satisfy the full intent of the current license. That doesn't mean there aren't any -- the JSON license, for instance, is on the pkg.go.dev list, and is an MIT derivative that tries to do the right thing. Here's a related conversation on a stack exchange site that covers some of the issues in more detail: https://softwareengineering.stackexchange.com/questions/199055/open-source-licenses-that-explicitly-prohibit-military-applications

Getting dataframe.ApplySeriesFn undefined error

Thanks for creating this library!

I can get this code to work:

ctx := context.TODO()

// step 1: open the csv
csvfile, err := os.Open("data/example.csv")
if err != nil {
	log.Fatal(err)
}

dataframe, err := imports.LoadFromCSV(ctx, csvfile)

Here's the data that's printed:

fmt.Print(dataframe.Table())

+-----+------------+-----------------+
|     | FIRST NAME | FAVORITE NUMBER |
+-----+------------+-----------------+
| 0:  |  matthew   |       23        |
| 1:  |   daniel   |        8        |
| 2:  |  allison   |       42        |
| 3:  |   david    |       18        |
+-----+------------+-----------------+
| 4X2 |   STRING   |     STRING      |
+-----+------------+-----------------+

I cannot get this code working:

s := dataframe.Series[2]

applyFn := dataframe.ApplySeriesFn(func(val interface{}, row, nRows int) interface{} {
	return 2 * val.(int64)
})

dataframe.Apply(ctx, s, applyFn, dataframe.FilterOptions{InPlace: true})

fmt.Print(dataframe.Table())

Here's the error message:

./dataframe_go.go:36:22: dataframe.ApplySeriesFn undefined (type *dataframe.DataFrame has no field or method ApplySeriesFn)
./dataframe_go.go:40:11: dataframe.Apply undefined (type *dataframe.DataFrame has no field or method Apply)
./dataframe_go.go:40:44: dataframe.FilterOptions undefined (type *dataframe.DataFrame has no field or method FilterOptions)

Here's the code: https://github.com/MrPowers/go-dataframe-examples/blob/master/dataframe_go.go

Sorry if this is a basic question. I am a Go newbie!

Thanks again for making this library!