Giter VIP home page Giter VIP logo

Comments (6)

kniren avatar kniren commented on August 22, 2024

I think this is a good idea, however some consideration is needed. If we limit the argument to a map[string]string it is a better solution than now for an usability perspective but what happens if we want to force parsing all columns to a given type? What if we want to specify what does it mean by NA or other options that the Read* family of methods could need?

I'm all for simplicity and I found your solution to be better than the current implementation, but can we do better before we settle? Even since I read https://dave.cheney.net/2014/10/17/functional-options-for-friendly-apis I thought that this could be the way to go for such issues, but it is paramount that we are very mindful of the API that will go to the stable release.

I'm interested in knowing your opinion on this!

from gota.

tobgu avatar tobgu commented on August 22, 2024

Thanks for the link, I had not read it before. Feels like the technique has the potential to fill the void of not having default/named arguments for some use cases.

I don't think it would be a good idea to make the configuration functions operate on the dataframe or series themselves in this case since that would clutter those data structures with data that may only be of use initially, during construction.

Perhaps it would be good to have the functions operate on a configuration struct instead? So, for this particular case something along the lines of:

type Config struct {
     columnTypes map[string]string
}

func ColumnTypes(columnTypes map[string]string) func (*Config) error {
    return func ColumnTypes(c *Config) error {
         c.columnTypes = columnTypes
         return nil
    }
}

func ReadRecords(records [][]string, configs ...func (*Config) error) DataFrame {
    // New empty Config
    // Loop over configs applying each in turn, retur error if one of them fails
    ...

Config could be extended to hold more data and additional functions could be added to operate on it in the future if needed.

BTW: How come you return the error from the constructors as a field inside the DataFrame instead of an own value as usually seen in Go programs (eg. func foo() (value, error))?

from gota.

kniren avatar kniren commented on August 22, 2024

This feels like the best of both worlds solution. Good thinking!

The rationale behind returning the errors as a field inside DataFrame and
Series objects is that this enables pipes of operations as continuous
method chains df.Filter().Filter(). Select() so if an error happens at
any stage of the pipeline, the rest of the operations become a NoOp and the
error cascades until the end of the pipe. I feel this makes for a much
better API.

On Nov 15, 2016 8:21 AM, "Tobias Gustafsson" [email protected]
wrote:

Thanks for the link, I had not read it before. Feels like the technique
has the potential to fill the void of not having default/named arguments
for some use cases.

I don't think it would be a good idea to make the configuration functions
operate on the dataframe or series themselves in this case since that would
clutter those data structures with data that may only be of use initially,
during construction.

Perhaps it would be good to have the functions operate on a configuration
struct instead? So, for this particular case something along the lines of:

type Config struct {
columnTypes map[string]string
}

func ColumnTypes(columnTypes map[string]string) func (*Config) error {
return func ColumnTypes(c *Config) error {
c.columnTypes = columnTypes
return nil
}
}

func ReadRecords(records [][]string, configs ...func (*Config) error) DataFrame {
// New empty Config
// Loop over configs applying each in turn, retur error if one of them fails
...

Config could be extended to hold more data and additional functions could
be added to operate on it in the future if needed.

BTW: How come you return the error from the constructors as a field inside
the DataFrame instead of an own value as usually seen in Go programs (eg. func
foo() (value, error))?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#18 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABm4OoUBM7pvj_6WuqNBUsC32oZkUZAXks5q-V2NgaJpZM4Kwv_h
.

from gota.

tobgu avatar tobgu commented on August 22, 2024

Regarding the return value: I see, makes sense!

from gota.

kniren avatar kniren commented on August 22, 2024

This has been implemented on the dev branch as of commit c6251c5

The LoadOptions type contains the following fields:

type LoadOptions struct {
    detectTypes bool
    hasHeader   bool
    types       map[string]Type
    defaultType Type
}

Subsequently, the functions to configure these options are:

  • CfgDetectTypes
  • CfgHasHeader
  • CfgColumnTypes
  • CfgDefaultType

As an example:

_ = LoadRecords(
    [][]string{
        {"A", "B", "C", "D"},
        {"a", "1", "true", "0"},
        {"b", "2", "true", "0.5"},
    },
    CfgHasHeader(true),
    CfgDetectTypes(true),
    CfgColumnTypes(map[string]Type{
        "A": String,
        "B": Int,
    }),
    CfgDefaultType(String),
)

But we can also call it with the default options:

_ = LoadRecords([][]string{...})

from gota.

tobgu avatar tobgu commented on August 22, 2024

Cool, hope to find some time to try it out soon.

from gota.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.