Why was indexing into a DataFrame removed in latest? Looking at the commit history too

Hello! Which API are you talking about? I still see this: <div class="Box Box--con

Column name indexing removed in .4? about corefxlab HOT 15 CLOSED

JamesAlexander42 commented on July 29, 2024

Column name indexing removed in .4?

from corefxlab.

Comments (15)

rhysparry commented on July 29, 2024 3

I think @Hamaze was specifically asking why the API was changed that way. Maybe you can point to the API review?

I know that I'd rather write:

df["Int3"] = df["Int1"] * 2 + df["Int2"];

As opposed to:

df.Columns["Int3"] = df.Columns["Int1"] * 2 + df.Columns["Int2"];

from corefxlab.

eerhardt commented on July 29, 2024 1

There's enough support here to consider bringing back the column name indexer on DataFrame.

I agree. Personally I like the ease of use of df["Int1"] as well, so I'm glad I'm not alone.

It should be pretty easy to add the API back as a wrapper over the .Columns[string] indexer, and a test or two. Anyone want to make a PR for that?

from corefxlab.

pgovind commented on July 29, 2024

Hello! Which API are you talking about? I still see this:

corefxlab/src/Microsoft.Data.Analysis/DataFrame.cs

Line 72 in f44a099

public object this[long rowIndex, int columnIndex]

from corefxlab.

JamesAlexander42 commented on July 29, 2024

I referred above the change I'm referring to

from corefxlab.

pgovind commented on July 29, 2024

Ah I see. It still exists! We just moved that to the DataFrameColumnCollection class. See this for an example:

corefxlab/tests/Microsoft.Data.Analysis.Tests/DataFrameTests.cs

Line 648 in f44a099

df.Columns["Int3"] = df.Columns["Int1"] * 2 + df.Columns["Int2"];

from corefxlab.

JamesAlexander42 commented on July 29, 2024

Yeah, the former is more pandas-esque and comfortable IMO.

from corefxlab.

MikaelUmaN commented on July 29, 2024

Agreed.

I suspect the reason is for the row filter to work.

But you are more often interested in column selection than row selection so it's better to have the penalty there instead.

df.Rows[2 .. 12]

df["Int3"] = df["Int1"] * 2 + df["Int2"];

from corefxlab.

pgovind commented on July 29, 2024

Just out of curiosity, are you using DataFrame in a notebook? Reason I ask is that we've worked on a cool extension for DataFrame in notebooks that'll let you write df.Int1 * 2 + df.Int2. To be specific, with the new extension you can now refer to a column as a field of a DataFrame object. With intellisense enabled in notebooks, this will be very discoverable.

from corefxlab.

JamesAlexander42 commented on July 29, 2024

I'm not using it in a notebook context for this exercise. Using it in an asp.net app.

from corefxlab.

MikaelUmaN commented on July 29, 2024

Haven't tested the new notebook support yet but will do.

I would say that even though notebooks are very useful, I much prefer the experience to be the same when doing normal software and when doing notebooks.

Usually I prototype in notebooks and then structure and copy stuff to some kind of software that is more production-like. So I would avoid using any extensions in notebook except for ones that are interactive such as plotting.

from corefxlab.

zyzhu commented on July 29, 2024

I concur with @MikaelUmaN.

For instance, I would expect code that runs in F# kernel notebook can be run in FSI under Visual Studio directly. I would also expect it to be compiled to be part of a bigger production system mixed with C# and F#. That's how I explore my problems in Ifsharp notebook and put them in production all the time.

However, if syntax involving dataframe relies on an extra notebook extension that only works in notebook, the beauty of production-ready scripts is no longer feasible.

cc @cartermp @dsyme to chime in.

from corefxlab.

pgovind commented on July 29, 2024

Just tagging @eerhardt for visibility here. This is great feedback! We're busy helping out with .NET 5 stuff this week, but I'll revisit this next week. There's enough support here to consider bringing back the column name indexer on DataFrame.

from corefxlab.

dsyme commented on July 29, 2024

Just out of curiosity, are you using DataFrame in a notebook? Reason I ask is that we've worked on a cool extension for DataFrame in notebooks that'll let you write df.Int1 * 2 + df.Int2. To be specific, with the new extension you can now refer to a column as a field of a DataFrame object. With intellisense enabled in notebooks, this will be very discoverable.

However, if syntax involving dataframe relies on an extra notebook extension that only works in notebook, the beauty of production-ready scripts is no longer feasible.

Yes, we need to be very careful about promoting non-standard extensions to the programming model for C# or F# which are only deployed only through select channels. Notebook programming should ideally not be using variations of these programming languages, though these things are subtle

This is a tricky area because there is a notable tendency to use the incremantal-dynamicity of notebook programming

@pgovind What APIs are you using to craft this language variation? Please discuss this with @MadsTorgersen, @jaredpar and myself. We can't have random variations on C# and F# floating around that fragment the overall programming experience.

from corefxlab.

pgovind commented on July 29, 2024

So, just to be clear, the extension I'm talking about here is only a prototype to explore the dotnet-interactive extensions APIs. There's no immediate plans to productize it right now, and we definitely don't want to create fragmentation. It lives here: https://github.com/dotnet/interactive/blob/main/src/Microsoft.DotNet.Interactive.ExtensionLab/DataFrameTypeGeneratorExtension.cs

What APIs are you using to craft this language variation?
It's not really a variation. It's a prototype right now (and not part of the type itself). Basically, given a DataFrame object, it looks at the types of the columns and spits out code to create a new SomeNameDataFrame type with the column names as properties. This code is then compiled on demand and the dotnet-interactive shell then exposes this type for use in the notebook.

from corefxlab.

dsyme commented on July 29, 2024

@pgovind The problem is that this sort of "generating API from dynamic data" is a completely new thing in the .NET universe (the closest thing is F# type providers, and then source generators, though those are normally part of the static toolchain).

It doesn't really fit any existing part of the existing C#/F#/.NET programming model and can never really be incorporated into project-based programming, for example. It can only be done in notebook-like environments that assume a complete compiler toolchain at each stage of execution, even in production scripts.

It's a powerful thing to be sure but we have to be aware of the direction this is going. I understand why you're thinking of doing this but yes, fragmentation of the programming experience is an intrinsic part of this direction, as tempting as it is.

An approach that does fit within existing norms is to drive the code generation off some kind of static schema (declared or acquired).

from corefxlab.

Column name indexing removed in .4? about corefxlab HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent