Giter VIP home page Giter VIP logo

Comments (4)

mwaskom avatar mwaskom commented on September 12, 2024 2

Thanks for elaborating.

and would immediately open up seaborn for use from polars

To be clear, this is already the case:

import polars as pl
import seaborn as sns
df = pl.DataFrame({"cat": ["x", "y", "z"], "val": [1, 2, 3]})
sns.barplot(df, x="cat", y="val")

image

using a more generic library like narwhals for any internal dataframe operations

This is a complete non-starter.

Assuming seaborn still requires NumPy types for interactivity with matplotlib

I can't see that changing any time soon, but I don't know what specifically is on matplotlib's roadmap.

from seaborn.

mwaskom avatar mwaskom commented on September 12, 2024

Thanks for flagging. Just skimmed your link but it looks like it's operating at a very different level from the dataframe interchange protocol? The relevance to seaborn (i.e., is there a simple way to be more agnostic about input data structure types) isn't super obvious.

from seaborn.

WillAyd avatar WillAyd commented on September 12, 2024

Apologies as I should have been more clear - the technical documentation I provided was just a reference, not something I'd expect seaborn to have to implement from scratch. The dataframe libraries that you would interact with should do most of the heavy lifting for that.

@MarcoGorelli probably knows best here, but from a cursory glance of the seaborn source code, I think you could adopt the Arrow PyCapsule interface in a piece-wise fashion:

  1. Maintain the dependency on pandas, but just swap out checks for the __dataframe__ dunder with checks for __arrow_c_schema__
  2. Drop the dependency on pandas, using a more generic library like narwhals for any internal dataframe operations

Step 1 I think would be pretty easy, and would immediately open up seaborn for use from polars, excluding any data types that polars has which pandas does not (most likely Decimal / aggregate types)

Step 2 would take a little more time. I'm not sure if narwhals is even fully capable of abstracting all of the dataframe operations that seaborn needs today, but in theory this would make your dependencies more lightweight by dropping pandas

Overall, rather than seaborn having to customize solutions towards the various dataframe type systems, the ecosystem would just converge on just the Arrow type system. Assuming seaborn still requires NumPy types for interactivity with matplotlib, there will still be a gap where Arrow types don't have a plottable equivalent, but I think that's better than the status quo where seaborn is tied to pandas type-system, given Arrow is better documented and more stable

from seaborn.

MarcoGorelli avatar MarcoGorelli commented on September 12, 2024

Thanks for the ping, and thanks both for comments! 🙏

It's true that Seaborn accepts Polars objects, but they fail if the object contains data types not recognised by the interchange protocol (#3533). (I think we all find this frustrating, and feel at least slightly let down by the interchange protocol, but that's a different story..)

Seaborn currently uses

pd.api.interchange.from_dataframe(data)

and that's what fails for when the interchange protocol falls short. But if in pandas we first tried using the (superior, better maintained, less fallible) PyCapsule interface, then Seaborn's current code could "just work"

using a more generic library like narwhals for any internal dataframe operations

This is a complete non-starter.

😆 fair enough


So, in summary, there might be anything actionable on Seaborn's side here (though I hope the fallback in #3534 makes it into the next release). Still, good to catch up and hear your opinion on the topic 🙌

from seaborn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.