This issue covers some of the feedback I have as a new (and currently limited) user of Polars. There's a lot to unpack for these proposals, so I'll start with some context for a foundation.
I'm a long time Pandas user who's worked on projects ranging from discovery work to implementing packaged compute pipelines in Python. I love what Pandas can do, but not how it's designed and often used. This is a major factor behind why I've grown interested in Polars and hope to get more involved to help polish and promote it as the viable DataFrame library it is.
Motivation
Before listing my feedback I want to establish some core assumptions behind my perspective.
First I believe that (1) as an onboarding user the onus is on you to learn the library. (2) Sufficient documentation isn't absolutely necessary to learn Polars, but it will scale Polars' ability to onboard new users (in turn hopefully surrounding the project with more support), and thus should be prioritized within the context of documentation. (3) The API design is a primary feature of a DataFrame library and is a great selling point for a library like Polars. (4) Some people are thorough and will read through the documentation entirely, while others will look for quicker ways to get hands on. IME some people are just hands on and once you get them started it's easier to guide them through more details. It's really difficult to incentivize these users to read through the docs in their entirety without being more creative with the presentation/flow of the docs. FWIW I started out by reading through the entire user guide.
Feedback
- Introduction page is perfect. I wouldn't change it.
- Getting started feels short. This is where I think the gap from motivation#4 can be filled. Getting Started content itself can and probably should be compact and to the point, but we can bridge into details and concepts more effectively from here
- For example Expressions could be a new concept to both Pandas and non-Pandas users. I'd guess that's a gigantic chunk of potential new users (if it's not the majority). Leading into expressions immediately may start some users off with more information overhead to digest than they should realize. This can be improved by providing 5-10 minutes of opinionated walk-through style content that covers an unspecific workflow using familiar DataFrame library behaviors (Create a dataframe, modify a dataframe, view data from a df, common operations on df, important and relevant data types for dfs, etc.)
- User Guide needs more directional hand-holding to educate new users. We can use links to redirect users to needed context/details for different topics that are covered. This leads into 2 major directions for documentation: (1) Topic introduction and (2) example completeness. Having actually been introduced to relatable DataFrame library behaviors, we can help users connect the dots between things like typical DataFrame usage behaviors and concepts like expressions, contexts, etc.
- I've been thinking a lot about example robustness and I'm struggling to see it as a net positive for the User Guide specifically. IMO the User Guide should focus on introducing users to Polars enough to get them moving and familiar enough to prevent anti-patterns or minimize user turnover. Otherwise you can end up with a pretty busy Cookbook. Cookbooks shouldn't be too busy and should have clear objectives and recipes for visiting users (another IMO). My feedback is that I'd expect example robustness to be an initiative for the Reference Guide. That doesn't mean it can't be augmented or supported in some form by the User Guide.
- At times I'd expect to find examples in the Reference Guide when I don't. This can be solved with the example completeness initiative.
- Maybe refining the Cookbook within the User Guide could help organize some of these changes. In other words making User Guide vs Cookbook more clear.
Proposals
I'm putting this issue together to gauge interest. Please feel free to tear this apart.
Improve Getting Started page by adding a guided walk-through
The walkthrough could be composed of the following:
- Standards - Includes import, API philosophy, references.
- Create data - Create a DataFrame, create a column/Series.
- Modify data - Modify columns and values in a DataFrame.
- View data - Selection of data from DataFrames with and without filters. Could include config references here.
- General Ops - Merge, group, morph, etc.
- Datetime behaviors and usage - Units, methods, etc.
- Additional data types - Categoricals, struct, etc.
- I/O - .csv, .parquet, SQL, why not Excel.
This gives the Cookbook a chance to provide surface-level explanations for certain decisions Polars makes through a relatable medium that includes pointers, recommendations, or just links to more relevant content. To me this is closer to actually guiding users through their onboarding rather than just giving them a topic-by-topic guide.
Filling in API Reference gaps
Edit: I'm getting this out to get this out. Reading this over I want to start with clearly defining directions for User Guide content, Cookbook content, and Reference content. I think that's the common denominator here.