Comments (4)
pl_flavor
doesn't refer to the difference between a large_string
and a string
. It refers to the difference between a large_string
and a utf8_view
which doesn't seem to be implemented in pyarrow yet.
It seems @ritchie46 intended to close this as not planned so I'll do that now. Sorry if I'm mistaken on that point.
from polars.
I'm still new to Polars. What are some use cases of LargeString?
Our in-memory engine favors large chunks (often single chunked dataframes). It is pretty easy to reach the 2GB string limit on user data that way.
Is it feasible to expose this boolean flag in py-polars as well?
This is to convert to string_view and is only temporary until arrow consumers implement binview
.
from polars.
We will not do that. Arrow default string can only hold 2GB of data per column, leading to all kinds of slicing requirements. We deem the default string utterly unusable for our use cases. You can always cast from LargeString
to String
and implement your own slicing if required.
from polars.
Thanks for the quick reply!
We deem the default string utterly unusable for our use cases
I'm still new to Polars. What are some use cases of LargeString
?
You can always cast from LargeString to String and implement your own slicing if required.
We will probably do this for pyiceberg. apache/iceberg-python#520
It looks like in Rust, there's a pl_flavor boolean flag that can be set to use the regular Arrow string instead (1, 2) but this is not available in Python.
Is it feasible to expose this boolean flag in py-polars as well?
from polars.
Related Issues (20)
- Ensure column names are unique for horizontal concat in the IR
- Expression input not working correctly for either argument of `Expr.str.replace_many` HOT 10
- PanicException on reading Parquet file from S3 HOT 4
- add an `ignore_nulls` option to `json_encode` for the `pl.Struct` column
- Change all columns one time HOT 5
- map_elements replaces all array elements with nulls HOT 2
- Let arr.reshape use Expr as inputs instead of just python tuple
- Import and Export Schema Objects to JSON
- error[E0599]: no variant or associated item named `Struct` found for enum `polars_core::datatypes::DataType` in the current scope HOT 2
- `with_columns(dict)` fail silently if key exists as column HOT 5
- Polars write database - Rest API call failing with AWS Lambda trigger HOT 1
- Change `read_csv` and `read_ipc` to use object store instead of fsspec
- `.over()` fails with `.top_k_by` HOT 3
- `join_nulls` in "asof" join HOT 3
- exception on numpy slicing literal column with object column
- Scanning cloud paths with percentages '%' fails
- Make pl.Enum(...) return type rather than instance HOT 2
- Elementwise check on `join` expressions is too restrictive HOT 3
- Built-in datasets and a function to load them HOT 1
- Python test workflows may fail due to failure to download `torch` dependency HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from polars.