Comments (2)
That is pyarrow
default behaviour for pandas conversion (which we use for this). If you want a faster and more accurate conversion (including a much-improved string type) you need to opt-in to pandas' newer arrow-backed dtypes, like so:
pd_df = df.to_pandas(use_pyarrow_extension_array=True)
pd_df.dtypes
# ID int64[pyarrow]
# Name large_string[pyarrow]
# Age int64[pyarrow]
Once the pandas ecosystem has more fully adopted the arrow dtypes (which will take a bit of time) this will likely become the default conversion path. For now it requires opt-in.
from polars.
This issue should be closed because it's not a bug - pandas doesn't have a dedicated string dtype when using NumPy arrays as the backing, even if you literally pass it a NumPy string array:
>>> pd.DataFrame({'a': np.array(['a', 'b'], dtype='U1')}).dtypes
a object
dtype: object
from polars.
Related Issues (20)
- High memory usage after `collect()` despite using `limit(1)`
- Conda package outdated HOT 2
- DateFrame.describe() reports datetime as str HOT 2
- pl.list,len() - pl.list,len() always returning u32 no matter the results HOT 8
- [FEA]: Allow specifying null location in `set_sorted`
- Expose the individual parameters from fastexcel.load_sheet in pl.read_excel HOT 1
- Provide native and fast Series slice assignment (currently slower than Pandas) HOT 6
- Supporting multidimensional array style operations, by specifying metadata columns
- struct field access returns incorrect values HOT 3
- pl.struct with no arguments triggers a panic
- Multiple expr.head(n).max()/min()/etc operations in with_columns causing ShapeError
- Add `repeat` and `tile` for Series/Expr
- `SchemaFieldNotFoundError` when chaining `select` and `collect`
- Series is ignoring the dtype argument, series.to_numpy() dtype depends on values passed
- Problem filtering categorical string columns with lazy frame and scan_parquet HOT 4
- PanicException: validity must be equal to the array's length HOT 6
- Polars-lts-cpu fails to import on older CPU (no SSSE3/SSE4 support)
- Release GIL on `collect_schema`
- Error when using struct expression with `with_fields` in an `over` context HOT 6
- How to write a UDF for polars that run concurrently?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from polars.