Comments (4)
I tried to create a repro like:
import pandas as pd
import os
import polars as pl
os.environ["POLARS_FORCE_OOC"] = "1"
pl.Config.set_verbose(True)
pl.enable_string_cache()
df = pd.DataFrame({"a": ["hi", "bye", "hello"], "b": [3, 1 ,2 ], "c": ["test3", "test1", "test2"], "d": ["d1", "d2", "d3"], "e": [2,1,1]}) # Still not sure if it's got something to do with pyarrow produced files or not
df["a"] = df["a"].astype("category")
df["c"] = df["c"].astype("category")
# df["d"] = df["d"].astype("category")
df.to_parquet("test.parquet")
ldf = pl.scan_parquet("test.parquet")
ldf = ldf.sort("e", "b")
ldf = ldf.with_columns(d = pl.concat_list(pl.col("c"), pl.col("d")))
ldf = ldf.explode(pl.col("d"))
ldf = ldf.sort("d", "e")
df = ldf.collect(streaming=True)
print(df)
But this seems to work fine.
from polars.
I also tried the following thinking that perhaps it's valid to search for bin offset values within a string series like so:
DataType::String => {
let ca = s.str().unwrap();
let ca = ca.as_binary();
let idx = match search_values.dtype() {
DataType::BinaryOffset => {
let search_values = search_values.binary_offset().unwrap();
search_sorted_bin_array_with_binary_offset(&ca, search_values, side, descending)
},
DataType::String => {
let search_values = search_values.str()?;
let search_values = search_values.as_binary();
search_sorted_bin_array(&ca, &search_values, side, descending)
},
_ => unreachable!(),
};
Ok(IdxCa::new_vec(s.name(), idx))
},
But that panics with a different error, the attempted code is probably just conceptually wrong, I don't know how the string view type is implemented.
from polars.
@ritchie46 do you know if it's valid to search a String
series with binary offset? or is the issue related to the losing track of the schema through the execution?
from polars.
This seems to have been fixed. Perhaps by 2b28777?
from polars.
Related Issues (20)
- Parquet nested slice pushdown gives incorrect results
- pl.LazyDataFrame.slice has a buggy behaviour with non scalar columns. HOT 1
- `read_ndjson()` and `read_parquet()` behave differently when the input is a list of files with different schemas HOT 1
- pl.from_numpy produces column with null dtype when input array is empty HOT 3
- equals lacks functionality that polars.testing.assert_frame_equal has HOT 6
- Polars drops pyarrow field-level metadata HOT 4
- Turn off CSE for new streaming engine
- Reading wide parquet is 25x slower with polars than pyarrow HOT 4
- In read_csv convert too long separator, quote_char, and/or eol_char to valid char HOT 2
- Optimize for simple math? HOT 3
- read_csv on gzipped csv much slower if n_rows specified
- CSV
- Some pl.Expr aggregations missing in the Aggregation section HOT 1
- Incorrect values calculated depending on the sequence of operations HOT 4
- from_jax
- Unexpected behaviour when calling list() on a slice of a series of dtype Object
- Make `new_streaming` feature available on Rust Polars HOT 1
- Parameters in `clip()` parse strings as column names, which is undocumented
- Add pl.Expr.min_by/max_by HOT 5
- `pl.Config.set_tbl_rows` doesn't work as expected HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from polars.