Comments (4)
I'm running into a similar problem here #16693 to do with list len which seems to be introduced by that exact same commit.
from polars.
Can you get a repro? It is hard for us to do anything without.
P.S. what is the callstack if you set RUST_BACKTRACE=1
?
from polars.
I've gotten to a point where I could localise when the error happens:
df = ...
df_le = df.filter(pl.col("foo") <= pl.col("bar").sum().over("x", "y"))
df_gt = df.filter(pl.col("foo") > pl.col("bar").sum().over("x", "y"))
df_le.write_parquet("df_le.parquet") # <- runs fine
df_gt.write_parquet("df_gt.parquet") # <- line crashes
But when I try to save the dataframe that occurs before this operation, the line that previously crashed runs fine...
df = ...
df.write_parquet("df.parquet") # <- runs fine
df_le = df.filter(pl.col("foo") <= pl.col("bar").sum().over("x", "y"))
df_gt = df.filter(pl.col("foo") > pl.col("bar").sum().over("x", "y"))
df_le.write_parquet("df_le.parquet") # <- runs fine
df_gt.write_parquet("df_gt.parquet") # <- runs fine
But when I make a copy of df
and save that before doing the operation, the line crashes again:
df = ...
df_clone = df.clone()
df_clone.write_parquet("df_clone.parquet") # <- runs fine
df_le = df.filter(pl.col("foo") <= pl.col("bar").sum().over("x", "y"))
df_gt = df.filter(pl.col("foo") > pl.col("bar").sum().over("x", "y"))
df_le.write_parquet("df_le.parquet") # <- runs fine
df_gt.write_parquet("df_gt.parquet") # <- line crashes
Then finally when trying to replicate the error using df_clone.parquet
, no errors happen. Seems like I've stumbled upon the Polars variant of the observers effect 😄
I have been able to solve the error when running our tests locally by:
df = ...
df = df.with_columns(pl.col("bar").sum().over("x", "y").alias("bar"))
total_le_weight = df.filter(pl.col("foo") <= pl.col("bar")).drop("bar")
total_gt_weight = df.filter(pl.col("foo") > pl.col("bar")).drop("bar")
But when our tests run elsewhere they still fail.
PS: full backtrace looks like:
Click to toggle contents
thread '<unnamed>' panicked at /Users/runner/work/polars/polars/crates/polars-core/src/frame/chunks.rs:42:57:
index out of bounds: the len is 1 but the index is 1
stack backtrace:
0: 0x151052b94 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h0d2dc83dc1a2d180
1: 0x14f0c2380 - core::fmt::write::hafaf83c1f13acb2d
2: 0x15102c0dc - std::io::Write::write_fmt::h99a5e7783791b568
3: 0x151056388 - std::sys_common::backtrace::print::hdef7d6c5b6479962
4: 0x151055ca0 - std::panicking::default_hook::{{closure}}::hd93f15c5fd187f2b
5: 0x15105763c - std::panicking::rust_panic_with_hook::h5ec40fc780130b7e
6: 0x1510566a0 - std::panicking::begin_panic_handler::{{closure}}::hf0062a47097d69cb
7: 0x151056608 - std::sys_common::backtrace::__rust_end_short_backtrace::hac3ff190b7c839f5
8: 0x1510565fc - _rust_begin_unwind
9: 0x1511afee0 - core::panicking::panic_fmt::h7ad8ed088f78f191
10: 0x1511aff3c - core::panicking::panic_bounds_check::hc4c59508c9c9cfdb
11: 0x14ff12780 - <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter::hf6a01e76e499b926
12: 0x14ff11c9c - polars_lazy::physical_plan::executors::stack::StackExec::execute_impl::hd91b93ed511967eb
13: 0x14ff11128 - <polars_lazy::physical_plan::executors::stack::StackExec as polars_lazy::physical_plan::executors::executor::Executor>::execute::h0b8aee35b0155ddf
14: 0x14ff03830 - polars_lazy::frame::LazyFrame::collect::hf6a4525f2adaf417
15: 0x14efae104 - polars::lazyframe::PyLazyFrame::__pymethod_collect__::haf2b4fa979e8e1dc
16: 0x14e9b6650 - pyo3::impl_::trampoline::trampoline::h573cd7b2aab8b9f4
17: 0x14efbfe08 - polars::lazyframe::_::__INVENTORY::trampoline::h9bef7636d256b696
18: 0x100c8b2c0 - _method_vectorcall_VARARGS_KEYWORDS
19: 0x100dafe80 - _call_function
20: 0x100da6e84 - __PyEval_EvalFrameDefault
21: 0x100da098c - __PyEval_Vector
22: 0x100c805dc - _method_vectorcall
23: 0x100dafe80 - _call_function
24: 0x100da64b0 - __PyEval_EvalFrameDefault
25: 0x100da098c - __PyEval_Vector
26: 0x100dafe80 - _call_function
27: 0x100da6e84 - __PyEval_EvalFrameDefault
28: 0x100da098c - __PyEval_Vector
29: 0x100dafe80 - _call_function
30: 0x100da6e84 - __PyEval_EvalFrameDefault
31: 0x100da098c - __PyEval_Vector
32: 0x100dafe80 - _call_function
33: 0x100da6e84 - __PyEval_EvalFrameDefault
34: 0x100da098c - __PyEval_Vector
35: 0x100dafe80 - _call_function
36: 0x100da64b0 - __PyEval_EvalFrameDefault
37: 0x100da098c - __PyEval_Vector
38: 0x100c805dc - _method_vectorcall
39: 0x100dafe80 - _call_function
40: 0x100da64b0 - __PyEval_EvalFrameDefault
41: 0x100da098c - __PyEval_Vector
42: 0x100dafe80 - _call_function
43: 0x100da6e84 - __PyEval_EvalFrameDefault
44: 0x100da098c - __PyEval_Vector
45: 0x100dafe80 - _call_function
46: 0x100da6e84 - __PyEval_EvalFrameDefault
47: 0x100da098c - __PyEval_Vector
48: 0x100dafe80 - _call_function
49: 0x100da6e84 - __PyEval_EvalFrameDefault
50: 0x100da098c - __PyEval_Vector
51: 0x100dafe80 - _call_function
52: 0x100da64b0 - __PyEval_EvalFrameDefault
53: 0x100da098c - __PyEval_Vector
54: 0x100c7d12c - _PyVectorcall_Call
55: 0x100da2bf0 - __PyEval_EvalFrameDefault
56: 0x100da098c - __PyEval_Vector
57: 0x100da2bf0 - __PyEval_EvalFrameDefault
58: 0x100da098c - __PyEval_Vector
59: 0x100dafe80 - _call_function
60: 0x100da7d80 - __PyEval_EvalFrameDefault
61: 0x100da098c - __PyEval_Vector
62: 0x100c805dc - _method_vectorcall
63: 0x100dafe80 - _call_function
64: 0x100da7d80 - __PyEval_EvalFrameDefault
65: 0x100da098c - __PyEval_Vector
66: 0x100c7ca18 - __PyObject_FastCallDictTstate
67: 0x100d0a6bc - _slot_tp_call
68: 0x100c7c6c8 - __PyObject_MakeTpCall
69: 0x100daff78 - _call_function
70: 0x100da64b0 - __PyEval_EvalFrameDefault
71: 0x100da098c - __PyEval_Vector
72: 0x100dafe80 - _call_function
73: 0x100da6e84 - __PyEval_EvalFrameDefault
74: 0x100da098c - __PyEval_Vector
75: 0x100da2bf0 - __PyEval_EvalFrameDefault
76: 0x100da098c - __PyEval_Vector
77: 0x100dafe80 - _call_function
78: 0x100da7d80 - __PyEval_EvalFrameDefault
79: 0x100da098c - __PyEval_Vector
80: 0x100c805dc - _method_vectorcall
81: 0x100dafe80 - _call_function
82: 0x100da7d80 - __PyEval_EvalFrameDefault
83: 0x100da098c - __PyEval_Vector
84: 0x100c7ca18 - __PyObject_FastCallDictTstate
85: 0x100d0a6bc - _slot_tp_call
86: 0x100c7d28c - __PyObject_Call
87: 0x100da2bf0 - __PyEval_EvalFrameDefault
88: 0x100da098c - __PyEval_Vector
89: 0x100dafe80 - _call_function
90: 0x100da6430 - __PyEval_EvalFrameDefault
91: 0x100da098c - __PyEval_Vector
92: 0x100c805dc - _method_vectorcall
93: 0x100dafe80 - _call_function
94: 0x100da64b0 - __PyEval_EvalFrameDefault
95: 0x100da098c - __PyEval_Vector
96: 0x100da2bf0 - __PyEval_EvalFrameDefault
97: 0x100da098c - __PyEval_Vector
98: 0x100dafe80 - _call_function
99: 0x100da6430 - __PyEval_EvalFrameDefault
100: 0x100da098c - __PyEval_Vector
101: 0x100dafe80 - _call_function
102: 0x100da64b0 - __PyEval_EvalFrameDefault
103: 0x100da098c - __PyEval_Vector
104: 0x100da2bf0 - __PyEval_EvalFrameDefault
105: 0x100da098c - __PyEval_Vector
106: 0x100dafe80 - _call_function
107: 0x100da7d80 - __PyEval_EvalFrameDefault
108: 0x100da098c - __PyEval_Vector
109: 0x100c805dc - _method_vectorcall
110: 0x100dafe80 - _call_function
111: 0x100da7d80 - __PyEval_EvalFrameDefault
112: 0x100da098c - __PyEval_Vector
113: 0x100c7ca18 - __PyObject_FastCallDictTstate
114: 0x100d0a6bc - _slot_tp_call
115: 0x100c7c6c8 - __PyObject_MakeTpCall
116: 0x100daff78 - _call_function
117: 0x100da64b0 - __PyEval_EvalFrameDefault
118: 0x100da098c - __PyEval_Vector
119: 0x100da2bf0 - __PyEval_EvalFrameDefault
120: 0x100da098c - __PyEval_Vector
121: 0x100dafe80 - _call_function
122: 0x100da7d80 - __PyEval_EvalFrameDefault
123: 0x100da098c - __PyEval_Vector
124: 0x100c805dc - _method_vectorcall
125: 0x100dafe80 - _call_function
126: 0x100da7d80 - __PyEval_EvalFrameDefault
127: 0x100da098c - __PyEval_Vector
128: 0x100c7ca18 - __PyObject_FastCallDictTstate
129: 0x100d0a6bc - _slot_tp_call
130: 0x100c7c6c8 - __PyObject_MakeTpCall
131: 0x100daff78 - _call_function
132: 0x100da64b0 - __PyEval_EvalFrameDefault
133: 0x100da098c - __PyEval_Vector
134: 0x100dafe80 - _call_function
135: 0x100da6430 - __PyEval_EvalFrameDefault
136: 0x100da098c - __PyEval_Vector
137: 0x100dafe80 - _call_function
138: 0x100da6430 - __PyEval_EvalFrameDefault
139: 0x100da098c - __PyEval_Vector
140: 0x100da2bf0 - __PyEval_EvalFrameDefault
141: 0x100da098c - __PyEval_Vector
142: 0x100dafe80 - _call_function
143: 0x100da7d80 - __PyEval_EvalFrameDefault
144: 0x100da098c - __PyEval_Vector
145: 0x100c805dc - _method_vectorcall
146: 0x100dafe80 - _call_function
147: 0x100da7d80 - __PyEval_EvalFrameDefault
148: 0x100da098c - __PyEval_Vector
149: 0x100c7ca18 - __PyObject_FastCallDictTstate
150: 0x100d0a6bc - _slot_tp_call
151: 0x100c7c6c8 - __PyObject_MakeTpCall
152: 0x100daff78 - _call_function
153: 0x100da64b0 - __PyEval_EvalFrameDefault
154: 0x100da098c - __PyEval_Vector
155: 0x100dafe80 - _call_function
156: 0x100da6430 - __PyEval_EvalFrameDefault
157: 0x100da098c - __PyEval_Vector
158: 0x100dafe80 - _call_function
159: 0x100da6430 - __PyEval_EvalFrameDefault
160: 0x100da098c - __PyEval_Vector
161: 0x100e0ca44 - _pyrun_file
162: 0x100e0c120 - __PyRun_SimpleFileObject
163: 0x100e0b740 - __PyRun_AnyFileObject
164: 0x100e35980 - _pymain_run_file_obj
165: 0x100e34f50 - _pymain_run_file
166: 0x100e34454 - _pymain_run_python
167: 0x100e342a8 - _Py_RunMain
168: 0x100e35b80 - _pymain_main
169: 0x100c0b558 - _main```
</details>
from polars.
It seems like the exact version where this error occurs is polars-0.20.22. It also seems like it behaves differently depending on hardware. Looking at the release notes of 0.20.22 I see a specific change on filtering in #15686, can that be the cause?
from polars.
Related Issues (20)
- pl.Enum equivalence is category order dependent HOT 3
- `Decimal[*, scale>0] * Int` has differing result type than `Decimal[*, scale>0] * Decimal[*,scale=0]` HOT 1
- The `pivot` feature does not compile in Rust polars v38-40. HOT 2
- Struct with decimals not read properly in parquet HOT 7
- Regression from 0.20.21 -> 0.20.22-rc.1 `pl.Expr.list.to_array(n)` is throwing `polars.exceptions.ComputeError: not all elements have the specified width n` HOT 1
- performance issue with tpch q7 after dropping columns and using sink_parquet HOT 4
- `map_elements` doesn't respect `return_dtype` within an `over` statement on an empty DataFrame HOT 1
- `pl.lit(None, dtype=pl.Struct({"a": pl.Int64()}))` gives `{'a': None}`, not `None` HOT 1
- Support equality operation on nested Array types
- Unordered enum data type HOT 4
- Support interval expressions in Python SQL Context
- minimal `dyn int` when reading from python HOT 1
- Panic when casting Array of Categoricals to Array of String HOT 2
- dt.epoch() is much slower than truediv() for the same operations HOT 1
- PanicException when using collect(streaming=True) on two LazyFrames from `scan_parquet()` calls.
- Allow Zero width no-break space in float parser HOT 7
- Alternative method 10x faster than dt.offset_by() HOT 2
- Sampling with groupby HOT 1
- Sample by Group HOT 4
- Add `make test-ci` to (mostly) replicate CI tests HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from polars.