Giter VIP home page Giter VIP logo

Comments (4)

kszlim avatar kszlim commented on September 27, 2024 1

I'm running into a similar problem here #16693 to do with list len which seems to be introduced by that exact same commit.

from polars.

ritchie46 avatar ritchie46 commented on September 27, 2024

Can you get a repro? It is hard for us to do anything without.

P.S. what is the callstack if you set RUST_BACKTRACE=1?

from polars.

maxzw avatar maxzw commented on September 27, 2024

I've gotten to a point where I could localise when the error happens:

df = ...
df_le = df.filter(pl.col("foo") <= pl.col("bar").sum().over("x", "y"))
df_gt = df.filter(pl.col("foo") > pl.col("bar").sum().over("x", "y"))
df_le.write_parquet("df_le.parquet")  # <- runs fine
df_gt.write_parquet("df_gt.parquet")  # <- line crashes

But when I try to save the dataframe that occurs before this operation, the line that previously crashed runs fine...

df = ...
df.write_parquet("df.parquet")   # <- runs fine
df_le = df.filter(pl.col("foo") <= pl.col("bar").sum().over("x", "y"))
df_gt = df.filter(pl.col("foo") > pl.col("bar").sum().over("x", "y"))
df_le.write_parquet("df_le.parquet")  # <- runs fine
df_gt.write_parquet("df_gt.parquet")  # <- runs fine

But when I make a copy of df and save that before doing the operation, the line crashes again:

df = ...
df_clone = df.clone()
df_clone.write_parquet("df_clone.parquet")   # <- runs fine
df_le = df.filter(pl.col("foo") <= pl.col("bar").sum().over("x", "y"))
df_gt = df.filter(pl.col("foo") > pl.col("bar").sum().over("x", "y"))
df_le.write_parquet("df_le.parquet")  # <- runs fine
df_gt.write_parquet("df_gt.parquet")  # <- line crashes

Then finally when trying to replicate the error using df_clone.parquet, no errors happen. Seems like I've stumbled upon the Polars variant of the observers effect 😄

I have been able to solve the error when running our tests locally by:

df = ...
df = df.with_columns(pl.col("bar").sum().over("x", "y").alias("bar"))
total_le_weight = df.filter(pl.col("foo") <= pl.col("bar")).drop("bar")
total_gt_weight = df.filter(pl.col("foo") > pl.col("bar")).drop("bar")

But when our tests run elsewhere they still fail.

PS: full backtrace looks like:

Click to toggle contents
thread '<unnamed>' panicked at /Users/runner/work/polars/polars/crates/polars-core/src/frame/chunks.rs:42:57:
index out of bounds: the len is 1 but the index is 1
stack backtrace:
   0:        0x151052b94 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h0d2dc83dc1a2d180
   1:        0x14f0c2380 - core::fmt::write::hafaf83c1f13acb2d
   2:        0x15102c0dc - std::io::Write::write_fmt::h99a5e7783791b568
   3:        0x151056388 - std::sys_common::backtrace::print::hdef7d6c5b6479962
   4:        0x151055ca0 - std::panicking::default_hook::{{closure}}::hd93f15c5fd187f2b
   5:        0x15105763c - std::panicking::rust_panic_with_hook::h5ec40fc780130b7e
   6:        0x1510566a0 - std::panicking::begin_panic_handler::{{closure}}::hf0062a47097d69cb
   7:        0x151056608 - std::sys_common::backtrace::__rust_end_short_backtrace::hac3ff190b7c839f5
   8:        0x1510565fc - _rust_begin_unwind
   9:        0x1511afee0 - core::panicking::panic_fmt::h7ad8ed088f78f191
  10:        0x1511aff3c - core::panicking::panic_bounds_check::hc4c59508c9c9cfdb
  11:        0x14ff12780 - <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter::hf6a01e76e499b926
  12:        0x14ff11c9c - polars_lazy::physical_plan::executors::stack::StackExec::execute_impl::hd91b93ed511967eb
  13:        0x14ff11128 - <polars_lazy::physical_plan::executors::stack::StackExec as polars_lazy::physical_plan::executors::executor::Executor>::execute::h0b8aee35b0155ddf
  14:        0x14ff03830 - polars_lazy::frame::LazyFrame::collect::hf6a4525f2adaf417
  15:        0x14efae104 - polars::lazyframe::PyLazyFrame::__pymethod_collect__::haf2b4fa979e8e1dc
  16:        0x14e9b6650 - pyo3::impl_::trampoline::trampoline::h573cd7b2aab8b9f4
  17:        0x14efbfe08 - polars::lazyframe::_::__INVENTORY::trampoline::h9bef7636d256b696
  18:        0x100c8b2c0 - _method_vectorcall_VARARGS_KEYWORDS
  19:        0x100dafe80 - _call_function
  20:        0x100da6e84 - __PyEval_EvalFrameDefault
  21:        0x100da098c - __PyEval_Vector
  22:        0x100c805dc - _method_vectorcall
  23:        0x100dafe80 - _call_function
  24:        0x100da64b0 - __PyEval_EvalFrameDefault
  25:        0x100da098c - __PyEval_Vector
  26:        0x100dafe80 - _call_function
  27:        0x100da6e84 - __PyEval_EvalFrameDefault
  28:        0x100da098c - __PyEval_Vector
  29:        0x100dafe80 - _call_function
  30:        0x100da6e84 - __PyEval_EvalFrameDefault
  31:        0x100da098c - __PyEval_Vector
  32:        0x100dafe80 - _call_function
  33:        0x100da6e84 - __PyEval_EvalFrameDefault
  34:        0x100da098c - __PyEval_Vector
  35:        0x100dafe80 - _call_function
  36:        0x100da64b0 - __PyEval_EvalFrameDefault
  37:        0x100da098c - __PyEval_Vector
  38:        0x100c805dc - _method_vectorcall
  39:        0x100dafe80 - _call_function
  40:        0x100da64b0 - __PyEval_EvalFrameDefault
  41:        0x100da098c - __PyEval_Vector
  42:        0x100dafe80 - _call_function
  43:        0x100da6e84 - __PyEval_EvalFrameDefault
  44:        0x100da098c - __PyEval_Vector
  45:        0x100dafe80 - _call_function
  46:        0x100da6e84 - __PyEval_EvalFrameDefault
  47:        0x100da098c - __PyEval_Vector
  48:        0x100dafe80 - _call_function
  49:        0x100da6e84 - __PyEval_EvalFrameDefault
  50:        0x100da098c - __PyEval_Vector
  51:        0x100dafe80 - _call_function
  52:        0x100da64b0 - __PyEval_EvalFrameDefault
  53:        0x100da098c - __PyEval_Vector
  54:        0x100c7d12c - _PyVectorcall_Call
  55:        0x100da2bf0 - __PyEval_EvalFrameDefault
  56:        0x100da098c - __PyEval_Vector
  57:        0x100da2bf0 - __PyEval_EvalFrameDefault
  58:        0x100da098c - __PyEval_Vector
  59:        0x100dafe80 - _call_function
  60:        0x100da7d80 - __PyEval_EvalFrameDefault
  61:        0x100da098c - __PyEval_Vector
  62:        0x100c805dc - _method_vectorcall
  63:        0x100dafe80 - _call_function
  64:        0x100da7d80 - __PyEval_EvalFrameDefault
  65:        0x100da098c - __PyEval_Vector
  66:        0x100c7ca18 - __PyObject_FastCallDictTstate
  67:        0x100d0a6bc - _slot_tp_call
  68:        0x100c7c6c8 - __PyObject_MakeTpCall
  69:        0x100daff78 - _call_function
  70:        0x100da64b0 - __PyEval_EvalFrameDefault
  71:        0x100da098c - __PyEval_Vector
  72:        0x100dafe80 - _call_function
  73:        0x100da6e84 - __PyEval_EvalFrameDefault
  74:        0x100da098c - __PyEval_Vector
  75:        0x100da2bf0 - __PyEval_EvalFrameDefault
  76:        0x100da098c - __PyEval_Vector
  77:        0x100dafe80 - _call_function
  78:        0x100da7d80 - __PyEval_EvalFrameDefault
  79:        0x100da098c - __PyEval_Vector
  80:        0x100c805dc - _method_vectorcall
  81:        0x100dafe80 - _call_function
  82:        0x100da7d80 - __PyEval_EvalFrameDefault
  83:        0x100da098c - __PyEval_Vector
  84:        0x100c7ca18 - __PyObject_FastCallDictTstate
  85:        0x100d0a6bc - _slot_tp_call
  86:        0x100c7d28c - __PyObject_Call
  87:        0x100da2bf0 - __PyEval_EvalFrameDefault
  88:        0x100da098c - __PyEval_Vector
  89:        0x100dafe80 - _call_function
  90:        0x100da6430 - __PyEval_EvalFrameDefault
  91:        0x100da098c - __PyEval_Vector
  92:        0x100c805dc - _method_vectorcall
  93:        0x100dafe80 - _call_function
  94:        0x100da64b0 - __PyEval_EvalFrameDefault
  95:        0x100da098c - __PyEval_Vector
  96:        0x100da2bf0 - __PyEval_EvalFrameDefault
  97:        0x100da098c - __PyEval_Vector
  98:        0x100dafe80 - _call_function
  99:        0x100da6430 - __PyEval_EvalFrameDefault
 100:        0x100da098c - __PyEval_Vector
 101:        0x100dafe80 - _call_function
 102:        0x100da64b0 - __PyEval_EvalFrameDefault
 103:        0x100da098c - __PyEval_Vector
 104:        0x100da2bf0 - __PyEval_EvalFrameDefault
 105:        0x100da098c - __PyEval_Vector
 106:        0x100dafe80 - _call_function
 107:        0x100da7d80 - __PyEval_EvalFrameDefault
 108:        0x100da098c - __PyEval_Vector
 109:        0x100c805dc - _method_vectorcall
 110:        0x100dafe80 - _call_function
 111:        0x100da7d80 - __PyEval_EvalFrameDefault
 112:        0x100da098c - __PyEval_Vector
 113:        0x100c7ca18 - __PyObject_FastCallDictTstate
 114:        0x100d0a6bc - _slot_tp_call
 115:        0x100c7c6c8 - __PyObject_MakeTpCall
 116:        0x100daff78 - _call_function
 117:        0x100da64b0 - __PyEval_EvalFrameDefault
 118:        0x100da098c - __PyEval_Vector
 119:        0x100da2bf0 - __PyEval_EvalFrameDefault
 120:        0x100da098c - __PyEval_Vector
 121:        0x100dafe80 - _call_function
 122:        0x100da7d80 - __PyEval_EvalFrameDefault
 123:        0x100da098c - __PyEval_Vector
 124:        0x100c805dc - _method_vectorcall
 125:        0x100dafe80 - _call_function
 126:        0x100da7d80 - __PyEval_EvalFrameDefault
 127:        0x100da098c - __PyEval_Vector
 128:        0x100c7ca18 - __PyObject_FastCallDictTstate
 129:        0x100d0a6bc - _slot_tp_call
 130:        0x100c7c6c8 - __PyObject_MakeTpCall
 131:        0x100daff78 - _call_function
 132:        0x100da64b0 - __PyEval_EvalFrameDefault
 133:        0x100da098c - __PyEval_Vector
 134:        0x100dafe80 - _call_function
 135:        0x100da6430 - __PyEval_EvalFrameDefault
 136:        0x100da098c - __PyEval_Vector
 137:        0x100dafe80 - _call_function
 138:        0x100da6430 - __PyEval_EvalFrameDefault
 139:        0x100da098c - __PyEval_Vector
 140:        0x100da2bf0 - __PyEval_EvalFrameDefault
 141:        0x100da098c - __PyEval_Vector
 142:        0x100dafe80 - _call_function
 143:        0x100da7d80 - __PyEval_EvalFrameDefault
 144:        0x100da098c - __PyEval_Vector
 145:        0x100c805dc - _method_vectorcall
 146:        0x100dafe80 - _call_function
 147:        0x100da7d80 - __PyEval_EvalFrameDefault
 148:        0x100da098c - __PyEval_Vector
 149:        0x100c7ca18 - __PyObject_FastCallDictTstate
 150:        0x100d0a6bc - _slot_tp_call
 151:        0x100c7c6c8 - __PyObject_MakeTpCall
 152:        0x100daff78 - _call_function
 153:        0x100da64b0 - __PyEval_EvalFrameDefault
 154:        0x100da098c - __PyEval_Vector
 155:        0x100dafe80 - _call_function
 156:        0x100da6430 - __PyEval_EvalFrameDefault
 157:        0x100da098c - __PyEval_Vector
 158:        0x100dafe80 - _call_function
 159:        0x100da6430 - __PyEval_EvalFrameDefault
 160:        0x100da098c - __PyEval_Vector
 161:        0x100e0ca44 - _pyrun_file
 162:        0x100e0c120 - __PyRun_SimpleFileObject
 163:        0x100e0b740 - __PyRun_AnyFileObject
 164:        0x100e35980 - _pymain_run_file_obj
 165:        0x100e34f50 - _pymain_run_file
 166:        0x100e34454 - _pymain_run_python
 167:        0x100e342a8 - _Py_RunMain
 168:        0x100e35b80 - _pymain_main
 169:        0x100c0b558 - _main```
</details>

from polars.

maxzw avatar maxzw commented on September 27, 2024

It seems like the exact version where this error occurs is polars-0.20.22. It also seems like it behaves differently depending on hardware. Looking at the release notes of 0.20.22 I see a specific change on filtering in #15686, can that be the cause?

from polars.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.