Comments (6)
Many thanks for the bug report and reproducer - I can reproduce the issue.
I can't see any obvious issue in your code, but I'm not an expert in the parallel functionality so I can't tell if there's also anything that's not expected to work when parallel=True
- I'll bring this up in the next triage meeting, but note that it's not until Tuesday January 16th.
from numba.
This issue shouldn't be closed. The pr #9397 is still waiting for review
from numba.
Thank you so much @gmarkall
I think I have found the reason behind the issue and managed to make it work, even though I think there is a serious problem that needs to be fixed with the hoisting step.
By calling _groupby_apply_nb_parallel_dynamic_size.parallel_diagnostics(level=4), I was able to check that everything seemed to be fine with the serialization step, but it seems there is something wrong with the hoisting step.
This is what it looks like:
(Copying the relevant part of the code below)
|
for k in prange(ngroups):----------------------------------------------------------------------------| #1
start_idx = start_indices[k] |
end_idx = end_indices[k] |
res = func([data[start_idx:end_idx, ...] for data in dataargs], *funcargs) |
result[k, :] = res-------------------------------------------------------------------------------| #0
|
return result
---------------------------Loop invariant code motion---------------------------
Allocation hoisting:
No allocation hoisting found
Instruction hoisting:
loop #1:
Has the following hoisted:
closure__locals___listcomp__v6__v20build_slice_6 = global(slice: <class 'slice'>)
closure__locals___listcomp__v6__vconst22_8 = const(ellipsis, Ellipsis)
$_list_extend_gv_tuple.1 = global(tuple: <class 'tuple'>)
$const318.27 = const(NoneType, None)
$const320.28 = const(NoneType, None)
$322build_slice.29 = global(slice: <class 'slice'>)
$322build_slice.30 = call $322build_slice.29($const318.27, $const320.28, func=$322build_slice.29, args=(Var($const318.27, numba_issue.py:57), Var($const320.28, numba_issue.py:57)), kws=(), vararg=None, varkwarg=None, target=None)
msg.25 = const(str, Sizes of result, res do not match on /home/numba_issue.py (57))
assert.26 = global(assert_equiv: <intrinsic assert_equiv>)
closure__locals___listcomp__v6__v6build_list_0 = build_list(items=[])
Failed to hoist the following:
....
....
It seems that it has a lot of things being hoisted, in particular:
closure__locals___listcomp__v6__v6build_list_0 = build_list(items=[])
Which looks like the list comprehension I have in [data[start_idx:end_idx, ...]
If it indeed is, then that should be the issue, as this obviously cannot be hoisted outside of the loop since the indices are defined by the prange variable.
I went further and tried to make the dependence even more explicit to help the compiler. I have then replaced
for k in prange(ngroups):
start_idx = start_indices[k]
end_idx = end_indices[k]
res = func([data[start_idx:end_idx, ...] for data in dataargs], *funcargs)
result[k, :] = res
by the below one liner
for k in prange(ngroups):
result[k, :] = func([data[start_indices[k]:end_indices[k], ...] for data in dataargs], *funcargs)
This is what the hoisting diagnosis looks like after this change:
---------------------------Loop invariant code motion---------------------------
Allocation hoisting:
No allocation hoisting found
Instruction hoisting:
loop #1:
Has the following hoisted:
closure__locals___listcomp__v6__v44build_slice_10 = global(slice: <class 'slice'>)
closure__locals___listcomp__v6__vconst46_12 = const(ellipsis, Ellipsis)
$_list_extend_gv_tuple.1 = global(tuple: <class 'tuple'>)
$const282.21 = const(NoneType, None)
$const284.22 = const(NoneType, None)
$286build_slice.23 = global(slice: <class 'slice'>)
$286build_slice.24 = call $286build_slice.23($const282.21, $const284.22, func=$286build_slice.23, args=(Var($const282.21, numba_issue.py:54), Var($const284.22, numba_issue.py:54)), kws=(), vararg=None, varkwarg=None, target=None)
msg.25 = const(str, Sizes of result, $276call_function_ex.18 do not match on /home/numba_issue.py (54))
assert.26 = global(assert_equiv: <intrinsic assert_equiv>)
Failed to hoist the following:
....
....
Ie, there a single difference: the line below disappeared from the hoisting.
closure__locals___listcomp__v6__v6build_list_0 = build_list(items=[])
After that, the code works both with PARALLEL = True and PARALLEL = False
I am now afraid of using the parallel optimization from Numba, as it seems it cannot be trusted to perform the optimizations. I will be happy to hear back from you guys on what could be the cause of the issue, so I can feel safer to continue using it after understand in detail what might be going wrong. I would also be happy to be part of the PR to solve the issue.
Best regards,
Neto
from numba.
I have found another similar issue, so I will just add it here rather than creating a separate new one.
Replacing
for k in prange(ngroups):
result[k, :] = func([data[start_indices[k]:end_indices[k], ...] for data in dataargs], *funcargs)
with
for k in prange(ngroups):
result[k, ...] = func([data[start_indices[k]:end_indices[k], ...] for data in dataargs], *funcargs)
ie, using Ellipsis instead of ":" , results in the same issue of Parallel giving wrong results (not only that, but also a number of unique results equal to the number of cpus in the machine).
It is not possible to reproduce it with the above example (sorry). I was unable to create a minimum example for the Ellipsis issue, as it only happens when I run multiple tests at the same time (using pytest) and the issue doesn't happen when running the function in isolation.
I have captured though the diagnostic on the function, and as you can see, it also seems that the result[k, ...] is being hoisted out of the prange loop, which is wrong.
Parallel loop listing for Function _groupby_apply_nb_parallel_dynamic_size
----------------------------------------------------------------------------------------------------|loop #ID
@njit(parallel=PARALLEL_FAST_GROUPBY, cache=False) |
def _groupby_apply_nb_parallel_dynamic_size( |
start_indices: np.array, |
end_indices: np.array, |
dataargs: numba.typed.List, |
func: callable, |
funcargs, |
return_dtype=np.float64, |
): |
""" |
start_indices : 1d np.array with the start indices of each group |
end_indices : 1d np.array with the end indices of each group |
dataargs: List of np 2d arrays. These are arguments of func |
func: jitted function |
funcargs: other arguments to func. |
return_dtype : output array dtype |
""" |
|
ngroups = len(start_indices) |
if ngroups == 0: |
raise Exception("empty number of grouped, (start_indices)") |
|
# First call to determinize the size of the output. |
# Assumes the first group will have output size equal to the output size of all other groups |
out_cols_size = ( |
func( |
[data[start_indices[0] : end_indices[0], ...] for data in dataargs], |
*funcargs, |
) |
).shape[0] |
result = np.empty(shape=(ngroups, out_cols_size), dtype=return_dtype) |
|
for k in prange(ngroups):-----------------------------------------------------------------------| #10
result[k, ...] = func( |
[data[start_indices[k] : end_indices[k], ...] for data in dataargs], |
*funcargs, |
) |
|
return result |
--------------------------------- Fusing loops ---------------------------------
Attempting fusion of parallel loops (combines loops with similar properties)...
----------------------------- Before Optimisation ------------------------------
--------------------------------------------------------------------------------
------------------------------ After Optimisation ------------------------------
Parallel structure is already optimal.
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
---------------------------Loop invariant code motion---------------------------
Allocation hoisting:
No allocation hoisting found
Instruction hoisting:
loop #10:
Has the following hoisted:
closure__locals___listcomp__v85__v44build_slice_10 = global(slice: <class 'slice'>)
closure__locals___listcomp__v85__vconst46_12 = const(ellipsis, Ellipsis)
$_list_extend_gv_tuple.1 = global(tuple: <class 'tuple'>)
$const278.21 = const(ellipsis, Ellipsis)
Failed to hoist the following:
not pure: $closure__locals___listcomp__v85_implicit0.497 = getiter(value=dataargs)
dependency: closure__locals___listcomp__v85__v20binary_subscr_6 = static_getitem(value=start__indices, index=None, index_var=$parfor__index_493.512, fn=<built-in function getitem>)
dependency: closure__locals___listcomp__v85__v34binary_subscr_9 = static_getitem(value=end__indices, index=None, index_var=$parfor__index_493.512, fn=<built-in function getitem>)
dependency: closure__locals___listcomp__v85__v44build_slice_11 = call $push_global_to_block.510(closure__locals___listcomp__v85__v20binary_subscr_6, closure__locals___listcomp__v85__v34binary_subscr_9, func=$push_global_to_block.510, args=(Var(closure__locals___listcomp__v85__v20binary_subscr_6, fastapply.py:239), Var(closure__locals___listcomp__v85__v34binary_subscr_9, fastapply.py:239)), kws=(), vararg=None, varkwarg=None, target=None)
dependency: closure__locals___listcomp__v85__v48build_tuple_13 = build_tuple(items=[Var(closure__locals___listcomp__v85__v44build_slice_11, fastapply.py:239), Var(closure__locals___listcomp__v85__vconst46_12, fastapply.py:239)])
dependency: closure__locals___listcomp__v85__v60list_append_15 = getattr(value=closure__locals___listcomp__v85__v6build_list_0, attr=append)
dependency: closure__locals___listcomp__v85__v60list_append_16 = call closure__locals___listcomp__v85__v60list_append_15(closure__locals___listcomp__v85__v50binary_subscr_14, func=closure__locals___listcomp__v85__v60list_append_15, args=(Var(closure__locals___listcomp__v85__v50binary_subscr_14, fastapply.py:239),), kws=(), vararg=None, varkwarg=None, target=None)
dependency: $264build_list.13 = build_tuple(items=[Var(closure__locals___listcomp__v85__v6build_list_0, fastapply.py:239)])
dependency: $268list_extend.15_var_funcargs = call $push_global_to_block.511(funcargs, func=$push_global_to_block.511, args=(Var(funcargs, fastapply.py:205),), kws=(), vararg=None, varkwarg=None, target=None)
dependency: $268list_extend.16 = $264build_list.13 + $268list_extend.15_var_funcargs
stored array: $272call_function_ex.18 = call func(*$268list_extend.16, func=func, args=[], kws=[], vararg=$268list_extend.16, varkwarg=None, target=None)
dependency: $280build_tuple.22 = build_tuple(items=[Var($parfor__index_493.512, <string>:3), Var($const278.21, fastapply.py:238)])
dependency: closure__locals___listcomp__v85__v10for_iter_2 = iternext(value=$closure__locals___listcomp__v85_implicit0.497)
dependency: closure__locals___listcomp__v85__v10for_iter_3 = pair_first(value=closure__locals___listcomp__v85__v10for_iter_2)
dependency: closure__locals___listcomp__v85__v10for_iter_4 = pair_second(value=closure__locals___listcomp__v85__v10for_iter_2)
--------------------------------------------------------------------------------
from numba.
@DrTodd13 Do you have any thoughts on this issue?
from numba.
This issue is marked as stale as it has had no activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with any updates and confirm that this issue still needs to be addressed.
from numba.
Related Issues (20)
- np.median raises AssertionError for empty arrays while numpy returns nan HOT 1
- Integrate ZLUDA for AMD CUDA HOT 3
- `nextafter` (via both `math` and `numpy`) missing for CUDA
- Fill out `python -m numba -s` field for `CUDA Libraries Test Output` (even if GPU is absent) HOT 1
- Overloaded function is compiled more than once with the same signature HOT 4
- Regression: searchsorted drastically slower in tight loops HOT 3
- test_sum1d_pyobj crashes on aarch64-linux on 0.59.0 HOT 5
- `AttributeError` for inlined generic functions with py312 pep-695 type-parameter syntax HOT 5
- [numba.cuda] StructModel + FP16 fails with CUDA because make_attribute_wrapper assumes default_manager HOT 1
- No matching implimentation for `np.empty` in guvectorized function with `NUMBA_DISABLE_JIT=1` and related caching issue HOT 4
- Casting error HOT 3
- overload_methoded function shouldn't set `no_cpython_wrapper` as `False` HOT 2
- How about unifying `int32` and `Literal[int](0)` as `int32`, rather than `int64` HOT 2
- The `np.size()` Function Fails HOT 1
- I get super bad performance when using zip() and parallel=True HOT 3
- CUDA target respect NUMBA_OPT environment variable HOT 4
- CUDA crashes when passed complex record array HOT 12
- [ANN] Numba User Survey 2024 HOT 1
- want `int64` rather than `Optional(int64)` when `dict.get(key, default_int64)` and dict value type is `int64` HOT 1
- Infer Numba Types from Python Type Hints HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from numba.