Giter VIP home page Giter VIP logo

Comments (6)

gmarkall avatar gmarkall commented on June 2, 2024 1

Many thanks for the bug report and reproducer - I can reproduce the issue.

I can't see any obvious issue in your code, but I'm not an expert in the parallel functionality so I can't tell if there's also anything that's not expected to work when parallel=True - I'll bring this up in the next triage meeting, but note that it's not until Tuesday January 16th.

from numba.

netomenoci avatar netomenoci commented on June 2, 2024 1

This issue shouldn't be closed. The pr #9397 is still waiting for review

from numba.

netomenoci avatar netomenoci commented on June 2, 2024

Thank you so much @gmarkall

I think I have found the reason behind the issue and managed to make it work, even though I think there is a serious problem that needs to be fixed with the hoisting step.

By calling _groupby_apply_nb_parallel_dynamic_size.parallel_diagnostics(level=4), I was able to check that everything seemed to be fine with the serialization step, but it seems there is something wrong with the hoisting step.

This is what it looks like:
(Copying the relevant part of the code below)


                                                                                                         | 
    for k in prange(ngroups):----------------------------------------------------------------------------| #1
        start_idx = start_indices[k]                                                                     | 
        end_idx = end_indices[k]                                                                         | 
        res = func([data[start_idx:end_idx, ...] for data in dataargs], *funcargs)                       | 
        result[k, :] = res-------------------------------------------------------------------------------| #0
                                                                                                         | 
    return result       

 ---------------------------Loop invariant code motion---------------------------
Allocation hoisting:
No allocation hoisting found

Instruction hoisting:
loop #1:
  Has the following hoisted:
    closure__locals___listcomp__v6__v20build_slice_6 = global(slice: <class 'slice'>)
    closure__locals___listcomp__v6__vconst22_8 = const(ellipsis, Ellipsis)
    $_list_extend_gv_tuple.1 = global(tuple: <class 'tuple'>)
    $const318.27 = const(NoneType, None)
    $const320.28 = const(NoneType, None)
    $322build_slice.29 = global(slice: <class 'slice'>)
    $322build_slice.30 = call $322build_slice.29($const318.27, $const320.28, func=$322build_slice.29, args=(Var($const318.27, numba_issue.py:57), Var($const320.28, numba_issue.py:57)), kws=(), vararg=None, varkwarg=None, target=None)
    msg.25 = const(str, Sizes of result, res do not match on /home/numba_issue.py (57))
    assert.26 = global(assert_equiv: <intrinsic assert_equiv>)
    closure__locals___listcomp__v6__v6build_list_0 = build_list(items=[])
  Failed to hoist the following:
....
....

It seems that it has a lot of things being hoisted, in particular:
closure__locals___listcomp__v6__v6build_list_0 = build_list(items=[])Which looks like the list comprehension I have in [data[start_idx:end_idx, ...]

If it indeed is, then that should be the issue, as this obviously cannot be hoisted outside of the loop since the indices are defined by the prange variable.

I went further and tried to make the dependence even more explicit to help the compiler. I have then replaced

    for k in prange(ngroups):
        start_idx = start_indices[k]
        end_idx = end_indices[k]
        res = func([data[start_idx:end_idx, ...] for data in dataargs], *funcargs)
        result[k, :] = res
        

by the below one liner

for k in prange(ngroups):
        result[k, :] = func([data[start_indices[k]:end_indices[k], ...] for data in dataargs], *funcargs)

This is what the hoisting diagnosis looks like after this change:


---------------------------Loop invariant code motion---------------------------
Allocation hoisting:
No allocation hoisting found

Instruction hoisting:
loop #1:
  Has the following hoisted:
    closure__locals___listcomp__v6__v44build_slice_10 = global(slice: <class 'slice'>)
    closure__locals___listcomp__v6__vconst46_12 = const(ellipsis, Ellipsis)
    $_list_extend_gv_tuple.1 = global(tuple: <class 'tuple'>)
    $const282.21 = const(NoneType, None)
    $const284.22 = const(NoneType, None)
    $286build_slice.23 = global(slice: <class 'slice'>)
    $286build_slice.24 = call $286build_slice.23($const282.21, $const284.22, func=$286build_slice.23, args=(Var($const282.21, numba_issue.py:54), Var($const284.22, numba_issue.py:54)), kws=(), vararg=None, varkwarg=None, target=None)
    msg.25 = const(str, Sizes of result, $276call_function_ex.18 do not match on /home/numba_issue.py (54))
    assert.26 = global(assert_equiv: <intrinsic assert_equiv>)
  Failed to hoist the following:
  ....
  ....

Ie, there a single difference: the line below disappeared from the hoisting.
closure__locals___listcomp__v6__v6build_list_0 = build_list(items=[])
After that, the code works both with PARALLEL = True and PARALLEL = False

I am now afraid of using the parallel optimization from Numba, as it seems it cannot be trusted to perform the optimizations. I will be happy to hear back from you guys on what could be the cause of the issue, so I can feel safer to continue using it after understand in detail what might be going wrong. I would also be happy to be part of the PR to solve the issue.

Best regards,
Neto

from numba.

netomenoci avatar netomenoci commented on June 2, 2024

I have found another similar issue, so I will just add it here rather than creating a separate new one.

Replacing

for k in prange(ngroups):
        result[k, :] = func([data[start_indices[k]:end_indices[k], ...] for data in dataargs], *funcargs)

with

for k in prange(ngroups):
        result[k, ...] = func([data[start_indices[k]:end_indices[k], ...] for data in dataargs], *funcargs)

ie, using Ellipsis instead of ":" , results in the same issue of Parallel giving wrong results (not only that, but also a number of unique results equal to the number of cpus in the machine).

It is not possible to reproduce it with the above example (sorry). I was unable to create a minimum example for the Ellipsis issue, as it only happens when I run multiple tests at the same time (using pytest) and the issue doesn't happen when running the function in isolation.

I have captured though the diagnostic on the function, and as you can see, it also seems that the result[k, ...] is being hoisted out of the prange loop, which is wrong.


Parallel loop listing for  Function _groupby_apply_nb_parallel_dynamic_size
----------------------------------------------------------------------------------------------------|loop #ID
@njit(parallel=PARALLEL_FAST_GROUPBY, cache=False)                                                  | 
def _groupby_apply_nb_parallel_dynamic_size(                                                                   | 
    start_indices: np.array,                                                                        | 
    end_indices: np.array,                                                                          | 
    dataargs: numba.typed.List,                                                                     | 
    func: callable,                                                                                 | 
    funcargs,                                                                                       | 
    return_dtype=np.float64,                                                                        | 
):                                                                                                  | 
    """                                                                                             | 
    start_indices : 1d np.array with the start indices of each group                                | 
    end_indices : 1d np.array with the end indices of each group                                    | 
    dataargs: List of np 2d arrays. These are arguments of func                                     | 
    func: jitted function                                                                           | 
    funcargs: other arguments to func.                                                              | 
    return_dtype : output array dtype                                                               | 
    """                                                                                             | 
                                                                                                    | 
    ngroups = len(start_indices)                                                                    | 
    if ngroups == 0:                                                                                | 
        raise Exception("empty number of grouped, (start_indices)")                                 | 
                                                                                                    | 
    # First call to determinize the size of the output.                                             | 
    # Assumes the first group will have output size equal to the output size of all other groups    | 
    out_cols_size = (                                                                               | 
        func(                                                                                       | 
            [data[start_indices[0] : end_indices[0], ...] for data in dataargs],                    | 
            *funcargs,                                                                              | 
        )                                                                                           | 
    ).shape[0]                                                                                      | 
    result = np.empty(shape=(ngroups, out_cols_size), dtype=return_dtype)                           | 
                                                                                                    | 
    for k in prange(ngroups):-----------------------------------------------------------------------| #10
        result[k, ...] = func(                                                                      | 
            [data[start_indices[k] : end_indices[k], ...] for data in dataargs],                    | 
            *funcargs,                                                                              | 
        )                                                                                           | 
                                                                                                    | 
    return result                                                                                   | 
--------------------------------- Fusing loops ---------------------------------
Attempting fusion of parallel loops (combines loops with similar properties)...
----------------------------- Before Optimisation ------------------------------
--------------------------------------------------------------------------------
------------------------------ After Optimisation ------------------------------
Parallel structure is already optimal.
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
 
---------------------------Loop invariant code motion---------------------------
Allocation hoisting:
No allocation hoisting found

Instruction hoisting:
loop #10:
  Has the following hoisted:
    closure__locals___listcomp__v85__v44build_slice_10 = global(slice: <class 'slice'>)
    closure__locals___listcomp__v85__vconst46_12 = const(ellipsis, Ellipsis)
    $_list_extend_gv_tuple.1 = global(tuple: <class 'tuple'>)
    $const278.21 = const(ellipsis, Ellipsis)
  Failed to hoist the following:
    not pure: $closure__locals___listcomp__v85_implicit0.497 = getiter(value=dataargs)
    dependency: closure__locals___listcomp__v85__v20binary_subscr_6 = static_getitem(value=start__indices, index=None, index_var=$parfor__index_493.512, fn=<built-in function getitem>)
    dependency: closure__locals___listcomp__v85__v34binary_subscr_9 = static_getitem(value=end__indices, index=None, index_var=$parfor__index_493.512, fn=<built-in function getitem>)
    dependency: closure__locals___listcomp__v85__v44build_slice_11 = call $push_global_to_block.510(closure__locals___listcomp__v85__v20binary_subscr_6, closure__locals___listcomp__v85__v34binary_subscr_9, func=$push_global_to_block.510, args=(Var(closure__locals___listcomp__v85__v20binary_subscr_6, fastapply.py:239), Var(closure__locals___listcomp__v85__v34binary_subscr_9, fastapply.py:239)), kws=(), vararg=None, varkwarg=None, target=None)
    dependency: closure__locals___listcomp__v85__v48build_tuple_13 = build_tuple(items=[Var(closure__locals___listcomp__v85__v44build_slice_11, fastapply.py:239), Var(closure__locals___listcomp__v85__vconst46_12, fastapply.py:239)])
    dependency: closure__locals___listcomp__v85__v60list_append_15 = getattr(value=closure__locals___listcomp__v85__v6build_list_0, attr=append)
    dependency: closure__locals___listcomp__v85__v60list_append_16 = call closure__locals___listcomp__v85__v60list_append_15(closure__locals___listcomp__v85__v50binary_subscr_14, func=closure__locals___listcomp__v85__v60list_append_15, args=(Var(closure__locals___listcomp__v85__v50binary_subscr_14, fastapply.py:239),), kws=(), vararg=None, varkwarg=None, target=None)
    dependency: $264build_list.13 = build_tuple(items=[Var(closure__locals___listcomp__v85__v6build_list_0, fastapply.py:239)])
    dependency: $268list_extend.15_var_funcargs = call $push_global_to_block.511(funcargs, func=$push_global_to_block.511, args=(Var(funcargs, fastapply.py:205),), kws=(), vararg=None, varkwarg=None, target=None)
    dependency: $268list_extend.16 = $264build_list.13 + $268list_extend.15_var_funcargs
    stored array: $272call_function_ex.18 = call func(*$268list_extend.16, func=func, args=[], kws=[], vararg=$268list_extend.16, varkwarg=None, target=None)
    dependency: $280build_tuple.22 = build_tuple(items=[Var($parfor__index_493.512, <string>:3), Var($const278.21, fastapply.py:238)])
    dependency: closure__locals___listcomp__v85__v10for_iter_2 = iternext(value=$closure__locals___listcomp__v85_implicit0.497)
    dependency: closure__locals___listcomp__v85__v10for_iter_3 = pair_first(value=closure__locals___listcomp__v85__v10for_iter_2)
    dependency: closure__locals___listcomp__v85__v10for_iter_4 = pair_second(value=closure__locals___listcomp__v85__v10for_iter_2)
--------------------------------------------------------------------------------

from numba.

gmarkall avatar gmarkall commented on June 2, 2024

@DrTodd13 Do you have any thoughts on this issue?

from numba.

github-actions avatar github-actions commented on June 2, 2024

This issue is marked as stale as it has had no activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with any updates and confirm that this issue still needs to be addressed.

from numba.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.