Giter VIP home page Giter VIP logo

Comments (8)

jmorel avatar jmorel commented on June 24, 2024 1

I used to have the issue while creating assets on the demo setup, I tried reproducing it locally but can't. I'll run my test case against the demo as soon as it's available again and fetch the traceback.

from substra-backend.

GuillaumeCisco avatar GuillaumeCisco commented on June 24, 2024 1

I've just written a document about this error.
It is available here

Please give me your reviews on it.

from substra-backend.

samlesu avatar samlesu commented on June 24, 2024

Thanks @jmorel do you have the associated traceback in the backend?

from substra-backend.

Kelvin-M avatar Kelvin-M commented on June 24, 2024

@jmorel Do you manage to reproduce it ? It would be nice to have the backend traceback in order to be able to find a fix for this issue.
I suspect that we will need to change the way we initialize the fabric client !

from substra-backend.

jmorel avatar jmorel commented on June 24, 2024

I got the error again while running a very big (about 1400 tuples) compute plan on the demo env, here is the stacktrace from org4-worker:

[2020-01-08 08:04:52,559: ERROR/ForkPoolWorker-1] <_Rendezvous of RPC that terminated with:
    status = StatusCode.UNAVAILABLE
    details = "failed to connect to all addresses"
    debug_error_string = "{"created":"@1578470692.558659550","description":"Failed to pick subchannel","file":"src/core/ext/filters
/client_channel/client_channel.cc","file_line":3934,"referenced_errors":[{"created":"@1578470692.558654186","description":"failed t
o connect to all addresses","file":"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":393,"grpc_s
tatus":14}]}"
>
Traceback (most recent call last):
  File "/usr/src/app/substrapp/ledger_utils.py", line 167, in call_ledger
    response = loop.run_until_complete(chaincode_calls[call_type](**params))
  File "/usr/lib/python3.6/asyncio/base_events.py", line 484, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.6/dist-packages/hfc/fabric/client.py", line 1640, in chaincode_invoke
    res = await asyncio.gather(*responses)
  File "/usr/local/lib/python3.6/dist-packages/aiogrpc/channel.py", line 40, in __call__
    return await fut
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
    status = StatusCode.UNAVAILABLE
    details = "failed to connect to all addresses"
    debug_error_string = "{"created":"@1578470692.558659550","description":"Failed to pick subchannel","file":"src/core/ext/filters
/client_channel/client_channel.cc","file_line":3934,"referenced_errors":[{"created":"@1578470692.558654186","description":"failed t
o connect to all addresses","file":"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":393,"grpc_s
tatus":14}]}"
>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/src/app/substrapp/ledger_utils.py", line 180, in call_ledger
    response = [r for r in e.args[0] if r.response.status != 200][0].response.message
IndexError: tuple index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/src/app/substrapp/tasks/tasks.py", line 472, in on_success
    log_success_tuple(tuple_type, subtuple['key'], retval['result'])
  File "/usr/src/app/substrapp/ledger_utils.py", line 371, in log_success_tuple
    _update_tuple_status(tuple_type, tuple_key, 'done', extra_kwargs=extra_kwargs)
  File "/usr/src/app/substrapp/ledger_utils.py", line 324, in _update_tuple_status
    update_ledger(fcn=invoke_fcn, args=invoke_args, sync=True)
  File "/usr/src/app/substrapp/ledger_utils.py", line 107, in _wrapper
    return fn(*args, **kwargs)
  File "/usr/src/app/substrapp/ledger_utils.py", line 233, in update_ledger
    return _invoke_ledger(*args, **kwargs)
  File "/usr/src/app/substrapp/ledger_utils.py", line 212, in _invoke_ledger
    response = call_ledger('invoke', fcn=fcn, args=args, kwargs=params)
  File "/usr/src/app/substrapp/ledger_utils.py", line 182, in call_ledger
    raise LedgerError(str(e))
substrapp.ledger_utils.LedgerError: <_Rendezvous of RPC that terminated with:
    status = StatusCode.UNAVAILABLE
    details = "failed to connect to all addresses"
    debug_error_string = "{"created":"@1578470692.558659550","description":"Failed to pick subchannel","file":"src/core/ext/filters
/client_channel/client_channel.cc","file_line":3934,"referenced_errors":[{"created":"@1578470692.558654186","description":"failed t
o connect to all addresses","file":"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":393,"grpc_s
tatus":14}]}"
>

from substra-backend.

Kelvin-M avatar Kelvin-M commented on June 24, 2024

Thank you @jmorel.
We can see that we do not retry on this kind of error https://github.com/SubstraFoundation/substra-backend/blob/master/backend/substrapp/ledger_utils.py#L89-L117
We can add new error to retry on this issue

from substra-backend.

samlesu avatar samlesu commented on June 24, 2024

Thanks @jmorel, this is clearer and it seems like a new error.

As seen in the traceback, the backend is also failing to parse correctly this error (should be fixed).

Before retrying, I think it would be worth to understand the cause of this error. Retrying may not be the only solution and not the best long term solution.

from substra-backend.

Kelvin-M avatar Kelvin-M commented on June 24, 2024

Where are we on this one ?
As we have a ledger retry strategy it should prevent from short connection interruption between two nodes. For longer interruption, it may be a bigger problem that should not be handled directly in the backend no ?

from substra-backend.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.