Comments (2)
It also looks like the matching execution got retried properly because none of the tuples are stuck in status doing and all of them were executed properly.
from substra-backend.
@jmorel I have got the same issue here
[2020-01-16 12:11:40,724: INFO/MainProcess] Received task: substrapp.tasks.tasks.prepare_tuple[4537dc402640c0a5d347702318d810412e48ae37f21fdf00e9b71ca3bd772745]
[2020-01-16 12:11:40,788: INFO/ForkPoolWorker-1] DISCOVERY: adding channel peers query
[2020-01-16 12:11:40,788: INFO/ForkPoolWorker-1] DISCOVERY: adding config query
[2020-01-16 12:11:40,788: INFO/ForkPoolWorker-1] DISCOVERY: adding chaincodes/collection query
[2020-01-16 12:11:40,888: INFO/ForkPoolWorker-1] create peer delivery stream
[2020-01-16 12:11:40,905: INFO/ForkPoolWorker-1] create peer delivery stream
[2020-01-16 12:12:25,878: ERROR/ForkPoolWorker-1] exception calling callback for <Future at 0x7f1ce298fd30 state=finished raised _MultiThreadedRendezvous>
Traceback (most recent call last):
File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.6/dist-packages/aiogrpc/utils.py", line 126, in _next
return next(self._iterator)
File "/usr/local/lib/python3.6/dist-packages/grpc/_channel.py", line 416, in __next__
return self._next()
File "/usr/local/lib/python3.6/dist-packages/grpc/_channel.py", line 703, in _next
raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.CANCELLED
details = "Locally cancelled by application!"
debug_error_string = "None"
>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.6/concurrent/futures/_base.py", line 324, in _invoke_callbacks
callback(self)
File "/usr/lib/python3.6/asyncio/futures.py", line 417, in _call_set_state
dest_loop.call_soon_threadsafe(_set_state, destination, source)
File "/usr/lib/python3.6/asyncio/base_events.py", line 637, in call_soon_threadsafe
self._check_closed()
File "/usr/lib/python3.6/asyncio/base_events.py", line 377, in _check_closed
raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
[2020-01-16 12:12:30,910: INFO/ForkPoolWorker-1] DISCOVERY: adding channel peers query
[2020-01-16 12:12:30,910: INFO/ForkPoolWorker-1] DISCOVERY: adding config query
[2020-01-16 12:12:30,910: INFO/ForkPoolWorker-1] DISCOVERY: adding chaincodes/collection query
[2020-01-16 12:12:30,931: ERROR/ForkPoolWorker-1] compositeTraintuple 4537dc402640c0a5d347702318d810412e48ae37f21fdf00e9b71ca3bd772745 wrong status todo: expecting doing (trie 1)
[2020-01-16 12:12:40,951: INFO/ForkPoolWorker-1] DISCOVERY: adding channel peers query
[2020-01-16 12:12:40,952: INFO/ForkPoolWorker-1] DISCOVERY: adding config query
[2020-01-16 12:12:40,952: INFO/ForkPoolWorker-1] DISCOVERY: adding chaincodes/collection query
[2020-01-16 12:12:40,969: ERROR/ForkPoolWorker-1] compositeTraintuple 4537dc402640c0a5d347702318d810412e48ae37f21fdf00e9b71ca3bd772745 wrong status todo: expecting doing (trie 2)
[2020-01-16 12:12:55,983: INFO/ForkPoolWorker-1] DISCOVERY: adding channel peers query
[2020-01-16 12:12:55,983: INFO/ForkPoolWorker-1] DISCOVERY: adding config query
[2020-01-16 12:12:55,983: INFO/ForkPoolWorker-1] DISCOVERY: adding chaincodes/collection query
[2020-01-16 12:12:56,057: ERROR/ForkPoolWorker-1] compositeTraintuple 4537dc402640c0a5d347702318d810412e48ae37f21fdf00e9b71ca3bd772745 wrong status todo: expecting doing (trie 3)
[2020-01-16 12:13:16,068: INFO/ForkPoolWorker-1] DISCOVERY: adding channel peers query
[2020-01-16 12:13:16,068: INFO/ForkPoolWorker-1] DISCOVERY: adding config query
[2020-01-16 12:13:16,068: INFO/ForkPoolWorker-1] DISCOVERY: adding chaincodes/collection query
[2020-01-16 12:13:16,089: ERROR/ForkPoolWorker-1] compositeTraintuple 4537dc402640c0a5d347702318d810412e48ae37f21fdf00e9b71ca3bd772745 wrong status todo: expecting doing (trie 4)
[2020-01-16 12:13:28,150: INFO/MainProcess] missed heartbeat from [email protected]
[2020-01-16 12:13:41,285: INFO/ForkPoolWorker-1] DISCOVERY: adding channel peers query
[2020-01-16 12:13:41,286: INFO/ForkPoolWorker-1] DISCOVERY: adding config query
[2020-01-16 12:13:41,286: INFO/ForkPoolWorker-1] DISCOVERY: adding chaincodes/collection query
[2020-01-16 12:13:41,386: ERROR/ForkPoolWorker-1] compositeTraintuple 4537dc402640c0a5d347702318d810412e48ae37f21fdf00e9b71ca3bd772745 wrong status todo: expecting doing (trie 5)
[2020-01-16 12:13:41,496: ERROR/ForkPoolWorker-1] Task substrapp.tasks.tasks.prepare_tuple[4537dc402640c0a5d347702318d810412e48ae37f21fdf00e9b71ca3bd772745] raised unexpected: composite
Traintuple 4537dc402640c0a5d347702318d810412e48ae37f21fdf00e9b71ca3bd772745 wrong status todo: expecting doing
Traceback (most recent call last):
[2020-01-16 12:13:28,150: INFO/MainProcess] missed heartbeat from [email protected]
[2020-01-16 12:13:41,285: INFO/ForkPoolWorker-1] DISCOVERY: adding channel peers query
[2020-01-16 12:13:41,286: INFO/ForkPoolWorker-1] DISCOVERY: adding config query
[2020-01-16 12:13:41,286: INFO/ForkPoolWorker-1] DISCOVERY: adding chaincodes/collection query
[2020-01-16 12:13:41,386: ERROR/ForkPoolWorker-1] compositeTraintuple 4537dc402640c0a5d347702318d810412e48ae37f21fdf00e9b71ca3bd772745 wrong status todo: expecting doing (trie 5)
[2020-01-16 12:13:41,496: ERROR/ForkPoolWorker-1] Task substrapp.tasks.tasks.prepare_tuple[4537dc402640c0a5d347702318d810412e48ae37f21fdf00e9b71ca3bd772745] raised unexpected: composite
Traintuple 4537dc402640c0a5d347702318d810412e48ae37f21fdf00e9b71ca3bd772745 wrong status todo: expecting doing
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/hfc/fabric/client.py", line 1711, in chaincode_invoke
timeout=wait_for_event_timeout)
File "/usr/lib/python3.6/asyncio/tasks.py", line 362, in wait_for
raise futures.TimeoutError()
concurrent.futures._base.TimeoutError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/src/app/substrapp/ledger_utils.py", line 176, in call_ledger
response = loop.run_until_complete(chaincode_calls[call_type](**params))
File "/usr/lib/python3.6/asyncio/base_events.py", line 484, in run_until_complete
return future.result()
File "/usr/local/lib/python3.6/dist-packages/hfc/fabric/client.py", line 1721, in chaincode_invoke
raise TimeoutError('waitForEvent timed out.')
TimeoutError: waitForEvent timed out.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/src/app/substrapp/ledger_utils.py", line 340, in _update_tuple_status
update_ledger(fcn=invoke_fcn, args=invoke_args, sync=True)
File "/usr/src/app/substrapp/ledger_utils.py", line 116, in _wrapper
return fn(*args, **kwargs)
File "/usr/src/app/substrapp/ledger_utils.py", line 249, in update_ledger
return _invoke_ledger(*args, **kwargs)
File "/usr/src/app/substrapp/ledger_utils.py", line 228, in _invoke_ledger
response = call_ledger('invoke', fcn=fcn, args=args, kwargs=params)
File "/usr/src/app/substrapp/ledger_utils.py", line 178, in call_ledger
raise LedgerTimeout(str(e))
substrapp.ledger_utils.LedgerTimeout: waitForEvent timed out.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/celery/app/trace.py", line 382, in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/celery/app/trace.py", line 641, in __protected_call__
return self.run(*args, **kwargs)
File "/usr/src/app/substrapp/tasks/tasks.py", line 449, in prepare_tuple
log_start_tuple(tuple_type, subtuple['key'])
File "/usr/src/app/substrapp/ledger_utils.py", line 346, in log_start_tuple
_update_tuple_status(tuple_type, tuple_key, 'doing')
File "/usr/src/app/substrapp/ledger_utils.py", line 342, in _update_tuple_status
_wait_until_status_after_timeout(tuple_type, tuple_key, status)
File "/usr/src/app/substrapp/ledger_utils.py", line 298, in _wait_until_status_after_timeout
f'{tuple_type} {tuple_key} wrong status {status}: expecting {expected_status}')
substrapp.ledger_utils.LedgerTimeoutNotHandled: compositeTraintuple 4537dc402640c0a5d347702318d810412e48ae37f21fdf00e9b71ca3bd772745 wrong status todo: expecting doing
[2020-01-16 12:17:40,035: INFO/MainProcess] missed heartbeat from [email protected]
anyway, I think we need be able to create a new smart contract to handle the retry on a tuple.
from substra-backend.
Related Issues (20)
- Algo fetch: HTTP errors aren't displayed in the logs HOT 4
- Local setup (skaffold): Incorrect DNS assumptions
- Cannot launch using docker-compose HOT 1
- Error 500 when linking dataset with datasamples
- Error 400 when adding a dataset HOT 3
- single-node compute plan: worker tries to delete missing image HOT 2
- Chart not compatible with K8S >=1.16 HOT 1
- Document password requirements on the Helm chart values
- 502 when under load HOT 5
- [Edge case] Crash when composite traintuple head and trunk models are identifcal HOT 1
- Evicted training tasks pods are never deleted
- Serializers for the CompositeTraintuple are not consistent (single vs compute plan)
- Can't get a stable deployment using line 0.7.1 of the compatibility table HOT 2
- authenticate_worker in Process has a strange behavior (not systematic)
- Ease the process to add extra volumes for local data registration HOT 2
- Add an auto-cleanup of old docker-registry images
- Is this project dead? HOT 1
- installation of backend with helm does not seem to work with microk8s HOT 1
- which backend url do I use to instantiate a Client? HOT 1
- BUG: container kaniko exit 1 Susbtra-backend crash HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from substra-backend.