Comments (3)
I have a few questions and comments:
- Are the children you're running sending their answers to a push endpoint? This would explain both why no delivery acknowledgement is being received by the child itself and why messages continue to be received from the child (by the push endpoint rather than the parent service itself). However, I'm not sure how you could be waiting for an answer on a question where you've set a push endpoint because an error should be raised if trying to wait for an answer from a push subscription.
- As far as I can see, delivery acknowledgements are always sent. Can you see whether the child has logged sending a delivery acknowledgement message in a case when one isn't received by the parent?
- You could be right about the delivery acknowledgement being sent before the parent starts listening. I think I may have seen this mess up answers in manual testing. My fix for this should stop this being a problem as long as at least one subsequent message from the child is received by the parent.
- It would be very useful to see an example of a service experiencing an error and not terminating. I'm not sure how this could happen currently unless the question is being answered by more than one instance of the service because of the known question answering duplication that happens in Cloud Run due to Pub/Sub limitations and one instance fails while another continues
from octue-sdk-python.
I just went through the logs above and followed question acee1bcd-19cf-4609-8ef6-d5392150a393
through from being asked to the point of no delivery acknowledgement being received by the parent. I don't see evidence of the question running, any results being returned for it, or any other strange behaviour - just the child failing to acknowledge that one question. Am I missing something?
from octue-sdk-python.
Are the children you're running sending their answers to a push endpoint? This would explain both why no delivery acknowledgement is being received by the child itself and why messages continue to be received from the child (by the push endpoint rather than the parent service itself). However, I'm not sure how you could be waiting for an answer on a question where you've set a push endpoint because an error should be raised if trying to wait for an answer from a push subscription.
Power loss service is sending results to a push endpoint. Child wake calculations are pull subscriptions using wait_for_answer in power loss. The results (any events) from power loss gets sent to the main worker cloud run service.
As far as I can see, delivery acknowledgements are always sent. Can you see whether the child has logged sending a delivery acknowledgement message in a case when one isn't received by the parent?
I think these are debug level logs. Haven't seen this getting logged in wake service. The child must be sending the acknowledgement but the parent doesn't receive it for whatever reason.
It would be very useful to see an example of a service experiencing an error and not terminating. I'm not sure how this could happen currently unless the question is being answered by more than one instance of the service because of the known question answering duplication that happens in Cloud Run due to Pub/Sub limitations and one instance fails while another continues
I'll try to find questions which has this. I'm not at my computer at the moment and these questions are tough to find among others because they do end up getting answers.
What happens is that the push endpoint, where power loss sends it's events to, receives an exception. We handle the exception and the question gets marked as failed. Meanwhile, wakes continue to run and power loss receives wake results after some time (as usual) and posts it's own result to the end point. The endpoint receives this result and our handler marks the question as complete.
From what I've seen, when this happens the exception posted is the delivery timeout error which occurs in wait_for_answer. So, exceptions raised not in the child and not in the parent's calculation but inside the message handler in wait_for_answer.
I just went through the logs above and followed question acee1bcd-19cf-4609-8ef6-d5392150a393 through from being asked to the point of no delivery acknowledgement being received by the parent. I don't see evidence of the question running, any results being returned for it, or any other strange behaviour - just the child failing to acknowledge that one question. Am I missing something?
I think so. Power loss logging Wake efficiencies from the wake service is {'WP1': array([0.58601053, 0.78744018,
would indicate that it's gotten results from all running wake services. Assuming, ask multiple exits when one of the child services raises any exception. (Or it might not be re-raising the exception as mentioned in the previous point). Check the worker for service events. I'm out right now but will che vk the rest of the logs for this question and update this comment if necessary
from octue-sdk-python.
Related Issues (20)
- Speed up manifest, dataset, and datafile instantiation/validation HOT 1
- Consider removing delivery acknowledgement timeouts and relying solely on heartbeats
- Consider adding `**kwargs` to `Manifest.__init__` to increase inter-service compatibility if new argument added to `Manifest` constructor
- Add `ignore_stored_metadata` option to service configuration
- No delivery acknowledgement error in wait_for_answer
- Crash analytics failing to upload HOT 1
- Include more debug information in the crash diagnostics HOT 3
- Use batching to get and update datafile/dataset metadata
- Consider removing `deploy` commands from CLI HOT 2
- Provide default dockerfiles with python3.10+ bases HOT 2
- Adjust analysis log handling to not remove current log handlers
- Use more straightforward logging setup
- No data on compatibility of parent SDK version 0.43.5 and child SDK version 0.45.0 HOT 1
- Deduce service name in monitor message essentials functions
- Move monitor message essentials functions on to `Analysis` class
- Test issue for testing issue/project sync HOT 3
- Combine definitions in a glossary in docs?
- Test issue
- Test issue
- Should `latest` refer to a real tag or to the latest tag? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from octue-sdk-python.