Giter VIP home page Giter VIP logo

Comments (3)

cortadocodes avatar cortadocodes commented on June 28, 2024

I have a few questions and comments:

  • Are the children you're running sending their answers to a push endpoint? This would explain both why no delivery acknowledgement is being received by the child itself and why messages continue to be received from the child (by the push endpoint rather than the parent service itself). However, I'm not sure how you could be waiting for an answer on a question where you've set a push endpoint because an error should be raised if trying to wait for an answer from a push subscription.
  • As far as I can see, delivery acknowledgements are always sent. Can you see whether the child has logged sending a delivery acknowledgement message in a case when one isn't received by the parent?
  • You could be right about the delivery acknowledgement being sent before the parent starts listening. I think I may have seen this mess up answers in manual testing. My fix for this should stop this being a problem as long as at least one subsequent message from the child is received by the parent.
  • It would be very useful to see an example of a service experiencing an error and not terminating. I'm not sure how this could happen currently unless the question is being answered by more than one instance of the service because of the known question answering duplication that happens in Cloud Run due to Pub/Sub limitations and one instance fails while another continues

from octue-sdk-python.

cortadocodes avatar cortadocodes commented on June 28, 2024

I just went through the logs above and followed question acee1bcd-19cf-4609-8ef6-d5392150a393 through from being asked to the point of no delivery acknowledgement being received by the parent. I don't see evidence of the question running, any results being returned for it, or any other strange behaviour - just the child failing to acknowledge that one question. Am I missing something?

from octue-sdk-python.

nvn-nil avatar nvn-nil commented on June 28, 2024

Are the children you're running sending their answers to a push endpoint? This would explain both why no delivery acknowledgement is being received by the child itself and why messages continue to be received from the child (by the push endpoint rather than the parent service itself). However, I'm not sure how you could be waiting for an answer on a question where you've set a push endpoint because an error should be raised if trying to wait for an answer from a push subscription.

Power loss service is sending results to a push endpoint. Child wake calculations are pull subscriptions using wait_for_answer in power loss. The results (any events) from power loss gets sent to the main worker cloud run service.

As far as I can see, delivery acknowledgements are always sent. Can you see whether the child has logged sending a delivery acknowledgement message in a case when one isn't received by the parent?

I think these are debug level logs. Haven't seen this getting logged in wake service. The child must be sending the acknowledgement but the parent doesn't receive it for whatever reason.

It would be very useful to see an example of a service experiencing an error and not terminating. I'm not sure how this could happen currently unless the question is being answered by more than one instance of the service because of the known question answering duplication that happens in Cloud Run due to Pub/Sub limitations and one instance fails while another continues

I'll try to find questions which has this. I'm not at my computer at the moment and these questions are tough to find among others because they do end up getting answers.

What happens is that the push endpoint, where power loss sends it's events to, receives an exception. We handle the exception and the question gets marked as failed. Meanwhile, wakes continue to run and power loss receives wake results after some time (as usual) and posts it's own result to the end point. The endpoint receives this result and our handler marks the question as complete.

From what I've seen, when this happens the exception posted is the delivery timeout error which occurs in wait_for_answer. So, exceptions raised not in the child and not in the parent's calculation but inside the message handler in wait_for_answer.

I just went through the logs above and followed question acee1bcd-19cf-4609-8ef6-d5392150a393 through from being asked to the point of no delivery acknowledgement being received by the parent. I don't see evidence of the question running, any results being returned for it, or any other strange behaviour - just the child failing to acknowledge that one question. Am I missing something?

I think so. Power loss logging Wake efficiencies from the wake service is {'WP1': array([0.58601053, 0.78744018, would indicate that it's gotten results from all running wake services. Assuming, ask multiple exits when one of the child services raises any exception. (Or it might not be re-raising the exception as mentioned in the previous point). Check the worker for service events. I'm out right now but will che vk the rest of the logs for this question and update this comment if necessary

from octue-sdk-python.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.