Giter VIP home page Giter VIP logo

backbeat's People

Contributors

bauerjon avatar i-tier-bot avatar jenniferalons avatar keithwiersema avatar kevbuchanan avatar larrylv avatar minostro avatar nchainani avatar sydneycodes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

backbeat's Issues

Signalling without a subject...

The primary cause of #12 was that the client was signalling Activities with no subject and no parent context. This resulted in the Server.create_workflow function finding/creating a Workflow with a nil subject (Thanks, Ruby), and thereafter all subjectless signals are now scheduled by the same Workflow.

I propose that if a Client signals without a subject, we should assume the signal is a one-off with no associated workflow, and instead of nil, give it a SecureRandom.uuid. The Principle of Least Surprise says that if we signal an activity without a subject, it should not surprise you by joining a list of other activities in a workflow.

Extensability/Customizability and Configuration is difficult with a clone-and-use app

I'm currently working on adding some additional instrumentation to the project and running into a bit of a problem figuring out how to extend or add instrumentation to the app. The kind of instrumentation I want to add (Zipkin, etc and Steno) is not something I would consider globally-applicable to the open-source community.
The first option is to just fork and modify the project.
While this would obviously be the most straightforward and easy way to get the changes I want, I don't see that as a sustainable way of using Backbeat and don't want to encourage that kind of behavior.
What I would actually prefer is a way of putting Backbeat into a container which I control, then using a public configuration API to add the instrumentation I need, just as Rails does.

I think this is a far better way forward with the project, forcing modularity and a clear API for modifications. Since Backbeat is already a Rack application, it is relatively easy to pack it up into a directory and then point a new Rackup file to the backbeat App.
To take this far enough to be viable, I think that we should consider repackaging Backbeat Server as a Gem in a manner similar to Rails, with a simple generator to produce the environment configuration files. Then we would be able to VCS the configuration privately and add Groupon-specific tweaks/tooling to it, while maintaining a clean and configurable project for the Open Source community.

Make retry scheduling calculation configurable

Our retry logic currently makes it impossible to have a task that retries at T+0, T+5, T+25, t+125. The minimum amount of time for the first retry is 4 minutes, and can be up to 60 minutes. The interval input makes almost no difference.

I've published a branch with a new backoff algorithm, but there's some concern about how it will affect existing activities.

In order to satisfy all clients, we will need to implement a backoff selection mechanism in the server. In addition to selecting the number of retries and the interval between, clients may now also choose a specific backoff algorithm, which will be implemented as a backoff strategy in the scheduler that inserts the next attempt on an activity into Sidekiq. In order to make it possible to calculate backoff factors, we will also need to add a max_retries column to the node state, because the node retries are decremented fields, and there is no way of telling otherwise how far from the origin number they are.

_IMPL_

Modify the Backbeat::Server class, where it creates the NodeDetail object for the node. Here, we will accept an additional parameter, backoff, which will be one of: [:legacy, :exponential, :constant].
When instantiating the NodeDetail object, automatically record the retries as max_retries.

Write a database migration to add the backoff string(32) column to the node_details table.
Write a database migration to add the max_retries int column to the node_details table.

Modify the Backbeat::Schedulers::ScheduleRetry singleton as follows:
Isolate the computation of the time, only update the node and return the time which should be used to schedule the async job. Delegate the backoff calculation to an instance of BackoffCalculator, which accepts a node.

Modify Backbeat::Node such that it delegates backoff to :node_detail.

Write a BackoffCalculator, which accepts a node and an optional time which defaults to Time.now. This class reads the node.backoff and selects a BackoffStrategy function based on a mapping of that value to these functions. If no backoff is specified, default to :legacy

The BackoffStrategy always accepts a retry_number and returns a time. It uses the time that may be injected into BackoffCalculator.

Implement the three strategies:

  • :legacy => Same as is currently present in the ScheduleRetry block.
  • :exponential => Calculates retry as an exponential function with three products; the exponential product: 2^r, where ris the number of the retry, a stampede-reduction factor randomly chosen between 0.8 and 1.2, and finally the retry_interval.
  • :constant => Calculates retry as a simple addition, adding the retry_interval to the current time.

_AC_
Unit tests should be written for the changes.
Integration tests should be updated to use a variety of the backoff strategies.
Backbeat integration tests pass

Integration branch: master

BACKBEAT_SERVER_VERSION=exponential-backoff bundle exec rake docker:test

Can backbeat server sits directly on top of a Rails app?

This service looks pretty slick! Great job!

If I have a Rails app and the app needs a workflow, is there a way to directly include Backbeat server as part of that Rails app? Or do I have to make Backbeat server as its own service?

Single Client can monopolize server with poorly-defined job subject

We had an issue today where a mistake caused blocking jobs to be queued onto the same workflow. When the size of this workflow grew, it caused a resource contention issue on the table, and on the JVM as GC began to intensify. In the end, the entire server was monopolized by repeated calls to ScheduleNextNode for the same workflow.

While we cannot prevent clients from behaving badly and creating massive backlogs of workflows, we can reduce the potential for one job to monopolize the worker pool.

A co-worker found: https://github.com/mhenrixon/sidekiq-unique-jobs. What do you think?

How do clients initiate a new workflow without supporting the legacy_type /decision endpoint?

I'm misunderstanding the API I think.

It seems to me that a client must implement the /decision endpoint for the first activity in a workflow, because the /signal behavior of Server seen here defaults the legacy_type to decision, which causes the client to attempt to contact the decision callback.

What's the expected interaction pattern? Here, I'll post a big GIST with my test in it.

My issue is that in the integration test, the client fails to call the java app because it would call the decision endpoint, because the node is legacy_type: decision, after using the signal API.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.