groupon / backbeat Goto Github PK

View Code? Open in Web Editor NEW

28.0 28.0 15.0 4.29 MB

A workflow service for processing asynchronous tasks across distributed systems

License: BSD 3-Clause "New" or "Revised" License

Ruby 99.52% Shell 0.08% HTML 0.40%

backbeat's People

Contributors

Stargazers

Watchers

Forkers

nchainani courtneyb sydneycodes ismailmechbal sstanford minostro azizur77 phirefly shuo-zhang-zz ed-barahona keithwiersema wlapham marcoshmuniz isabella232

backbeat's Issues

Signalling without a subject...

The primary cause of #12 was that the client was signalling Activities with no subject and no parent context. This resulted in the Server.create_workflow function finding/creating a Workflow with a nil subject (Thanks, Ruby), and thereafter all subjectless signals are now scheduled by the same Workflow.

I propose that if a Client signals without a subject, we should assume the signal is a one-off with no associated workflow, and instead of nil, give it a SecureRandom.uuid. The Principle of Least Surprise says that if we signal an activity without a subject, it should not surprise you by joining a list of other activities in a workflow.

Extensability/Customizability and Configuration is difficult with a clone-and-use app

I'm currently working on adding some additional instrumentation to the project and running into a bit of a problem figuring out how to extend or add instrumentation to the app. The kind of instrumentation I want to add (Zipkin, etc and Steno) is not something I would consider globally-applicable to the open-source community.
The first option is to just fork and modify the project.
While this would obviously be the most straightforward and easy way to get the changes I want, I don't see that as a sustainable way of using Backbeat and don't want to encourage that kind of behavior.
What I would actually prefer is a way of putting Backbeat into a container which I control, then using a public configuration API to add the instrumentation I need, just as Rails does.

I think this is a far better way forward with the project, forcing modularity and a clear API for modifications. Since Backbeat is already a Rack application, it is relatively easy to pack it up into a directory and then point a new Rackup file to the backbeat App.
To take this far enough to be viable, I think that we should consider repackaging Backbeat Server as a Gem in a manner similar to Rails, with a simple generator to produce the environment configuration files. Then we would be able to VCS the configuration privately and add Groupon-specific tweaks/tooling to it, while maintaining a clean and configurable project for the Open Source community.

Backbeat Integration Tests Open Sourced

@keithwiersema does the team still run the backbeat integration tests with the docker setup? Would this be something you could ask to be open sourced?

Thanks!

a node can keep adding child-nodes even after it's complete

This is because there is no validation around node's status in the "/decisions" endpoint.

Make retry scheduling calculation configurable

Our retry logic currently makes it impossible to have a task that retries at T+0, T+5, T+25, t+125. The minimum amount of time for the first retry is 4 minutes, and can be up to 60 minutes. The interval input makes almost no difference.

I've published a branch with a new backoff algorithm, but there's some concern about how it will affect existing activities.

In order to satisfy all clients, we will need to implement a backoff selection mechanism in the server. In addition to selecting the number of retries and the interval between, clients may now also choose a specific backoff algorithm, which will be implemented as a backoff strategy in the scheduler that inserts the next attempt on an activity into Sidekiq. In order to make it possible to calculate backoff factors, we will also need to add a max_retries column to the node state, because the node retries are decremented fields, and there is no way of telling otherwise how far from the origin number they are.

_IMPL_

Modify the Backbeat::Server class, where it creates the NodeDetail object for the node. Here, we will accept an additional parameter, backoff, which will be one of: [:legacy, :exponential, :constant].
When instantiating the NodeDetail object, automatically record the retries as max_retries.

Write a database migration to add the backoff string(32) column to the node_details table.
Write a database migration to add the max_retries int column to the node_details table.

Modify the Backbeat::Schedulers::ScheduleRetry singleton as follows:
Isolate the computation of the time, only update the node and return the time which should be used to schedule the async job. Delegate the backoff calculation to an instance of BackoffCalculator, which accepts a node.

Modify Backbeat::Node such that it delegates backoff to :node_detail.

Write a BackoffCalculator, which accepts a node and an optional time which defaults to Time.now. This class reads the node.backoff and selects a BackoffStrategy function based on a mapping of that value to these functions. If no backoff is specified, default to :legacy

The BackoffStrategy always accepts a retry_number and returns a time. It uses the time that may be injected into BackoffCalculator.

Implement the three strategies:

:legacy => Same as is currently present in the ScheduleRetry block.
:exponential => Calculates retry as an exponential function with three products; the exponential product: 2^r, where ris the number of the retry, a stampede-reduction factor randomly chosen between 0.8 and 1.2, and finally the retry_interval.
:constant => Calculates retry as a simple addition, adding the retry_interval to the current time.

_AC_
Unit tests should be written for the changes.
Integration tests should be updated to use a variety of the backoff strategies.
Backbeat integration tests pass

Integration branch: master

BACKBEAT_SERVER_VERSION=exponential-backoff bundle exec rake docker:test

Can backbeat server sits directly on top of a Rails app?

This service looks pretty slick! Great job!

If I have a Rails app and the app needs a workflow, is there a way to directly include Backbeat server as part of that Rails app? Or do I have to make Backbeat server as its own service?

Single Client can monopolize server with poorly-defined job subject

We had an issue today where a mistake caused blocking jobs to be queued onto the same workflow. When the size of this workflow grew, it caused a resource contention issue on the table, and on the JVM as GC began to intensify. In the end, the entire server was monopolized by repeated calls to ScheduleNextNode for the same workflow.

While we cannot prevent clients from behaving badly and creating massive backlogs of workflows, we can reduce the potential for one job to monopolize the worker pool.

A co-worker found: https://github.com/mhenrixon/sidekiq-unique-jobs. What do you think?

How do clients initiate a new workflow without supporting the legacy_type /decision endpoint?

I'm misunderstanding the API I think.

It seems to me that a client must implement the /decision endpoint for the first activity in a workflow, because the /signal behavior of Server seen here defaults the legacy_type to decision, which causes the client to attempt to contact the decision callback.

What's the expected interaction pattern? Here, I'll post a big GIST with my test in it.

My issue is that in the integration test, the client fails to call the java app because it would call the decision endpoint, because the node is legacy_type: decision, after using the signal API.

groupon / backbeat Goto Github PK

backbeat's People

Contributors

Stargazers

Watchers

Forkers

backbeat's Issues

Signalling without a subject...

Extensability/Customizability and Configuration is difficult with a clone-and-use app

Backbeat Integration Tests Open Sourced

a node can keep adding child-nodes even after it's complete

Make retry scheduling calculation configurable

Can backbeat server sits directly on top of a Rails app?

Single Client can monopolize server with poorly-defined job subject

How do clients initiate a new workflow without supporting the legacy_type /decision endpoint?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent