usc-cloud / floe2 Goto Github PK

View Code? Open in Web Editor NEW

3.0 3.0 1.0 1.71 MB

USC Dynamic Continuous Dataflow framework.

License: Apache License 2.0

Shell 0.11% Python 0.69% Thrift 0.41% Java 98.80%

floe2's People

Contributors

Stargazers

Watchers

Forkers

lazycrazyowl

floe2's Issues

[P1] Remove Duplicate serialization/de-serialization

*Message dispatcher should not have to de-serialize the messages. Let the transport do serialization ( Do serialization at the end just before sending out message)

Multiple containers in pusedo distributed mode do not work.

Multiple containers on a single machine use same port numbers to assign to the flakes. Allow an option to set it at container level rather than at coordinator level.

Channel grouping and message dispersion feature -- How messages are distributed when multiple pellets subscribe for same output stream.

Currently messages are duplicated to different pellets (and-split). e.g. (task parallelism)

This feature will add an ability to selectively send messages to desired downstream pellets. (multi-select split)

We will use named streams to implement this feature.

i.e. a pellet can generate more than one named stream. The emitter interface will have an additional parameter which refers to the stream name. The stream names should static and be predefined by the pellet and cannot be changed at runtime.

A downstream neighbour can then subscribe not only to the pellet, but also to the particular named stream of the pellet.

By default all pellets generate a stream named "DEFAULT_STREAM" and when subscribed without any parameters subscribe to the stream of the same name.

Implement consistent hashing dispersion strategy for elastic mapreduce

depends on #11, #19, #20, #21

create and upload dev. design docs (implementation details)

Enable (ZK) notifications for application start/stop.

Currently, once the resource mapping is obtained from the resource manager, it is stored in ZK where individual containers monitor the ZK location and deploy any flakes assigned to them as required.

However, there is no way for the coordinator (and in turn, the user) to know whether all flakes have been deployed successfully.

[P0] Simplify Message Disptacher API

*Simple interface
*Plugable model

Implement StateManager (at flake level, not per pellet instance) for stateful pellets

Create installation and deployment guides

scale down stopped working.

Scale down does not work.
The resource mapping is correctly updated. The signal is received by the container but scale down action does not decrement the pellet.

Implement message ordering logic (combination of timestamp + serial numbers)

message grouping within a channel - how messages are distributed among multiple instances of the same pellet subscriber.

Move the messaging and subscription logic from PelletExecutor to Flake.

Currently..
The predecessor pellet instance creates and send a message meant directly for the succeeding pellet's instance. This is nice since the notion of "flake" is completely hidden from the preceding flake's dispersion strategy. It works directly with the instances.

But, on the flip side, even though flake is the unit of communication, multiple subscriptions and backchannel messages are sent per flake since each flake creates multiple pellet instances.

This also has implications on fault tolerance (e.g. in case of reducer pellets). (Will add more details later).

Runtime scale down feature and commandline api.

Add the feature to perform scale down at runtime (i.e. gracefully terminate PE instances)
-Add command line api to do so.
-Use the updated data structure from scale up to hold resource mapping 'delta'
-Update Container/AppsAssignmentMonitor to look for changes in the app assignment and respond to the scale down request.

Note: It is important to cleanly close data connections to ensure no messages are lost during scale down, and any messages received but pending in the queue should be processed before terminating the instance.

decouple the deployment into various phases (coordinated through zk).

Launching Flake Phase - launch all flakes and start listeners
Connect Flake Phase - Issue connection requests
Launch Pellet Phase - Launch required number of pellet instances per flake
Start Execution Phase - Signal "source" pellets to start execution.

Cntrl-C on components (especially container) does not close the socket connections cleanly

Cntrl-C should be handled gracefully without effecting any other components and should cleanly shutdown all sockets (tcp, inproc and most importantly ipc).

If scale up creates a new flake in different container, the connect command is not sent to th successor hence it receives no messages.

scalable streaming map reduce.

Multiple features need to be added as part of this.

NOTE: This issue is only for scalable MR (and not elastic MR). i.e. the number of reducers are decided at deployment.

channel grouping
channel dispersion interface and strategies
special pellet types for mapper and reducers
appbuilder apis

the kill command does not kill the application cleanly.

Running multiple pellet instances per flake does not work correctly.

This is a stupid regression caused due to an effort to optimize by eliminating the middle layer of communication between the message emitter and the multiple backends (one per edge in the application).

Fix: bring the middle layer back!! (needs more re-factoring to keep the back channel and other things intact)

Implement peer-checkpointing for state (specifically for MR scenario)

(intermittent) The scale down notification is sometimes not sent to the predecessor on the backchannel.

This is a timing issue. We terminate the flake before the message is actually sent. Need to fix #27 to fix this issue (and other possible message loss issues).

Extend stateful manager for reducer (just flake local, without fault tolerance)

Clean up state management apis to enable custom state managers.

Generic efficient state management to support transparent state migrations, checkpoint and fault-tolerance (extension to ICDCS)

Scale in with elastic map reduce does not work.

Similar to #17, decouple the runtime adaptation into various phases (coordinated through zk)

Extend the dispersion strategy interface to allow the specific strategy to construct the message to be sent on the back channel.

Currently Low priority since we are focusing on RR and MapReduce and both these strategies do not require an special message. Later to implement active load balancing, we will need this feature and will add it then.

UPDATE: The elastic MR strategy requires this feature since we will send the position on the ring on the backchannel.

There is a delay in sending backchannel message since both data, control and backchannel are on same thread.

Integration with Mesos/JClouds

Use Mesos/JClouds to acquire resources and resource management/mapping.

Extend state manager for stateful reducer pellets (with fault-tolerance using peer-checkpointing)

depends on #24

signallable pellets.

Ability to send signals to running pellets.

Note this should be on the same thread as data messages. This will let the user to respond to the signal without worrying about threading issues.
side effect is that during the time the signal is being processed, the data messages will start queuing in the zmq queues.

Support for custom flake data while scaling in/out (or when the flake status changes)

Need a clean API for flake token/data handler while scaling in/out

Implement simple ``neighbor'' failure detection using state checkpoint messages as heartbeats.

Scale up after all flakes have been terminated results in a blocked flake.

The flake first connects to the preceding flakes and then creates the pellet instance. But given the ZMQ's PUSH policy that the send method will block untill atleast one client connects, it leads to an issue since both the backend send and the control signal receive are on the same thread. Since backend send blocks, the control signal to increment pellet is never received and a deadlock occurs.

Implement ``State and Message'' recovery when a failure of the ``nearest'' neighbor is detected.

Ability to specify parallelism for each pellet in the graph.

ApplicationBuilder API as well as updates to the thrift definitions.
Update ClusterResourceManager to account for the requested parallelism.

Update the kill transition to use the generic BaseTransition

Dynamic Tasks (with a static list of alternate implementations).

Add support for dynamic pellets with AppBuilderAPI to easily create such dynamic tasks.

Add runtime support for dynamically switching the active alternate for a pellet.

NOTE: This issue is only related to dynamic alternate switching without synchronization/consistency issues. A different issue will be opened for that.

Runtime scale up feature and commandline api.

Add the feature to perform scale up/out at runtime depending on available resources.
-Add command line api to do so.
-Update data structure to hold resource mapping 'delta'
-Update Container/AppsAssignmentMonitor to look for changes in the app. assignment and respond to the scale up request.

Note: Active load balancing is not part of this and will be handled separately.

Decoupling from scale down feature since scaling down needs additional work to cleanly close data connections to ensure no messages are lost during scale down.

Implement peer-message backup (specifically for map-reduce scenario. Generic scenario will be considered later)

Clean up the "peer" coordination apis to enable custom coordination logic.

Message loss during scale down (specifically for reducers).

There may be message loss during scale down if there are messages in the flakes zmq input or output queue and we call a disconnect. This is a know issue and will be fixed as part of the fault tolerance framework.

Implement message replication using Multicast.

add a notify function to the MessageDispersionStrategy when scaling in.

Also need to add a special back channel message to do so.

Create a notification system for all the components of the flake to report start/stop to it.

This is required to make the "createFlake" and "terminateFlake" functions synchronous so that we do not start pellets before the flakes are ready.

Initialize the reducers ring and identify K neighbors.

AndSplit (i.e. multiple subscribers) feature does not work.

When multiple subscribers subscribe to the same stream, the stream should be duplicated across the multiple subscribers.
Currently, the stream is split (in a round robin way) across them.

Expected:

duplicate stream across multiple subscribers. (task parallelism)
split (in round robin, or load balanced way) across multiple instances of the same subscriber pellet (data parallelism).

performing scale up command in the container after scale down (which terminates the flake) does not work.

Issue is with heartbeats.
(done) Fix for previously terminated flakes: The last heartbeat sent by the flake will mark the flake as terminated which will make the container cleanup the flake and hence on scale up initialize a new flake.

Fix for failed flakes (where the heartbeat may stop without the terminate flag): Reduce flake-to-container heartbeat duration. Periodic cleanup if heartbeat is not received in time. (will file and fix as a separate issue as part of the fault tolerance feature).