Giter VIP home page Giter VIP logo

elfstore's Introduction

ElfStore: Edge-local federated Store

Sumit K Monga, Sheshadri K R, Yogesh Simmhan.

To Appear as part of the Proceedings of the IEEE International Conference on Web Services, 2019

ElfStore, a first-of-its-kind edge-local federated store for streams of data blocks. It uses reliable fog devices as a super-peer overlay to monitor the edge resources, over federated metadata indexing using Bloom filters, locates data within 2-hops, and maintains approxi mate global statistics about the reliability and storage capacity of edges. Edges host the actual data blocks, and we use a unique diferential replication scheme to select edges on which to replicate blocks, to guarantee a minimum reliability and to balance storage utilization.

Command Line Interface (CLI) , credits : Ishan Sharma

Link to the paper : https://arxiv.org/pdf/1905.08932.pdf

Instructions to setup:

  1. maven compile to generate executable jar at the level of 'src' directory. ( mvn clean compile assembly:single )
  2. For setup and installation of Elfstore, please refer the Readme file which contain step-by-step instructions to test the Elfstore using the Command Line Interface (CLI) in your local machine.

Note: These software are research prototypes and made available on a best-effort basis, without any guarantees ๐Ÿ™‚ If you have any questions or comments, you can sent the respective author a note.

Copyright 2019 DREAM:Lab, Indian Institute of Science, Bangalore

Licensed under the Apache License, Version 2.0 (the "License"); you may not use files in this project except in compliance with the License. You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

elfstore's People

Contributors

ishansharma07 avatar sheshadri-1992 avatar simmhan avatar skmonga avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

nifey tmnnt

elfstore's Issues

Handling of edge failures wherein an edge device may miss heartbeats due to network link failures and comes back

In the current implementation, an edge device is removed from total usage in terms of read() and put() by its managing Fog if it misses a certain number of heartbeats. However this heartbeat misses may be due to the presence of network failures in which case the edge device may come back once the network failure is healed. What should be the proper behaviour of the Fog then in terms of allowing or discarding this edge device once it comes back?

Support for edge rejoining with previous blocks and their metadata

When an edge dies and is removed from the fog/its index, and it rejoins later with replicas, should we include its blocks (after MD5 verification) to available replicas? May cause excess reliability? Handle it using #13 ? What if it joins quickly and re-replication is ongoing?

Block recovery in a given setup with incremental edge failures works fine with few failures but all blocks are not recovered for later edge failures

It has been observed with both D20 and D272 setup that as edge devices fail incrementally, the complete recovery of blocks succeeds only for a few (2 in both setups) failures and for the following failures, all the blocks are not recovered properly. Ideally the system should be able to handle failures to the point when there is enough storage available to replicate the lost blocks (link this issue with replica identification to make sure the storage assumption holds true).

Print replica allocation state diagram.

As a part of verification for the correctness of the replica identification algorithm, print the following things which enables easy verification.

  • A state diagram depicting the choices made for replica allocation. (local allocation + remote allocation)
  • Print the Global Edge allocation table.
  • Min Replica, Max Replica and Expected Reliability constraints. (all this are stream level constraints)
  • The Reliability choices of the Edge and Fog which achieved the Expected Reliability.

Enable support for changing edge reliability with time

Currently we are assuming edge reliability to be a fixed value. However that is rarely the case as the reliability of the edge device changes with usage over time. We need to accomodate the changing reliability of the edge device in the future implementation.

Add support for updating the block data

This new feature allow updates for the data using FindBlock setting the force flag to true (details at #39). All replicas are fetched and their content is updated. The block should also maintain block version number as an internal property.

Handle metadata updates when edge dies

How do we update the bloom filter and metadata index when an edge dies?
Maintain one bloom filter per edge. Construct parent fog bloom filter as OR of all of these.
Loss of an edge requires just local bloom recomputation.

Recovery of blocks doesn't work to completion in presence of simultaneous edge failures

With simultaneous edge failures (2 to be precise) on a D20 setup, the blocks of neither of the two failing edge devices were recovered successfully.

One issue that can be causing failure in recovery might be a lack of implementation of java.util.concurrent.RejectedExecutionHandler which is a handler for tasks that cannot be executed by a ThreadPoolExecutor when there are no more threads or space left in the queue for holding the tasks. In such as case, the implementation of this interface may spin waiting to add the lost block into the recovery queue till it succeeds.

Add retry mechanism to handle write failures of edge

The current implementation finds the replica locations and these locations are then written to by the client. The order we took is : first write the local replica then write the remote ones. However in case if there are fog failures or the edge failures, then the write may not succeed and retry is needed to achieve the expected reliability. To meet the desiderata in terms of number of replicas and expected reliability, one order that can be taken is write the remote replicas first and lastly write the local replica.

Include a simple commandline shell

Have a (python based?) shell to perform add edge, create stream, append file, put block (with and w/o lock), update block, get file, get block, list stream, list block, etc.

Support checkpointing of Fog's state in case a fog restart is required

Currently a Fog maintains all its state in memory. As a result , if a Fog restart is done, all this state is lost which might require all its edge devices sending their data+metadata information along with the neighbors and buddies doing their part. Instead provide a facility wherein the Fog can checkpoint its state and can be restarted back with all the metadata preserved.

StreamMetadata object should also contain the mapping of blockId and its MD5 checksum as a dynamic property

Currently the StreamMetadata object doesn't contain the mapping of the blockId with its MD5 checksum. This mapping is only maintained at the fog where the stream was registered. As a result, when a client or a fog uses getStreamMetadata(), the object returned doesn't contain this mapping. However this mapping should not be returned to the client directly while it is useful for the fog devices (recovery is one of the possible scenarios where this is useful when a fog device dies). So this mapping should be added to the StreamMetadata object as a dynamic property.

Stress test verification for storage utlization

The past stress test runs show that even though storage is available on the edge devices, it is not used fully due to Fog devices unable to return heartbeat responses to the edge as well as its buddies and to whom it is a neighbor. This issue was seen when we were using TNonBlockingServer which uses a single network I/O thread which is also used for handling the request itself. Since the current implementation uses TThreadedSelectorServer which allows separate thread pools for network I/O as well as request handling, stress test reruns are required to validate the correct behaviour.

Efficient use of bloom filter to optimise for the number of network calls during GetBlock

For GetBlock, as part of the FindBlock, in the current implementation a fog on receiving the request checks its local data structure to see if it contains the block locally and adds itself to the list of candidate fogs. It then looks into the bloom filters of its neighbours and on possibility of containing the block, it makes an api call to the neighbour to check whether the neighbour actually contains it or not. These neighbours are also added to the list. Similar procedure is done for the buddies with one change, the buddy also contacts its own neighbours as well to find the suitable replica fogs. However this leads to a very high number of network calls which can be reduced by using the successful response from bloom filters as an indicator of possible containment of the block.

This can be done using only the contacted fog's local data and the bloom filters of its neighbours. If the bloom filter of a neighbour indicates that the block may be contained by the fog, then it will be returned as a candidate fog to the client. The client then uses this list and tries to read a block from them. In case the block is not found among these fogs, then the block is not present on the fog and its neighbors and thus the buddies should be contacted to get the data. A similar bloom filter handling can be done for the buddies as well. Also a force flag can be passed which when unset will do the efficient version mentioned above while a set flag uses the current implementation to make sure proper replicas are returned as part of FindBlock.

StreamMetadata should have a mechanism to allow sequential access to the blocks

The current implementation contains the bug in that the metadata has a last blockId maintained along with only a mapping of blockId and the MD5 checksum of it. Thus there is no way that blocks can be consumed in the order in which they were added to the stream thus not providing a sequential access pattern for the blocks. For the correction, a list of blocks can be used to fix it. Two cases need to be carefully looked into

  1. PutBlock with locking
  2. PutBlock without locking

For case 1, only one writer can append blocks at any time thus there is no race condition. However in case 2, multiple writers may be writing to the stream at the same time and there can be interleaving between the writers. It is not necessary that the last blockId should strictly increase and this interleaving is perfectly fine. So proper handling of race condition should be done i.e. atomic updates for the last blockId, list of blocks and mapping of blockId to the checksum as a whole should be done.

Maintain blacklisted fogs during writes to ensure same fogs are not requested in case of write failures

It might happen that replica identification returns a set of fogs but some of the fogs are not able to serve the write request due to several reasons (edge failures, edge storage below the disk watermark, etc). In such a case, during the retry phase, the api for replica identification must support a list of blacklisted fogs which will not be picked for writing again and will only search through the rest of the fogs.

Support searching blocks of data using metadata based query

Currently the system supports searching data blocks via the read() api which takes the blockId as the query parameter. This issue extends the search functionality using the metadata parameters of the data blocks apart from the blockId.

Proper exception handling for server and client side

Need proper exception handling on both the server and client side. The server side (FogServer and EdgeServer) has exception handling added which should be revisited while the client side lacks the exception handling currently.

Use 2-level statistics to form global stats

Buddy creates stats for its neighbors and itself. Passes intermediate stats to its buddy pool. O(mn + mb) messages instead of O(m) messages (m: # of peers, b: buddy pool size - 1; n: neighbor count).
This will require 1-level indirection to the buddy to decide the actual fog to hold replica. It will also make global stats more appox.

Fog to return actual edge reliabiliyt to client as part of put block op

Each fog where we place replica should return the exact reliability of the edge where the replica is being put on. This will help client calculate exact reliability of the block.
Question: what do we do with this info? Can we "drop" the last replica blocks if required reliability is achieved?

Add retry mechanism to handle write failures of fog

The current implementation finds the replica locations and these locations are then written to by the client. The order we took is : first write the local replica then write the remote ones. However in case if there are fog failures, then the write may not succeed and retry is needed to achieve the expected reliability.

Talk to parent fog to get alternative in case fog failure?
Related to general concept of handling fog failures/buddy failures, #22

Support additional apis such as readNext(), putNext(),dropStream()

We wish to implement the following features in the future:

  • readNext() which allows to search data blocks based on the previous search (similar to iterator pattern)
  • putNext() which allows to write data blocks to the same locations as the previous write without making a call to identify replicas
  • dropStream() which puts an end to the currently running stream by adding an endtime to it

Priority based recovery of blocks on edge failure

Right now, there is a priority across multiple edges that fail, but for a single edge failure, all blocks have same priority.

When an edge fails, can we recover blocks which have much lower reliability than others? How do we find the current reliability of the blocks?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.