Giter VIP home page Giter VIP logo

tapir's Introduction

TAPIR

This repository includes code implementing TAPIR -- the Transaction Application Protocol for Inconsistent Replication. This code was used for the SOSP 2015 paper, "Building Consistent Transactions with Inconsistent Replication."

TAPIR is a new protocol for linearizable distributed transactions built using replication with no consistency guarantees. By enforcing consistency only at the transaction layer, TAPIR eliminates coordination at the replication layer, enabling TAPIR to provide the same transaction model and consistency guarantees as existing systems, like Spanner, with better latency and throughput.

In addition to TAPIR, this repo includes several other useful implementations of distributed systems, including:

  1. An implementation of a lock server designed to work with inconsistent replication (IR), our high-performance, unordered replication protocol.

  2. An implementation of Viewstamped Replication (VR), detailed in this older paper and this more recent paper.

  3. An implementation of a scalable, distributed storage system designed to work with VR that uses two-phase commit to support distributed transactions and supports both optimistic concurrency control and strict two-phase locking.

The repo is structured as follows:

  • /lib - the transport library for communication between nodes. This includes UDP based network communcation as well as the ability to simulate network conditions on a local machine, including packet delays and reorderings.

  • /replication - replication library for the distributed stores

    • /vr - implementation of viewstamped replication protocol
    • /ir - implementation of inconsistent replication protocol
  • /store - partitioned/sharded distributed store

    • /common - common data structures, backing stores and interfaces for all of stores
    • /tapirstore - implementation of TAPIR designed to work with IR
    • /strongstore - implementation of both an OCC-based and locking-based 2PC transactional storage system, designed to work with VR
    • /weakstore - implementation of an eventually consistent storage system, using quorum writes for replication
  • /lockserver - a lock server designed to be used with IR

Compiling & Running

You can compile all of the TAPIR executables by running make in the root directory

TAPIR depends on protobufs, libevent and openssl, so you will need the following development libraries:

  • libprotobuf-dev
  • libevent-openssl
  • libevent-pthreads
  • libevent-dev
  • libssl-dev
  • protobuf-compiler

Contact and Questions

Please email Irene at [email protected], Dan at [email protected] and Naveen at [email protected]

tapir's People

Contributors

iyzhang avatar maximecaron avatar mwhittaker avatar nkrsharma avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tapir's Issues

Can't parse message of type "replication.vr.proto.UnloggedReplyMessage" because it is missing required fields: clientreqid

running the tests on osx 10.11.6 (el capitan) using this compiler:

jaten@jatens-MacBook-Pro ~/go/src/github.com/UWSysLab/tapir (master) $ gcc -v
Configured with: --prefix=/Applications/Xcode-beta.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 8.0.0 (clang-800.0.36.1)
Target: x86_64-apple-darwin15.6.0
Thread model: posix
InstalledDir: /Applications/Xcode-beta.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
jaten@jatens-MacBook-Pro ~/go/src/github.com/UWSysLab/tapir (master) $ 

I find the following test failure and hang:

jaten@jatens-MacBook-Pro ~/go/src/github.com/UWSysLab/tapir (master) $ make test
make test
WARNING: Paranoid mode enabled
+ CC     /Users/jaten/go/src/github.com/google/googletest/googletest/src/gtest-all.cc
+ CC     /Users/jaten/go/src/github.com/google/googletest/googletest/src/gtest_main.cc
+ AR     .obj/gtest/gtest_main.a
ar: creating archive .obj/gtest/gtest_main.a
a - .obj/gtest/gtest-all.o
a - .obj/gtest/gtest_main.o
+ LD     lib/tests/configuration-test
+ RUN    lib/tests/configuration-test
Running main() from gtest_main.cc
[==========] Running 7 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 7 tests from Configuration
[ RUN      ] Configuration.Basic
[       OK ] Configuration.Basic (0 ms)
[ RUN      ] Configuration.Multicast
[       OK ] Configuration.Multicast (0 ms)
[ RUN      ] Configuration.Quorum
[       OK ] Configuration.Quorum (0 ms)
[ RUN      ] Configuration.Leader
[       OK ] Configuration.Leader (0 ms)
[ RUN      ] Configuration.FromFile
[       OK ] Configuration.FromFile (0 ms)
[ RUN      ] Configuration.AddressEquality
[       OK ] Configuration.AddressEquality (0 ms)
[ RUN      ] Configuration.Equality
[       OK ] Configuration.Equality (0 ms)
[----------] 7 tests from Configuration (0 ms total)

[----------] Global test environment tear-down
[==========] 7 tests from 1 test case ran. (0 ms total)
[  PASSED  ] 7 tests.
+ CC     lib/tests/simtransport-test.cc
+ CC     lib/simtransport.cc
+ CC     .obj/gen/lib/tests/simtransport-testmessage.pb.cc
+ LD     lib/tests/simtransport-test
+ RUN    lib/tests/simtransport-test
Running main() from gtest_main.cc
[==========] Running 8 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 8 tests from SimTransportTest
[ RUN      ] SimTransportTest.Basic
[       OK ] SimTransportTest.Basic (0 ms)
[ RUN      ] SimTransportTest.Filter
[       OK ] SimTransportTest.Filter (0 ms)
[ RUN      ] SimTransportTest.FilterModify
[       OK ] SimTransportTest.FilterModify (0 ms)
[ RUN      ] SimTransportTest.FilterDelay
[       OK ] SimTransportTest.FilterDelay (0 ms)
[ RUN      ] SimTransportTest.FilterPriority
[       OK ] SimTransportTest.FilterPriority (0 ms)
[ RUN      ] SimTransportTest.Timer
[       OK ] SimTransportTest.Timer (0 ms)
[ RUN      ] SimTransportTest.TimerCancel
[       OK ] SimTransportTest.TimerCancel (0 ms)
[ RUN      ] SimTransportTest.Timeout
[       OK ] SimTransportTest.Timeout (0 ms)
[----------] 8 tests from SimTransportTest (0 ms total)

[----------] Global test environment tear-down
[==========] 8 tests from 1 test case ran. (0 ms total)
[  PASSED  ] 8 tests.
+ CC     replication/vr/tests/vr-test.cc
+ LD     replication/vr/tests/vr-test
+ RUN    replication/vr/tests/vr-test
Running main() from gtest_main.cc
[==========] Running 22 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 22 tests from Batching/VRTest
[ RUN      ] Batching/VRTest.OneOp/0
[       OK ] Batching/VRTest.OneOp/0 (1 ms)
[ RUN      ] Batching/VRTest.OneOp/1
20170108-090252-5531 64362 * VRReplica       (replica.cc:71):    Batching enabled; batch size 8
20170108-090252-5536 64362 * VRReplica       (replica.cc:71):    Batching enabled; batch size 8
20170108-090252-5536 64362 * VRReplica       (replica.cc:71):    Batching enabled; batch size 8
[       OK ] Batching/VRTest.OneOp/1 (0 ms)
[ RUN      ] Batching/VRTest.Unlogged/0
[libprotobuf ERROR google/protobuf/message_lite.cc:121] Can't parse message of type "replication.vr.proto.UnloggedReplyMessage" because it is missing required fields: clientreqid
20170108-090252-5538 64362 ! UnloggedRequestTimeoutCallback (client.cc:201):    Unlogged request timed out
... (hangs here)...

Typo in HandleDoViewChange

// If we're a recovering node, we don't want to be the leader.
if (status == STATUS_NORMAL) {
Debug("Ignoring DO-VIEW-CHANGE for view %" PRIu64
" because our status is RECOVERING.",
view);
return;
}
Seems to be a typo. It should be if (status == STATUS_RECOVERING) in Line 289, right?

Liveness bug in VSR implementation: Client table and uncommitted ops after view change

I believe there may be a very subtle liveness bug in https://github.com/UWSysLab/tapir/blob/master/replication/vr/replica.cc#L450 where the client table (effectively a record of committed replies) is touched on both the prepare and commit paths.

However, there is a difference between uncommitted ops and committed ops, as uncommitted ops may not survive a view change. Yet the implementation does not appear to account for this by fixing up the client table after a view change if it was modified by prepared ops that did not survive. This can then cause some client requests to be permanently blocked out, treated as duplicates, while they were never actually committed to the client table.

A cleaner approach might be to use the client table only for a single purpose i.e. only for committed data, and then to use the inflight pipeline to dedupe any uncommitted inflight ops. This way the client table never needs to be patched up after a view change.

Benchmark clients do not abort

Regarding the benchmark client implementation in store/benchmark

When Get(.) returns an error, due to lock contention at the server, the transaction should abort (as said in the Warning message in retwisClient.cc, for example). However, I didn't see client->Abort(.) anywhere in the code. Am I missing something?

I think it's important to Abort in order to release the read locks. For example, a transaction may have 2 reads R1 and R2, R1 gets the lock successful, but R2 fails to get the lock. Not aborting the transaction make R1's lock "dangling", preventing any future write.

run tests

After downloading gtest from https://github.com/google/googletest and change the GTEST_DIR := $(HOME)/Desktop/googletest/googletest (my path for the googletest) i get the error

  • CC /home/sarantitis/Desktop/googletest/googletest/src/gtest-all.cc
    /home/sarantitis/Desktop/googletest/googletest/src/gtest-all.cc:38:10: fatal error: gtest/gtest.h: No such file or directory
    38 | #include "gtest/gtest.h"
    | ^~~~~~~~~~~~~~~
    compilation terminated.
    make: *** [Makefile:225: .obj/gtest/gtest-all.o] Error 1
  • How to make it work?

go/running the tests

Tapir looks like a super interesting project!

I have two quick questions:

a) is anyone already doing go-bindings or a golang port?

a) how do I run the tests? gtest-all.cc seems to be missing...

# on osx 10.11.6 with c++ compiler: Apple LLVM version 8.0.0 (clang-800.0.36.1)
 ~github.com/UWSysLab/tapir (master) $ make test
make test
WARNING: Paranoid mode enabled
make: *** No rule to make target `.obj/gtest/gtest-all.o', needed by `.obj/gtest/gtest_main.a'.  Sto\
p.

About the YCSB+T benchmark

Hi, @iyzhang.

I have a question about the implementation of YCSB+T.

Where did you get the latest YCSB+T project?

The only place I can find to fork it is on Akon Dey's github.
https://github.com/akon-dey/YCSB

Could you help me to implement the ycsb+t benchmark correctly?

In his article he reports having such methods:

"
• doTransactionInsert() creates a new account with an
initial balance captured from doTransactionDelete() operation described below.

• doTransactionRead() reads a set of account balances
determined by the key generator.

• doTransactionScan() scans the database given the start
key and the number of records and fetches them from the
data base.

• doTransactionUpdate() reads a record and add $1 from
the balance captured from delete operations to it and write
it back.

• doTransactionDelete() reads an account record, add the
amount to the captured the balance (capture used in
doTransactionInsert()) and then deletes the record.

• doTransactionReadModifyWrite() reads two records,
subtracts $1 from the one of the two and adds $1 to
the other before writing them both back.
"

In the akon repository where I made the fork, I didn't find the implementation of the methods doTransactionInsert() , doTransactionRead() , doTransactionScan() , doTransactionUpdate() and doTransactionDelete().

I just noticed that the doTransactionReadModifyWrite() method is implemented, where it subtracts the value 1 from account A and assigns that value to account B.

Could you help me understand this part of the implementation?

Regards,
Caio

Server raises `Sync` unimplemented and ycsb-t stucks before finishing

Is it normal that server raise sync unimplemented when there is no client? I'm running 1 shard with 3 replicas on one machine at different port using

~/tapir/store/tapirstore/server -m txn-l -c ~/tapir/store/tools/shard0.config -i 2

However if I start ycsb-t client in time, the server stays.

Also, when I testing ycsb-t with 3 shards, 9 replicas in total, the throughput kept decreasing, and in the and the throughput reaches 0 with operationcount=1000000 not finished:

80 sec: 903644 operations; 0 current ops/sec; 
90 sec: 903644 operations; 0 current ops/sec; 
100 sec: 903644 operations; 0 current ops/sec;
110 sec: 903644 operations; 0 current ops/sec;
120 sec: 903644 operations; 0 current ops/sec;
130 sec: 903644 operations; 0 current ops/sec;
140 sec: 903644 operations; 0 current ops/sec;
150 sec: 903644 operations; 0 current ops/sec;
160 sec: 903644 operations; 0 current ops/sec;
170 sec: 903644 operations; 0 current ops/sec;
180 sec: 903644 operations; 0 current ops/sec;

Am I missing anything?

Correctness bug in VSR implementation: A replica in recovery status participates in view changes

The Viewstamped Replication Revisited paper in Section 4.2 requires that:

When a replica recovers after a crash it cannot participate in request processing and view changes until it has a state at least as recent as when it failed. If it could participate sooner than this, the system can fail. For example, if it forgets that it prepared some operation, this operation might then be known to fewer than a quorum of replicas even though it committed, which could cause the operation to be forgotten in a view change.

However, I believe there may be a bug in https://github.com/UWSysLab/tapir/blob/master/replication/vr/replica.cc#L833-L835 where a replica in recovery status is allowed by the implementation to participate in a higher view change, leading to data loss.

I found this while working on TigerBeetle's implementation of Viewstamped Replication, as I was doing a survey of existing implementations. By the way, Tapir's implementation of VSR is really nice and clean.

On a similar note, if anyone is interested, we just launched a $20k consensus challenge over at https://github.com/coilhq/viewstamped-replication-made-famous, where if you can find a correctness bug in an implementation of VSR you could earn bounties of up to $3,000.

The live launch event on Saturday also featured special interviews with Brian Oki and James Cowling, if you're a fan of the pioneering protocol and would like to take a watch: https://www.youtube.com/watch?v=_Jlikdtm4OA

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.