Giter VIP home page Giter VIP logo

tapir's Issues

Correctness bug in VSR implementation: A replica in recovery status participates in view changes

The Viewstamped Replication Revisited paper in Section 4.2 requires that:

When a replica recovers after a crash it cannot participate in request processing and view changes until it has a state at least as recent as when it failed. If it could participate sooner than this, the system can fail. For example, if it forgets that it prepared some operation, this operation might then be known to fewer than a quorum of replicas even though it committed, which could cause the operation to be forgotten in a view change.

However, I believe there may be a bug in https://github.com/UWSysLab/tapir/blob/master/replication/vr/replica.cc#L833-L835 where a replica in recovery status is allowed by the implementation to participate in a higher view change, leading to data loss.

I found this while working on TigerBeetle's implementation of Viewstamped Replication, as I was doing a survey of existing implementations. By the way, Tapir's implementation of VSR is really nice and clean.

On a similar note, if anyone is interested, we just launched a $20k consensus challenge over at https://github.com/coilhq/viewstamped-replication-made-famous, where if you can find a correctness bug in an implementation of VSR you could earn bounties of up to $3,000.

The live launch event on Saturday also featured special interviews with Brian Oki and James Cowling, if you're a fan of the pioneering protocol and would like to take a watch: https://www.youtube.com/watch?v=_Jlikdtm4OA

Server raises `Sync` unimplemented and ycsb-t stucks before finishing

Is it normal that server raise sync unimplemented when there is no client? I'm running 1 shard with 3 replicas on one machine at different port using

~/tapir/store/tapirstore/server -m txn-l -c ~/tapir/store/tools/shard0.config -i 2

However if I start ycsb-t client in time, the server stays.

Also, when I testing ycsb-t with 3 shards, 9 replicas in total, the throughput kept decreasing, and in the and the throughput reaches 0 with operationcount=1000000 not finished:

80 sec: 903644 operations; 0 current ops/sec; 
90 sec: 903644 operations; 0 current ops/sec; 
100 sec: 903644 operations; 0 current ops/sec;
110 sec: 903644 operations; 0 current ops/sec;
120 sec: 903644 operations; 0 current ops/sec;
130 sec: 903644 operations; 0 current ops/sec;
140 sec: 903644 operations; 0 current ops/sec;
150 sec: 903644 operations; 0 current ops/sec;
160 sec: 903644 operations; 0 current ops/sec;
170 sec: 903644 operations; 0 current ops/sec;
180 sec: 903644 operations; 0 current ops/sec;

Am I missing anything?

Typo in HandleDoViewChange

// If we're a recovering node, we don't want to be the leader.
if (status == STATUS_NORMAL) {
Debug("Ignoring DO-VIEW-CHANGE for view %" PRIu64
" because our status is RECOVERING.",
view);
return;
}
Seems to be a typo. It should be if (status == STATUS_RECOVERING) in Line 289, right?

go/running the tests

Tapir looks like a super interesting project!

I have two quick questions:

a) is anyone already doing go-bindings or a golang port?

a) how do I run the tests? gtest-all.cc seems to be missing...

# on osx 10.11.6 with c++ compiler: Apple LLVM version 8.0.0 (clang-800.0.36.1)
 ~github.com/UWSysLab/tapir (master) $ make test
make test
WARNING: Paranoid mode enabled
make: *** No rule to make target `.obj/gtest/gtest-all.o', needed by `.obj/gtest/gtest_main.a'.  Sto\
p.

Can't parse message of type "replication.vr.proto.UnloggedReplyMessage" because it is missing required fields: clientreqid

running the tests on osx 10.11.6 (el capitan) using this compiler:

jaten@jatens-MacBook-Pro ~/go/src/github.com/UWSysLab/tapir (master) $ gcc -v
Configured with: --prefix=/Applications/Xcode-beta.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 8.0.0 (clang-800.0.36.1)
Target: x86_64-apple-darwin15.6.0
Thread model: posix
InstalledDir: /Applications/Xcode-beta.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
jaten@jatens-MacBook-Pro ~/go/src/github.com/UWSysLab/tapir (master) $ 

I find the following test failure and hang:

jaten@jatens-MacBook-Pro ~/go/src/github.com/UWSysLab/tapir (master) $ make test
make test
WARNING: Paranoid mode enabled
+ CC     /Users/jaten/go/src/github.com/google/googletest/googletest/src/gtest-all.cc
+ CC     /Users/jaten/go/src/github.com/google/googletest/googletest/src/gtest_main.cc
+ AR     .obj/gtest/gtest_main.a
ar: creating archive .obj/gtest/gtest_main.a
a - .obj/gtest/gtest-all.o
a - .obj/gtest/gtest_main.o
+ LD     lib/tests/configuration-test
+ RUN    lib/tests/configuration-test
Running main() from gtest_main.cc
[==========] Running 7 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 7 tests from Configuration
[ RUN      ] Configuration.Basic
[       OK ] Configuration.Basic (0 ms)
[ RUN      ] Configuration.Multicast
[       OK ] Configuration.Multicast (0 ms)
[ RUN      ] Configuration.Quorum
[       OK ] Configuration.Quorum (0 ms)
[ RUN      ] Configuration.Leader
[       OK ] Configuration.Leader (0 ms)
[ RUN      ] Configuration.FromFile
[       OK ] Configuration.FromFile (0 ms)
[ RUN      ] Configuration.AddressEquality
[       OK ] Configuration.AddressEquality (0 ms)
[ RUN      ] Configuration.Equality
[       OK ] Configuration.Equality (0 ms)
[----------] 7 tests from Configuration (0 ms total)

[----------] Global test environment tear-down
[==========] 7 tests from 1 test case ran. (0 ms total)
[  PASSED  ] 7 tests.
+ CC     lib/tests/simtransport-test.cc
+ CC     lib/simtransport.cc
+ CC     .obj/gen/lib/tests/simtransport-testmessage.pb.cc
+ LD     lib/tests/simtransport-test
+ RUN    lib/tests/simtransport-test
Running main() from gtest_main.cc
[==========] Running 8 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 8 tests from SimTransportTest
[ RUN      ] SimTransportTest.Basic
[       OK ] SimTransportTest.Basic (0 ms)
[ RUN      ] SimTransportTest.Filter
[       OK ] SimTransportTest.Filter (0 ms)
[ RUN      ] SimTransportTest.FilterModify
[       OK ] SimTransportTest.FilterModify (0 ms)
[ RUN      ] SimTransportTest.FilterDelay
[       OK ] SimTransportTest.FilterDelay (0 ms)
[ RUN      ] SimTransportTest.FilterPriority
[       OK ] SimTransportTest.FilterPriority (0 ms)
[ RUN      ] SimTransportTest.Timer
[       OK ] SimTransportTest.Timer (0 ms)
[ RUN      ] SimTransportTest.TimerCancel
[       OK ] SimTransportTest.TimerCancel (0 ms)
[ RUN      ] SimTransportTest.Timeout
[       OK ] SimTransportTest.Timeout (0 ms)
[----------] 8 tests from SimTransportTest (0 ms total)

[----------] Global test environment tear-down
[==========] 8 tests from 1 test case ran. (0 ms total)
[  PASSED  ] 8 tests.
+ CC     replication/vr/tests/vr-test.cc
+ LD     replication/vr/tests/vr-test
+ RUN    replication/vr/tests/vr-test
Running main() from gtest_main.cc
[==========] Running 22 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 22 tests from Batching/VRTest
[ RUN      ] Batching/VRTest.OneOp/0
[       OK ] Batching/VRTest.OneOp/0 (1 ms)
[ RUN      ] Batching/VRTest.OneOp/1
20170108-090252-5531 64362 * VRReplica       (replica.cc:71):    Batching enabled; batch size 8
20170108-090252-5536 64362 * VRReplica       (replica.cc:71):    Batching enabled; batch size 8
20170108-090252-5536 64362 * VRReplica       (replica.cc:71):    Batching enabled; batch size 8
[       OK ] Batching/VRTest.OneOp/1 (0 ms)
[ RUN      ] Batching/VRTest.Unlogged/0
[libprotobuf ERROR google/protobuf/message_lite.cc:121] Can't parse message of type "replication.vr.proto.UnloggedReplyMessage" because it is missing required fields: clientreqid
20170108-090252-5538 64362 ! UnloggedRequestTimeoutCallback (client.cc:201):    Unlogged request timed out
... (hangs here)...

run tests

After downloading gtest from https://github.com/google/googletest and change the GTEST_DIR := $(HOME)/Desktop/googletest/googletest (my path for the googletest) i get the error

  • CC /home/sarantitis/Desktop/googletest/googletest/src/gtest-all.cc
    /home/sarantitis/Desktop/googletest/googletest/src/gtest-all.cc:38:10: fatal error: gtest/gtest.h: No such file or directory
    38 | #include "gtest/gtest.h"
    | ^~~~~~~~~~~~~~~
    compilation terminated.
    make: *** [Makefile:225: .obj/gtest/gtest-all.o] Error 1
  • How to make it work?

Benchmark clients do not abort

Regarding the benchmark client implementation in store/benchmark

When Get(.) returns an error, due to lock contention at the server, the transaction should abort (as said in the Warning message in retwisClient.cc, for example). However, I didn't see client->Abort(.) anywhere in the code. Am I missing something?

I think it's important to Abort in order to release the read locks. For example, a transaction may have 2 reads R1 and R2, R1 gets the lock successful, but R2 fails to get the lock. Not aborting the transaction make R1's lock "dangling", preventing any future write.

About the YCSB+T benchmark

Hi, @iyzhang.

I have a question about the implementation of YCSB+T.

Where did you get the latest YCSB+T project?

The only place I can find to fork it is on Akon Dey's github.
https://github.com/akon-dey/YCSB

Could you help me to implement the ycsb+t benchmark correctly?

In his article he reports having such methods:

"
• doTransactionInsert() creates a new account with an
initial balance captured from doTransactionDelete() operation described below.

• doTransactionRead() reads a set of account balances
determined by the key generator.

• doTransactionScan() scans the database given the start
key and the number of records and fetches them from the
data base.

• doTransactionUpdate() reads a record and add $1 from
the balance captured from delete operations to it and write
it back.

• doTransactionDelete() reads an account record, add the
amount to the captured the balance (capture used in
doTransactionInsert()) and then deletes the record.

• doTransactionReadModifyWrite() reads two records,
subtracts $1 from the one of the two and adds $1 to
the other before writing them both back.
"

In the akon repository where I made the fork, I didn't find the implementation of the methods doTransactionInsert() , doTransactionRead() , doTransactionScan() , doTransactionUpdate() and doTransactionDelete().

I just noticed that the doTransactionReadModifyWrite() method is implemented, where it subtracts the value 1 from account A and assigns that value to account B.

Could you help me understand this part of the implementation?

Regards,
Caio

Liveness bug in VSR implementation: Client table and uncommitted ops after view change

I believe there may be a very subtle liveness bug in https://github.com/UWSysLab/tapir/blob/master/replication/vr/replica.cc#L450 where the client table (effectively a record of committed replies) is touched on both the prepare and commit paths.

However, there is a difference between uncommitted ops and committed ops, as uncommitted ops may not survive a view change. Yet the implementation does not appear to account for this by fixing up the client table after a view change if it was modified by prepared ops that did not survive. This can then cause some client requests to be permanently blocked out, treated as duplicates, while they were never actually committed to the client table.

A cleaner approach might be to use the client table only for a single purpose i.e. only for committed data, and then to use the inflight pipeline to dedupe any uncommitted inflight ops. This way the client table never needs to be patched up after a view change.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.