Giter VIP home page Giter VIP logo

lu-raft-kv's People

Contributors

dependabot[bot] avatar leakey0626 avatar rensailong avatar stateis0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lu-raft-kv's Issues

选举Leader细节问题

选举Leader的一个需求是 candidate 请求投票时,如果有超过半数的响应为 success,就获选leader。

但这处代码的实现是:通过 CountDownLatch(size)阻塞实现,且是在全部都响应了才开始判断。

而且这部分的异步任务处理 分提交和检测完成两次,觉得写的有点繁琐。个人觉得可使用一个ExecutorCompletionService,写起来更清晰,同时在每次take后检测是否有超过半数的成功响应。

感谢分享

写的很好

但是感觉处理用户请求的时候少了2 phase commit

日志提交问题

if (N > commitIndex) {
LogEntry entry = logModule.read(N);
if (entry != null && entry.getTerm() == currentTerm) {
commitIndex = N;
}
}

    //  响应客户端(成功一半)
    if (success.get() >= (count / 2)) {
        // 更新
        commitIndex = logEntry.getIndex();
        //  应用到状态机
        getStateMachine().apply(logEntry);

大佬,这一段代码复制日志成功后只提交了本日志。如果出现这种情况:
出现一个日志A成功复制过半但宕机未提交,重启后又重新当选,这时出现一个日志B成功复制且提交,那么过去这个日志A岂不是永远无法提交了。

volatile多线程并不安全

if (N > commitIndex) {
    LogEntry entry = logModule.read(N);
    if (entry != null && entry.getTerm() == currentTerm) {
        commitIndex = N;
    }
}

//  响应客户端(成功一半)
if (success.get() >= (count / 2)) {
    // 更新
    commitIndex = logEntry.getIndex();
    //  应用到状态机
    getStateMachine().apply(logEntry);
    lastApplied = commitIndex;

    log.info("success apply local state machine,  logEntry info : {}", logEntry);
    // 返回成功.
    return ClientKVAck.ok();
} else {
    // 回滚已经提交的日志.
    logModule.removeOnStartIndex(logEntry.getIndex());
    log.warn("fail apply local state  machine,  logEntry info : {}", logEntry);
    // TODO 不应用到状态机,但已经记录到日志中.由定时任务从重试队列取出,然后重复尝试,当达到条件时,应用到状态机.
    // 这里应该返回错误, 因为没有成功复制过半机器.
    return ClientKVAck.fail();
}

这种涉及到多线程操作的变量比如commitIndex 虽然是 volatile修饰了 但是是不是仍然存在线程安全问题可能被一个旧的更小的值覆盖掉?而这里的赋值有很多先查询后写入的过程,那如果这里有问题,我们继续推断一下,是不是所有多线程操作的变量都有可能出现更新丢失?

获取不到serverPort属性

大佬你好。学习raft中,运行RaftNodeBootStrap时,System.getProperty("serverPort")报错,提示为空,获取不到该属性,这是在哪里设置这个属性的,请有空解答下,谢谢
2

java.lang.UnsatisfiedLinkError

您好,我在 Mac(M1 处理器) 上运行出现了异常,请教下怎么解决呢,
我的 jdk 版本是 zulu 出得 jdk11,
异常信息如下:

Exception in thread "main" java.lang.UnsatisfiedLinkError: /private/var/folders/z4/gn9br015731gt_290mkz_9x80000gn/T/librocksdbjni333205147224129941.jnilib: dlopen(/private/var/folders/z4/gn9br015731gt_290mkz_9x80000gn/T/librocksdbjni333205147224129941.jnilib, 1): no suitable image found. Did find:
/private/var/folders/z4/gn9br015731gt_290mkz_9x80000gn/T/librocksdbjni333205147224129941.jnilib: mach-o, but wrong architecture
/private/var/folders/z4/gn9br015731gt_290mkz_9x80000gn/T/librocksdbjni333205147224129941.jnilib: mach-o, but wrong architecture
at java.base/java.lang.ClassLoader$NativeLibrary.load0(Native Method)
at java.base/java.lang.ClassLoader$NativeLibrary.load(ClassLoader.java:2442)
at java.base/java.lang.ClassLoader$NativeLibrary.loadLibrary(ClassLoader.java:2498)
at java.base/java.lang.ClassLoader.loadLibrary0(ClassLoader.java:2694)
at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2627)
at java.base/java.lang.Runtime.load0(Runtime.java:768)
at java.base/java.lang.System.load(System.java:1837)
at org.rocksdb.NativeLibraryLoader.loadLibraryFromJar(NativeLibraryLoader.java:78)
at org.rocksdb.NativeLibraryLoader.loadLibrary(NativeLibraryLoader.java:56)
at org.rocksdb.RocksDB.loadLibrary(RocksDB.java:64)
at org.rocksdb.RocksDB.(RocksDB.java:35)
at cn.think.in.java.impl.DefaultStateMachine.(DefaultStateMachine.java:52)
at cn.think.in.java.impl.DefaultStateMachine.(DefaultStateMachine.java:36)
at cn.think.in.java.impl.DefaultStateMachine$DefaultStateMachineLazyHolder.(DefaultStateMachine.java:89)
at cn.think.in.java.impl.DefaultStateMachine.getInstance(DefaultStateMachine.java:73)
at cn.think.in.java.constant.StateMachineSaveType.(StateMachineSaveType.java:32)
at cn.think.in.java.RaftNodeBootStrap.boot(RaftNodeBootStrap.java:58)
at cn.think.in.java.RaftNodeBootStrap.main(RaftNodeBootStrap.java:38)

按照quick start的方法启动失败

您好,我按照quick start的说明在idea中依次配置了多个启动项,加入 -DserverPort=8775,8776,8777等不同端口,但是我始终只能正常启动一个节点。我在macOS和ubuntu上都会出现这个问题。

截屏2023-04-25 22 40 57

截屏2023-04-25 22 41 09

以上是我两个节点的启动项配置。

Exception in thread "main" java.lang.ExceptionInInitializerError
at cn.think.in.java.impl.DefaultStateMachine.getInstance(DefaultStateMachine.java:72)
at cn.think.in.java.constant.StateMachineSaveType.(StateMachineSaveType.java:32)
at cn.think.in.java.RaftNodeBootStrap.boot(RaftNodeBootStrap.java:58)
at cn.think.in.java.RaftNodeBootStrap.main(RaftNodeBootStrap.java:38)
Caused by: java.lang.RuntimeException: org.rocksdb.RocksDBException: While lock file: ./rocksDB-raft/null/stateMachine/LOCK: Resource temporarily unavailable
at cn.think.in.java.impl.DefaultStateMachine.(DefaultStateMachine.java:67)
at cn.think.in.java.impl.DefaultStateMachine.(DefaultStateMachine.java:36)
at cn.think.in.java.impl.DefaultStateMachine$DefaultStateMachineLazyHolder.(DefaultStateMachine.java:88)
... 4 more
Caused by: org.rocksdb.RocksDBException: While lock file: ./rocksDB-raft/null/stateMachine/LOCK: Resource temporarily unavailable
at org.rocksdb.RocksDB.open(Native Method)
at org.rocksdb.RocksDB.open(RocksDB.java:231)
at cn.think.in.java.impl.DefaultStateMachine.(DefaultStateMachine.java:65)
... 6 more

Process finished with exit code 1
以上是我出现的错误的日志。

另外我发现您的 RaftNodeBootStrap 类中并没有对输入的args做任何的处理。这也让我不太理解。希望得到您的解答,谢谢。
截屏2023-04-25 22 43 59

系统可能存在两个问题:关于关闭节点后节点仍然会参与共识过程的 bug report

您好,我们在对系统进行测试时发现系统可能存在两个问题:

  1. 在关闭某个节点后该节点仍然会参与共识,接受 leader 发出的 proposal 并 commit。
  2. 在关闭某个节点后,如果用client查询该节点的数据,有时可以成功,有时不会成功。

首先,我们小幅修改了 RaftClient 类下的 get 方法代码,使得 client 使用 get 方法获取对应 key 的 value 值时,可以选择对应的节点的value。并且在 RaftClientWithCommandLine 类中将 get 方法替换为了 getCertainPeerKey 方法。

    /**
     * @param key
     * @return
     */
    // 修改 get 为 getCertainPeerKey,增加输入的参数 addr, 使client能获得对应节点的value
    // addr 为:"localhost:8777", "localhost:8778", "localhost:8779"其中之一
    public LogEntry getCertainPeerKey(String key, String addr) {
        ClientKVReq obj = ClientKVReq.builder().key(key).type(ClientKVReq.GET).build();

        ClientKVAck response;
        Request r = Request.builder().obj(obj).url(addr).cmd(Request.CLIENT_REQ).build();
        try {
            response = CLIENT.send(r);
        } catch (Exception e) {
            r.setUrl(list.get((int) ((count.incrementAndGet()) % list.size())));
            response = CLIENT.send(r);
        }
        return (LogEntry)response.getResult();
    }

复现过程如下:

  1. 启动三个节点,节点端口号分别为:8777,8778和8779,等待新leader产生
  2. 启动RaftClientWithCommandLine
  3. 输入 put A 0
  4. 关闭 leader 节点(这里我关闭的是8777),等待新leader产生
  5. 输入 put A 1
  6. 多次尝试输入 get A

此时会得到两种结果:
一种是:
截屏2023-04-27 22 18 11

另一种是:
截屏2023-04-27 22 18 36

在前一种结果时,我们可以看到 8777 节点仍然 commit 了<A, 1> 这表明它参与了第5步的proposal的共识过程,但是在这一步被执行时它已经被关闭并且之后并未被重新启动,这里存在矛盾,此为问题1。
这两种结果的出现情况无法被控制,此为问题2。

希望得到您的回复,谢谢!

客户端启动报错

Sofa-Middleware-Log SLF4J : Actual binding is of type [ com.alipay.remoting Log4j ]
2023-09-10 16:42:07,599 main INFO [com.alipay.sofa.common.log:report:30] - Sofa-Middleware-Log SLF4J : Actual binding is of type [ com.alipay.remoting Log4j ]
Exception in thread "main" java.lang.NoClassDefFoundError: com/caucho/hessian/io/SerializerFactory
at com.alipay.remoting.serialization.HessianSerializer.(HessianSerializer.java:36)
at com.alipay.remoting.serialization.SerializerManager.(SerializerManager.java:36)
at com.alipay.remoting.rpc.protocol.RpcRequestCommand.serializeContent(RpcRequestCommand.java:132)
at com.alipay.remoting.rpc.RpcCommand.serialize(RpcCommand.java:105)
at com.alipay.remoting.rpc.RpcRemoting.toRemotingCommand(RpcRemoting.java:353)
at com.alipay.remoting.rpc.RpcRemoting.invokeSync(RpcRemoting.java:179)
at com.alipay.remoting.rpc.RpcClientRemoting.invokeSync(RpcClientRemoting.java:72)
at com.alipay.remoting.rpc.RpcRemoting.invokeSync(RpcRemoting.java:143)
at com.alipay.remoting.rpc.RpcClient.invokeSync(RpcClient.java:219)
at cn.think.in.java.raft.common.rpc.DefaultRpcClient.send(DefaultRpcClient.java:42)
at cn.think.in.java.raft.common.rpc.DefaultRpcClient.send(DefaultRpcClient.java:35)
at cn.think.in.java.raft.client.RaftClientRPC.put(RaftClientRPC.java:89)
at cn.think.in.java.raft.client.RaftClient1.main(RaftClient1.java:38)
Caused by: java.lang.ClassNotFoundException: com.caucho.hessian.io.SerializerFactory
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 13 more

客户端发送消息失败

感觉Node判断是否选举是否有点问题

看了下代码。 Node 如果是follower, 当判断是否要选举的时候, 首先判断了一下自己是否是leader, 如果不是就继续判断是否过了选举时间, 过了就开始选举。
我猜测应该是heartbeat的时候应该延长这个要选择leader的时间, 但是没有看到代码里有判断。
当然这样运行也不会错, 但是多了很多无效的选举。 不知道说的对不?

RocksDB bug

在高并发下,对RocksDB 处理出现数据不一致问题

    @SneakyThrows
    @Test
    public void test() throws RocksDBException {
        System.out.println(getLastIndex());
        System.out.println(get(getLastIndex()));

        System.out.println("-------begin--------");

        int i = 0;

        while (i < 10) {
            int finalI = i;
            CompletableFuture.runAsync(() -> {
                write(new Cmd("hello", "value"));
                System.out.println(finalI +" "+ getLastIndex());

                deleteOnStartIndex(getLastIndex());

                System.out.println(finalI +" "+ getLastIndex());
            });
            i++;
        }

        Thread.sleep(4000);

        System.out.println("----------end----------");

        System.out.println(getLastIndex());

        System.out.println(get(getLastIndex()));

        Thread.sleep(3000);

    }

按照步骤启动失败了,抛出了RocksDBException的异常。

按照文章的步骤启动的,启动第二个应用的时候报错了

Exception in thread "main" java.lang.ExceptionInInitializerError
	at cn.think.in.java.impl.DefaultStateMachine.getInstance(DefaultStateMachine.java:72)
	at cn.think.in.java.constant.StateMachineSaveType.<clinit>(StateMachineSaveType.java:32)
	at cn.think.in.java.RaftNodeBootStrap.boot(RaftNodeBootStrap.java:58)
	at cn.think.in.java.RaftNodeBootStrap.main(RaftNodeBootStrap.java:38)
Caused by: java.lang.RuntimeException: org.rocksdb.RocksDBException: Failed to create lock file: ./rocksDB-raft/null/stateMachine/LOCK: Áíһ¸ö³ÌÐòÕýÔÚʹÓôËÎļþ£¬½ø³ÌÎ
	at cn.think.in.java.impl.DefaultStateMachine.<init>(DefaultStateMachine.java:67)
	at cn.think.in.java.impl.DefaultStateMachine.<init>(DefaultStateMachine.java:36)
	at cn.think.in.java.impl.DefaultStateMachine$DefaultStateMachineLazyHolder.<clinit>(DefaultStateMachine.java:88)
	... 4 more
Caused by: org.rocksdb.RocksDBException: Failed to create lock file: ./rocksDB-raft/null/stateMachine/LOCK: Áíһ¸ö³ÌÐòÕýÔÚʹÓôËÎļþ£¬½ø³ÌÎ
	at org.rocksdb.RocksDB.open(Native Method)
	at org.rocksdb.RocksDB.open(RocksDB.java:231)
	at cn.think.in.java.impl.DefaultStateMachine.<init>(DefaultStateMachine.java:65)
	... 6 more

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.