Giter VIP home page Giter VIP logo

Comments (9)

seeflood avatar seeflood commented on August 15, 2024

chinese version of this design doc:

0. 太长,不看

添加TryLock和Unlock API. 续租API有争议,第一版不加入续租API

1. 调研

系统 能否实现分布式锁 阻塞锁(基于watch) 可用性 写操作线性一致 sequencer(chubby论文里提出的feature) 续租
单机redis yes x 单点失效时,锁服务不可用 yes yes(need poc) yes
redis集群 yes x yes no. 故障转移可能导致丢锁 yes(need poc) yes
redis Redlock yes
nacos ×
consul yes
eureka ×
zookeeper yes yes yes 有fo能力,200 ms内完成选举 yes yes 使用zxid作为sequencer yes
etcd yes yes yes yes yes use revision yes lease.KeepAlive

可以看到能力还是有一定差异

2. High-level design

2.1. API

2.1.0. 设计原则

我们面临着很多诱惑,分布式锁其实有很多feature可以做(阻塞锁,可重入锁,读写锁,sequencer等等)

但是毕竟我们的目标是设计一套足够通用的API规范,那么在API制定上还是尽量保守些,start simple,先把简单、常用的功能抽象成API规范,等用户反馈后再考虑将更多功能抽象成API规范

2.1.1. TryLock/Unlock

最基础的加锁、解锁功能。TryLock非阻塞,如果没有抢到锁直接返回

proto:

message TryLockRequest {
  string store_name = 1;
  // resource_id is the lock key.
  string resource_id = 2;
  // client_id will be automatically generated if not set
  string client_id = 3;
  // expire is the time before expire
  int64 expire = 4;
}

message TryLockResponse {

  bool success = 1;

  string client_id = 2;
}

message UnlockRequest {
  string store_name = 1;
  // resource_id is the lock key.
  string resource_id = 2;

  string client_id = 3;
}

message UnlockResponse {
  enum Status {
    SUCCESS = 0;
    LOCK_UNEXIST = 1;
    LOCK_BELONG_TO_OTHERS = 2;
  }

  Status status = 1;
}

Q: expire字段的时间单位?

A: 秒。

Q: 能否强制让用户把秒数配大点,别配太小?

A: 没法在编译时或者启动时限制,算了

Q: 为啥不加metadata

A: 一开始尽量保守一些,等有人反馈有需求再加,或者实现组件过程中发现确实有需要再加

Q: 以后要支持sequencer、可重入锁之类的feature咋加?

A: 入参增加feature option,组件也要实现Support()接口

2.1.2. 续租

Solution A: add an API "LockKeepAlive"

rpc LockKeepAlive(stream LockKeepAliveRequest) returns (stream LockKeepAliveResponse){}
  
message LockKeepAliveRequest {
  string store_name = 1;
  // resource_id is the lock key.
  string resource_id = 2;

  string client_id = 3;
  // expire is the time to expire
  int64 expire = 4;
}

message LockKeepAliveResponse {
  enum Status {
    SUCCESS = 0;
    LOCK_UNEXIST = 1;
    LOCK_BELONG_TO_OTHERS = 2;
  }
  string store_name = 1;
  // resource_id is the lock key.
  string resource_id = 2;

  Status status = 3;
}

续租的入参、返回结果都是stream,这里参考etcd的实现,app和sidecar只需要维护一个连接,每次用锁需要续租的时候都复用该连接传递续租请求。

Q: 为啥不把续租作为一个stream参数塞到tryLock里?

A: 很多业务不用续租,让trylock尽量简单;

尽量单一职责,后续加上阻塞锁,可以公用续租API;

把带stream的部分拆开成一个接口,以便http实现简单。

Q:续租逻辑太复杂,能否让用户不感知?

A: sdk屏蔽掉这层逻辑,开个线程/协程/nodejs定时事件,自动续租

Solution B: 用户不感知续租逻辑,自动续租,App和sidecar维持统一心跳

缺点/难点:

  1. 使用统一心跳的话,难以定制心跳间隔

解法是保证心跳间隔低,比如1秒1次

  1. 如何保证可靠的故障检测?

例如以下java代码,unlock可能失败:

try{

}finally{
  lock.unlock()   
}

如果是单机锁,unlock能保证成功(除非整个jvm故障),但是unlock走网络调用的话可能失败。调用失败后,怎么保证心跳断掉?

这就要求业务在开发时要往心跳检测里上报一些细粒度的状态。

我们可以定义http callback接口,由actuator轮询检测,约定callback返回的数据结构为:

{
  "status": "UP",
  "details": {
    "lock": [
      {
        "resource_id": "res1",
        "client_id": "dasfdasfasdfa",
        "type": "unlock_fail"
      }
    ],
    "xxx": []
  }
}

应用要处理状态收集、上报、上报成功后清理、限制map容量(比如上报失败次数多了,map太大OOM怎么办),总归要求app实现一些复杂逻辑,也要放sdk里

  1. 这个实现其实和续租一样的,都是开一个单独的连接做状态管理,使用过程中有需要就通过这个公用连接上报状态。
  2. API spec依赖心跳逻辑。依赖心跳间隔、心跳返回的数据结构。相当于API spec依赖了Layotto的实现,除非我们能把心跳实现也标准化掉(包括间隔、返回的数据结构等)

结论

目前大家对续租方案有不同意见,一期先不实现续租功能。

个人倾向A方案,sdk屏蔽掉续租逻辑。虽然用grpc直接调需要处理续租逻辑,但是续租算是分布式锁常用方案,开发者的理解成本低。

抛出来看大家意见

2.2. 组件

2.2.1. 如何处理“组件不支持某个feature option”

模拟>运行报错>ignore

特殊情况才ignore,比如consistency传了是eventual consistency,但是存储系统本身是强一致的,可以ignore这个option

2.2.2. 如何处理“组件不支持某个API”

模拟>运行报错

以事务API举例

2.2.3. 选型

一期选择单机redis实现

2.2.4. "运行时报错"式设计会破坏可移植性,如何让用户更简单的评估能否移植?

A. 提供文档,文档上每种组件给的保证、feature不同,让用户自己评估

B. 在sidecar的配置里声明API支持的feature option,如果sidecar启动发现组件不match,启动时报错

C. sidecar打运行时日志,自动统计app用了哪些feature,待需要移植时进行日志分析

D. 先不管

E. 做一个静态分析工具,自动分析可移植性

结论:选A,因为简单

3. Future work

  • 可重入

会有一些计数逻辑。需要考虑是所有锁默认支持可重入,还是传参里面加个feature option、标识用户需要可重入

  • 阻塞锁
  • sequencer

4. Reference

How to do distributed locking

The Chubby lock service for loosely-coupled distributed systems

https://www.jianshu.com/p/6e72e3ee5623

http://zhangtielei.com/posts/blog-redlock-reasoning.html

http://zhangtielei.com/posts/blog-redlock-reasoning-part2.html

from layotto.

nobodyiam avatar nobodyiam commented on August 15, 2024

Do we use the client_id to indicate the lock owner? What would happen if different applications pass the same client_id?

from layotto.

seeflood avatar seeflood commented on August 15, 2024

Do we use the client_id to indicate the lock owner?

@nobodyiam yes,we use the client_id to indicate the lock owner.

What would happen if different applications pass the same client_id?

case 1. If two apps with different app-id pass the same client_id,they won't conflict because client_id is grouped by 'app-id ',while 'app-id' is configurated in sidecar's static config(configurated in config.json or passed as parameters at startup)

case 2.If two apps with same app-id pass the same client_id,they will conflict and the second app will obtained the same lock already used by the first app.Then the correctness property will be broken.
So user has to care about the uniqueness property of client_id.

We let client_id field optional.If client_id is blank,the sidecar will generate a uuid for the request.

why I don't remove this field in the request struct

The reason why I don't remove this field in the request struct is that when reentrant lock is needed,the existing client-id is required to identify client and check "whether this client can reenter this lock "

from layotto.

seeflood avatar seeflood commented on August 15, 2024

Again,this design assumes that every lock belongs to an 'app-id',like the configuration API.
But I do know that in some legacy systems,different apps with different 'app-id' use same redis to share same lock.
Although the scenarios exist,the use case is dangerous and not reasonable.So we won't take these scenarios into consideration currently.

from layotto.

nobodyiam avatar nobodyiam commented on August 15, 2024

Is it better to use the name lock_owner instead of client_id?

from layotto.

seeflood avatar seeflood commented on August 15, 2024

Is it better to use the name lock_owner instead of client_id?

Ok,I will change it to lock_owner if you guys think the name client_id is weird.
By the way,the client_id (or lock_owner) is per request,not per process,so it is different for each request,which aims to prevent multi-thread in the process trying the same lock concurrently.
I will add these conversation as comment in the proto file.

from layotto.

seeflood avatar seeflood commented on August 15, 2024

Hi guys,I submit a pull request to implement this API.

There are some changes:

  1. change 'client_id' to 'lock_owner'
  2. The 'lock_owner' in TryLockRequest will not be optional.User must generate it themselves (or our sdk will generate a uuid automatically)
  3. remove 'client_id' in TryLockResponse

I put the reason in the comment:

message TryLockRequest {
  string store_name = 1;
  // resource_id is the lock key.
  string resource_id = 2;
  // lock_owner indicate the identifier of lock owner.
  // This field is required.You can generate a uuid as lock_owner.For example,in golang:
  //
  // req.LockOwner = uuid.New().String()
  //
  // This field is per request,not per process,so it is different for each request,
  // which aims to prevent multi-thread in the same process trying the same lock concurrently.
  //
  // The reason why we don't make it automatically generated is:
  // 1. If it is automatically generated,there must be a 'my_lock_owner_id' field in the response.
  // This name is so weird that we think it is inappropriate to put it into the api spec
  // 2. If we change the field 'my_lock_owner_id' in the response to 'lock_owner',which means the current lock owner of this lock,
  // we find that in some lock services users can't get the current lock owner.Actually users don't need it at all.
  // 3. When reentrant lock is needed,the existing lock_owner is required to identify client and check "whether this client can reenter this lock".
  // So this field in the request shouldn't be removed.
  string lock_owner = 3;

  // expire is the time before expire.The time unit is second.
  int32 expire = 4;
}

message TryLockResponse {

  bool success = 1;
}

from layotto.

github-actions avatar github-actions commented on August 15, 2024

This issue has been automatically marked as stale because it has not had recent activity in the last 30 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue or help wanted) or other activity occurs. Thank you for your contributions.

from layotto.

github-actions avatar github-actions commented on August 15, 2024

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as pinned, good first issue or help wanted. Thank you for your contributions.

from layotto.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.