Giter VIP home page Giter VIP logo

etcd's Introduction

玩转etcd

etcd事实上是Hadoop Zookeeper的替代。而Zookeeper是Google Chubby的开源 仿制。对Chubby的描述见 这篇论文。文中说 Chubby是一个lock service,实际上简单的理解是Chubby是一个key-value存 储系统,和分布式文件系统(如GFS)类似,只是为了性能考虑,每个etcd维护 的文件大小尽量小于1MB。

文档

目前常用的etcd 版本是 2.x,和之前的 1.x有诸多不同。比如 1.x etcd监听 4001 和 7001 端口,而etcd2监听 2379 和 2380 端口。etcd2的文档在 这里

安装

etcd的Github页面的Releases列表里有安装介绍

单点部署

如果不考虑利用Raft协议实现不间断服务,最简单的etcd配置可以只有一个进程。 这个进程监听本机 2379 端口。我们可以用 curl 之类的标准工具和这个端口通 信——写入或者读取 key-value pairs。

下面例子下载并且启动一个etcd进程,然后利用etcdctl程序来访问这个单点部署:

wget -c https://github.com/coreos/etcd/releases/download/$VER/etcd-v2.3.6-darwin-amd64.zip
unzip etcd-v2.3.6-darwin-amd64.zip
ln -s etcd

(./etcd/etcd 2>&1 > ./single-node.log) &

./etcd/etcdctl set /hello Message
./etcd/etcdctl get /hello

上面例子里,etcd和etcdctl都使用默认端口 2379 通信。如果想使用非默认端 口,请参见下面例子:

多点部署

通常为了容错,也为了不间断服务,我们可以在多台机器上启动多个 etcd 进程。 每个进程可以通过本地 2379 端口和客户端程序通信,此外 etcd 进程之间通过 2380 端口互相通信和协调。如果当前主进程(leader)挂了,剩下的进程利 用Raft协议推举一个新的主进程。

有两种多点部署的机制:static和discovery。

静态机群

如果我们可以决定在哪些机器上启动etcd,以及每个etcd进程的port,那么一切 都很简单:为了让多个etcd进程互相知道对方,我们给每个进程一个命令行参数 --initial-cluster,用来指定etcd cluster里所有的进程,比如:

infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380

其中 infra0, infra1, infra2 是三个etcd进程的名字,通过命令行参数 --name指定。

此外,还需要告诉每个etcd进程自己是其中哪一个,为此需要设置 --initial-advertise-peer-urls,比如:

--initial-advertise-peer-urls http://10.0.1.10:2380

关于static方式,请参见CoreOS公司的 这篇文档

如果想在本机上启动一个三个进程的etcd 机群,可以打开三个terminal窗口,依次输入一下三个启动命令:

`pwd`/etcd/etcd \
    --name infra0 \
    --initial-advertise-peer-urls http://127.0.0.1:7001 \
    --listen-peer-urls http://127.0.0.1:7001 \
    --listen-client-urls http://127.0.0.1:4001 \
    --advertise-client-urls http://127.0.0.1:4001 \
    --initial-cluster-token etcd-cluster-1 \
    --initial-cluster infra0=http://127.0.0.1:7001,infra1=http://127.0.0.1:7002,infra2=http://127.0.0.1:7003 \
    --initial-cluster-state new 

`pwd`/etcd/etcd \
    --name infra1 \
    --initial-advertise-peer-urls http://127.0.0.1:7002 \
    --listen-peer-urls http://127.0.0.1:7002 \
    --listen-client-urls http://127.0.0.1:4002 \
    --advertise-client-urls http://127.0.0.1:4002 \
    --initial-cluster-token etcd-cluster-1 \
    --initial-cluster infra0=http://127.0.0.1:7001,infra1=http://127.0.0.1:7002,infra2=http://127.0.0.1:7003 \
    --initial-cluster-state new 

`pwd`/etcd/etcd \
    --name infra2 \
    --initial-advertise-peer-urls http://127.0.0.1:7003 \
    --listen-peer-urls http://127.0.0.1:7003 \
    --listen-client-urls http://127.0.0.1:4003 \
    --advertise-client-urls http://127.0.0.1:4003 \
    --initial-cluster-token etcd-cluster-1 \
    --initial-cluster infra0=http://127.0.0.1:7001,infra1=http://127.0.0.1:7002,infra2=http://127.0.0.1:7003 \
    --initial-cluster-state new 

这个例子里,三个etcd进程都启动在同一台机器上,所以不能使用默认的 2379 端口用于和客户通信,以及默认的 2380 端口用于etcd进程间通信。为此,第一 个进程用4001端口和客户通信,以及7001和其他etcd进程通信。类似的,第二个 进程用4002和7002,第三个用4003和7003。

需要注意的是,一个etcd机群最少需要两个进程。因为Raft协议要求要大多数进 程赞同,才能决定一个leader。而当只有两个进程的时候,Raft协议中的“大多 数”指的是2。

动态发现

可惜大多数时候,etcd进程是通过机群管理系统启动的,我们事先并不知道会用 到哪些机器,也不能确定每个etcd进程的port。此时我们要借助一个第三方服 务——discovery。我们可以用这套代码: https://github.com/coreos/discovery.etcd.io

git clone https://github.com/coreos/discovery.etcd.io
cd discovery.etcd.io

go run third_party.go build github.com/coreos/discovery.etcd.io

./discovery.etcd.io --addr=:8087

这个discovery服务维护一个映射表,从一个机群ID到这个机群里有哪些etcd进 程。当我们创建etcd机群的时候,我们访问这个服务的/new URL,从而为我们 的新机群在这个映射表里增加一项,并且返回这一项的ID。比如下面命令创建一 个有三个etcd进程的机群:

$ curl http://localhost:8087/new?size=3
https://discovery.etcd.io/9b14ae6ce7764df5464542caface175d

其中3是我们预期的etcd进程的数量; 9b14ae6ce7764df5464542caface175d就是我们新机群的ID。

随后,我们就可以修改上一节例子中启动etcd进程的命令行,不再需要 --initial-cluster, --initial-cluster-token, --initial-cluster-state 这些参数了,而是用 --discovery http://localhost:8087/9b14ae6ce7764df5464542caface175d

$ etcd --name infra0 --initial-advertise-peer-urls http://10.0.1.10:2380 \
  --listen-peer-urls http://10.0.1.10:2380 \
  --listen-client-urls http://10.0.1.10:2379,http://127.0.0.1:2379 \
  --advertise-client-urls http://10.0.1.10:2379 \
  --discovery http://localhost:8087/9b14ae6ce7764df5464542caface175d
  
$ etcd --name infra1 --initial-advertise-peer-urls http://10.0.1.11:2380 \
  --listen-peer-urls http://10.0.1.11:2380 \
  --listen-client-urls http://10.0.1.11:2379,http://127.0.0.1:2379 \
  --advertise-client-urls http://10.0.1.11:2379 \
  --discovery http://localhost:8087/9b14ae6ce7764df5464542caface175d
  
$ etcd --name infra2 --initial-advertise-peer-urls http://10.0.1.12:2380 \
  --listen-peer-urls http://10.0.1.12:2380 \
  --listen-client-urls http://10.0.1.12:2379,http://127.0.0.1:2379 \
  --advertise-client-urls http://10.0.1.12:2379 \
  --discovery http://localhost:8087/9b14ae6ce7764df5464542caface175d

这样一来,每个etcd进程启动的时候都把自己向discovery服务注册。当注册了 足够多的进程(上面例子里是3个)后,etcd机群就开始服务了。接下来也可以 有更多进程注册自己,但是不会是机群的正式成员,只是一reverse proxy的形 式工作。

etcd's People

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

recall704

etcd's Issues

"failed to send out heartbeat on time" when using static mode

logs:

Jun 12 03:04:01 hlg-wuyi-coreos-01 etcd2[4187]: server is likely overloaded
Jun 12 03:04:01 hlg-wuyi-coreos-01 etcd2[4187]: failed to send out heartbeat on time (deadline exceeded for 86.557859ms)
Jun 12 03:04:01 hlg-wuyi-coreos-01 etcd2[4187]: server is likely overloaded
Jun 12 03:04:32 hlg-wuyi-coreos-01 etcd2[4187]: failed to send out heartbeat on time (deadline exceeded for 10.632199ms)
Jun 12 03:04:32 hlg-wuyi-coreos-01 etcd2[4187]: server is likely overloaded
Jun 12 03:04:32 hlg-wuyi-coreos-01 etcd2[4187]: failed to send out heartbeat on time (deadline exceeded for 10.801399ms)
Jun 12 03:04:32 hlg-wuyi-coreos-01 etcd2[4187]: server is likely overloaded

etcdctl works fine:

# etcdctl member list
251125c4d7f71035: name=hlg-wuyi-coreos-03 peerURLs=http://172.24.3.221:2380 clientURLs=http://172.24.3.221:2379 isLeader=false
8daf20902656d961: name=hlg-wuyi-coreos-02 peerURLs=http://172.24.3.220:2380 clientURLs=http://172.24.3.220:2379 isLeader=false
e06797cebe158b25: name=hlg-wuyi-coreos-01 peerURLs=http://172.24.3.150:2380 clientURLs=http://172.24.3.150:2379 isLeader=true
# etcdctl ls
/coreos.com

All etcd server exits with static mode.

Environment

bjlg-49p45-k8s-01 static-etcd2 # cat /etc/os-release
NAME=CoreOS
ID=coreos
VERSION=1010.6.0
VERSION_ID=1010.6.0
BUILD_ID=2016-06-28-0910
PRETTY_NAME="CoreOS 1010.6.0 (MoreOS)"
ANSI_COLOR="1;32"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"
bjlg-49p45-k8s-01 static-etcd2 # /usr/bin/etcd2 -version
etcd Version: 2.3.1
Git SHA: 2b67f52
Go Version: go1.5.3
Go OS/Arch: linux/amd64

Deploy bash script, 3 etcd memory with named infra0, infra1 and infra2

export public_ipv4_infra0=192.168.49.45
export public_ipv4_infra1=192.168.49.46
export public_ipv4_infra2=192.168.49.47
export name=infra0
export NITIAL_CLUSTER="infra0=http://$public_ipv4_infra0:2380,infra1=http://$public_ipv4_infra1:2380,infra2=http://$public_ipv4_infra2:2380"
export ETCD_INITIAL_CLUSTER_STATE=new
/usr/bin/etcd2 --name $name --initial-advertise-peer-urls http://$public_ipv4_infra0:2380 \
  --listen-peer-urls http://$public_ipv4_infra0:2380 \
  --listen-client-urls http://$public_ipv4_infra0:2379,http://127.0.0.1:2379 \
  --advertise-client-urls http://$public_ipv4_infra0:2379 \
  --initial-cluster-token etcd-cluster-1 \
  --initial-cluster infra0=http://$public_ipv4_infra0:2380,infra1=http://$public_ipv4_infra1:2380,infra2=http://$public_ipv4_infra2:2380 \
  --initial-cluster-state new \
  --data-dir /var/lib/etcd2/$name \

All process will be existed after a period of time. Here is the logs

infra0 log

2016-07-19 14:29:24.289951 E | etcdhttp: got unexpected response error (etcdserver: request timed out)
2016-07-19 14:29:24.859851 I | raft: f3aefe338854c13a is starting a new election at term 3419
2016-07-19 14:29:24.859887 I | raft: f3aefe338854c13a became candidate at term 3420
2016-07-19 14:29:24.859910 I | raft: f3aefe338854c13a received vote from f3aefe338854c13a at term 3420
2016-07-19 14:29:24.859923 I | raft: f3aefe338854c13a [logterm: 3243, index: 262126] sent vote request to e344a1396881242e at term 3420
2016-07-19 14:29:24.859935 I | raft: f3aefe338854c13a [logterm: 3243, index: 262126] sent vote request to eff15145dca03741 at term 3420
2016-07-19 14:29:26.000041 E | etcdhttp: got unexpected response error (etcdserver: request timed out) [merged 3 repeated lines in 1.71s]
2016-07-19 14:29:26.159758 I | raft: f3aefe338854c13a is starting a new election at term 3420
2016-07-19 14:29:26.159803 I | raft: f3aefe338854c13a became candidate at term 3421
2016-07-19 14:29:26.159814 I | raft: f3aefe338854c13a received vote from f3aefe338854c13a at term 3421
2016-07-19 14:29:26.159828 I | raft: f3aefe338854c13a [logterm: 3243, index: 262126] sent vote request to e344a1396881242e at term 3421
2016-07-19 14:29:26.159841 I | raft: f3aefe338854c13a [logterm: 3243, index: 262126] sent vote request to eff15145dca03741 at term 3421
2016-07-19 14:29:27.559756 I | raft: f3aefe338854c13a is starting a new election at term 3421
2016-07-19 14:29:27.559806 I | raft: f3aefe338854c13a became candidate at term 3422
2016-07-19 14:29:27.559818 I | raft: f3aefe338854c13a received vote from f3aefe338854c13a at term 3422
2016-07-19 14:29:27.559831 I | raft: f3aefe338854c13a [logterm: 3243, index: 262126] sent vote request to e344a1396881242e at term 3422
2016-07-19 14:29:27.559844 I | raft: f3aefe338854c13a [logterm: 3243, index: 262126] sent vote request to eff15145dca03741 at term 3422
2016-07-19 14:29:29.059778 I | raft: f3aefe338854c13a is starting a new election at term 3422
2016-07-19 14:29:29.059813 I | raft: f3aefe338854c13a became candidate at term 3423
2016-07-19 14:29:29.059824 I | raft: f3aefe338854c13a received vote from f3aefe338854c13a at term 3423
2016-07-19 14:29:29.059838 I | raft: f3aefe338854c13a [logterm: 3243, index: 262126] sent vote request to e344a1396881242e at term 3423
2016-07-19 14:29:29.059850 I | raft: f3aefe338854c13a [logterm: 3243, index: 262126] sent vote request to eff15145dca03741 at term 3423

infra1 log

2016-07-19 10:41:53.820550 I | fileutil: purged file /var/lib/etcd2/infra1/member/snap/0000000000000caa-0000000000029f36.snap successfully
2016-07-19 11:30:29.568414 I | raft: e344a1396881242e [term: 3242] received a MsgVote message with higher term from eff15145dca03741 [term: 3243]
2016-07-19 11:30:29.568457 I | raft: e344a1396881242e became follower at term 3243
2016-07-19 11:30:29.568474 I | raft: e344a1396881242e [logterm: 3242, index: 230649, vote: 0] voted for eff15145dca03741 [logterm: 3242, index: 230649] at term 3243
2016-07-19 11:30:29.568485 I | raft: raft.node: e344a1396881242e lost leader f3aefe338854c13a at term 3243
2016-07-19 11:30:29.578578 I | raft: raft.node: e344a1396881242e elected leader eff15145dca03741 at term 3243
2016-07-19 11:30:30.119368 I | raft: e344a1396881242e [term: 3243] ignored a MsgApp message with lower term from f3aefe338854c13a [term: 3242]
2016-07-19 11:30:30.119402 I | raft: e344a1396881242e [term: 3243] ignored a MsgApp message with lower term from f3aefe338854c13a [term: 3242]
2016-07-19 11:37:03.513093 I | etcdserver: start to snapshot (applied: 231836, lastsnap: 221835)
2016-07-19 11:37:03.545078 I | etcdserver: saved snapshot at index 231836
2016-07-19 11:37:03.545449 I | etcdserver: compacted raft log at 226836
2016-07-19 11:37:23.839884 I | fileutil: purged file /var/lib/etcd2/infra1/member/snap/0000000000000caa-000000000002c647.snap successfully
2016-07-19 12:32:38.217257 I | etcdserver: start to snapshot (applied: 241837, lastsnap: 231836)
2016-07-19 12:32:38.249246 I | etcdserver: saved snapshot at index 241837
2016-07-19 12:32:38.249525 I | etcdserver: compacted raft log at 236837
2016-07-19 12:32:53.859343 I | fileutil: purged file /var/lib/etcd2/infra1/member/snap/0000000000000caa-000000000002ed58.snap successfully
2016-07-19 13:28:11.867890 I | etcdserver: start to snapshot (applied: 251838, lastsnap: 241837)
2016-07-19 13:28:11.899176 I | etcdserver: saved snapshot at index 251838
2016-07-19 13:28:11.899461 I | etcdserver: compacted raft log at 246838
2016-07-19 13:28:23.876017 I | fileutil: purged file /var/lib/etcd2/infra1/member/snap/0000000000000caa-0000000000031469.snap successfully
2016-07-19 14:23:45.867896 I | etcdserver: start to snapshot (applied: 261839, lastsnap: 251838)
2016-07-19 14:23:45.900204 I | etcdserver: saved snapshot at index 261839
2016-07-19 14:23:45.900482 I | etcdserver: compacted raft log at 256839
2016-07-19 14:23:53.894054 I | fileutil: purged file /var/lib/etcd2/infra1/member/snap/0000000000000caa-0000000000033b7a.snap successfully

infra2 log

2016-07-19 14:25:00.608974 W | rafthttp: the connection to peer e344a1396881242e is unhealthy
2016-07-19 14:25:00.842435 W | etcdserver: failed to reach the peerURL(http://192.168.49.46:2380) of member e344a1396881242e (Get http://192.168.49.46:2380/version: dial tcp 192.168.49.46:2380: getsockopt: connection refused)
2016-07-19 14:25:00.842488 W | etcdserver: cannot get the version of member e344a1396881242e (Get http://192.168.49.46:2380/version: dial tcp 192.168.49.46:2380: getsockopt: connection refused)
2016-07-19 14:25:04.843764 W | etcdserver: failed to reach the peerURL(http://192.168.49.46:2380) of member e344a1396881242e (Get http://192.168.49.46:2380/version: dial tcp 192.168.49.46:2380: getsockopt: connection refused)
2016-07-19 14:25:04.843825 W | etcdserver: cannot get the version of member e344a1396881242e (Get http://192.168.49.46:2380/version: dial tcp 192.168.49.46:2380: getsockopt: connection refused)
2016-07-19 14:25:08.845047 W | etcdserver: failed to reach the peerURL(http://192.168.49.46:2380) of member e344a1396881242e (Get http://192.168.49.46:2380/version: dial tcp 192.168.49.46:2380: getsockopt: connection refused)
2016-07-19 14:25:08.845106 W | etcdserver: cannot get the version of member e344a1396881242e (Get http://192.168.49.46:2380/version: dial tcp 192.168.49.46:2380: getsockopt: connection refused)
2016-07-19 14:25:12.846185 W | etcdserver: failed to reach the peerURL(http://192.168.49.46:2380) of member e344a1396881242e (Get http://192.168.49.46:2380/version: dial tcp 192.168.49.46:2380: getsockopt: connection refused)
2016-07-19 14:25:12.846248 W | etcdserver: cannot get the version of member e344a1396881242e (Get http://192.168.49.46:2380/version: dial tcp 192.168.49.46:2380: getsockopt: connection refused)
2016-07-19 14:25:16.847425 W | etcdserver: failed to reach the peerURL(http://192.168.49.46:2380) of member e344a1396881242e (Get http://192.168.49.46:2380/version: dial tcp 192.168.49.46:2380: getsockopt: connection refused)
2016-07-19 14:25:16.847487 W | etcdserver: cannot get the version of member e344a1396881242e (Get http://192.168.49.46:2380/version: dial tcp 192.168.49.46:2380: getsockopt: connection refused)
2016-07-19 14:25:20.848772 W | etcdserver: failed to reach the peerURL(http://192.168.49.46:2380) of member e344a1396881242e (Get http://192.168.49.46:2380/version: dial tcp 192.168.49.46:2380: getsockopt: connection refused)
2016-07-19 14:25:20.848834 W | etcdserver: cannot get the version of member e344a1396881242e (Get http://192.168.49.46:2380/version: dial tcp 192.168.49.46:2380: getsockopt: connection refused)
2016-07-19 14:25:23.777587 N | osutil: received terminated signal, shutting down...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.