etcd-io / etcd Goto Github PK

View Code? Open in Web Editor NEW

46.4K 46.4K 9.6K 80.18 MB

Distributed reliable key-value store for the most critical data of a distributed system

Home Page: https://etcd.io

License: Apache License 2.0

Shell 1.99% Go 96.49% Makefile 0.23% Dockerfile 0.03% Jsonnet 1.10% Procfile 0.16%

cncf consensus database distributed-database distributed-systems etcd go key-value kubernetes raft

etcd's Introduction

etcd

Note: The main branch may be in an unstable or even broken state during development. For stable versions, see releases.

etcd is a distributed reliable key-value store for the most critical data of a distributed system, with a focus on being:

Simple: well-defined, user-facing API (gRPC)
Secure: automatic TLS with optional client cert authentication
Fast: benchmarked 10,000 writes/sec
Reliable: properly distributed using Raft

etcd is written in Go and uses the Raft consensus algorithm to manage a highly-available replicated log.

etcd is used in production by many companies, and the development team stands behind it in critical deployment scenarios, where etcd is frequently teamed with applications such as Kubernetes, locksmith, vulcand, Doorman, and many others. Reliability is further ensured by rigorous robustness testing.

See etcdctl for a simple command line client.

_{Original image credited to xkcd.com/2347, alterations by Josh Berkus.}

Maintainers

Maintainers strive to shape an inclusive open source project culture where users are heard and contributors feel respected and empowered. Maintainers aim to build productive relationships across different companies and disciplines. Read more about Maintainers role and responsibilities.

Getting started

Getting etcd

The easiest way to get etcd is to use one of the pre-built release binaries which are available for OSX, Linux, Windows, and Docker on the release page.

For more installation guides, please check out play.etcd.io and operating etcd.

Running etcd

First start a single-member cluster of etcd.

If etcd is installed using the pre-built release binaries, run it from the installation location as below:

/tmp/etcd-download-test/etcd

The etcd command can be simply run as such if it is moved to the system path as below:

mv /tmp/etcd-download-test/etcd /usr/local/bin/
etcd

This will bring up etcd listening on port 2379 for client communication and on port 2380 for server-to-server communication.

Next, let's set a single key, and then retrieve it:

etcdctl put mykey "this is awesome"
etcdctl get mykey

etcd is now running and serving client requests. For more, please check out:

etcd TCP ports

The official etcd ports are 2379 for client requests, and 2380 for peer communication.

Running a local etcd cluster

First install goreman, which manages Procfile-based applications.

Our Procfile script will set up a local example cluster. Start it with:

goreman start

This will bring up 3 etcd members infra1, infra2 and infra3 and optionally etcd grpc-proxy, which runs locally and composes a cluster.

Every cluster member and proxy accepts key value reads and key value writes.

Follow the comments in Procfile script to add a learner node to the cluster.

Install etcd client v3

go get go.etcd.io/etcd/client/v3

Next steps

Now it's time to dig into the full etcd API and other guides.

Read the full documentation.
Review etcd frequently asked questions.
Explore the full gRPC API.
Set up a multi-machine cluster.
Learn the config format, env variables and flags.
Find language bindings and tools.
Use TLS to secure an etcd cluster.
Tune etcd.

Contact

Email: etcd-dev
Slack: #sig-etcd channel on Kubernetes (get an invite)
Community meetings

Community meetings

etcd contributors and maintainers meet every week at 11:00 AM (USA Pacific) on Thursday and meetings alternate between community meetings and issue triage meetings. Meeting agendas are recorded in a shared Google doc and everyone is welcome to suggest additional topics or other agendas.

Issue triage meetings are aimed at getting through our backlog of PRs and Issues. Triage meetings are open to any contributor; you don't have to be a reviewer or approver to help out! They can also be a good way to get started contributing.

The meeting lead role is rotated for each meeting between etcd maintainers or sig-etcd leads and is recorded in a shared Google sheet.

Meeting recordings are uploaded to the official etcd YouTube channel.

Get calendar invitations by joining etcd-dev mailing group.

Join the CNCF-funded Zoom channel: zoom.us/my/cncfetcdproject

Contributing

See CONTRIBUTING for details on setting up your development environment, submitting patches and the contribution workflow.

Please refer to community-membership.md for information on becoming an etcd project member. We welcome and look forward to your contributions to the project!

Please also refer to roadmap to get more details on the priorities for the next few major or minor releases.

Reporting bugs

See reporting bugs for details about reporting any issues. Before opening an issue please check it is not covered in our frequently asked questions.

Reporting a security vulnerability

See security disclosure and release process for details on how to report a security vulnerability and how the etcd team manages it.

Issue and PR management

See issue triage guidelines for details on how issues are managed.

See PR management for guidelines on how pull requests are managed.

etcd Emeritus Maintainers

These emeritus maintainers dedicated a part of their career to etcd and reviewed code, triaged bugs and pushed the project forward over a substantial period of time. Their contribution is greatly appreciated.

Fanmin Shi
Anthony Romano
Brandon Philips
Joe Betz
Gyuho Lee
Jingyi Hu
Xiang Li
Ben Darnell
Sam Batschelet
Piotr Tabor
Hitoshi Mitake

License

etcd is under the Apache 2.0 license. See the LICENSE file for details.

etcd's People

Contributors

Stargazers

Watchers

Forkers

xiang90 philips bringhurst imaxxs iconara mabogunje reinhardholl matttproud traviscross slantview juliusv sinopower emilisto mistobaan m0wfo gmcquillan asbjornenge altoplano conikeeblog freemking doodles526 dreamfrog zefhemel mairbek zionwu bantana mnikhil-git gulinfang marineam mies hoverbear tomekla zhang0137 dogpandacat scottcagno duoku gilliam fmd bairagi marccampbell hayesgm diwakergupta reds ofazomi dwwoelfel rboulton szelok ithinkihaveacat minatl alpe nstielau vektra andypook jplana tunght13488 ddfisher anismiles hannibalhuang soulinfo robknight zined lemonhall rca lavagetto chrisfarms wincus qorio kafkaliu benbjohnson rjocoleman robszumski amirhhz adrianlzt nareshv lemenkov dmarti fanyeren mattheath bauerzhou dansouza redisoptimal cloudfoundry bcwaldon sipims glycerine geertjohan thoughting ronnylt taomaree cenkalti technoweenie nexagames neildunbar casa87 bscott chywoo myaniu maxgfaraday mojotech wuqichuang

etcd's Issues

Index out of range in machines.go

I have a 2 node cluster on two separate machines. I just restarted one of the nodes and the server is now failing with the following:

[etcd] 21:51:08.733989 INFO Found node configuration in '/var/cache/etcd/state/info'. Ignoring flags
panic: runtime error: index out of range

goroutine 1 [running]:
main.getMachines(0x7b6858, 0x0, 0x0, 0x1)
        /Users/dev/etcd/src/github.com/coreos/etcd/machines.go:24 +0x300
main.(*raftServer).ListenAndServe(0xc2000d74d0)
        /Users/dev/etcd/src/github.com/coreos/etcd/raft_server.go:99 +0x235
main.main()
        /Users/dev/etcd/src/github.com/coreos/etcd/etcd.go:215 +0x76f

The conf file contains: {"commitIndex":2,"peers":[]}

I'm using the snapshot feature on both nodes, as recommended in the discussion of #162.

Looking into machines.go it's creating a slice of length len(peers)+1 and trying to set [0] to the leader and [1] to self (and [1] is an index out of range due to len(peers) being zero).

Allow removal of directory keys

Using etcd 0.1.1.

Running the following commands reproduces the error :

# Add subkey (and thus adding "directory" key "a")
curl http://localhost:4001/v1/keys/a/b -d value=x

# Get subkey (check that it exists)
curl http://localhost:4001/v1/keys/a/b

# Remove subkey
curl -X DELETE http://localhost:4001/v1/keys/a/b

# Get a directory, returns "[]"
curl http://localhost:4001/v1/keys/a

# Delete directory (gets errorCode 100, key not found)
curl -X DELETE http://localhost:4001/v1/keys/a

# Get a directory, still returns "[]" ...
curl http://localhost:4001/v1/keys/a

The expected behavior is that a should be deleted with curl -X DELETE http://localhost:4001/v1/keys/a, obviously the key exists since we can GET it.

I'm using directories in my case for service discovery where a directory (example : nodes/b) is a new node in the cluster, and the subkeys (example: nodes/b/status) are properties of that node.

I'm using the existence of directory keys nodes/b to check for existence (I can obviously use subkeys as a workaround).

Let me know if there is an ETA for a fix on this. Or if there is something else I can use in the meantime.

Thanks for your awesome work so far btw 👍

307 Error Code

I try to create a lot of keys to the server in one go. Initially, it still works fine bu after around 50 records, I receive the 502 error.

Clarify architecture in releases

I just noticed that the release binaries are only named after platform and not architecture. Maybe clarifying architecture would make it easier for people to decide whether to build their own binaries or not? Just an idea...

testAndSet on non-existent key

testAndSet should work on a non-existent key.

Nil pointer dereference in raft_stats.go

Again, probably related to #176. When one node goes down, the other node starts panicing with:

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x38 pc=0x40c3cc]

goroutine 1969 [running]:
main.(*raftPeerStats).Succ(0x0, 0x9f04812)
  /Users/dev/etcd/src/github.com/coreos/etcd/raft_stats.go:91 +0x1c
main.(*transporter).SendAppendEntriesRequest(0xc2000cf8c0, 0xc200117000, 0xc2000df540, 0xc200436e60, 0x0, ...)
  /Users/dev/etcd/src/github.com/coreos/etcd/transporter.go:84 +0x6d6
github.com/coreos/go-raft.(*Peer).sendAppendEntriesRequest(0xc2000df540, 0xc200436e60)
  /Users/dev/etcd/src/github.com/coreos/go-raft/peer.go:175 +0x247
github.com/coreos/go-raft.(*Peer).flush(0xc2000df540)
  /Users/dev/etcd/src/github.com/coreos/go-raft/peer.go:161 +0x2f3
github.com/coreos/go-raft.(*Peer).heartbeat(0xc2000df540, 0xc20019b900)
  /Users/dev/etcd/src/github.com/coreos/go-raft/peer.go:146 +0x3a7
created by github.com/coreos/go-raft.(*Peer).startHeartbeat
  /Users/dev/etcd/src/github.com/coreos/go-raft/peer.go:84 +0x78

Upon investigation, it's probably got something to do with the empty peers collection from #176. The SendAppendEntriesRequest method tries to record stats about the peer, and it's returning a nil for the peer stats map.

thisPeerStats, ok := r.peersStats[peer.Name]

// ... snip ...

if err != nil {
  debugf("Cannot send AppendEntriesRequest to %s: %s", u, err)
  if ok {
    thisPeerStats.Fail()
  }
} else {
  if ok {
    // fails here with thisPeerStats being nil
    thisPeerStats.Succ(end.Sub(start))
  }
}

r.peersStats[peer.Name] = thisPeerStats

etcd machines storage keeps duplicates after restart

I'm not sure what the intention/usage of the v1/keys/_etcd/machines/ keys are, but it currently reports duplicates in case a process got restarted.

[
  {"action":"GET","key":"/_etcd/machines/node1","value":"10.70.25.118,7001,4001","index":8},
  {"action":"GET","key":"/_etcd/machines/node3","value":"10.70.25.118,7002,4002","index":8},
  {"action":"GET","key":"/_etcd/machines/node4","value":"10.70.25.118,7003,4003","index":8},
  {"action":"GET","key":"/_etcd/machines/node5","value":"10.70.30.80,7002,4002","index":8},
  {"action":"GET","key":"/_etcd/machines/node6","value":"10.70.30.80,7001,4001","index":8},
  {"action":"GET","key":"/_etcd/machines/node8","value":"10.70.25.118,7001,4001","index":8}
]

node1 and node8 describe the same process.

The output of /machines is correct, as it just asks for the peers.

10.70.25.118:4001,10.70.25.118:4002,10.70.25.118:4003,10.70.30.80:4002,10.70.30.80:4001

I'd expect to either see some explanation about the intention of _etcd/machines/ or to remove keys from that list in case a process goes down.

Distributed master - merging-capable algo?

I understood that the choice for raft algorithm is a convenient approach for simple distribution and election of leader; but I was wondering if there was any plan in mid term for a leader-less algorithm like doozer uses?

While convenient it still has some side effects; if half of the servers are unreachable (IDC issue; rack failure; network disruption; you name it) - then raft prevents the election of a new master (can not reach the consensus of majority of votes). A merging-capable algorithm would somehow prevent such situation and still allow nodes to operate independently until the connectivity issue get resolved.

Either way - congrats for this very promising tool !

separate node identity from transport

Network topologies can be a complex. For example on Rackspace cloud nodes have a public IP and a servicenet IP.

Perhaps, in the case of a single datacenter cluster, you want the etcd servers talking over service net but presenting the client interface via their public IPs. This would mean we need to change a few interfaces:

First on the command line we should identify a node with an id and URLs that it is listening on:

./etcd -i <cluster unique name> -s https://service-net-hostname:8000 -c https://external-hostname:8000

This would also require a change in the machines interface:

curl -L http://127.0.0.1:4001/v1/keys/_etcd/machines
[{"action":"GET","key":"/machines/node1","value":"0.0.0.0,7001,4001","index":4},
{"action":"GET","key":"/machines/node3","value":"0.0.0.0,7002,4002","index":4},
{"action":"GET","key":"/machines/node4","value":"0.0.0.0,7003,4003","index":4}]

To:

curl -L http://127.0.0.1:4001/v1/keys/_etcd/machines
[{"action":"GET","key":"/machines/node1","value":"server=http://service-net1:7001&client=http://public-net:4001","index":4},
{"action":"GET","key":"/machines/node3","value":"server=http://service-net1:7003&client=http://public-net:4003","index":4},
{"action":"GET","key":"/machines/node4","value":"server=http://service-net1:7004&client=http://public-net:4004","index":4}]

In the future we will probably need to provide a list of public and a list of server URLs. But, for this initial version lets just support 1 of each. The use case would be cross data center connections. The list would be in priority order. Lets not worry about it for now but keep it in mind.

RFC: versioning of the protocol

We want to be able to roll out a new version of etcd that understands a new protocol without upgrading every follower at once. To do this we should have a protocol version number that encapsulates any changes to etcds internal protocols.

Add a ProtocolVersion parameter to the Join command store this in etcd/_machines
If the master sees that _etcd/protocol is less than the ProtocolVersion of all machines in _etcd/protocol then it SHOULD commit the version to _etcd/protocol
All followers MUST speak the new protocol after this point.

All followers MUST be capable of speaking all older protocol versions. We may consider adding a SupportedProtocols field or something to loosen this requirement further down the road.

test: add sigstop test to simulate net partitions

It was suggested that a good test would be to sigstop peers and the master to test network partitions.

Accept PUT requests as well as POST for changing keys.

As it stands, etcd accepts POST requests for changing keys like so:

curl -L http://127.0.0.1:4001/v1/keys/message -d value="Hello etcd"

@philips mentioned this was mostly for convenience. However that's no excuse not to accept PUT as well, so we can follow REST guidlines.

Current behaivor:

   ______                ____  _____
  / ____/___  ________  / __ \/ ___/
 / /   / __ \/ ___/ _ \/ / / /\__ \
/ /___/ /_/ / /  /  __/ /_/ /___/ /
\____/\____/_/   \___/\____//____/
core@localhost ~ $ curl -X POST http://127.0.0.1:4001/v1/keys/message -d value="Hello world"
{"action":"SET","key":"/message","value":"Hello world","newKey":true,"index":5}
core@localhost ~ $ curl -X GET http://127.0.0.1:4001/v1/keys/message
{"action":"GET","key":"/message","value":"Hello world","index":5}
core@localhost ~ $ curl -X POST http://127.0.0.1:4001/v1/keys/message -d value="Hello etcd"
{"action":"SET","key":"/message","prevValue":"Hello world","value":"Hello etcd","index":6}
core@localhost ~ $ curl -X GET http://127.0.0.1:4001/v1/keys/message
{"action":"GET","key":"/message","value":"Hello etcd","index":6}
core@localhost ~ $ curl -X PUT http://127.0.0.1:4001/v1/keys/message -d value="Hello potato"
core@localhost ~ $ curl -X GET http://127.0.0.1:4001/v1/keys/message
{"action":"GET","key":"/message","value":"Hello etcd","index":6}
core@localhost ~ $

Desired behaivor:

   ______                ____  _____
  / ____/___  ________  / __ \/ ___/
 / /   / __ \/ ___/ _ \/ / / /\__ \
/ /___/ /_/ / /  /  __/ /_/ /___/ /
\____/\____/_/   \___/\____//____/
core@localhost ~ $ curl -X POST http://127.0.0.1:4001/v1/keys/message -d value="Hello world"
{"action":"SET","key":"/message","value":"Hello world","newKey":true,"index":5}
core@localhost ~ $ curl -X GET http://127.0.0.1:4001/v1/keys/message
{"action":"GET","key":"/message","value":"Hello world","index":5}
core@localhost ~ $ curl -X POST http://127.0.0.1:4001/v1/keys/message -d value="Hello etcd"
{"action":"SET","key":"/message","prevValue":"Hello world","value":"Hello etcd","index":6}
core@localhost ~ $ curl -X GET http://127.0.0.1:4001/v1/keys/message
{"action":"GET","key":"/message","value":"Hello etcd","index":6}
core@localhost ~ $ curl -X PUT http://127.0.0.1:4001/v1/keys/message -d value="Hello potato"
{"action":"SET","key":"/message","prevValue":"Hello etcd","value":"Hello potato","index":6}
core@localhost ~ $ curl -X GET http://127.0.0.1:4001/v1/keys/message
{"action":"GET","key":"/message","value":"Hello etcd","index":6}
core@localhost ~ $

Assignment to entry in nil map in store/watcher.go

This could be another symptom of #176, but I'm also seeing this a lot in my logs now:

runtime error: assignment to entry in nil map

It seems the WatcherHub watchers map is nil for some reason.

[etcd] 00:51:11.265706 INFO Found node configuration in '/var/cache/etcd/state/info'. Ignoring flags
[etcd] 00:51:11.429209 WARN the entire cluster is down! this machine will restart the cluster.
[etcd] 00:51:11.429582 INFO etcd server [name app, listen on 0.0.0.0:4001, advertised url http://myhostname:4001]
[etcd] 00:51:11.429869 INFO raft server [name app, listen on 0.0.0.0:7001, advertised url http://myhostname:7001]
2013/09/18 00:51:11 http: panic serving myip:51101: runtime error: assignment to entry in nil map
goroutine 30 [running]:
net/http.func·007()
  /usr/local/go/src/pkg/net/http/server.go:1022 +0xac
github.com/coreos/etcd/store.(*WatcherHub).addWatcher(0xc200000338, 0xc200429000, 0x17, 0xc2003073e0, 0x0, ...)
  /Users/dev/etcd/src/github.com/coreos/etcd/store/watcher.go:55 +0x282
github.com/coreos/etcd/store.(*Store).AddWatcher(0xc2000e0180, 0xc20043f84e, 0x16, 0xc2003073e0, 0x0, ...)
  /Users/dev/etcd/src/github.com/coreos/etcd/store/store.go:497 +0x71
main.(*WatchCommand).Apply(0xc2004dbfe0, 0xc2000ff000, 0x7fad1b710b90, 0x3, 0x3, ...)
  /Users/dev/etcd/src/github.com/coreos/etcd/command.go:109 +0x89
main.WatchHttpHandler(0xc200308240, 0xc200262c40, 0xc2004dcea0, 0x4f9282, 0xc2000d0cd0, ...)
  /Users/dev/etcd/src/github.com/coreos/etcd/etcd_handlers.go:287 +0x264
main.errorHandler.ServeHTTP(0x7b6838, 0xc200308240, 0xc200262c40, 0xc2004dcea0)
  /Users/dev/etcd/src/github.com/coreos/etcd/etcd_handlers.go:53 +0x7d
net/http.(*ServeMux).ServeHTTP(0xc2000d0cc0, 0xc200308240, 0xc200262c40, 0xc2004dcea0)
  /usr/local/go/src/pkg/net/http/server.go:1416 +0x11d
net/http.serverHandler.ServeHTTP(0xc2000e0200, 0xc200308240, 0xc200262c40, 0xc2004dcea0)
  /usr/local/go/src/pkg/net/http/server.go:1517 +0x16c
net/http.(*conn).serve(0xc200313900)
  /usr/local/go/src/pkg/net/http/server.go:1096 +0x765
created by net/http.(*Server).Serve
  /usr/local/go/src/pkg/net/http/server.go:1564 +0x266

I'll investigate further too.

High Memory Usage

Currently my etcd nodes are sitting at 229m, 228m, and 189m of resident memory. Its a bit worrying as the log is 29M and we are at most making 80 writes/second on about 68 keys.

This may not be a pressing issue, but the memory usage makes it worrying to run on smaller nodes.

Sockets remain in TIME_WAIT / CLOSE_WAIT

I have a simple etcd setup with two machines in a cluster. Both servers eventually run out of free file descriptors (2013/09/10 16:47:32 http: Accept error: accept tcp [::]:7001: too many open files; retrying in 1s).

Etcd fails to properly close the sockets. They remain in TIME_WAIT / CLOSE_WAIT. You can use these two commands to inspect in which state the sockets are:

lsof -a -p pidof etcd
netstat -ntp

So far one of the etcd servers has leaked 14 file descriptors in about two hours.

Building RPM for FC 19.

For
https://github.com/coreos/etcd/blob/v0.1.1/build

Why do you have "ETCD_PACKAGE=github.com/coreos/etcd"? I'm trying to package etcd v0.1.1 specifically, don't want to be pulling the recent commits every time someone installs the RPM.

Receiving error message on install

Installed GO on a fresh EC2 instance

sudo apt-get install golang

Followed instructions as specified and received error:

   go get github.com/coreos/etcd
 # github.com/coreos/etcd/store
 src/github.com/coreos/etcd/store/store.go:549: method s.checkNode is not an expression, must be called
# github.com/ccding/go-config-reader/config
src/github.com/ccding/go-config-reader/config/config.go:31: undefined: bufio.NewScanner
# github.com/coreos/go-raft
src/github.com/coreos/go-raft/log.go:242: function ends without a return statement

Inconsistent get return value for nested nodes

Hi,

I'm evaluating etcd and found one weird inconsistency in the API.

Let's say I have the following key/value pairs:

/foo/1   f1
/foo/2   f2
/bar/1   b1

Why does GET /keys/foo return an array, but GET /bar the only nested key? I guess the client libraries could check if the returned key has the same path as the requested one, but I'm wondering if there is any specific reason why it works this way?

Add support for daemonizing

Currently etcd runs on fore ground, it will be helpful if it can daemonize itself (and make this behavior configurable via a command line argument). i.e

etcd -D  # this will return immediately leaving the daemon running in background
etcd      # without the -D option it will run in foreground,(current behavior)

Feature Request : A Status Code to indicate Success or Failure (instead of just errorCode in event of failure)

In event of a network failure, an insert into etcd might fail, but an external process might query of an non-existent key for variety of reasons.

Non-existent keys send a distinct failed message:

{"errorCode":100,"message":"Key Not Found","cause":"/D0A1A866-6591-40CF-99EE-15AE1BF3A7F1"}

Successful value fetch from key does not hold a statusCode at this point

[{"action":"GET","key":"/51b8cc7c-c7d8-4d6e-86ee-1a381db2f068/status","value":"COMPLETED","expiration":"2013-08-11T06:04:23.647167583-07:00","ttl":29849,"index":97}]

Is it possible to add an upper level 'statusCode' to indicate success or failure instead of just passing errorCode on event of a failure?

Client(s) can then use the statusCode to make deterministic logic flow.

Remove nodes

When a nodes fails and does not come back for a long time, it is reasonable for etcd to remove that node from the cluster.

Add version flag

I didn't see a version flag specified anywhere. Maybe I'm crazy?

Test-and-set should be described as compare-and-swap?

Is what is described in the README as "test-and-set" what Wikipedia describes as test-and-set, or is it actually compare-and-swap?

These terms seem to be used somewhat imprecisely and interchangeably, but compare-and-swap (or compare-and-set) seems to be more common. (And to the extent that there is a meaningful difference between the two (e.g. this paper), compare-and-swap seems to better match the semantics of the etcd operation.)

To take some random examples, for the concept described in the README, the following use the term:

"compare-and-swap" - Wikipedia
"compare-and-set" - Couchbase documentation, Google's Python memcache client.
"test-and-set" if operating on a bit, and "compare-and-swap" if operating on a 32-bit field - IBM's z/TPF documentation

Tried to build from Source. No failure and no results?

Yes, it's the first time I'm building any Go project from source, but it should be fairly easy if I read your README.

I run Ubuntu 13.04 and installed the golang-go package. Now running the build script results in the following message, where I can't understand if these are errors or warnings or a stacktrace:

# github.com/ccding/go-config-reader/config
src/github.com/ccding/go-config-reader/config/config.go:35: undefined: bufio.NewScanner
# github.com/coreos/etcd/store
src/github.com/coreos/etcd/store/store.go:633: method s.checkNode is not an expression, must be called
# github.com/coreos/go-raft
src/github.com/coreos/go-raft/log.go:241: function ends without a return statement

The guide doesn't say more, so I hope to get some help here.

testAndSet prevIndex

testAndSet should allow for testing the index instead of the value.

Odd number of instances

Just curious, how important is it that an odd number of etcd instances are running in a cluster?

FATAL open node0/info: no such file or directory

Runing ./etcd -d node0 -n node0 gives FATAL open node0/info: no such file or directory

Any tips on what's wrong?

Is this the right place to post this type of questions?

root@e7a377552a3b:/usr/local/etcd# ls -al
total 7684
drwxr-xr-x 11 root root    4096 Aug 12 22:27 .
drwxr-xr-x 13 root root    4096 Aug 12 22:04 ..
drwxr-xr-x  8 root root    4096 Aug 12 22:04 .git
-rw-r--r--  1 root root      36 Aug 12 22:04 .gitignore
-rw-r--r--  1 root root      76 Aug 12 22:04 .travis.yml
-rw-r--r--  1 root root     316 Aug 12 22:04 Dockerfile
-rw-r--r--  1 root root   11358 Aug 12 22:04 LICENSE
-rw-r--r--  1 root root   14176 Aug 12 22:04 README.md
-rwxr-xr-x  1 root root     449 Aug 12 22:04 build
-rw-r--r--  1 root root    4005 Aug 12 22:04 command.go
-rw-r--r--  1 root root    3202 Aug 12 22:04 config.go
-rw-r--r--  1 root root    1185 Aug 12 22:04 error.go
-rwxr-xr-x  1 root root 7681392 Aug 12 22:14 etcd
-rw-r--r--  1 root root    5483 Aug 12 22:04 etcd.go
-rw-r--r--  1 root root    7749 Aug 12 22:04 etcd_handlers.go
-rw-r--r--  1 root root     885 Aug 12 22:04 etcd_server.go
-rw-r--r--  1 root root    7920 Aug 12 22:04 etcd_test.go
drwxr-xr-x  3 root root    4096 Aug 12 22:04 fixtures
-rw-r--r--  1 root root      61 Aug 12 22:04 go_version.go
-rw-r--r--  1 root root     751 Aug 12 22:04 machines.go
-rw-r--r--  1 root root    1343 Aug 12 22:04 name_url_map.go
drwxr--r--  2 root root    4096 Aug 12 22:28 node0
-rw-r--r--  1 root root    3325 Aug 12 22:04 raft_handlers.go
-rw-r--r--  1 root root    4798 Aug 12 22:04 raft_server.go
-rw-r--r--  1 root root      57 Aug 12 22:14 release_version.go
drwxr-xr-x  2 root root    4096 Aug 12 22:04 scripts
-rw-r--r--  1 root root     895 Aug 12 22:04 snapshot.go
drwxr-xr-x  5 root root    4096 Aug 12 22:05 src
drwxr-xr-x  2 root root    4096 Aug 12 22:04 store
drwxr-xr-x  2 root root    4096 Aug 12 22:04 test
-rwxr-xr-x  1 root root      90 Aug 12 22:04 test.sh
drwxr-xr-x  5 root root    4096 Aug 12 22:04 third_party
-rw-r--r--  1 root root    4135 Aug 12 22:04 transporter.go
-rw-r--r--  1 root root    4001 Aug 12 22:04 util.go
-rw-r--r--  1 root root      35 Aug 12 22:04 version.go
drwxr-xr-x  2 root root    4096 Aug 12 22:04 web
root@e7a377552a3b:/usr/local/etcd# ./etcd -d node0 -n node0
[etcd] 23:02:26.526660 FATAL open node0/info: no such file or directory
root@e7a377552a3b:/usr/local/etcd# uname -a
Linux e7a377552a3b 3.8.0-27-generic #40~precise3-Ubuntu SMP Fri Jul 19 14:38:30 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
root@e7a377552a3b:/usr/local/etcd# go version
go version go1.1.2 linux/amd64

Improve "Not a File" error message

When you try to write a value to an existing directory, you receive a "Not a File" message. Ideally this should say something more descriptive about the error, like "Value already exists as directory".

Add support for logging into a file

It will help the users if etcd binary can take an optional argument for logging into a file instead of stdout/stderr

Trying to compile package under Win go 1.1.2 and have such mistake

C:\Users\blaze\etcd>go build etcd

github.com/coreos/go-raft
C:\Go\src\pkg\github.com\coreos\go-raft\snapshot.go:57: cannot use int(file.Fd()
) (type int) as type syscall.Handle in function argument

Delete directory

It seems that it is not possible to delete empty directories?

$ curl 127.0.0.1:4001/v1/keys/hosts/
{"errorCode":100,"message":"Key Not Found","cause":"/hosts"}

$ curl -d value=1 127.0.0.1:4001/v1/keys/hosts/a
{"action":"SET","key":"/hosts/a","value":"1","newKey":true,"index":3}

$ curl 127.0.0.1:4001/v1/keys/hosts/
[{"action":"GET","key":"/hosts/a","value":"1","index":3}]

$ curl -X DELETE 127.0.0.1:4001/v1/keys/hosts/a
{"action":"DELETE","key":"/hosts/a","prevValue":"1","index":4}

$ curl 127.0.0.1:4001/v1/keys/hosts/
[]

$ curl -X DELETE 127.0.0.1:4001/v1/keys/hosts/
{"errorCode":100,"message":"Key Not Found","cause":"/hosts"}

$ curl -X DELETE 127.0.0.1:4001/v1/keys/hosts
{"errorCode":100,"message":"Key Not Found","cause":"/hosts"}

$ curl -d value=1 127.0.0.1:4001/v1/keys/hosts
{"errorCode":102,"message":"Not A File","cause":"/hosts"}

Is that a bug or intended functionality?

Read only mode returns Raft Internal Error

We should make the error json when the majority of the cluster is down instead of

$ curl -L 127.0.0.1:4001/v1/keys/brandon -d 'value=foobar'
raft: Command timeout

Something like this:

$ curl -L 127.0.0.1:4001/v1/keys/brandon -d 'value=foobar'
{"errorCode":500,"message":"Quorum of machines not available","cause":"set: /brandon"}

/v1/leader returns the raft port, not the client port

I've been writing a client for etcd (https://github.com/iconara/etcd-rb), and I implemented automatic failover and some other features using the /machines and /leader features. Now it seems like the latter has changed to return the raft port instead of the client port, so it can't be used by the client. Is this by design or by accident?

In other words, in previous versions curl http://127.0.0.1:4001/leader would return "127.0.0.1:4001", now curl http://127.0.0.1:4001/v1/leader returns "http://127.0.0.1:7001/" (notice that the port is for the raft protocol and not the client).

It's great that the leader and machines resources got moved into the versioned part of the URL scheme, by the way.

What are the consistency guarantees for mutations with respect to replication?

If I have a cluster of etcd processes and set the value for a key on a slave and then immediately read the same key from the slave I always get a 404. I expected to get it less than half of the time, since my expectation was that the set operation blocks until the value has been committed on half of the nodes (and with all three nodes running locally as soon as one node has the value the other one should too, most of the time). Is this not how Raft is supposed to work? Or is etcd only waiting for the leader to acknowledge that the value has been persisted locally?

It would be great if it was at least configurable what consistency level that a mutation operation requires.

Here I have three nodes running locally on ports 4001, 4002 and 4003, the last one is the leader:

% curl -is -XDELETE -L 'http://localhost:4001/v1/keys/abc' && curl -is -XPOST -L 'http://localhost:4001/v1/keys/abc' -d 'value=foo' && curl -is -L 'http://localhost:4001/v1/keys/abc'

HTTP/1.1 307 Temporary Redirect
Location: http://0.0.0.0:4003/v1/keys/abc
Content-Type: text/plain; charset=utf-8
Content-Length: 0
Date: Sun, 04 Aug 2013 17:58:00 GMT

HTTP/1.1 200 OK
Content-Type: text/plain; charset=utf-8
Content-Length: 62
Date: Sun, 04 Aug 2013 17:58:00 GMT

{"action":"DELETE","key":"/abc","prevValue":"foo","index":419}

HTTP/1.1 307 Temporary Redirect
Location: http://0.0.0.0:4003/v1/keys/abc
Content-Type: text/plain; charset=utf-8
Content-Length: 0
Date: Sun, 04 Aug 2013 17:58:00 GMT

HTTP/1.1 200 OK
Content-Type: text/plain; charset=utf-8
Content-Length: 69
Date: Sun, 04 Aug 2013 17:58:00 GMT

{"action":"SET","key":"/abc","value":"foo","newKey":true,"index":420}

HTTP/1.1 404 Not Found
Content-Type: text/plain; charset=utf-8
Content-Length: 19
Date: Sun, 04 Aug 2013 17:58:00 GMT

404 page not found

As you can see the delete and set operations are redirected to the master, and resent there. Then the get operation returns a 404 because the slave does not yet have the value. If I run the get operation in a loop I eventually (like in milliseconds) get the value, but I can do a couple of tens or hundreds of HTTP requests before I get it.

Split main etcd.go in multiple files

I think etcd.go is too big of a file that does too many things.

I propose to split it in different file based on functionality:

WIP: https://github.com/Mistobaan/etcd/compare

TestAndSet on a non-existent value

It'd be nice if you could do a TESTANDSET on a key that doesn't exist (and create it).

For context: I'm using etcd as a lock service, having a service lock its own hostname record and having slaves sleep until they can grab the lock.

Each service is to doing a TESTANDSET on its hostname, this works fine if the hostname record exists; however, when a brand new service starts it doesn't yet have a hostname, so it fails with a 404 when it tries to TESTANDSET it.

Right now I have to do the initial TESTANDSET, check for a 404, then do a SET, followed by another GET to make sure we set it (in case one of the slaves got to it first). This is a bit of extra work, and a lot less atomic than I would prefer.

I'm happy to contribute a fix, but I thought I'd see if it's something other people think is a good/bad idea first.

Consistent GET/WATCH

All the nodes will get the newest data after one broadcast cycle.

During the broadcast cycle, the client may get old data from the followers in the cluster (etcd has a internal leader, follower structure from go-raft) since we do not redirect GET and WATCH. To allow this, we can gain much more throughout put.

We will add consistent GET and WATCH that will also redirect GET and WATCH to the leader to deal with this problem.

Cleanups to the stats API

/v1/stats/self
/v1/stats/machines/

{
    "startTime":"2013-09-11T17:47:35.394627053-07:00",
    "leader": {
        "name": "machine3",
        "uptime":"3m43.397313616s"
    },
    "operationCounts": {
        "gets":12,
        "sets":7,
        "deletes":0,
        "testAndSets":0
    },
    "raftStats": {
        "recvAppendRequestCnt":1,
        "sendAppendRequestCnt":8184,
        "sendPkgRate":36.869030756098816,
        "sendBandwidthRate":397190.0683354525
    }
}

/v1/stats/leader

Note: proxied to the leader

{
    "followers": [
        machine0: {
            "role": "leader",
            "currentLatency":1.400807,
            "averageLatency":1.0508203526193118,
            "sdvLatency":0.5267024400632015,
            "minLatency":0.326581,
            "maxLatency":15.584396,
            "failsCount":321,
            "successCount":19738
            "failing":0,
        },
        machine1: {
            "role": "follower",
            "currentLatency":1.400807,
            "averageLatency":1.0508203526193118,
            "sdvLatency":0.5267024400632015,
            "minLatency":0.326581,
            "maxLatency":15.584396,
            "failsCount":321,
            "successCount":19738
            "failing":0,
        }
    ]
}

/cc @xiangli-cmu

addToSet command

I think a common use case for etcd would be to keep a set of values of a particular type, where the key you give denotes a set, rather than a key->value pair.

Example scenario: A memcached container spins up and registers its address to the key /backing_services/memcaced. Web server containers listen to this key, and is notified that another memcached server has been added.

I'd like this to work something like:

$ curl http://etcd:4001/v1/keys/backing_services/memcached -d setMember="memcached48.wonderland.com:49152"
    {
        "action": "ADDTOSET",
        "index": 1,
        "key": "/backing_services/memcached",
        "newKey": true,
        "value": "memcached48.wonderland.com:49152"
    }
$ curl http://etcd:4001/v1/keys/backing_services/memcached -d setMember="memcached48.wonderland.com:49152
    {
        "action": "ADDTOSET",
        "index": 1,
        "key": "/backing_services/memcached",
        "newKey": false,
        "value": "memcached48.wonderland.com:49152"
    }
$ curl http://etcd:4001/v1/keys/backing_services/memcached -d setMember="memcached49.wonderland.com:49731"
    {
        "action": "ADDTOSET",
        "index": 2,
        "key": "/backing_services/memcached",
        "newKey": true,
        "value": "memcached49.wonderland.com:49731"
    }

I'm not sure about all details, but first of all I'd like your input. Does this resonate with your vision for etcd? I know this could be accomplished by simply calculating a hash of the value to be stored, and then use this as the key name. But given the practicality of this use case, I think it'd be nice to have it integrated into etcd.

API to LIST ALL Keys

Is there an API to list all keys, synonymous to Zookeeper's "ls /"

Ephemeral nodes?

ZooKeeper supports the creation of "ephemeral znodes", which are znodes (key-value pairs in etcd parlance) that are automatically deleted when the session that creates them terminates, either deliberately or due to a failure. Do you guys think this feature is worth adding to etcd?

I'm bringing this up because I'm working on a set of sync primitives, and some of them could be implemented more easily if there was this feature. For example, in implementing distributed locks, I have to make sure that if a node dies when holding a lock, the lock gets deleted. The way I'm doing this right now is to set TTL for the lock file and reset the file periodically, so that if the node dies the lock file will also disappear after some time. However if etcd supported ephemeral keys, I would just make the lock file ephemeral, so that if the node dies the lock file automatically gets deleted.

There are many other potential applications of this feature of course; this is just an example.

TTL on directories

How would TTL work in context of a directory structure

curl -L http://127.0.0.1:4001/v1/keys/reqid/status -d value=XYZ
curl -L http://127.0.0.1:4001/v1/keys/reqid/token -d value=PQR

If I'd wish to set a TTL on "v1/keys/reqid" so that all sub directories are purged on expiration, how would I go about doing it?

ACL support

We will add ACL soon.

README: Improved CA instructions

In order to use all of the TLS features users need a nice example of how to generate and manage a x509 CA. Some things to research and consider:

watch returns previous index after some time

Test code: https://gist.github.com/EnosFeedler/6502204

Reproduces after a day of the cluster being up.

IRC context

[14:38:51] <enos>    philips: is this the etcd channel as well? :)
[14:39:01] <philips>     enos: yes
[14:43:35] <enos>    philips: I have a client setting a value on a key, waiting for the response, and then watching for changes to that value. What I am seeing is the watch returns the same set event that I previously snet
[14:44:34] <enos>    philips: I even then tried sending the index that was returned from the set along with the subsequent watch command to ensure that I wouldn't be observing the previous set, but it still returned that event (with the same index)
[14:44:44]   failshell ([email protected]) left IRC. (Ping timeout: 256 seconds)
[14:44:53] <enos>    philips: am I missing something about how this works?
[14:45:54] <philips>     enos: that sounds right. If you watch on an identical index watch should just hang.
[14:45:58]   chorrell ([email protected]) joined the channel.
[14:47:33] <enos>    philips: if I do SET /state, "newValue" followed by a WATCH /state.  Shouldnt the SET already have "happened" and therefore the watch should miss it?
[14:47:49] <enos>    philips: this is the behavior I want, at least.
[14:47:51] <philips>     enos: that is how it should work, yes
[14:48:05] <philips>     enos: are the set and watch happenin gon the same node?
[14:48:11] <enos>    philips: yes.
[14:48:12] <enos>    err
[14:48:15] <enos>    yes
[14:48:21] <philips>     enos: err, machine not node
[14:48:28] <philips>     enos: hrm, it shouldn't work that way
[14:48:43] <philips>     enos: Can you file a bug? I can't look at this right now.
[14:48:44] <enos>    philips: yes they are happening in the same process actually
[14:48:49] <enos>    philips: yes sure
[14:48:51] <philips>     enos: github.com/coreos/etcd/issues
[14:48:59] <philips>     enos: thank you
[14:49:01] <enos>    philips: i can try and repro it with curl, im using a node-etcd node.js client right now
[14:58:22]   zenoway ([email protected]) joined the channel.
[15:00:57] <philips>     enos: perfect
[15:02:59]   zenoway ([email protected]) left IRC. (Ping timeout: 276 seconds)
[15:08:11] <enos>    philips: if i put a 100ms delay between the SET and WATCH then the watch will successfully ignore the previoius set
[15:08:34] <enos>    philips: the trouble is when the commands are called without delay between them
[15:08:59] <philips>     enos: if this is node are you waiting for the SET callback first?
[15:09:13] <enos>    philips: yes im waiting for the set callback
[15:09:24] <philips>     enos: so you don't start the watch until set returns?
[15:09:29] <enos>    correct
[15:10:04] <philips>     enos: hrm, that sounds like a bug then. If you can provide a test case I will try and fix it.
[15:10:15] <philips>     but it sounds pretty straightforward
[15:10:16] <enos>    philips: https://gist.github.com/EnosFeedler/6502204
[15:10:21] <enos>    this is the test code
[15:10:45] <enos>    my etcd has been running since yesterday and things were working fine yesterday.  I just started seeing this issue today
[15:10:53] <enos>    but it happens consistently, every time
[15:11:31] <philips>     enos: oh, so this delay only started happening after some time?
[15:12:02] <enos>    yesterday the watch was working fine, it was ignoring the set called right before it. today, its picking up the set
[15:15:27] <enos>    philips: I am made it more clean in the gist, with program output
[15:16:02] <philips>     enos: thanks. I will turn this into a bug

Unable to build on ubuntu

I've downloaded version 1.1.2 of go. This is the output when I try to build:

vagrant@precise64:~/etcd$ go version
go version go1
vagrant@precise64:~/etcd$ ./build
# github.com/coreos/etcd/store
src/github.com/coreos/etcd/store/store.go:638: method s.checkNode is not an expression, must be called
# github.com/ccding/go-config-reader/config
src/github.com/ccding/go-config-reader/config/config.go:35: undefined: bufio.NewScanner
# github.com/coreos/go-raft
src/github.com/coreos/go-raft/log.go:241: function ends without a return statement
vagrant@precise64:~/etcd$ uname -a
Linux precise64 3.8.0-27-generic #40~precise3-Ubuntu SMP Fri Jul 19 14:38:30 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Problem when restart all nodes

Only record names in raft conf, cannot recover raftURL.
I am solving this problem.

Always return JSON list for directory listing

When listing a directory containing a single subkey, the GET command returns that single key as a JSON dictionary:

core@localhost ~/foo $ curl -L http://127.0.0.1:4001/v1/keys/foo/bar/
{"action":"GET","key":"/foo/bar/k1","value":"v1","index":16}

When a directory contains multiple subkeys, a JSON list is returned:

core@localhost ~/foo $ curl -L http://127.0.0.1:4001/v1/keys/foo/bar/
[{"action":"GET","key":"/foo/bar/k1","value":"v1","index":17},{"action":"GET","key":"/foo/bar/k2","value":"v2","index":17}]

This makes the interface cumbersome to use, as it requires inspecting the result and special casing the single subkey case. I suggest always returning a list, even when there is a single subkey.

Add support for reading configurations from file

It will help building platform specific packages for etcd if it can read all the configurations (available via command line arguments) from a file. This will also simplify writing init.d scripts.

Support for Ephemeral keys

I would like to try etcd out for one of my current projects that uses Zookeeper. My project makes use of ephemeral nodes in ZK. Is there any intention to add this ability to etcd? If it is already there then can we add a blurb about it to the readme.