3-clause BSD. See LICENSE file.
go-zookeeper / zk Goto Github PK
View Code? Open in Web Editor NEWThis project forked from samuel/go-zookeeper
Native ZooKeeper client for Go
License: BSD 3-Clause "New" or "Revised" License
This project forked from samuel/go-zookeeper
Native ZooKeeper client for Go
License: BSD 3-Clause "New" or "Revised" License
3-clause BSD. See LICENSE file.
Hello, i had a workaround here eBayClassifiedsGroup/PanteraS#299 (comment)
Somehow I trusted github to store the information I needed as usually people archive their repo
It seems the owner didn't think people needed legacy support #50
How to get the .deb from this new repo please :) !
This is basically a port of this commit:
z-division@4f30e43
The same issue was earlier found in the java client, ZOOKEEPER-1576.
Or is there a reason why this fix has not been ported from the z-division fork to this fork?
Add a deleteall
implementation to the client. It is especially useful in test to teardown the test nodes. Right now if the nodes have children, using the client, we need to delete recursively. Else would get ErrNotEmpty
. I briefly looked at the implementation and this seems to be the start where we can add: https://github.com/go-zookeeper/zk/blob/master/constants.go#L202. Happy to pull a PR if there is a guide on this. Thanks!
conn.Multi signature is func (c *Conn) Multi(ops ...interface{}) ([]MultiResponse, error)
where ops is any type. However it will return an error if it is not one of CreateRequest
, SetDataRequest
, DeleteRequest
, or CheckVersionRequest:
We could implement a interface that all these structs implement to limit to just the operations from the package. This could just return the opCode itself.
type operation interface {
opCode() int32 // better yet type opCodes as well.
}
In the connect loop the client does not try to reset active watches.
This leads to stale view of the world if the client becomes isolated.
I would expect to receive an event to ensure that the business logic is aware of partitioning and could react accordingly.
This has already pseudo started with the repo https://github.com/go-zookeeper/recipes.
I would suggest we write down a scheme we intend to prescribe so we know where things belong.
github.com/go-zookeeper/recipes/lock
github.com/go-zookeeper/recipes/election
We should move/remove these peripherals to other repositories or even just sub-Go packages to remove the noise of supporting recipes in the core client.
These would be different than let's say github.com/go-zookeeper/zk/zktestutils
, or where the FourLetter Helpers, maybe go-zookeeper/zk/flw
.
My hunch is that these patterns/recipes would get the most issues or questions/support, so from separating them into another repo seems smart. On the other hand that may lead to a tiny bit more overhead in redirecting or moving issue over to the other repo. On the other hand, we could have all of these be in the same repo, but sub-packages
github.com/go-zookeeper/zk/recipes/lock
github.com/go-zookeeper/zk/recipes/election
or even
github.com/go-zookeeper/zk/lock
github.com/go-zookeeper/zk/election
cc @nemith
Zookeeper 3.5 was released in May 2019 with support for TTL nodes. This library has not been updated to support 3.5 features including TTL nodes.
doc: https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_4lw
such as
$ echo mntr | nc localhost 2185
zk_version 3.4.0
zk_avg_latency 0
zk_max_latency 0
zk_min_latency 0
zk_packets_received 70
zk_packets_sent 69
zk_outstanding_requests 0
zk_server_state leader
zk_znode_count 4
zk_watch_count 0
zk_ephemerals_count 0
zk_approximate_data_size 27
zk_followers 4 - only exposed by the Leader
zk_synced_followers 4 - only exposed by the Leader
zk_pending_syncs 0 - only exposed by the Leader
zk_open_file_descriptor_count 23 - only available on Unix platforms
zk_max_file_descriptor_count 1024 - only available on Unix platforms
when session close then recreate session but FlagEphemeral node not recreate
zk event state not support SyncConnected, so it use Unknown replaced, but I don't think it's appropriate because the official documentation has clearly defined this state
this is the log printed by zk
{"message":"NodeWatch::watchFlow receive zk events","state":"Unknown","type":"EventNodeChildrenChanged","path":"/colla/cpf/configs/flow"}
this is the watcher response from zkCli
[zk: localhost:2181(CONNECTED) 31] ls -w /colla/cpf/configs/flow
ls -w /colla/cpf/configs/flow
[zk: localhost:2181(CONNECTED) 32]
WATCHER::
WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/colla/cpf/configs/flow
A lot of the standard methods are missing go doc
. We should align the documentation with that found from the Java client lib (https://zookeeper.apache.org/doc/r3.4.14/api/index.html)
conn.go:57:6: exported type Dialer should have comment or be unexported
conn.go:69:6: exported type Conn should have comment or be unexported
conn.go:140:6: exported type Event should have comment or be unexported
conn.go:313:1: exported method Conn.Close should have comment or be unexported
conn.go:987:1: exported method Conn.AddAuth should have comment or be unexported
conn.go:1013:1: exported method Conn.Children should have comment or be unexported
conn.go:1026:1: exported method Conn.ChildrenW should have comment or be unexported
conn.go:1044:1: exported method Conn.Get should have comment or be unexported
conn.go:1076:1: exported method Conn.Set should have comment or be unexported
conn.go:1089:1: exported method Conn.Create should have comment or be unexported
conn.go:1151:1: exported method Conn.Delete should have comment or be unexported
conn.go:1160:1: exported method Conn.Exists should have comment or be unexported
conn.go:1178:1: exported method Conn.ExistsW should have comment or be unexported
conn.go:1203:1: exported method Conn.GetACL should have comment or be unexported
conn.go:1215:1: exported method Conn.SetACL should have comment or be unexported
conn.go:1228:1: exported method Conn.Sync should have comment or be unexported
conn.go:1241:6: exported type MultiResponse should have comment or be unexported
constants.go:10:2: exported const DefaultPort should have comment (or a comment on this block) or be unexported
constants.go:37:2: exported const EventNodeCreated should have comment (or a comment on this block) or be unexported
constants.go:58:2: exported const StateUnknown should have comment (or a comment on this block) or be unexported
constants.go:71:2: exported const FlagEphemeral should have comment (or a comment on this block) or be unexported
constants.go:89:6: exported type State should have comment or be unexported
constants.go:98:6: exported type ErrCode should have comment or be unexported
constants.go:101:2: exported var ErrConnectionClosed should have comment or be unexported
constants.go:208:6: exported type EventType should have comment or be unexported
constants.go:228:2: exported const ModeUnknown should have comment (or a comment on this block) or be unexported
server_help.go:18:6: exported type TestServer should have comment or be unexported
server_help.go:24:6: exported type TestCluster should have comment or be unexported
server_help.go:29:1: exported function StartTestCluster should have comment or be unexported
server_help.go:102:1: exported method TestCluster.Connect should have comment or be unexported
server_help.go:107:1: exported method TestCluster.ConnectAll should have comment or be unexported
server_help.go:111:1: exported method TestCluster.ConnectAllTimeout should have comment or be unexported
server_help.go:115:1: exported method TestCluster.ConnectWithOptions should have comment or be unexported
server_help.go:124:1: exported method TestCluster.Stop should have comment or be unexported
server_help.go:176:1: exported method TestCluster.StartServer should have comment or be unexported
server_help.go:186:1: exported method TestCluster.StopServer should have comment or be unexported
server_help.go:196:1: exported method TestCluster.StartAllServers should have comment or be unexported
server_help.go:207:1: exported method TestCluster.StopAllServers should have comment or be unexported
server_java.go:11:6: exported type ErrMissingServerConfigField should have comment or be unexported
server_java.go:18:2: exported const DefaultServerTickTime should have comment (or a comment on this block) or be unexported
server_java.go:26:6: exported type ServerConfigServer should have comment or be unexported
server_java.go:33:6: exported type ServerConfig should have comment or be unexported
server_java.go:44:1: exported method ServerConfig.Marshall should have comment or be unexported
server_java.go:113:6: exported type Server should have comment or be unexported
server_java.go:121:1: exported method Server.Start should have comment or be unexported
server_java.go:134:1: exported method Server.Stop should have comment or be unexported
structs.go:13:2: exported var ErrUnhandledFieldType should have comment or be unexported
structs.go:24:6: exported type ACL should have comment or be unexported
structs.go:30:6: exported type Stat should have comment or be unexported
structs.go:121:6: exported type PathVersionRequest should have comment or be unexported
structs.go:141:6: exported type CheckVersionRequest should have comment or be unexported
structs.go:160:6: exported type CreateRequest should have comment or be unexported
structs.go:168:6: exported type DeleteRequest should have comment or be unexported
structs.go:225:6: exported type SetDataRequest should have comment or be unexported
util.go:27:1: exported function DigestACL should have comment or be unexported
Encountered an issue: when a TCP connection disruption leads to an EOF error within the io.ReadFull()
function in authenticate()
, it results in the loop()
function indefinitely failing to establish a functional connection.
The cause lies in the cycle where, after encountering EOF, the connect()
within the loop attempts to rebuild the TCP connection. Upon successful reconnection, the authenticate()
method uses the retained lastZxid
to send a request, prompting the ZooKeeper service to actively close this TCP connection, perpetuating the EOF loop.
I suspect the reason behind the ZooKeeper service actively closing the TCP connection is due to internal Zxid
loss or expiration within the service. This issue commonly arises during scenarios such as redeployment of the ZooKeeper service container (first docker-compose down
, followed by docker-compose up -d
) or when the service faces substantial pressure.
It's worth noting that if the client and server operate within the same operating system and the system supports and enables the TCP Keep-Alive mechanism, it might help circumvent the endless EOF loop scenario. However, if TCP Keep-Alive also fails, the aforementioned issue will persist.
"zk: node does not exist" lacks context. At least node name should be present in this error and corresponding message.
We've been encountering sporadic deadlocks in a heavily concurrent service of ours which uses this library. After a rigorous investigation it narrowed down to a race in Conn.Close()
causes a deadlock when certain conditions are met. Bear with me.
The following block is simple at glance:
select {
case <-c.queueRequest(opClose, &closeRequest{}, &closeResponse{}, nil):
case <-time.After(time.Second):
}
The timeout should recover from any deadlock here, right? Not really. Although very non-apparent, if the call to c.queueRequest
blocks, it never makes it to the actual select, and this is exactly what happens. Let's look at this block in Conn.queueRequest
:
switch opcode {
case opClose:
// always attempt to send close ops.
c.sendChan <- rq
default:
…
}
return rq.recvChan
So there is clearly unguarded write to c.sendChan
. The channel itself is buffered and almost always being consumed in sendLoop
, so how it comes it may ever block here? Well, sendLoop is not always there and takes a break sometimes – say, when the connection hangs up. It is supposed to get re-started after re-connection sequence completes, but if that never happens, neither does the sendLoop
. In its absence, and in presence of pending unsent messages it becomes a blocking write. The playground snippet demonstrates this. So to hit the deadlock, the Close()
call should happen while the client is in re-connection loop while there is a number of unsent requests pending. We do signal our other go-routines prior to calling c.Close
but that has no effect on those which are already queued in send/receive logic.
The simplest of the fixes would be to add timeout around the write to sendChan
in queueRequest
(a PR is coming). The strategic one would be a revision of the shutdown sequence and upgrading to context sooner than later. That, however, is likely to require changes to API and therefore to be scheduled for the next major version.
Hello, I need removewatches
command, it would be nice if it was added to the library.
https://zookeeper.apache.org/doc/current/zookeeperProgrammers.html#sc_WatchRemoval
Are there any plans to add this command? If not, can I implement it and create a pull request?
When ChildrenW
and GetW
try to listen an unexisting znode, although the znode will be create later, the create event and all modification of the new znode cannot trigger either ChildrenW or GetW.
It seems that only ExistsW can do this?
Hi,
we'd like to add exponential backoff if zookeeper is not reachable. Currently it's not possible to implement as that would require keeping some state inside dialer, since dialer is currently function it's not viable solution therefore so it would be great to make dialer an interface instead.
Here #95 you can find small PR which makes dialer an interface
I pull down the code for the master branch ,and run the zk_test test file,The function method is TestIntegration_Create ,Then an error was reported,Debugging indicates that an error occurs in the StartTestCluster method . error message is received unexpected error: zk: could not find testing zookeeper bin path at "zookeeper/bin": CreateFile zookeeper/bin: The system cannot find the path specified.
any plan support for gssapi sasl auth ? when zk with kerberos security enabled
The python-client kazoo supports connection to a readonly zookeeper, is there any plan to implement it in go-zookeeper?
I spent some time writing https://github.com/go-zookeeper/jute which is nearly complete which can auto generate structures and how to serialize/deserialize them to the wire similar to Java which will illuminate the reflection and bring a bit more protection as well.
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x8dc9cc]
goroutine 16 [running]:
github.com/go-zookeeper/zk.(*Conn).connect(0xc000506180)
/root/go/pkg/mod/github.com/go-zookeeper/[email protected]/conn.go:392 +0x30c
github.com/go-zookeeper/zk.(*Conn).loop(0xc000506180, {0xe2a5a8?, 0x13ceea0})
/root/go/pkg/mod/github.com/go-zookeeper/[email protected]/conn.go:428 +0x3c
github.com/go-zookeeper/zk.Connect.func1()
/root/go/pkg/mod/github.com/go-zookeeper/[email protected]/conn.go:224 +0x25
created by github.com/go-zookeeper/zk.Connect in goroutine 1
/root/go/pkg/mod/github.com/go-zookeeper/[email protected]/conn.go:223 +0x3d7
FAIL command-line-arguments 0.039s
FAIL
Hello,
As the Apache ZooKeeper support ssl connection, I want to know whether this go client support ssl connectiong.
It would be nice to support Context
on Conn
methods like Get
, and should be fairly easy to implement in the request
method. I'm not sure if context cancellations would need special attention for write operations, but at least read operations would be simple.
I am facing an issue while attempting to create a TTL node. During the execution of the corresponding function, I consistently encounter an "invalid arguments" error. My parameters align with the function definition, yet the error persists. Could it be that I am using the function incorrectly?
My code like:
res, err := conn.CreateTTL("/testpath", []byte("ttl node"), zk.FlagTTL, zk.WorldACL(zk.PermAll), time.Second)
Java provides a JKUtil class with a bunch of static methods for doing common operations with zookeeper. I think we should follow this and keep the client expectations that one call to a method generates a message to the server and not try to overload this.
We can add things like #52
Whether to add a default value to the log, make a few more judgments
Hello, I use rmr to delete a node using zkCli. What would be the equivalent Method for this?
I tried using conn.Delete
but I'm not sure what is the version to pass.
conn.Delete(path, version)
The transaction class in the Java client library is just a way of storing operations and then issuing a conn.Multi against it. It would be trivial and beneficial to having something equivalent in Go.
Hi, I noticed an issue where the go-zookeeper client will not reset the session if the Zookeeper cluster loses it's session state (instance reset, redeploy, etc) and the server responds with an EOF. When this happens the client enters an endless loop trying to authenticate with the previous Session ID. It looks like the loop happens here:
Line 435 in 50daf81
I haven't had a chance to look through the code yet for this but I'm wondering if anyone knows what process needs to happen to reset the session in the client?
Hello, there was an issue in the previous repo: samuel#156 and it still unanswered.
AFAIK, there is no SASL auth in this library, is it planned?
There are additional error codes added to the server that we don't always handle ok. For example -6
is missing for UnimplementedException
. See #49.
We just need to go through https://github.com/apache/zookeeper/blob/190a227aa9d4655ebfe6ba9f5c2da426da8c5d98/zookeeper-server/src/main/java/org/apache/zookeeper/KeeperException.java#L229 and make sure we have coverage. For v1 these need to be just regular errors but in v2 we should look at doing a ZKPathError or something like that that could add contextual information to the error (#32)
Hello,
Thanks for this open source library. I was looking into Lock() and realised that it blocks until a lock is acquired.
We have a use case where we have multiple instances trying to acquire the lock and we want to return an error immediately for instances that did not acquire the lock instead of waiting to acquire the lock. Is this achievable?
Calling Exists() before Lock() is not an ideal solution because reads are not linearizable in ZooKeeper.
Thanks!
The CI pipelines fail on "upload code coverage" step with a message "Please provide an upload token from codecov.io with valid arguments", e.g. https://github.com/go-zookeeper/zk/pull/19/checks?check_run_id=573436487
when I test the CreateTTL
method,I got an error: unknown error: -6
in my cluster environment.
This is my code:
func main() {
c, _, err := zk.Connect([]string{"192.168.29.81:2181", "192.168.29.82:2181", "192.168.29.83:2181"})
if err!= nil {
panic(err)
}
if _, err:= c.CreateTTL("/test", []byte{}, zk.FlagTTL|zk.FlagEphemeral, zk.WorldACL(zk.PermAll), 20*time.Second); err != nil {
panic(err)
}
}
The zookeeper conf like this:
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/opt/data/zookeeper
clientPort=2181
forceSync=no
autopurge.snapRetainCount=5
admin.serverPort=8999
server.81=192.168.29.81:2888:3888
server.82=192.168.29.82:2888:3888
server.83=192.168.29.83:2888:3888
Save this child, please.QAQ
When we listen to node using childrenW
method
We have two cases:
node Parent
with children's nodes [child1, child2, child3]
. if I try to delete the node Parent
, I get event.Type = EventNodeChildrenChanged
.
node Parent
without children's nodes. if I try to delete the node Parent
, I get event.Type = EventNodeDeleted
.
It looks like a bug. In the first case I expected to get event.Type = EventNodeDeleted
Example of getting events:
go func() {
for {
select {
case <-ctx.Done():
return
case ev := <-events:
fmt.Println(ev.Type)
_, _, events, err = client.conn.ChildrenW(path)
}
}
}()
Not working with ZK version 3.4.9.
Hi Team,
We have a Zookeeper production cluster with 6 nodes(1 leader,2 followers, 3 observers). We are seeing a drop in watchers if the leader restarts for some reason without drop-in connections. Please see the sequence of actions listed below.
We would like to understand, is this the the bug, or when ever connection drop happens we need to recreate the watcher?
The core library should have a guide for contributions
I found this line in conn.go
https://github.com/samuel/go-zookeeper/blob/master/zk/conn.go#L717
or
https://github.com/go-zookeeper/zk/blob/master/conn.go#L654
c.conn.SetReadDeadline(time.Time{})
In my view, if there occurs a problem in the network or server, clear the timeout deadline may cause this goroutine stuck for a long time even forever.
I wonder why it was written like this, was there some reason I've missed it?
If necessary, I'd like to post an mr.
The ChildrenW
method can only watch the changes of the children, but not the specific changes that occur in the children, such as delete
, create
, or set
?
Or is there something wrong with my code?
rootPath := "/root"
for {
_, _, eventChannel, err := conn.ChildrenW(rootPath)
if err != nil {
panic(err)
}
e := <-eventChannel
if e.Err != nil {
panic(e.Err)
}
if e.Type == zk.EventNodeCreated {
fmt.Printf("child node[%s] has beed deteted\n", e.Path)
} else if e.Type == zk.EventNodeDeleted {
fmt.Printf("a new child node[%s] has beed created\n", e.Path)
} else if e.Type == zk.EventNodeDataChanged {
fmt.Printf("child node[%s] has beed changed\n", e.Path)
} else if e.Type == zk.EventNodeChildrenChanged {
fmt.Println("there is a child node has beed changed")
}
}
}
Our environment places stunnel in front of Zookeeper (this was required for transport encryption in older versions of ZK). This means that a client will always be able to establish a connection, which may then later fail (stunnel is up, but ZK is down). Similar behaviour could be encountered with other proxy configurations (e.g haproxy etc).
In this configuration, c.connect()
will never return an error, instead the call the authenticate
fails, and the client immediately enters StateDisconnected
. This means the send and recv loops are never started against the client.
Due to a misuse of the library, we were also closing the connection whenever it entered StateDisconnected
(this was to support dynamic reconfiguration of the ZK ensemble, we've switched to a custom HostProvider to better support this now).
At this point, the client is closed, with no send and recv loops executing. Any unsent requests made against the client will never receive a response, blocking any application threads accessing ZK.
To repro:
nc -l 12345
conn.AddAuth("digest", []byte("test:wIN95csScHWkVWqPPYZk6vsE4Fs="))
Expected:
The ZK access in 3 returns when the connection is closed.
Actual:
The ZK access in 3 never returns.
Right now the packet types are encoded and decoded to the wire using reflection. Since we know the types we can code the message encoding/decoding directly using the bytes instead and should get a nice performance boost.
Need to benchmark this to see the improvements.
--- FAIL: TestSlowServer (6.05s)
cluster_test.go:15: [ZKERR] ZooKeeper JMX enabled by default
cluster_test.go:15: [ZKERR] Using config: /tmp/gozk591418556/srv1/zoo.cfg
zk_test.go:908: zk: could not connect to a server
switch res.Type {
case EventNodeCreated:
wTypes = append(wTypes, watchTypeExist)
case EventNodeDeleted, EventNodeDataChanged:
wTypes = append(wTypes, watchTypeExist, watchTypeData, watchTypeChild)
case EventNodeChildrenChanged:
wTypes = append(wTypes, watchTypeChild)
}
switch res.Type {
case EventNodeCreated:
wTypes = append(wTypes, watchTypeExist)
case EventNodeDeleted:
wTypes = append(wTypes, watchTypeExist, watchTypeData, watchTypeChild)
case EventNodeDataChanged:
wTypes = append(wTypes, watchTypeExist, watchTypeData)
case EventNodeChildrenChanged:
wTypes = append(wTypes, watchTypeChild)
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.