Giter VIP home page Giter VIP logo

zk's Introduction

Native Go Zookeeper Client Library

GoDoc unittest Coverage Status

License

3-clause BSD. See LICENSE file.

zk's People

Contributors

alxzh avatar dragonsinth avatar einxie avatar hdonnay avatar horizonzy avatar horkhe avatar jdef avatar jeffbean avatar jhump avatar kaithyrookie avatar linsite avatar mattrobenolt avatar mkaczanowski avatar moredure avatar nemith avatar nomis52 avatar nsd20463 avatar pmazzini avatar sakateka avatar samuel avatar sh-cho avatar spenczar avatar tailhook avatar theatrus avatar theckman avatar tobio avatar vespian avatar yarikk avatar yunxianghuang avatar zellyn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

zk's Issues

conn.Multi should take a specific interface and not `interface{}`

conn.Multi signature is func (c *Conn) Multi(ops ...interface{}) ([]MultiResponse, error) where ops is any type. However it will return an error if it is not one of CreateRequest, SetDataRequest, DeleteRequest, or CheckVersionRequest:

We could implement a interface that all these structs implement to limit to just the operations from the package. This could just return the opCode itself.

type operation interface {
  opCode() int32  // better yet type opCodes as well.
}

clean out the core zk package

This has already pseudo started with the repo https://github.com/go-zookeeper/recipes.

I would suggest we write down a scheme we intend to prescribe so we know where things belong.

github.com/go-zookeeper/recipes/lock
github.com/go-zookeeper/recipes/election

We should move/remove these peripherals to other repositories or even just sub-Go packages to remove the noise of supporting recipes in the core client.

These would be different than let's say github.com/go-zookeeper/zk/zktestutils, or where the FourLetter Helpers, maybe go-zookeeper/zk/flw.

My hunch is that these patterns/recipes would get the most issues or questions/support, so from separating them into another repo seems smart. On the other hand that may lead to a tiny bit more overhead in redirecting or moving issue over to the other repo. On the other hand, we could have all of these be in the same repo, but sub-packages

github.com/go-zookeeper/zk/recipes/lock
github.com/go-zookeeper/zk/recipes/election

or even

github.com/go-zookeeper/zk/lock
github.com/go-zookeeper/zk/election

cc @nemith

No support of TTL on znodes

Zookeeper 3.5 was released in May 2019 with support for TTL nodes. This library has not been updated to support 3.5 features including TTL nodes.

support four letter words ?

doc: https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_4lw
such as

$ echo mntr | nc localhost 2185

zk_version  3.4.0
zk_avg_latency  0
zk_max_latency  0
zk_min_latency  0
zk_packets_received 70
zk_packets_sent 69
zk_outstanding_requests 0
zk_server_state leader
zk_znode_count   4
zk_watch_count  0
zk_ephemerals_count 0
zk_approximate_data_size    27
zk_followers    4                   - only exposed by the Leader
zk_synced_followers 4               - only exposed by the Leader
zk_pending_syncs    0               - only exposed by the Leader
zk_open_file_descriptor_count 23    - only available on Unix platforms
zk_max_file_descriptor_count 1024   - only available on Unix platforms

zk event state not support SyncConnected

zk event state not support SyncConnected, so it use Unknown replaced, but I don't think it's appropriate because the official documentation has clearly defined this state

this is the log printed by zk

{"message":"NodeWatch::watchFlow receive zk events","state":"Unknown","type":"EventNodeChildrenChanged","path":"/colla/cpf/configs/flow"}

this is the watcher response from zkCli

[zk: localhost:2181(CONNECTED) 31] ls -w /colla/cpf/configs/flow
ls -w /colla/cpf/configs/flow
[zk: localhost:2181(CONNECTED) 32] 
WATCHER::

WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/colla/cpf/configs/flow

Many methods/functions are missing go doc information

A lot of the standard methods are missing go doc. We should align the documentation with that found from the Java client lib (https://zookeeper.apache.org/doc/r3.4.14/api/index.html)

conn.go:57:6: exported type Dialer should have comment or be unexported
conn.go:69:6: exported type Conn should have comment or be unexported
conn.go:140:6: exported type Event should have comment or be unexported
conn.go:313:1: exported method Conn.Close should have comment or be unexported
conn.go:987:1: exported method Conn.AddAuth should have comment or be unexported
conn.go:1013:1: exported method Conn.Children should have comment or be unexported
conn.go:1026:1: exported method Conn.ChildrenW should have comment or be unexported
conn.go:1044:1: exported method Conn.Get should have comment or be unexported
conn.go:1076:1: exported method Conn.Set should have comment or be unexported
conn.go:1089:1: exported method Conn.Create should have comment or be unexported
conn.go:1151:1: exported method Conn.Delete should have comment or be unexported
conn.go:1160:1: exported method Conn.Exists should have comment or be unexported
conn.go:1178:1: exported method Conn.ExistsW should have comment or be unexported
conn.go:1203:1: exported method Conn.GetACL should have comment or be unexported
conn.go:1215:1: exported method Conn.SetACL should have comment or be unexported
conn.go:1228:1: exported method Conn.Sync should have comment or be unexported
conn.go:1241:6: exported type MultiResponse should have comment or be unexported
constants.go:10:2: exported const DefaultPort should have comment (or a comment on this block) or be unexported
constants.go:37:2: exported const EventNodeCreated should have comment (or a comment on this block) or be unexported
constants.go:58:2: exported const StateUnknown should have comment (or a comment on this block) or be unexported
constants.go:71:2: exported const FlagEphemeral should have comment (or a comment on this block) or be unexported
constants.go:89:6: exported type State should have comment or be unexported
constants.go:98:6: exported type ErrCode should have comment or be unexported
constants.go:101:2: exported var ErrConnectionClosed should have comment or be unexported
constants.go:208:6: exported type EventType should have comment or be unexported
constants.go:228:2: exported const ModeUnknown should have comment (or a comment on this block) or be unexported
server_help.go:18:6: exported type TestServer should have comment or be unexported
server_help.go:24:6: exported type TestCluster should have comment or be unexported
server_help.go:29:1: exported function StartTestCluster should have comment or be unexported
server_help.go:102:1: exported method TestCluster.Connect should have comment or be unexported
server_help.go:107:1: exported method TestCluster.ConnectAll should have comment or be unexported
server_help.go:111:1: exported method TestCluster.ConnectAllTimeout should have comment or be unexported
server_help.go:115:1: exported method TestCluster.ConnectWithOptions should have comment or be unexported
server_help.go:124:1: exported method TestCluster.Stop should have comment or be unexported
server_help.go:176:1: exported method TestCluster.StartServer should have comment or be unexported
server_help.go:186:1: exported method TestCluster.StopServer should have comment or be unexported
server_help.go:196:1: exported method TestCluster.StartAllServers should have comment or be unexported
server_help.go:207:1: exported method TestCluster.StopAllServers should have comment or be unexported
server_java.go:11:6: exported type ErrMissingServerConfigField should have comment or be unexported
server_java.go:18:2: exported const DefaultServerTickTime should have comment (or a comment on this block) or be unexported
server_java.go:26:6: exported type ServerConfigServer should have comment or be unexported
server_java.go:33:6: exported type ServerConfig should have comment or be unexported
server_java.go:44:1: exported method ServerConfig.Marshall should have comment or be unexported
server_java.go:113:6: exported type Server should have comment or be unexported
server_java.go:121:1: exported method Server.Start should have comment or be unexported
server_java.go:134:1: exported method Server.Stop should have comment or be unexported
structs.go:13:2: exported var ErrUnhandledFieldType should have comment or be unexported
structs.go:24:6: exported type ACL should have comment or be unexported
structs.go:30:6: exported type Stat should have comment or be unexported
structs.go:121:6: exported type PathVersionRequest should have comment or be unexported
structs.go:141:6: exported type CheckVersionRequest should have comment or be unexported
structs.go:160:6: exported type CreateRequest should have comment or be unexported
structs.go:168:6: exported type DeleteRequest should have comment or be unexported
structs.go:225:6: exported type SetDataRequest should have comment or be unexported
util.go:27:1: exported function DigestACL should have comment or be unexported

The continuous logging of "authentication failed: EOF"

Encountered an issue: when a TCP connection disruption leads to an EOF error within the io.ReadFull() function in authenticate(), it results in the loop() function indefinitely failing to establish a functional connection.

The cause lies in the cycle where, after encountering EOF, the connect() within the loop attempts to rebuild the TCP connection. Upon successful reconnection, the authenticate() method uses the retained lastZxid to send a request, prompting the ZooKeeper service to actively close this TCP connection, perpetuating the EOF loop.

I suspect the reason behind the ZooKeeper service actively closing the TCP connection is due to internal Zxid loss or expiration within the service. This issue commonly arises during scenarios such as redeployment of the ZooKeeper service container (first docker-compose down, followed by docker-compose up -d) or when the service faces substantial pressure.

It's worth noting that if the client and server operate within the same operating system and the system supports and enables the TCP Keep-Alive mechanism, it might help circumvent the endless EOF loop scenario. However, if TCP Keep-Alive also fails, the aforementioned issue will persist.

Deadlock in Conn.Close()

We've been encountering sporadic deadlocks in a heavily concurrent service of ours which uses this library. After a rigorous investigation it narrowed down to a race in Conn.Close() causes a deadlock when certain conditions are met. Bear with me.

The following block is simple at glance:

	select {
	case <-c.queueRequest(opClose, &closeRequest{}, &closeResponse{}, nil):
	case <-time.After(time.Second):
	}

The timeout should recover from any deadlock here, right? Not really. Although very non-apparent, if the call to c.queueRequest blocks, it never makes it to the actual select, and this is exactly what happens. Let's look at this block in Conn.queueRequest:

switch opcode {
case opClose:
	// always attempt to send close ops.
	c.sendChan <- rq
default:
	…
}
return rq.recvChan

So there is clearly unguarded write to c.sendChan. The channel itself is buffered and almost always being consumed in sendLoop, so how it comes it may ever block here? Well, sendLoop is not always there and takes a break sometimes – say, when the connection hangs up. It is supposed to get re-started after re-connection sequence completes, but if that never happens, neither does the sendLoop. In its absence, and in presence of pending unsent messages it becomes a blocking write. The playground snippet demonstrates this. So to hit the deadlock, the Close() call should happen while the client is in re-connection loop while there is a number of unsent requests pending. We do signal our other go-routines prior to calling c.Close but that has no effect on those which are already queued in send/receive logic.

The simplest of the fixes would be to add timeout around the write to sendChan in queueRequest (a PR is coming). The strategic one would be a revision of the shutdown sequence and upgrading to context sooner than later. That, however, is likely to require changes to API and therefore to be scheduled for the next major version.

ChildrenW and GetW cannot listen the create event?

When ChildrenW and GetW try to listen an unexisting znode, although the znode will be create later, the create event and all modification of the new znode cannot trigger either ChildrenW or GetW.

It seems that only ExistsW can do this?

Make dialer extensible by making it an interface

Hi,
we'd like to add exponential backoff if zookeeper is not reachable. Currently it's not possible to implement as that would require keeping some state inside dialer, since dialer is currently function it's not viable solution therefore so it would be great to make dialer an interface instead.

Here #95 you can find small PR which makes dialer an interface

received unexpected error: zk: could not find testing zookeeper bin path at "zookeeper/bin": CreateFile zookeeper/bin: The system cannot find the path specified.

I pull down the code for the master branch ,and run the zk_test test file,The function method is TestIntegration_Create ,Then an error was reported,Debugging indicates that an error occurs in the StartTestCluster method . error message is received unexpected error: zk: could not find testing zookeeper bin path at "zookeeper/bin": CreateFile zookeeper/bin: The system cannot find the path specified.

runtime error: invalid memory address or nil pointer dereference

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x8dc9cc]

goroutine 16 [running]:
github.com/go-zookeeper/zk.(*Conn).connect(0xc000506180)
/root/go/pkg/mod/github.com/go-zookeeper/[email protected]/conn.go:392 +0x30c
github.com/go-zookeeper/zk.(*Conn).loop(0xc000506180, {0xe2a5a8?, 0x13ceea0})
/root/go/pkg/mod/github.com/go-zookeeper/[email protected]/conn.go:428 +0x3c
github.com/go-zookeeper/zk.Connect.func1()
/root/go/pkg/mod/github.com/go-zookeeper/[email protected]/conn.go:224 +0x25
created by github.com/go-zookeeper/zk.Connect in goroutine 1
/root/go/pkg/mod/github.com/go-zookeeper/[email protected]/conn.go:223 +0x3d7
FAIL command-line-arguments 0.039s
FAIL

Support for Context on relevant Conn methods

It would be nice to support Context on Conn methods like Get, and should be fairly easy to implement in the request method. I'm not sure if context cancellations would need special attention for write operations, but at least read operations would be simple.

Create TTL Node shows invalid arguments

I am facing an issue while attempting to create a TTL node. During the execution of the corresponding function, I consistently encounter an "invalid arguments" error. My parameters align with the function definition, yet the error persists. Could it be that I am using the function incorrectly?
My code like:
res, err := conn.CreateTTL("/testpath", []byte("ttl node"), zk.FlagTTL, zk.WorldACL(zk.PermAll), time.Second)

Add zkutils package

Java provides a JKUtil class with a bunch of static methods for doing common operations with zookeeper. I think we should follow this and keep the client expectations that one call to a method generates a message to the server and not try to overload this.

We can add things like #52

Question: Equivalent method for rmr

Hello, I use rmr to delete a node using zkCli. What would be the equivalent Method for this?

I tried using conn.Delete but I'm not sure what is the version to pass.

conn.Delete(path, version)

Zookeeper client session is not reset when cluster loses session state (authentication failed: EOF)

Hi, I noticed an issue where the go-zookeeper client will not reset the session if the Zookeeper cluster loses it's session state (instance reset, redeploy, etc) and the server responds with an EOF. When this happens the client enters an endless loop trying to authenticate with the previous Session ID. It looks like the loop happens here:

zk/conn.go

Line 435 in 50daf81

c.logger.Printf("authentication failed: %s", err)

I haven't had a chance to look through the code yet for this but I'm wondering if anyone knows what process needs to happen to reset the session in the client?

SASL authentication

Hello, there was an issue in the previous repo: samuel#156 and it still unanswered.
AFAIK, there is no SASL auth in this library, is it planned?

Add errors for missing error codes

There are additional error codes added to the server that we don't always handle ok. For example -6 is missing for UnimplementedException. See #49.

We just need to go through https://github.com/apache/zookeeper/blob/190a227aa9d4655ebfe6ba9f5c2da426da8c5d98/zookeeper-server/src/main/java/org/apache/zookeeper/KeeperException.java#L229 and make sure we have coverage. For v1 these need to be just regular errors but in v2 we should look at doing a ZKPathError or something like that that could add contextual information to the error (#32)

TryLock API for lock

Hello,

Thanks for this open source library. I was looking into Lock() and realised that it blocks until a lock is acquired.

We have a use case where we have multiple instances trying to acquire the lock and we want to return an error immediately for instances that did not acquire the lock instead of waiting to acquire the lock. Is this achievable?

Calling Exists() before Lock() is not an ideal solution because reads are not linearizable in ZooKeeper.

Thanks!

Help: CreateTTL got `unknown error: -6`

when I test the CreateTTL method,I got an error: unknown error: -6 in my cluster environment.
This is my code:

func main() {
	c, _, err := zk.Connect([]string{"192.168.29.81:2181", "192.168.29.82:2181", "192.168.29.83:2181"})
	if err!= nil {
		panic(err)
	}
	
	if _, err:= c.CreateTTL("/test", []byte{}, zk.FlagTTL|zk.FlagEphemeral, zk.WorldACL(zk.PermAll), 20*time.Second); err != nil {
		panic(err)
	}
}

The zookeeper conf like this:

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/opt/data/zookeeper
clientPort=2181
forceSync=no
autopurge.snapRetainCount=5
admin.serverPort=8999

server.81=192.168.29.81:2888:3888
server.82=192.168.29.82:2888:3888
server.83=192.168.29.83:2888:3888

Save this child, please.QAQ

ChildrenW returns a wrong EventType

When we listen to node using childrenW method
We have two cases:

  1. node Parent with children's nodes [child1, child2, child3]. if I try to delete the node Parent, I get event.Type = EventNodeChildrenChanged.

  2. node Parent without children's nodes. if I try to delete the node Parent, I get event.Type = EventNodeDeleted.

It looks like a bug. In the first case I expected to get event.Type = EventNodeDeleted

Example of getting events:

go func() {
		for {
			select {
			case <-ctx.Done():
				return
			case ev := <-events:
				
				fmt.Println(ev.Type)

				_, _, events, err = client.conn.ChildrenW(path)
				
			}
		}
	}()

Watchers are crashing on the zookeeper leader restart

Hi Team,

We have a Zookeeper production cluster with 6 nodes(1 leader,2 followers, 3 observers). We are seeing a drop in watchers if the leader restarts for some reason without drop-in connections. Please see the sequence of actions listed below.

  1. The leader restarted for some reason.
  2. All the followers restarted because of the leader restart to elect new leader.
  3. We see there is no drop in the number of connections. Connections shifted from one ZK VM to another ZK VM.
  4. But we are seeing a drop in the number of watchers created. Each connection that we create is associated with one to twenty watchers depending on the use case.

We would like to understand, is this the the bug, or when ever connection drop happens we need to recreate the watcher?

Add contribution guide.

The core library should have a guide for contributions

  • all changes come with unit/integration tests.
  • no external libraries (keep it as it is with no dependencies.)
  • at least one peer review
  • maybe some code coverage checks
  • Go style guide (lightly extend the standard library guide)

Why clear timeout when read a response from server?

I found this line in conn.go

https://github.com/samuel/go-zookeeper/blob/master/zk/conn.go#L717
or
https://github.com/go-zookeeper/zk/blob/master/conn.go#L654
c.conn.SetReadDeadline(time.Time{})

In my view, if there occurs a problem in the network or server, clear the timeout deadline may cause this goroutine stuck for a long time even forever.

I wonder why it was written like this, was there some reason I've missed it?
If necessary, I'd like to post an mr.

Method `ChildrenW` only returns EventNodeChildrenChanged?

The ChildrenW method can only watch the changes of the children, but not the specific changes that occur in the children, such as delete, create, or set?
Or is there something wrong with my code?

rootPath := "/root"

for {
		_, _, eventChannel, err := conn.ChildrenW(rootPath)
		if err != nil {
			panic(err)
		}

		e := <-eventChannel
		if e.Err != nil {
			panic(e.Err)
		}

		if e.Type == zk.EventNodeCreated {
			fmt.Printf("child node[%s] has beed deteted\n", e.Path)
		} else if e.Type == zk.EventNodeDeleted {
			fmt.Printf("a new child node[%s] has beed created\n", e.Path)
		} else if e.Type == zk.EventNodeDataChanged {
			fmt.Printf("child node[%s] has beed changed\n", e.Path)
		} else if e.Type == zk.EventNodeChildrenChanged {
			fmt.Println("there is a child node has beed changed")
		}
	}
}

Connection deadlock when connecting to unresponsive ZK ensemble.

Our environment places stunnel in front of Zookeeper (this was required for transport encryption in older versions of ZK). This means that a client will always be able to establish a connection, which may then later fail (stunnel is up, but ZK is down). Similar behaviour could be encountered with other proxy configurations (e.g haproxy etc).

In this configuration, c.connect() will never return an error, instead the call the authenticate fails, and the client immediately enters StateDisconnected. This means the send and recv loops are never started against the client.

Due to a misuse of the library, we were also closing the connection whenever it entered StateDisconnected (this was to support dynamic reconfiguration of the ZK ensemble, we've switched to a custom HostProvider to better support this now).

At this point, the client is closed, with no send and recv loops executing. Any unsent requests made against the client will never receive a response, blocking any application threads accessing ZK.

To repro:

  1. Start netcat listening on a port nc -l 12345
  2. Connect to the netcat port using the libary
  3. In one goroutine, attempt to access zk e.g conn.AddAuth("digest", []byte("test:wIN95csScHWkVWqPPYZk6vsE4Fs="))
  4. In another goroutine, close the connection after some delay

Expected:

The ZK access in 3 returns when the connection is closed.

Actual:

The ZK access in 3 never returns.

Dont use reflection for encoding/decoding

Right now the packet types are encoded and decoded to the wire using reflection. Since we know the types we can code the message encoding/decoding directly using the bytes instead and should get a nice performance boost.

Need to benchmark this to see the improvements.

Flaky test: TestSlowServer

--- FAIL: TestSlowServer (6.05s)
    cluster_test.go:15: [ZKERR] ZooKeeper JMX enabled by default
    cluster_test.go:15: [ZKERR] Using config: /tmp/gozk591418556/srv1/zoo.cfg
    zk_test.go:908: zk: could not connect to a server

getw and ChildrenW same path at the same time

switch res.Type {
case EventNodeCreated:
wTypes = append(wTypes, watchTypeExist)
case EventNodeDeleted, EventNodeDataChanged:
wTypes = append(wTypes, watchTypeExist, watchTypeData, watchTypeChild)
case EventNodeChildrenChanged:
wTypes = append(wTypes, watchTypeChild)
}


switch res.Type {
case EventNodeCreated:
wTypes = append(wTypes, watchTypeExist)
case EventNodeDeleted:
wTypes = append(wTypes, watchTypeExist, watchTypeData, watchTypeChild)
case EventNodeDataChanged:
wTypes = append(wTypes, watchTypeExist, watchTypeData)
case EventNodeChildrenChanged:
wTypes = append(wTypes, watchTypeChild)
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.