Comments (4)
P50 is 0.5ms, P90 is 1.2ms, P95 is 2ms, P99 is 4ms
I think these numbers are pretty good. Olric is implement in Go.
If we create a worker pool of goroutines with the same client, with each worker having its own DMap with the same name. Then use the worker pool to process incoming requests rather than using a single DMap, will it help improve performance?
I have never tried such a thing before but increasing amount of context switches may reduce the performance at some point.
Won't owner always win in this case, or do I miss sth? It would be great if we can have a option to read from owner/primary only.
It depends what happened in the cluster before you run the request. If you are adding and removing the nodes frequently, you may encounter such anomalies. Only the owner node has read-write right on the partitions but it apply the LWW policy to return the most up to date result.
Olric is an AP store, that means Olric always chooses availability over consistency.
IIUC, the difference is that when owner is down, LWW can still read from backup, while with this new option, we need to wait for new node coming up to propagate the data from backup, during which the data is unavailable?
When a node goes down, a new partition owner will be assigned immediately and start processing the incoming requests. There is no active anti-entropy mechanism in Olric. It only tries to read keys from members(using the routing table, based on a consistent hash algorithm) and apply read-repair if it's enabled.
Also the calls to dm.lookupOnOwners and dm.lookupOnReplicas in getOnCluster are sequential. Can it be parallelized to speed up Get?
It may improve performance in some cases but I think the increasing amount of parallel network calls may decrease the overall performance for some workloads. This should be carefully designed and tested.
Besides, condition dm.s.config.ReadQuorum >= config.MinimumReplicaCount is always true, so dm.lookupOnReplicas will always be called.
Yeah, this enables the LWW implicitly.
from olric.
Hey @zhp007,
The setup, configuration, and results seem normal to me.
P99 set is 4.5ms, P99 get is 6ms. It is much higher than we expected.
What are the results for other percentiles? The Go runtime causes latency fluctuations. It might be a good idea to play with the GC parameters.
Will create more than one EmbeddedClient and/or DMap in each server help?
The EmbeddecClient/DMap implementation is thread-safe. With the same client instance, you can create any number of goroutines. Two CPUs should be good enough to get a rough idea of the performance characteristics.
Any other config settings or tunings we need to care about?
Currently, there are no other configuration options to improve performance. Still, you have two replicas, and Olric is trying to fetch all accessible values from the cluster before returning the result to the client. This is called the Last Write Wins(LWW) policy. It compares the timestamps in the returned DMap entries and the most up-to-date result wins. It's not possible to turn off this behavior explicitly. We can quickly implement it, but disabling LWW decreases the consistency.
See this:
Line 281 in 81e1254
ReadQuorum
is one by default. We can add a boolean flag to turn off fetching the values from replicas but this will hurt the consistency.
The other thing to know is that if you request a key/value pair that does not belong to the node, the node finds the partition owner, fetches the pair from the owner, and returns it to the client. So, there is no redirection message in the protocol. It works as a reverse proxy.
See this:
Line 317 in 81e1254
from olric.
@buraksezer Thanks for quick reply!
What are the results for other percentiles?
Set P50 is 0.8ms, P90 is 1.7ms, P95 is 2.3ms
Get P50 is 1.5ms, P90 is 2.4ms, P95 is 3.1ms
We also tried ReplicaCount=1, there is no change on Set, while Get has better performance:
P50 is 0.5ms, P90 is 1.2ms, P95 is 2ms, P99 is 4ms
With the same client instance, you can create any number of goroutines.
If we create a worker pool of goroutines with the same client, with each worker having its own DMap with the same name. Then use the worker pool to process incoming requests rather than using a single DMap, will it help improve performance?
Olric is trying to fetch all accessible values from the cluster before returning the result to the client. This is called the Last Write Wins(LWW) policy. It compares the timestamps in the returned DMap entries and the most up-to-date result wins.
Won't owner always win in this case, or do I miss sth? It would be great if we can have a option to read from owner/primary only.
IIUC, the difference is that when owner is down, LWW can still read from backup, while with this new option, we need to wait for new node coming up to propagate the data from backup, during which the data is unavailable?
Also the calls to dm.lookupOnOwners
and dm.lookupOnReplicas
in getOnCluster
are sequential. Can it be parallelized to speed up Get?
Besides, condition dm.s.config.ReadQuorum >= config.MinimumReplicaCount
is always true, so dm.lookupOnReplicas
will always be called.
The other thing to know is that if you request a key/value pair that does not belong to the node, the node finds the partition owner, fetches the pair from the owner, and returns it to the client.
Yes, understand that one additional hopping and data transfer can increase overhead/latency.
from olric.
Related Issues (20)
- panic on args HOT 1
- Panic on start when BindAddr cannot be resolved
- Data race when getting stats in embedded mode
- remove dtopic listener didn't work well in cluster mode HOT 1
- DNS based service discovery HOT 8
- Fast restart will cause data lost HOT 1
- Flaky integration test
- Using olric from PHP programs HOT 3
- storage overhead per object HOT 3
- Allow developer to pass his/her logging framework of choice HOT 9
- Use of DM Lock with Embedded Member HOT 2
- Failed to fetch the latest version of the routing table
- Can't reliably get PubSub - "no available client found"
- Data race in cluster setup using Olric v0.5.5 HOT 1
- Entire cluster fail to serve requests when one pod fails to restart with DMap creation operation timeout HOT 5
- Potential lock contention: Coordinator is not able to publish routing table to new node which causes it fails to be bootstrapped HOT 2
- Olric Clustered over different Kubernetes Clusters HOT 2
- panic: DMapPipeline.execOnPartition
- Olric possible duplication of keys HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from olric.