kentik / patricia Goto Github PK
View Code? Open in Web Editor NEWGarbage collector-sensitive patricia tree for IP/CIDR tagging
License: BSD 2-Clause "Simplified" License
Garbage collector-sensitive patricia tree for IP/CIDR tagging
License: BSD 2-Clause "Simplified" License
We're currently seeing occasional errors trying to delete values at addresses that are known to be in the tree. There's a bug either on insert or delete. This is a high priority, and will be tracked down as soon as possible.
Add a method like FindDeepestTag
, but that returns everything found at that node.
Would be cool to add some performance tests to test suite so we could automatically see if PRs dont make things slower.
Hi - I ran into this error trying to use patricia in a project with dependencies handled by vgo:
vgo: import "github.com/gcla/wiresdeep/godeep" ->
import "github.com/Kentik/patricia/uint32_tree" ->
import "github.com/kentik/patricia": module path of repo is github.com/Kentik/patricia, not github.com/kentik/patricia (wrong case)
I think that if you run sed and replace kentik/patricia with Kentik/patricia, the problem will be resolved - though I'm not sure how this affects current users. I noticed that sirupsen/logrus had a similar issue a while back. I worked around the problem by forking on github to my own account and replacing the author token :-( But I'd much rather the references pointed to your project so you get the cosmic credit!
Graham
Hello,
I wrote a benchmark to see how your wonderful package would perform and it looks like it is too memory intensive: ~400 bytes in average are allocated during the lookup (so added to the size of the tree), and ~200B in average to insert a value. For a uint8 tree, it sounds like a lot and unexpected to me. And for a 1,000,000-node tee, it therefore needs ~400MB for the tree and ~200MB for the lookups.
I was expecting only a few bytes would be added for new entries sharing part of their prefix with existing ones.
My results:
goarch: amd64
pkg: github.com/sqreen/go-agent/agent/internal/actor
BenchmarkTree/IPv4/Lookup/Random_addresses/1-8 200000 5630 ns/op 408 B/op 7 allocs/op
BenchmarkTree/IPv4/Lookup/Random_addresses/10-8 500000 4715 ns/op 408 B/op 7 allocs/op
BenchmarkTree/IPv4/Lookup/Random_addresses/100-8 500000 5361 ns/op 408 B/op 7 allocs/op
BenchmarkTree/IPv4/Lookup/Random_addresses/1000-8 300000 5631 ns/op 408 B/op 7 allocs/op
BenchmarkTree/IPv4/Lookup/Random_addresses/10000-8 500000 4433 ns/op 408 B/op 7 allocs/op
BenchmarkTree/IPv4/Lookup/Random_addresses/100000-8 300000 4095 ns/op 408 B/op 7 allocs/op
BenchmarkTree/IPv4/Lookup/Random_addresses/1000000-8 300000 4654 ns/op 408 B/op 7 allocs/op
BenchmarkTree/IPv4/Lookup/Realisitic_Random_Networks/1-8 500000 4986 ns/op 408 B/op 7 allocs/op
BenchmarkTree/IPv4/Lookup/Realisitic_Random_Networks/10-8 300000 4978 ns/op 408 B/op 7 allocs/op
BenchmarkTree/IPv4/Lookup/Realisitic_Random_Networks/100-8 200000 5711 ns/op 408 B/op 7 allocs/op
BenchmarkTree/IPv4/Lookup/Realisitic_Random_Networks/1000-8 200000 5399 ns/op 408 B/op 7 allocs/op
BenchmarkTree/IPv4/Lookup/Realisitic_Random_Networks/10000-8 200000 5719 ns/op 412 B/op 7 allocs/op
BenchmarkTree/IPv4/Lookup/Realisitic_Random_Networks/100000-8 300000 4738 ns/op 425 B/op 10 allocs/op
BenchmarkTree/IPv4/Lookup/Realisitic_Random_Networks/1000000-8 100000 13806 ns/op 654 B/op 13 allocs/op
BenchmarkTree/IPv4/Lookup/Random_Networks/1-8 500000 5120 ns/op 408 B/op 7 allocs/op
BenchmarkTree/IPv4/Lookup/Random_Networks/10-8 300000 5549 ns/op 408 B/op 7 allocs/op
BenchmarkTree/IPv4/Lookup/Random_Networks/100-8 300000 5479 ns/op 415 B/op 8 allocs/op
BenchmarkTree/IPv4/Lookup/Random_Networks/1000-8 200000 7534 ns/op 501 B/op 12 allocs/op
BenchmarkTree/IPv4/Lookup/Random_Networks/10000-8 100000 18245 ns/op 1268 B/op 15 allocs/op
BenchmarkTree/IPv4/Lookup/Random_Networks/100000-8 10000 159465 ns/op 11914 B/op 18 allocs/op
BenchmarkTree/IPv4/Lookup/Random_Networks/1000000-8 500 2589345 ns/op 111124 B/op 19 allocs/op
BenchmarkTree/IPv4/Insertion/Consequitive_addresses/1000000-8 3000000 458 ns/op 262 B/op 0 allocs/op
BenchmarkTree/IPv4/Insertion/Random_addresses/1000000-8 1000000 1287 ns/op 222 B/op 0 allocs/op
BenchmarkTree/IPv4/Insertion/Random_networks/1000000-8 2000000 827 ns/op 111 B/op 0 allocs/op
And here is the benchmark:
func BenchmarkTree(b *testing.B) {
b.Run("IPv4", func(b *testing.B) {
b.Run("Lookup", func(b *testing.B) {
b.Run("Random addresses", func(b *testing.B) {
for n := 1; n <= 1000000; n *= 10 {
n := n
tree, cidrs := RandTreeV4(b, n, RandIPv4)
b.Run(fmt.Sprintf("%d", len(cidrs)), func(b *testing.B) {
b.ReportAllocs()
for n := 0; n < b.N; n++ {
// Pick a random CIDR that was inserted
ix := int(rand.Int63n(int64(len(cidrs))))
cidr := cidrs[ix]
tags, err := tree.FindTags(*cidr)
require.NoError(b, err)
require.NotEqual(b, len(tags), 0)
}
})
}
})
b.Run("Realisitic Random Networks", func(b *testing.B) {
for n := 1; n <= 1000000; n *= 10 {
n := n
tree, cidrs := RandTreeV4(b, n, RandRealisticCIDRv4)
b.Run(fmt.Sprintf("%d", len(cidrs)), func(b *testing.B) {
b.ReportAllocs()
for n := 0; n < b.N; n++ {
// Pick a random CIDR that was inserted
ix := int(rand.Int63n(int64(len(cidrs))))
cidr := cidrs[ix]
tags, err := tree.FindTags(*cidr)
require.NoError(b, err)
require.NotEqual(b, len(tags), 0)
}
})
}
})
b.Run("Random Networks", func(b *testing.B) {
for n := 1; n <= 1000000; n *= 10 {
n := n
tree, cidrs := RandTreeV4(b, n, RandCIDRv4)
b.Run(fmt.Sprintf("%d", len(cidrs)), func(b *testing.B) {
b.ReportAllocs()
for n := 0; n < b.N; n++ {
// Pick a random CIDR that was inserted
ix := int(rand.Int63n(int64(len(cidrs))))
cidr := cidrs[ix]
tags, err := tree.FindTags(*cidr)
require.NoError(b, err)
require.NotEqual(b, len(tags), 0)
}
})
}
})
})
b.Run("Insertion", func(b *testing.B) {
cidr, _, err := patricia.ParseIPFromString("1.2.3.4")
require.NoError(b, err)
b.Run("Consequitive addresses/1000000", func(b *testing.B) {
b.ReportAllocs()
tree := uint8_tree.NewTreeV4()
for n := 0; n < b.N; n++ {
cidr.Address += 1
tree.Set(*cidr, 0)
}
})
b.Run("Random addresses/1000000", func(b *testing.B) {
b.ReportAllocs()
tree := uint8_tree.NewTreeV4()
for n := 0; n < b.N; n++ {
cidr.Address = rand.Uint32()
tree.Set(*cidr, 0)
}
})
b.Run("Random networks/1000000", func(b *testing.B) {
b.ReportAllocs()
tree := uint8_tree.NewTreeV4()
for n := 0; n < b.N; n++ {
cidr.Address = rand.Uint32()
cidr.Length = 1 + (uint(rand.Uint32()) % uint(32))
tree.Set(*cidr, 0)
}
})
})
})
}
func RandIPv4() string {
return net.IPv4(byte(rand.Uint32()), byte(rand.Uint32()), byte(rand.Uint32()), byte(rand.Uint32())).String()
}
func RandCIDRv4() string {
return fmt.Sprintf("%s/%d", RandIPv4(), 1+rand.Uint32()%32)
}
func RandRealisticCIDRv4() string {
return fmt.Sprintf("%s/%d", RandIPv4(), 10+(rand.Uint32()%23))
}
What do you think?
I'm new to go and I'm trying to use this library in my code. I've gone through all the test files, open and closed issues but haven't found anything in the way of an example of how to use this.
Is it possible to show me how to;
Thanks!
hello,
I am testing this module for adding enrichments to data before we write the data to elasticsearch and looking for some guidance on proper usage of the library.
Currently I am using a string tree and storing key:value pairs as tags like "dns-hostname:foo.example.com" and "asn:65535", and then parsing these tags out -- my question is, is this the proper way to go about this? or am I missing something obvious? in the case of integers, would it be better to just duplicate the tree where the only tag represents the ASN Number? and then store another tree containing dns names, and yet another containing other metadata? and if I need to have another integer tag, create another tree for that, and so on?
our tree currently is not expected to store millions and millions of entries, but its feasible it could have 1M+ if we decide to store 'global' data in it such as the internet routing table to for example enrich a log entry IP address with the AS Number, and some number of nodes in the tree may have several tags associated representing key:value pairs
Thanks!
Hello.
Just curious, why offer a string_tree when it contains pointers and creates high GC pressure?
Thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.