Giter VIP home page Giter VIP logo

patricia's Issues

Add performance tests to CI

Would be cool to add some performance tests to test suite so we could automatically see if PRs dont make things slower.

vgo error because of mismatched case kentik v Kentik

Hi - I ran into this error trying to use patricia in a project with dependencies handled by vgo:

vgo: import "github.com/gcla/wiresdeep/godeep" ->
        import "github.com/Kentik/patricia/uint32_tree" ->
        import "github.com/kentik/patricia": module path of repo is github.com/Kentik/patricia, not github.com/kentik/patricia (wrong case)

I think that if you run sed and replace kentik/patricia with Kentik/patricia, the problem will be resolved - though I'm not sure how this affects current users. I noticed that sirupsen/logrus had a similar issue a while back. I worked around the problem by forking on github to my own account and replacing the author token :-( But I'd much rather the references pointed to your project so you get the cosmic credit!

Graham

Memory-usage benchmark

Hello,

I wrote a benchmark to see how your wonderful package would perform and it looks like it is too memory intensive: ~400 bytes in average are allocated during the lookup (so added to the size of the tree), and ~200B in average to insert a value. For a uint8 tree, it sounds like a lot and unexpected to me. And for a 1,000,000-node tee, it therefore needs ~400MB for the tree and ~200MB for the lookups.

I was expecting only a few bytes would be added for new entries sharing part of their prefix with existing ones.

My results:

goarch: amd64
pkg: github.com/sqreen/go-agent/agent/internal/actor
BenchmarkTree/IPv4/Lookup/Random_addresses/1-8            200000              5630 ns/op             408 B/op          7 allocs/op
BenchmarkTree/IPv4/Lookup/Random_addresses/10-8           500000              4715 ns/op             408 B/op          7 allocs/op
BenchmarkTree/IPv4/Lookup/Random_addresses/100-8          500000              5361 ns/op             408 B/op          7 allocs/op
BenchmarkTree/IPv4/Lookup/Random_addresses/1000-8         300000              5631 ns/op             408 B/op          7 allocs/op
BenchmarkTree/IPv4/Lookup/Random_addresses/10000-8        500000              4433 ns/op             408 B/op          7 allocs/op
BenchmarkTree/IPv4/Lookup/Random_addresses/100000-8       300000              4095 ns/op             408 B/op          7 allocs/op
BenchmarkTree/IPv4/Lookup/Random_addresses/1000000-8      300000              4654 ns/op             408 B/op          7 allocs/op
BenchmarkTree/IPv4/Lookup/Realisitic_Random_Networks/1-8          500000              4986 ns/op             408 B/op          7 allocs/op
BenchmarkTree/IPv4/Lookup/Realisitic_Random_Networks/10-8         300000              4978 ns/op             408 B/op          7 allocs/op
BenchmarkTree/IPv4/Lookup/Realisitic_Random_Networks/100-8        200000              5711 ns/op             408 B/op          7 allocs/op
BenchmarkTree/IPv4/Lookup/Realisitic_Random_Networks/1000-8       200000              5399 ns/op             408 B/op          7 allocs/op
BenchmarkTree/IPv4/Lookup/Realisitic_Random_Networks/10000-8      200000              5719 ns/op             412 B/op          7 allocs/op
BenchmarkTree/IPv4/Lookup/Realisitic_Random_Networks/100000-8     300000              4738 ns/op             425 B/op         10 allocs/op
BenchmarkTree/IPv4/Lookup/Realisitic_Random_Networks/1000000-8    100000             13806 ns/op             654 B/op         13 allocs/op
BenchmarkTree/IPv4/Lookup/Random_Networks/1-8                     500000              5120 ns/op             408 B/op          7 allocs/op
BenchmarkTree/IPv4/Lookup/Random_Networks/10-8                    300000              5549 ns/op             408 B/op          7 allocs/op
BenchmarkTree/IPv4/Lookup/Random_Networks/100-8                   300000              5479 ns/op             415 B/op          8 allocs/op
BenchmarkTree/IPv4/Lookup/Random_Networks/1000-8                  200000              7534 ns/op             501 B/op         12 allocs/op
BenchmarkTree/IPv4/Lookup/Random_Networks/10000-8                 100000             18245 ns/op            1268 B/op         15 allocs/op
BenchmarkTree/IPv4/Lookup/Random_Networks/100000-8                 10000            159465 ns/op           11914 B/op         18 allocs/op
BenchmarkTree/IPv4/Lookup/Random_Networks/1000000-8                  500           2589345 ns/op          111124 B/op         19 allocs/op
BenchmarkTree/IPv4/Insertion/Consequitive_addresses/1000000-8    3000000               458 ns/op             262 B/op          0 allocs/op
BenchmarkTree/IPv4/Insertion/Random_addresses/1000000-8          1000000              1287 ns/op             222 B/op          0 allocs/op
BenchmarkTree/IPv4/Insertion/Random_networks/1000000-8           2000000               827 ns/op             111 B/op          0 allocs/op

And here is the benchmark:

func BenchmarkTree(b *testing.B) {
	b.Run("IPv4", func(b *testing.B) {
		b.Run("Lookup", func(b *testing.B) {
			b.Run("Random addresses", func(b *testing.B) {
				for n := 1; n <= 1000000; n *= 10 {
					n := n
					tree, cidrs := RandTreeV4(b, n, RandIPv4)
					b.Run(fmt.Sprintf("%d", len(cidrs)), func(b *testing.B) {
						b.ReportAllocs()
						for n := 0; n < b.N; n++ {
							// Pick a random CIDR that was inserted
							ix := int(rand.Int63n(int64(len(cidrs))))
							cidr := cidrs[ix]
							tags, err := tree.FindTags(*cidr)
							require.NoError(b, err)
							require.NotEqual(b, len(tags), 0)
						}
					})
				}
			})

			b.Run("Realisitic Random Networks", func(b *testing.B) {
				for n := 1; n <= 1000000; n *= 10 {
					n := n
					tree, cidrs := RandTreeV4(b, n, RandRealisticCIDRv4)
					b.Run(fmt.Sprintf("%d", len(cidrs)), func(b *testing.B) {
						b.ReportAllocs()
						for n := 0; n < b.N; n++ {
							// Pick a random CIDR that was inserted
							ix := int(rand.Int63n(int64(len(cidrs))))
							cidr := cidrs[ix]
							tags, err := tree.FindTags(*cidr)
							require.NoError(b, err)
							require.NotEqual(b, len(tags), 0)
						}
					})
				}
			})

			b.Run("Random Networks", func(b *testing.B) {
				for n := 1; n <= 1000000; n *= 10 {
					n := n
					tree, cidrs := RandTreeV4(b, n, RandCIDRv4)
					b.Run(fmt.Sprintf("%d", len(cidrs)), func(b *testing.B) {
						b.ReportAllocs()
						for n := 0; n < b.N; n++ {
							// Pick a random CIDR that was inserted
							ix := int(rand.Int63n(int64(len(cidrs))))
							cidr := cidrs[ix]
							tags, err := tree.FindTags(*cidr)
							require.NoError(b, err)
							require.NotEqual(b, len(tags), 0)
						}
					})
				}
			})
		})

		b.Run("Insertion", func(b *testing.B) {
			cidr, _, err := patricia.ParseIPFromString("1.2.3.4")
			require.NoError(b, err)

			b.Run("Consequitive addresses/1000000", func(b *testing.B) {
				b.ReportAllocs()
				tree := uint8_tree.NewTreeV4()
				for n := 0; n < b.N; n++ {
					cidr.Address += 1
					tree.Set(*cidr, 0)
				}
			})

			b.Run("Random addresses/1000000", func(b *testing.B) {
				b.ReportAllocs()
				tree := uint8_tree.NewTreeV4()
				for n := 0; n < b.N; n++ {
					cidr.Address = rand.Uint32()
					tree.Set(*cidr, 0)
				}
			})

			b.Run("Random networks/1000000", func(b *testing.B) {
				b.ReportAllocs()
				tree := uint8_tree.NewTreeV4()
				for n := 0; n < b.N; n++ {
					cidr.Address = rand.Uint32()
					cidr.Length = 1 + (uint(rand.Uint32()) % uint(32))
					tree.Set(*cidr, 0)
				}
			})
		})
	})
}

func RandIPv4() string {
	return net.IPv4(byte(rand.Uint32()), byte(rand.Uint32()), byte(rand.Uint32()), byte(rand.Uint32())).String()
}

func RandCIDRv4() string {
	return fmt.Sprintf("%s/%d", RandIPv4(), 1+rand.Uint32()%32)
}

func RandRealisticCIDRv4() string {
	return fmt.Sprintf("%s/%d", RandIPv4(), 10+(rand.Uint32()%23))
}

What do you think?

Usage Examples

I'm new to go and I'm trying to use this library in my code. I've gone through all the test files, open and closed issues but haven't found anything in the way of an example of how to use this.

Is it possible to show me how to;

  1. add a network to a tree and tag it
  2. check if an ip address is in a tree

Thanks!

guidance on usage?

hello,

I am testing this module for adding enrichments to data before we write the data to elasticsearch and looking for some guidance on proper usage of the library.

Currently I am using a string tree and storing key:value pairs as tags like "dns-hostname:foo.example.com" and "asn:65535", and then parsing these tags out -- my question is, is this the proper way to go about this? or am I missing something obvious? in the case of integers, would it be better to just duplicate the tree where the only tag represents the ASN Number? and then store another tree containing dns names, and yet another containing other metadata? and if I need to have another integer tag, create another tree for that, and so on?

our tree currently is not expected to store millions and millions of entries, but its feasible it could have 1M+ if we decide to store 'global' data in it such as the internet routing table to for example enrich a log entry IP address with the AS Number, and some number of nodes in the tree may have several tags associated representing key:value pairs

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.