Comments (16)
I had let a 100k prefix one run as well, here's that result:
NumberEntered: 0
NumberRejected: -100000
NumberAccepted: 100000
NumberMerged: 91192
NumberReturned: 8808
TimeToProcess: 869.71707892418
from aggregator.
We had used a previous implementation to aggregate v6 blocks in another project which I had replaced the aggregator there with this one since it wasn't quite fully aggregating in all instances. That reference is here: https://github.com/6connect/irrpt/blob/dacc17f6b538ed69c6c486f787034c5aa31cbbd3/inc/aggregate.inc
from aggregator.
I've pulled down the latest file and tested, here are the new numbers:
NumberEntered: 1000
TimeToProcess: 0.38982105255127
NumberEntered: 2000
TimeToProcess: 1.193638086319
NumberEntered: 5000
TimeToProcess: 4.0424168109894
NumberEntered: 10000
TimeToProcess: 13.068011045456
NumberEntered: 25000
TimeToProcess: 62.265861034393
NumberEntered: 50000
TimeToProcess: 207.09190392494
NumberEntered: 100000
TimeToProcess: 829.24696993828
So interesting, the more routes you provide the faster it goes, marginally :>
from aggregator.
hey, hope all is well with you!
Wondering if you've had any more time to take a look to see if there's anything that could be done to improve the performance, or if the other aggregation file I mentioned assisted in any ideas for your version.
from aggregator.
Here are some performance numbers using an increment of routes:
NumberEntered: 0
NumberRejected: -1000
NumberAccepted: 1000
NumberMerged: 910
NumberReturned: 90
TimeToProcess: 0.37775897979736
NumberEntered: 0
NumberRejected: -2000
NumberAccepted: 2000
NumberMerged: 1792
NumberReturned: 208
TimeToProcess: 1.204185962677
NumberEntered: 0
NumberRejected: -5000
NumberAccepted: 5000
NumberMerged: 4792
NumberReturned: 208
TimeToProcess: 4.1573090553284
NumberEntered: 0
NumberRejected: -10000
NumberAccepted: 10000
NumberMerged: 9792
NumberReturned: 208
TimeToProcess: 13.36133813858
NumberEntered: 0
NumberRejected: -25000
NumberAccepted: 25000
NumberMerged: 24791
NumberReturned: 209
TimeToProcess: 63.091789007187
NumberEntered: 0
NumberRejected: -50000
NumberAccepted: 50000
NumberMerged: 47414
NumberReturned: 2586
TimeToProcess: 209.73743796349
from aggregator.
Cheers.
Hi there,
I'm wondering if there's any known issue with performance when trying to feed in 100k prefixes? When I enable some extra output on the classes/functions I find that it seems to drag along at stripInvalidRangesAndSubs($this->Output), specifically the $Line area I think.
Any idea?
Yeah.. the kinds of benchmarking you're seeing are pretty much the same at my end too, so I would suggest that it's an accurate reflection of the average performance a user could expect from the Aggregator at this time.
Still works much better and much faster than the very few equivalent alternatives I've seen floating around the internet IMO, but definitely still isn't ideal, so I'd be quite keen to see some refactoring happen at some point, to try to speed it up a bit, improve performance, etc.
from aggregator.
Worth noting too, that the decline in performance as the workload increases seems to be more exponential, rather than linear (meaning, a doubling of workload is more likely to quadruple the decline or beyond, rather than just doubling the decline). Conversely, the performance seems to be really, really good when the workload is reasonably small (per my own experiences). But yeah; there's some room for improvement, I think.
from aggregator.
A public benchmark I did about a month ago (but, the code hasn't changed much since then, so it should still be a reasonably accurate reflection of the current performance expectations), aggregating all the default CIDRAM signature files together (generated using the callable hooks to print time/date to CLI at various points during aggregation):
IP Aggregator
===
Parse 1 ... 100.00% (Thu, 12 Dec 2019 15:21:26 +0800) <RAM: 5.29 MB>
Parse 2 ... 100.00% (Thu, 12 Dec 2019 15:26:25 +0800) <RAM: 5.26 MB>
Parse 3 ... 100.00% (Thu, 12 Dec 2019 15:26:29 +0800) <RAM: 5.72 MB>
Parse 4 ... 100.00% (Thu, 12 Dec 2019 15:26:31 +0800) <RAM: 5.70 MB>
Parse 5 ... 100.00% (Thu, 12 Dec 2019 15:26:33 +0800) <RAM: 5.69 MB>
Parse 6 ... 100.00% (Thu, 12 Dec 2019 15:26:35 +0800) <RAM: 5.69 MB>
Parse 7 ... 100.00% (Thu, 12 Dec 2019 15:26:37 +0800) <RAM: 5.69 MB>
Parse 8 ... 100.00% (Thu, 12 Dec 2019 15:26:39 +0800) <RAM: 5.69 MB>
Parse 9 ... 100.00% (Thu, 12 Dec 2019 15:26:41 +0800) <RAM: 5.69 MB>
Parse 10 ... 100.00% (Thu, 12 Dec 2019 15:26:43 +0800) <RAM: 5.69 MB>
Parse 11 ... 100.00% (Thu, 12 Dec 2019 15:26:45 +0800) <RAM: 5.69 MB>
Parse 12 ... 100.00% (Thu, 12 Dec 2019 15:26:47 +0800) <RAM: 5.21 MB>
Results (49,336 in – 12,849 rejected – 36,487 accepted – 5,540 merged – 30,947 out):
(The "parse" where we see the obvious bottleneck approximately covers the point in the execution where stripInvalidRangesAndSubs executes).
from aggregator.
I've got some updated results condensed:
NumberEntered: 1000
TimeToProcess: 0.35671091079712
NumberEntered: 2000
TimeToProcess: 0.99436402320862
NumberEntered: 5000
TimeToProcess: 4.0280168056488
NumberEntered: 10000
TimeToProcess: 13.002596139908
NumberEntered: 25000
TimeToProcess: 62.700823068619
NumberEntered: 50000
TimeToProcess: 208.29100298882
NumberEntered: 100000
TimeToProcess: 865.93879389763
Might be worth rounding the number to a 3 digit decimal :>
Where do you think the lag exists in the aggregation?
from aggregator.
My current thinking is that the lag exists in the part of the code responsible for stripping out subordinate ranges in the data (in stripInvalidRangesAndSubs, prior to mergeRanges), but I'll need to do a little more tinkering and testing, I think, to be absolutely sure whether that's the best part of the code to try tackling, whether there are other parts I could try to further optimise as well and so on.
When I get some more spare time later this week, I'm thinking I'll see whether I can come up with any reliable short-cuts in the code to maybe further reduce the number of calculations needed in some instances in order to produce the final aggregate. Not sure whether there's really anything left we could safely further cut back on in the overall process without compromising the overall reliability of the aggregator, but if I can come up with a few good ways to detect the specific instances where we're able to do so and code up some short-cuts for them accordingly, that might help further optimise it, I think.
from aggregator.
Yep; Slowly getting there. :-)
I committed a benchmarking script of my own earlier today, as well: 9d59b87
Seems the results from my own benchmarking script are actually a bit poorer than I'd expected, unfortunately.
Generated: 2020.02.16 22:27 UTC+0800 using PHP 7.4.1
System: Windows 10 Home x64, 8GB RAM, AMD A10-8700P Radeon R6 4C+6G/1.80GHz.
Aggregator Benchmarks.
===
Aggregating 1,000 arbitrary IPv4 CIDRs ...
Iteration 1: 0.30416011810303 seconds
Iteration 2: 0.24873614311218 seconds
Iteration 3: 0.2512800693512 seconds
Average time: 0.26805877685547 seconds
Aggregating 1,000 arbitrary IPv6 CIDRs ...
Iteration 1: 0.47209405899048 seconds
Iteration 2: 0.44880485534668 seconds
Iteration 3: 0.4348258972168 seconds
Average time: 0.45190827051799 seconds
Aggregating 5,000 arbitrary IPv4 CIDRs ...
Iteration 1: 2.9636359214783 seconds
Iteration 2: 2.8205618858337 seconds
Iteration 3: 3.1395859718323 seconds
Average time: 2.9745945930481 seconds
Aggregating 5,000 arbitrary IPv6 CIDRs ...
Iteration 1: 7.2216868400574 seconds
Iteration 2: 6.3239941596985 seconds
Iteration 3: 6.3605780601501 seconds
Average time: 6.6354196866353 seconds
Aggregating 10,000 arbitrary IPv4 CIDRs ...
Iteration 1: 9.107794046402 seconds
Iteration 2: 9.233412027359 seconds
Iteration 3: 10.589750051498 seconds
Average time: 9.6436520417531 seconds
Aggregating 10,000 arbitrary IPv6 CIDRs ...
Iteration 1: 25.068593978882 seconds
Iteration 2: 30.529397010803 seconds
Iteration 3: 32.182762861252 seconds
Average time: 29.260251283646 seconds
Aggregating 20,000 arbitrary IPv4 CIDRs ...
Iteration 1: 58.215924978256 seconds
Iteration 2: 54.149658918381 seconds
Iteration 3: 53.645915985107 seconds
Average time: 55.337166627248 seconds
Aggregating 20,000 arbitrary IPv6 CIDRs ...
Iteration 1: 150.7880718708 seconds
Iteration 2: 152.01976585388 seconds
Iteration 3: 139.97001504898 seconds
Average time: 147.59261759122 seconds
Aggregating 50,000 arbitrary IPv4 CIDRs ...
Iteration 1: 261.9621989727 seconds
Iteration 2: 260.28996515274 seconds
Iteration 3: 282.08100700378 seconds
Average time: 268.11105704308 seconds
Aggregating 50,000 arbitrary IPv6 CIDRs ...
Iteration 1: 915.34364199638 seconds
Iteration 2: 847.58106780052 seconds
Iteration 3: 745.74548888206 seconds
Average time: 836.22339955966 seconds
Aggregating 100,000 arbitrary IPv4 CIDRs ...
Iteration 1: 841.79229807854 seconds
Iteration 2: 861.2343609333 seconds
Iteration 3: 857.35292696953 seconds
Average time: 853.45986199379 seconds
Aggregating 100,000 arbitrary IPv6 CIDRs ...
Iteration 1: 2427.4008698463 seconds
Iteration 2: 2380.1490700245 seconds
Iteration 3: 2446.4187259674 seconds
Average time: 2417.9895552794 seconds
But, I've reached out to some other people I know for possible advice, suggestions, etc; it should be possible for further improvements to be made, I think.
from aggregator.
did you take a quick look at the one I had previously noted, wondering if there's an idea there that could be used to help speed things up.
from aggregator.
Hey. Yeah, all is well. Sorry about the delayed response; Have been pretty preoccupied working on other projects over the past week or so. '^.^
I haven't looked too much at this since the past month or so, since our earlier discussion about it, so haven't gotten much further with it yet. I still reckon it should be possible to further improve it though; I just need to make the time for it. I'll look at this more closely again once I get what I'm currently working on in other projects out of the way (I'm hoping within the coming week at some point, if all goes well).
from aggregator.
hi! Hope all is well :)
Just wanted to ping you find out if you've had any ideas or a chance to see how we can speed up the aggregation a bit.
Cheers,
from aggregator.
Hi @nistorj,
Haven't had much time recently due to work, unfortunately. Still on the (eventual) to-do list though. :-)
Of course, in the meantime, if anyone else has any ideas, I'm open to listen, review pull requests, etc and etc.
from aggregator.
Had to push out a fix for a bug I'd recently discovered, so accordingly, ran the benchmark script again to generate some new benchmarks.
Generated: 2021.10.23 15:19 UTC+0800 using PHP 8.0.12
System: Windows 10 Home x64, 8GB RAM, AMD A10-8700P Radeon R6 4C+6G/1.80GHz.
Aggregator Benchmarks.
===
Aggregating 1,000 arbitrary IPv4 CIDRs ...
Iteration 1: 0.30308699607849 seconds
Iteration 2: 0.29645705223083 seconds
Iteration 3: 0.2579460144043 seconds
Average time: 0.28583002090454 seconds
Aggregating 1,000 arbitrary IPv6 CIDRs ...
Iteration 1: 0.51753497123718 seconds
Iteration 2: 0.49191689491272 seconds
Iteration 3: 0.44894790649414 seconds
Average time: 0.48613325754801 seconds
Aggregating 5,000 arbitrary IPv4 CIDRs ...
Iteration 1: 2.9791870117188 seconds
Iteration 2: 2.934103012085 seconds
Iteration 3: 2.8410980701447 seconds
Average time: 2.9181293646495 seconds
Aggregating 5,000 arbitrary IPv6 CIDRs ...
Iteration 1: 7.133465051651 seconds
Iteration 2: 6.6941599845886 seconds
Iteration 3: 6.8942091464996 seconds
Average time: 6.9072780609131 seconds
Aggregating 10,000 arbitrary IPv4 CIDRs ...
Iteration 1: 9.5212910175323 seconds
Iteration 2: 9.1984870433807 seconds
Iteration 3: 11.138768911362 seconds
Average time: 9.9528489907583 seconds
Aggregating 10,000 arbitrary IPv6 CIDRs ...
Iteration 1: 27.45821595192 seconds
Iteration 2: 34.600538015366 seconds
Iteration 3: 25.711801052094 seconds
Average time: 29.256851673126 seconds
Comparative average times:
➡Date/Version➡ ⬇Measure⬇ | 2020.02.16 (PHP 7.4.1, Aggregator v1.3.0) | 2020.12.04 (PHP 7.4.13, Aggregator v1.3.1) | 2021.10.23 (PHP 8.0.12, Aggregator v1.3.3) |
---|---|---|---|
1,000 arbitrary IPv4 CIDRs | 0.26805877685547 | 0.25539104143778 (Best) | 0.28583002090454 (Worst) |
1,000 arbitrary IPv6 CIDRs | 0.45190827051799 | 0.42232791582743 (Best) | 0.48613325754801 (Worst) |
5,000 arbitrary IPv4 CIDRs | 2.9745945930481 (Worst) | 2.7790486017863 (Best) | 2.9181293646495 |
5,000 arbitrary IPv6 CIDRs | 6.6354196866353 | 6.4285720984141 (Best) | 6.9072780609131 (Worst) |
10,000 arbitrary IPv4 CIDRs | 9.6436520417531 (Worst) | 9.3462359110514 (Best) | 9.9528489907583 |
10,000 arbitrary IPv6 CIDRs | 29.260251283646 (Worst) | Not measured. | 29.256851673126 (Best) |
Seems for large numbers of CIDRs, it has slightly improved, but for lower numbers, has actually now become slightly worse (not by a huge amount, but even so.. would rather see it get better than get worse).. I guess, I'll need to do a lot more work here.
from aggregator.
Related Issues (6)
- Hacktoberfest 2017 HOT 6
- Support for netmasks. HOT 2
- To-do: Fix tests. HOT 1
- NumberEntered missing HOT 3
- Feature: passing an array to $input HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aggregator.