Giter VIP home page Giter VIP logo

Comments (16)

nistorj avatar nistorj commented on June 5, 2024 1

I had let a 100k prefix one run as well, here's that result:

NumberEntered:  0
NumberRejected: -100000
NumberAccepted: 100000
NumberMerged:   91192
NumberReturned: 8808
TimeToProcess:  869.71707892418

from aggregator.

nistorj avatar nistorj commented on June 5, 2024 1

We had used a previous implementation to aggregate v6 blocks in another project which I had replaced the aggregator there with this one since it wasn't quite fully aggregating in all instances. That reference is here: https://github.com/6connect/irrpt/blob/dacc17f6b538ed69c6c486f787034c5aa31cbbd3/inc/aggregate.inc

from aggregator.

nistorj avatar nistorj commented on June 5, 2024 1

I've pulled down the latest file and tested, here are the new numbers:

NumberEntered:  1000
TimeToProcess:  0.38982105255127

NumberEntered:  2000
TimeToProcess:  1.193638086319

NumberEntered:  5000
TimeToProcess:  4.0424168109894

NumberEntered:  10000
TimeToProcess:  13.068011045456

NumberEntered:  25000
TimeToProcess:  62.265861034393

NumberEntered:  50000
TimeToProcess:  207.09190392494

NumberEntered:  100000
TimeToProcess:  829.24696993828

So interesting, the more routes you provide the faster it goes, marginally :>

from aggregator.

nistorj avatar nistorj commented on June 5, 2024 1

hey, hope all is well with you!

Wondering if you've had any more time to take a look to see if there's anything that could be done to improve the performance, or if the other aggregation file I mentioned assisted in any ideas for your version.

from aggregator.

nistorj avatar nistorj commented on June 5, 2024

Here are some performance numbers using an increment of routes:

NumberEntered:  0
NumberRejected: -1000
NumberAccepted: 1000
NumberMerged:   910
NumberReturned: 90
TimeToProcess:  0.37775897979736

NumberEntered:  0
NumberRejected: -2000
NumberAccepted: 2000
NumberMerged:   1792
NumberReturned: 208
TimeToProcess:  1.204185962677

NumberEntered:  0
NumberRejected: -5000
NumberAccepted: 5000
NumberMerged:   4792
NumberReturned: 208
TimeToProcess:  4.1573090553284

NumberEntered:  0
NumberRejected: -10000
NumberAccepted: 10000
NumberMerged:   9792
NumberReturned: 208
TimeToProcess:  13.36133813858

NumberEntered:  0
NumberRejected: -25000
NumberAccepted: 25000
NumberMerged:   24791
NumberReturned: 209
TimeToProcess:  63.091789007187


NumberEntered:  0
NumberRejected: -50000
NumberAccepted: 50000
NumberMerged:   47414
NumberReturned: 2586
TimeToProcess:  209.73743796349

from aggregator.

Maikuolan avatar Maikuolan commented on June 5, 2024

Cheers. 👍

Hi there,

I'm wondering if there's any known issue with performance when trying to feed in 100k prefixes? When I enable some extra output on the classes/functions I find that it seems to drag along at stripInvalidRangesAndSubs($this->Output), specifically the $Line area I think.

Any idea?

Yeah.. the kinds of benchmarking you're seeing are pretty much the same at my end too, so I would suggest that it's an accurate reflection of the average performance a user could expect from the Aggregator at this time.

Still works much better and much faster than the very few equivalent alternatives I've seen floating around the internet IMO, but definitely still isn't ideal, so I'd be quite keen to see some refactoring happen at some point, to try to speed it up a bit, improve performance, etc.

from aggregator.

Maikuolan avatar Maikuolan commented on June 5, 2024

Worth noting too, that the decline in performance as the workload increases seems to be more exponential, rather than linear (meaning, a doubling of workload is more likely to quadruple the decline or beyond, rather than just doubling the decline). Conversely, the performance seems to be really, really good when the workload is reasonably small (per my own experiences). But yeah; there's some room for improvement, I think.

from aggregator.

Maikuolan avatar Maikuolan commented on June 5, 2024

A public benchmark I did about a month ago (but, the code hasn't changed much since then, so it should still be a reasonably accurate reflection of the current performance expectations), aggregating all the default CIDRAM signature files together (generated using the callable hooks to print time/date to CLI at various points during aggregation):

IP Aggregator
===

Parse 1 ... 100.00% (Thu, 12 Dec 2019 15:21:26 +0800) <RAM: 5.29 MB>
Parse 2 ... 100.00% (Thu, 12 Dec 2019 15:26:25 +0800) <RAM: 5.26 MB>
Parse 3 ... 100.00% (Thu, 12 Dec 2019 15:26:29 +0800) <RAM: 5.72 MB>
Parse 4 ... 100.00% (Thu, 12 Dec 2019 15:26:31 +0800) <RAM: 5.70 MB>
Parse 5 ... 100.00% (Thu, 12 Dec 2019 15:26:33 +0800) <RAM: 5.69 MB>
Parse 6 ... 100.00% (Thu, 12 Dec 2019 15:26:35 +0800) <RAM: 5.69 MB>
Parse 7 ... 100.00% (Thu, 12 Dec 2019 15:26:37 +0800) <RAM: 5.69 MB>
Parse 8 ... 100.00% (Thu, 12 Dec 2019 15:26:39 +0800) <RAM: 5.69 MB>
Parse 9 ... 100.00% (Thu, 12 Dec 2019 15:26:41 +0800) <RAM: 5.69 MB>
Parse 10 ... 100.00% (Thu, 12 Dec 2019 15:26:43 +0800) <RAM: 5.69 MB>
Parse 11 ... 100.00% (Thu, 12 Dec 2019 15:26:45 +0800) <RAM: 5.69 MB>
Parse 12 ... 100.00% (Thu, 12 Dec 2019 15:26:47 +0800) <RAM: 5.21 MB>

Results (49,336 in – 12,849 rejected – 36,487 accepted – 5,540 merged – 30,947 out):

(The "parse" where we see the obvious bottleneck approximately covers the point in the execution where stripInvalidRangesAndSubs executes).

from aggregator.

nistorj avatar nistorj commented on June 5, 2024

I've got some updated results condensed:

NumberEntered:  1000
TimeToProcess:  0.35671091079712

NumberEntered:  2000
TimeToProcess:  0.99436402320862

NumberEntered:  5000
TimeToProcess:  4.0280168056488

NumberEntered:  10000
TimeToProcess:  13.002596139908

NumberEntered:  25000
TimeToProcess:  62.700823068619

NumberEntered:  50000
TimeToProcess:  208.29100298882

NumberEntered:  100000
TimeToProcess:  865.93879389763

Might be worth rounding the number to a 3 digit decimal :>

Where do you think the lag exists in the aggregation?

from aggregator.

Maikuolan avatar Maikuolan commented on June 5, 2024

My current thinking is that the lag exists in the part of the code responsible for stripping out subordinate ranges in the data (in stripInvalidRangesAndSubs, prior to mergeRanges), but I'll need to do a little more tinkering and testing, I think, to be absolutely sure whether that's the best part of the code to try tackling, whether there are other parts I could try to further optimise as well and so on.

When I get some more spare time later this week, I'm thinking I'll see whether I can come up with any reliable short-cuts in the code to maybe further reduce the number of calculations needed in some instances in order to produce the final aggregate. Not sure whether there's really anything left we could safely further cut back on in the overall process without compromising the overall reliability of the aggregator, but if I can come up with a few good ways to detect the specific instances where we're able to do so and code up some short-cuts for them accordingly, that might help further optimise it, I think.

from aggregator.

Maikuolan avatar Maikuolan commented on June 5, 2024

Yep; Slowly getting there. :-)

I committed a benchmarking script of my own earlier today, as well: 9d59b87

Seems the results from my own benchmarking script are actually a bit poorer than I'd expected, unfortunately.

Generated: 2020.02.16 22:27 UTC+0800 using PHP 7.4.1
System: Windows 10 Home x64, 8GB RAM, AMD A10-8700P Radeon R6 4C+6G/1.80GHz.

Aggregator Benchmarks.
===

Aggregating 1,000 arbitrary IPv4 CIDRs ...
Iteration 1: 0.30416011810303 seconds
Iteration 2: 0.24873614311218 seconds
Iteration 3: 0.2512800693512 seconds
Average time: 0.26805877685547 seconds


Aggregating 1,000 arbitrary IPv6 CIDRs ...
Iteration 1: 0.47209405899048 seconds
Iteration 2: 0.44880485534668 seconds
Iteration 3: 0.4348258972168 seconds
Average time: 0.45190827051799 seconds


Aggregating 5,000 arbitrary IPv4 CIDRs ...
Iteration 1: 2.9636359214783 seconds
Iteration 2: 2.8205618858337 seconds
Iteration 3: 3.1395859718323 seconds
Average time: 2.9745945930481 seconds


Aggregating 5,000 arbitrary IPv6 CIDRs ...
Iteration 1: 7.2216868400574 seconds
Iteration 2: 6.3239941596985 seconds
Iteration 3: 6.3605780601501 seconds
Average time: 6.6354196866353 seconds


Aggregating 10,000 arbitrary IPv4 CIDRs ...
Iteration 1: 9.107794046402 seconds
Iteration 2: 9.233412027359 seconds
Iteration 3: 10.589750051498 seconds
Average time: 9.6436520417531 seconds


Aggregating 10,000 arbitrary IPv6 CIDRs ...
Iteration 1: 25.068593978882 seconds
Iteration 2: 30.529397010803 seconds
Iteration 3: 32.182762861252 seconds
Average time: 29.260251283646 seconds


Aggregating 20,000 arbitrary IPv4 CIDRs ...
Iteration 1: 58.215924978256 seconds
Iteration 2: 54.149658918381 seconds
Iteration 3: 53.645915985107 seconds
Average time: 55.337166627248 seconds


Aggregating 20,000 arbitrary IPv6 CIDRs ...
Iteration 1: 150.7880718708 seconds
Iteration 2: 152.01976585388 seconds
Iteration 3: 139.97001504898 seconds
Average time: 147.59261759122 seconds


Aggregating 50,000 arbitrary IPv4 CIDRs ...
Iteration 1: 261.9621989727 seconds
Iteration 2: 260.28996515274 seconds
Iteration 3: 282.08100700378 seconds
Average time: 268.11105704308 seconds


Aggregating 50,000 arbitrary IPv6 CIDRs ...
Iteration 1: 915.34364199638 seconds
Iteration 2: 847.58106780052 seconds
Iteration 3: 745.74548888206 seconds
Average time: 836.22339955966 seconds


Aggregating 100,000 arbitrary IPv4 CIDRs ...
Iteration 1: 841.79229807854 seconds
Iteration 2: 861.2343609333 seconds
Iteration 3: 857.35292696953 seconds
Average time: 853.45986199379 seconds


Aggregating 100,000 arbitrary IPv6 CIDRs ...
Iteration 1: 2427.4008698463 seconds
Iteration 2: 2380.1490700245 seconds
Iteration 3: 2446.4187259674 seconds
Average time: 2417.9895552794 seconds

But, I've reached out to some other people I know for possible advice, suggestions, etc; it should be possible for further improvements to be made, I think.

from aggregator.

nistorj avatar nistorj commented on June 5, 2024

did you take a quick look at the one I had previously noted, wondering if there's an idea there that could be used to help speed things up.

from aggregator.

Maikuolan avatar Maikuolan commented on June 5, 2024

Hey. Yeah, all is well. Sorry about the delayed response; Have been pretty preoccupied working on other projects over the past week or so. '^.^

I haven't looked too much at this since the past month or so, since our earlier discussion about it, so haven't gotten much further with it yet. I still reckon it should be possible to further improve it though; I just need to make the time for it. I'll look at this more closely again once I get what I'm currently working on in other projects out of the way (I'm hoping within the coming week at some point, if all goes well).

from aggregator.

nistorj avatar nistorj commented on June 5, 2024

hi! Hope all is well :)

Just wanted to ping you find out if you've had any ideas or a chance to see how we can speed up the aggregation a bit.

Cheers,

from aggregator.

Maikuolan avatar Maikuolan commented on June 5, 2024

Hi @nistorj,

Haven't had much time recently due to work, unfortunately. Still on the (eventual) to-do list though. :-)

Of course, in the meantime, if anyone else has any ideas, I'm open to listen, review pull requests, etc and etc.

from aggregator.

Maikuolan avatar Maikuolan commented on June 5, 2024

Had to push out a fix for a bug I'd recently discovered, so accordingly, ran the benchmark script again to generate some new benchmarks.

Generated: 2021.10.23 15:19 UTC+0800 using PHP 8.0.12
System: Windows 10 Home x64, 8GB RAM, AMD A10-8700P Radeon R6 4C+6G/1.80GHz.

Aggregator Benchmarks.
===

Aggregating 1,000 arbitrary IPv4 CIDRs ...
Iteration 1: 0.30308699607849 seconds
Iteration 2: 0.29645705223083 seconds
Iteration 3: 0.2579460144043 seconds
Average time: 0.28583002090454 seconds


Aggregating 1,000 arbitrary IPv6 CIDRs ...
Iteration 1: 0.51753497123718 seconds
Iteration 2: 0.49191689491272 seconds
Iteration 3: 0.44894790649414 seconds
Average time: 0.48613325754801 seconds


Aggregating 5,000 arbitrary IPv4 CIDRs ...
Iteration 1: 2.9791870117188 seconds
Iteration 2: 2.934103012085 seconds
Iteration 3: 2.8410980701447 seconds
Average time: 2.9181293646495 seconds


Aggregating 5,000 arbitrary IPv6 CIDRs ...
Iteration 1: 7.133465051651 seconds
Iteration 2: 6.6941599845886 seconds
Iteration 3: 6.8942091464996 seconds
Average time: 6.9072780609131 seconds


Aggregating 10,000 arbitrary IPv4 CIDRs ...
Iteration 1: 9.5212910175323 seconds
Iteration 2: 9.1984870433807 seconds
Iteration 3: 11.138768911362 seconds
Average time: 9.9528489907583 seconds


Aggregating 10,000 arbitrary IPv6 CIDRs ...
Iteration 1: 27.45821595192 seconds
Iteration 2: 34.600538015366 seconds
Iteration 3: 25.711801052094 seconds
Average time: 29.256851673126 seconds

Comparative average times:

➡Date/Version➡ ⬇Measure⬇ 2020.02.16 (PHP 7.4.1, Aggregator v1.3.0) 2020.12.04 (PHP 7.4.13, Aggregator v1.3.1) 2021.10.23 (PHP 8.0.12, Aggregator v1.3.3)
1,000 arbitrary IPv4 CIDRs 0.26805877685547 0.25539104143778 (Best) 0.28583002090454 (Worst)
1,000 arbitrary IPv6 CIDRs 0.45190827051799 0.42232791582743 (Best) 0.48613325754801 (Worst)
5,000 arbitrary IPv4 CIDRs 2.9745945930481 (Worst) 2.7790486017863 (Best) 2.9181293646495
5,000 arbitrary IPv6 CIDRs 6.6354196866353 6.4285720984141 (Best) 6.9072780609131 (Worst)
10,000 arbitrary IPv4 CIDRs 9.6436520417531 (Worst) 9.3462359110514 (Best) 9.9528489907583
10,000 arbitrary IPv6 CIDRs 29.260251283646 (Worst) Not measured. 29.256851673126 (Best)

Seems for large numbers of CIDRs, it has slightly improved, but for lower numbers, has actually now become slightly worse (not by a huge amount, but even so.. would rather see it get better than get worse).. I guess, I'll need to do a lot more work here.

from aggregator.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.