bbkr / geoip2 Goto Github PK

View Code? Open in Web Editor NEW

7.0 7.0 1.0 123 KB

MaxMind GeoIP v2 libraries reader for Raku language.

License: Artistic License 2.0

Raku 100.00%

geoip2 maxmind raku

geoip2's Issues

Add info how to get more translations by geoname id.

Reduce amount of read syscalls.

Amount of file handle syscalls required to perform simple lookup is overwhelming.
My idea is to introduce sequential read cache.

For example read(1) at pos 1000 will cause to create

{
    1000 => Buf( 1024 chars )
}

cache.

One byte will be shifted from Buf and cache will be remapped to new position key:

{
    1001 => Buf( 1023 chars)
}

This way all sequences (maps, pointer bytes) will use already fetched data.

http://www.maxmind.com/GeoIPLocationCSV-localized.zip no longer available

Awaiting MaxMind response. TRANSLATION section may require refactoring.

Switch to native decoding wherever possible.

rakudo/rakudo@4f14d71 - native decoding should give nice performance boost.

Think how to maintain backward compatibility with older Rakudo versions.

Test code against memory leaks.

Warnings at zef install GeoIP2

When installing via zef: $ zef install GeoIP2 got this message:

WARNINGS for /home/kostas/.zef/store/GeoIP2.git/40c1b984344b2925bfa768cab247c230d5d4f0c2/lib/GeoIP2.pm (GeoIP2):
Useless use of LOOP_BLOCK_1 symbol in sink context (line 133)
===> Testing [OK] for GeoIP2:ver<1.0.0>
===> Installing: GeoIP2:ver<1.0.0>
WARNINGS for /home/kostas/apache_root/perl6.pheix.org/git/pheix-pool/home#sources/B14446447BBA55BFD32F41138C6E7B843FF09D07 (GeoIP2):
Useless use of LOOP_BLOCK_1 symbol in sink context (line 133)

Only one node from pointer must be decoded.

We always know if we want to go left or right branch of binary tree.
There is no need to decode both pointers every time.

Separate metadata from derived values.

Method read-metadata should return decoded metadata. It should be possible to use it at any times and multiple times.

While %metadata attribute may contain some precalculated derived values like IPv4 start.

The split is there, but tests should reflect it.

Cache nodes and pointer values

Experimental branch that allows to cache binary tree nodes and data pointed by pointers is ready:

https://github.com/bbkr/GeoIP2/tree/node_cache
(no docs or tests yet)

It gives excellent boost (up to 400%), however random replacement retention policy used suffers from low performance on Hash.pick reported in: rakudo/rakudo#2586

So when retention takes place on large cache it can have very negative impact on overall performance.

Waiting for Rakudo task to be addressed, then I'll decide if this will be suitable for merge.

Benchmark of real traffic baseline.

Various optimization ideas require better benchmark. For example 1M of real, international www traffic IPs. With duplicates. With IPv4 and IPv6 mixed together, etc. And this should be resolved against pro version of city database because it has huge amount of search nodes.

Add protection against unknown extended type.

Optimize types recognition.

String representation of decoded types is useful for debugging but it adds 2 additional steps to decoding process.

Once all bits will be in place it can be replaced by direct closure to decoding methods.

Add IEEE754 decoding

Hacky approach will be to... use CArray :)

Optimize uints

All uint bytes can be fetched at once which should be faster than doing it byte-by-byte form handle.

0 size will need special treatment in this case.

Add IPv6 support.

Full form, short form (0000 -> 0 ).

Compact with '::' is to be considered.

Investigate why byte must be skipped after returning from pointer jump.

Is it documented somewhere or am I reading less bytes than needed when decoding pointer?

Investigate why arity of subdivisions is skewed.

Probably has something to do with:

$ perl6 -e 'use Test; is-deeply [ { a => "b" } ], [ %{ a => "b" }, ];'
not ok 1 -
# Failed test at -e line 1
# expected: $[{:a("b")},]
#      got: $[:a("b")]

Benchmark int decoding methods.

There are few approaches:

bitshift + bitor
bitshift + add
add bitshifted
zero pad and nativecast

Fix \d matching all unicode digits

perl6 -e 'say "۳.۳.۳.۳" ~~  / ^ (\d ** 1..3) ** 4 % "." $ /'
｢۳.۳.۳.۳｣
 0 => ｢۳｣
 0 => ｢۳｣
 0 => ｢۳｣
 0 => ｢۳｣

Implement different size decodings.

Currently only size==28 is supported.

Check where constants 285 and 65821 come from in other sizes.

Do some benchmark to estimate baseline for speed regression tests.

Once the all decoder pieces are in place.

Current speed (without IEEE754 decoding) is about 100 IPv4 searches per second.

This can be improved by using native types, shaped arrays, etc.

Translation hook

Add translation hook which allows to skip included translations (usually there is no point in decoding them all) and get specific language by geoname id.

Reader interface design.

Assume that there are people who will use bare GeoIP2 reader not masked by specific class such as GeoIP2::City. So interface must be well described and safe to use.

There are methods that are safe to call at any time - reading metadata, reading node pointer, reading location info. Those methods position cursor file on their own.

And there are unsafe methods that when called directly can cause unexpected results - mostly decoding values at current cursor position can go out of range.

Reading bytes type for empty size crashes.

Out of range: attempted to read 0 bytes from filehandle

bbkr / geoip2 Goto Github PK

geoip2's People

Contributors

Stargazers

Watchers

geoip2's Issues

Recommend Projects

Recommend Topics

Recommend Org