bbkr / geoip2 Goto Github PK
View Code? Open in Web Editor NEWMaxMind GeoIP v2 libraries reader for Raku language.
License: Artistic License 2.0
MaxMind GeoIP v2 libraries reader for Raku language.
License: Artistic License 2.0
Amount of file handle syscalls required to perform simple lookup is overwhelming.
My idea is to introduce sequential read cache.
For example read(1) at pos 1000 will cause to create
{
1000 => Buf( 1024 chars )
}
cache.
One byte will be shifted from Buf and cache will be remapped to new position key:
{
1001 => Buf( 1023 chars)
}
This way all sequences (maps, pointer bytes) will use already fetched data.
Awaiting MaxMind response. TRANSLATION section may require refactoring.
rakudo/rakudo@4f14d71 - native decoding should give nice performance boost.
Think how to maintain backward compatibility with older Rakudo versions.
And maybe AppVeyor...
When installing via zef: $ zef install GeoIP2
got this message:
WARNINGS for /home/kostas/.zef/store/GeoIP2.git/40c1b984344b2925bfa768cab247c230d5d4f0c2/lib/GeoIP2.pm (GeoIP2):
Useless use of LOOP_BLOCK_1 symbol in sink context (line 133)
===> Testing [OK] for GeoIP2:ver<1.0.0>
===> Installing: GeoIP2:ver<1.0.0>
WARNINGS for /home/kostas/apache_root/perl6.pheix.org/git/pheix-pool/home#sources/B14446447BBA55BFD32F41138C6E7B843FF09D07 (GeoIP2):
Useless use of LOOP_BLOCK_1 symbol in sink context (line 133)
We always know if we want to go left or right branch of binary tree.
There is no need to decode both pointers every time.
Method read-metadata should return decoded metadata. It should be possible to use it at any times and multiple times.
While %metadata attribute may contain some precalculated derived values like IPv4 start.
The split is there, but tests should reflect it.
Experimental branch that allows to cache binary tree nodes and data pointed by pointers is ready:
https://github.com/bbkr/GeoIP2/tree/node_cache
(no docs or tests yet)
It gives excellent boost (up to 400%), however random replacement retention policy used suffers from low performance on Hash.pick reported in: rakudo/rakudo#2586
So when retention takes place on large cache it can have very negative impact on overall performance.
Waiting for Rakudo task to be addressed, then I'll decide if this will be suitable for merge.
Various optimization ideas require better benchmark. For example 1M of real, international www traffic IPs. With duplicates. With IPv4 and IPv6 mixed together, etc. And this should be resolved against pro version of city database because it has huge amount of search nodes.
String representation of decoded types is useful for debugging but it adds 2 additional steps to decoding process.
Once all bits will be in place it can be replaced by direct closure to decoding methods.
Hacky approach will be to... use CArray :)
All uint bytes can be fetched at once which should be faster than doing it byte-by-byte form handle.
0 size will need special treatment in this case.
Full form, short form (0000 -> 0 ).
Compact with '::' is to be considered.
Is it documented somewhere or am I reading less bytes than needed when decoding pointer?
Probably has something to do with:
$ perl6 -e 'use Test; is-deeply [ { a => "b" } ], [ %{ a => "b" }, ];'
not ok 1 -
# Failed test at -e line 1
# expected: $[{:a("b")},]
# got: $[:a("b")]
There are few approaches:
perl6 -e 'say "۳.۳.۳.۳" ~~ / ^ (\d ** 1..3) ** 4 % "." $ /'
「۳.۳.۳.۳」
0 => 「۳」
0 => 「۳」
0 => 「۳」
0 => 「۳」
Currently only size==28 is supported.
Check where constants 285 and 65821 come from in other sizes.
Once the all decoder pieces are in place.
Current speed (without IEEE754 decoding) is about 100 IPv4 searches per second.
This can be improved by using native types, shaped arrays, etc.
Add translation hook which allows to skip included translations (usually there is no point in decoding them all) and get specific language by geoname id.
Assume that there are people who will use bare GeoIP2 reader not masked by specific class such as GeoIP2::City. So interface must be well described and safe to use.
There are methods that are safe to call at any time - reading metadata, reading node pointer, reading location info. Those methods position cursor file on their own.
And there are unsafe methods that when called directly can cause unexpected results - mostly decoding values at current cursor position can go out of range.
Out of range: attempted to read 0 bytes from filehandle
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.