objectpascal-community / 1brc-objectpascal Goto Github PK
View Code? Open in Web Editor NEW1️⃣🐝🏎️ The 1 Billion Row Challenge in Object Pascal
License: MIT License
1️⃣🐝🏎️ The 1 Billion Row Challenge in Object Pascal
License: MIT License
result values per station in the format //, rounded to one fractional digit
ROUNDED = truncated, I guess?
In the implementation, using a "single" type for adding a lot of values is known as not accurate.
With millions of values, adding new numbers may just be wrong because it exhausts the 32-bit precision of single.
This is why there are algorithms like https://en.wikipedia.org/wiki/Kahan_summation_algorithm
Implementation should better use Int64 values, and work as fixed resolution with temperature *10 64-bit integers.
And it will be faster.
BTW for a few thousand numbers, single is not faster than double, because the CPU L1 cache miss will probably be the bottleneck.
Describe the bug
Generator seems to fail with 400 stations option
To Reproduce
Steps to reproduce the behavior:
./generator -i ../data/weather_stations.csv -o data400.csv -n 1000000000 -4
Expected behavior
Should generate the file.
Screenshots
$ ./generator -i ../data/weather_stations.csv -o data400.csv -n 1000000000 -4
ERROR: Option at position 7 needs an argument : 4
Additional context
Same issue is --400stations
is used.
No problem without the -4
option: the csv file is correctly generated.
We need to discuss about the requirement of "full station name hash".
In (most of) my entries I use the "perfect hash" trick, i.e. only compare the 32-bit of the hash to check for a given station name. With a good enough hash function (e.g. crc32c), it works perfectly fine with our current dataset of 10K stations, and give the correct output results. BUT we may be able to add a line to the dataset with a forged name triggering a hash collision. Then the results would be inaccurate...
In the original 1BRC challenge, this trick was disallowed, and they rejected any solution not explicitly comparing the station names char by char.
gunnarmorling/1brc#495 (reply in thread)
So in my entry, I made this process flow available, and we can compare plain ./abouchez
and ./abouchez -f
- the later making a full name comparison, but lower (1.96s vs 1.10s on my Intel PC).
To be fair with the original comparison, I would recommend to require a full station name comparison.
It makes numbers lower, but is IMHO more accurate with what we expect on real work.
There seems to be an accepted result, with a published SHA256 value. It would be helpful to get sample values for the text that is provided to the hash routine. For example, my last run (to a file) resulted in the wrong hash, with the beginning of the output like:
{‘Abasān al Kabīrah=-18.2/-59.6/22.8, ‘Adrā=62.2/30.7/93.6, ‘Afrīn=28.7/0.7/56.6, ‘Ajab Shīr=-29.6/-70.7/11.3, ‘Ajlūn=33.4/4.3/62.3, ‘Ajmān=22.7/-9.2/54.5, ‘Akko=38.8/-3.6/80.6, ‘Alavīcheh=-61.7/-79.6/-43.8, ‘Alem T’ēna=6.2/-19.6/31.7, ‘Ālī Shahr=46.6/16.1/77.1, ‘Alīābād-e Katūl=-56.9/-84.1/-29.8, ‘Amrān=-15.4/-45.6/14.7, ‘Āmūdā=-58.1/-85.2/-30.9, ‘Anadān=11.4/-20.2/43.0, ‘Anbarābād=-39.1/-66.3/-12.2, ‘Aqrah=-35.9/-64.2/-7.8, ‘Ayn al ‘Arab=-15.3/-39.8/9.1, ‘Aynkāwah=38.6/5.4/72.2, ‘Ibrī=-50.3/-75.7/-24.9, ‘Izbat al Burj=23.6/-17.5/64.5, ‘Unayzah=56.2/14.0/98.5, ‘Utaybah=72.2/53.9/90.4, ’Aïn Abessa=0.8/-27.4/29.2, ’Aïn Abid=17.9/-3.1/38.9, ’Aïn Arnat=-25.2/-66.0/15.9, ’Aïn Azel=70.1/45.0/95.0, ’Aïn el Hammam=-44.0/-81.4/-7.4, ’Aïn Leuh=-71.3/-91.4/-51.6, ’Aïn Roua=-31.2/-54.3/-8.3, ’Ali Ben Sliman=66.1/40.4/91.9, ’Ayn Bni Mathar=-55.9/-86.9/-25.2, ```
I suspect I have not correctly implemented rounding, but I may be missing a BOM at the beginning, or an EOL at the end. If possible, provide an excerpt from the beginning of the correct result in the ReadMe file. Thank You!
You can only use pure Object Pascal with no calls to any operating system's API
This requirement did not exist in the original Java challenge, and is pointless IMHO.
I would like to focus on FPC Linux x86_64.
Or at least be able to use mORMot 2 as cross-platform and cross-compiler layer.
by about 4 amd a half minutes.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.