objectpascal-community / 1brc-objectpascal Goto Github PK

View Code? Open in Web Editor NEW

53.0 53.0 22.0 4.24 MB

1️⃣🐝🏎️ The 1 Billion Row Challenge in Object Pascal

License: MIT License

Pascal 99.71% C++ 0.03% Batchfile 0.26%

1brc 1brc-object-pascal challenge delphi free-pascal freepascal lazarus object-pascal objectpascal pascal

1brc-objectpascal's People

Contributors

Stargazers

Watchers

1brc-objectpascal's Issues

Clarify average rounding

result values per station in the format //, rounded to one fractional digit

ROUNDED = truncated, I guess?

Using "single" for the total value is unsafe

In the implementation, using a "single" type for adding a lot of values is known as not accurate.
With millions of values, adding new numbers may just be wrong because it exhausts the 32-bit precision of single.
This is why there are algorithms like https://en.wikipedia.org/wiki/Kahan_summation_algorithm

Implementation should better use Int64 values, and work as fixed resolution with temperature *10 64-bit integers.
And it will be faster.

BTW for a few thousand numbers, single is not faster than double, because the CPU L1 cache miss will probably be the bottleneck.

Generator seems to fail with 400 stations option

Describe the bug
Generator seems to fail with 400 stations option

To Reproduce
Steps to reproduce the behavior:
./generator -i ../data/weather_stations.csv -o data400.csv -n 1000000000 -4

Expected behavior
Should generate the file.

Screenshots

$ ./generator -i ../data/weather_stations.csv -o data400.csv -n 1000000000 -4
ERROR: Option at position 7 needs an argument : 4

Additional context
Same issue is --400stations is used.
No problem without the -4 option: the csv file is correctly generated.

Full Station Name Hash Requirement?

We need to discuss about the requirement of "full station name hash".

In (most of) my entries I use the "perfect hash" trick, i.e. only compare the 32-bit of the hash to check for a given station name. With a good enough hash function (e.g. crc32c), it works perfectly fine with our current dataset of 10K stations, and give the correct output results. BUT we may be able to add a line to the dataset with a forged name triggering a hash collision. Then the results would be inaccurate...

In the original 1BRC challenge, this trick was disallowed, and they rejected any solution not explicitly comparing the station names char by char.
gunnarmorling/1brc#495 (reply in thread)

So in my entry, I made this process flow available, and we can compare plain ./abouchez and ./abouchez -f - the later making a full name comparison, but lower (1.96s vs 1.10s on my Intel PC).

To be fair with the original comparison, I would recommend to require a full station name comparison.
It makes numbers lower, but is IMHO more accurate with what we expect on real work.

ReadMe should have partial example of the accepted result

There seems to be an accepted result, with a published SHA256 value. It would be helpful to get sample values for the text that is provided to the hash routine. For example, my last run (to a file) resulted in the wrong hash, with the beginning of the output like:

{‘Abasān al Kabīrah=-18.2/-59.6/22.8, ‘Adrā=62.2/30.7/93.6, ‘Afrīn=28.7/0.7/56.6, ‘Ajab Shīr=-29.6/-70.7/11.3, ‘Ajlūn=33.4/4.3/62.3, ‘Ajmān=22.7/-9.2/54.5, ‘Akko=38.8/-3.6/80.6, ‘Alavīcheh=-61.7/-79.6/-43.8, ‘Alem T’ēna=6.2/-19.6/31.7, ‘Ālī Shahr=46.6/16.1/77.1, ‘Alīābād-e Katūl=-56.9/-84.1/-29.8, ‘Amrān=-15.4/-45.6/14.7, ‘Āmūdā=-58.1/-85.2/-30.9, ‘Anadān=11.4/-20.2/43.0, ‘Anbarābād=-39.1/-66.3/-12.2, ‘Aqrah=-35.9/-64.2/-7.8, ‘Ayn al ‘Arab=-15.3/-39.8/9.1, ‘Aynkāwah=38.6/5.4/72.2, ‘Ibrī=-50.3/-75.7/-24.9, ‘Izbat al Burj=23.6/-17.5/64.5, ‘Unayzah=56.2/14.0/98.5, ‘Utaybah=72.2/53.9/90.4, ’Aïn Abessa=0.8/-27.4/29.2, ’Aïn Abid=17.9/-3.1/38.9, ’Aïn Arnat=-25.2/-66.0/15.9, ’Aïn Azel=70.1/45.0/95.0, ’Aïn el Hammam=-44.0/-81.4/-7.4, ’Aïn Leuh=-71.3/-91.4/-51.6, ’Aïn Roua=-31.2/-54.3/-8.3, ’Ali Ben Sliman=66.1/40.4/91.9, ’Ayn Bni Mathar=-55.9/-86.9/-25.2, ```

I suspect I have not correctly implemented rounding, but I may be missing a BOM at the beginning, or an EOL at the end. If possible, provide an excerpt from the beginning of the correct result in the ReadMe file. Thank You!

...

Get rid of cross-OS and cross-IDE requirement

You can only use pure Object Pascal with no calls to any operating system's API

This requirement did not exist in the original Java challenge, and is pointless IMHO.

I would like to focus on FPC Linux x86_64.
Or at least be able to use mORMot 2 as cross-platform and cross-compiler layer.

typo in readme

by about 4 amd a half minutes.

objectpascal-community / 1brc-objectpascal Goto Github PK

1brc-objectpascal's People

Contributors

Stargazers

Watchers

Forkers

1brc-objectpascal's Issues

Clarify average rounding

Using "single" for the total value is unsafe

Generator seems to fail with 400 stations option

Full Station Name Hash Requirement?

ReadMe should have partial example of the accepted result

...

Get rid of cross-OS and cross-IDE requirement

typo in readme

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent