hanabi1224 / programming-language-benchmarks Goto Github PK
View Code? Open in Web Editor NEWYet another implementation of computer language benchmarks game
Home Page: https://programming-language-benchmarks.vercel.app/
License: MIT License
Yet another implementation of computer language benchmarks game
Home Page: https://programming-language-benchmarks.vercel.app/
License: MIT License
Could you please let us know why the lisp benchmarks are not run?
Hi,
I would like to contribute and add free pascal to this awesome repository. What do you thing about it?
Seems like the Golang version of the HTTP-server problem (1-http2.go
) spawns an HTTPS server (with TLS encryption), while the RUST version (1.rs
) spawns a regular HTTP server (without encryption).
Hence the performance comparison is not fair.
The benchmarks seem to be run in GitHub Actions on GitHub hosted runners.
Those runners are hosted in the cloud, they are probably shared machines running multiple workloads at the same time, etc.
So the results will likely be very noisy.
Is there a plan to address that?
Until then I think it would be useful to run each measurement a few times, or reuse the previous runs to compute the standard deviation or some other estimator of the variance, as this a big caveat.
I compare results for golang & c#.
golang slower than c# but use less memory.
In this benchmark allocates big amount of memory so result is highly dependents on the aggressiveness of the garbage collection.
For golang can use env variable GOGC. by default it = 100
if set GOGC=300, the result becomes close to c#.
For testing need set garbage collection settings to approximatively same aggressiveness.
Now 7.0 has been released it would be great to show numbers for the source-generated version of regex-redux. It moves regex compilation to compile time.
I see you added one @hanabi1224 here:
but I don't see a result listed here
https://programming-language-benchmarks.vercel.app/problem/regex-redux
Note that in the final 7.0 bits the attribute name is "[GeneratedRegex]" not "[RegexGenerator]"
cc @stephentoub
I am pretty sure you're not taking measures for Julia's compilation time or startup time. The memory usage and slow performance for Julia are absurd. It doesn't make sense to treat it like Python. For example, using Julia's BenchmarkTools.jl -- push!(ARGS, "4000"); @benchmark(include("pydigits.jl"))
inside a julia session -- instead of hyperfine "julia -O3 pydigits.jl 4000" results in a difference of 379.271 ms, 1.15 GiB and 1.333 s ± 0.008 s.
What about make python benchs with numbas library (maybe apart, as extra benchs)? maybe it is not a standard library but its use is so important specially for data science and simulations.
Most programs are compiled targeting broadwell
architecture even though the test hardware is Xeon(R) Platinum 8272CL
which is cascadelake
. The test hardware have avx512
instruction set (and others) that doesn't usually exist in consumer level hardware (most dev machine). Is this intended e.g. to ease test and development of contributed programs?
I am just sending the link here. Maybe this can be useful
https://discord.com/channels/592103645835821068/630144782370471978/1038751140432592937
Hello, the maintainer of the Computer Language Benchmark Game is not always able to update the numbers regularly -- he posted in their forum recently it may be a few months. Would you consider adding to this repo the benchmarks from there that you don't already have?
For example, I work on .NET, we shipped .NET 6 and I'm curious to see updated numbers for regex redux.
Also, it seems likely easier to offer submissions here, than over there.
For some reason, the input file was added to gitignore at bench/algorithm/regex-redux
. Not quite sure what the purpose of that decision was, but the benchmark is unexecutable without it. I would request to have that gitignore file removed/edited and the input file included in the repository.
EDIT: noticed afterwards, that other benchmarks also make use of similarly named files, so regex-redux is not the only benchmark affected by this (knucleotide
seems to also require such absent files).
Why isn’t the the Julia 3.jl pidigit code marked ffi?
Could you please confirm if we could use Boost library for C++?
Hi I'm one of the Pyston developers and would really like to see a comparison with Pyston.
It seems like you are using ubuntu 20.04 so you deb packages should just work: https://github.com/pyston/pyston/releases/tag/pyston_2.3.3
Alternatively you could also use "portable" release package which should work on most linux distros.
Thanks
Latest commit of sb-simd is not working on sbcl older then 2.2.4.
Please note that sb-simd is planned to be integrated into sbcl-2.2.5 with one function used in the benchmark codes missing, therefore most codes using it will have to be also updated when the integration is done.
I did modify the spectral-norm CL codes but since you are not accepting my PR I don't know what to do.
I could describe what to change or just look at my PR and see what is needed to be changed and update the codes accordingly.
One of the requirements for the Computer Language Benchmarks Game is that programs are supposed to be using the same (rough) algorithm. Are these benchmarks going to have a similar requirement?
This is a noble goal but in practice it seems to be pretty hard to enforce. There have been many occasions on the Computer Language Benchmarks Game where a program has been accepted only to find out much later that the program was actually using a considerably different algorithm requiring it (and possibly other programs based on it) to be removed. People need to manually check the programs to see if they are using different algorithms and this is hard and time consuming to do so it isn't done as much as we would like.
Additionally some programming languages such as functional languages (like Haskell) can require considerably different algorithms to solve problems. There are also some grey areas where it doesn't become quite so clear if something is using a different algorithm or not. This might include things like rewriting algorithms to use multithreading or SIMD.
Because of these issues, I personally would suggest giving program authors flexibility with the algorithms they use. Over the long run, people will determine optimal algorithms to solve the problems and those optimal algorithms will be adopted by most of the programs anyway.
As of now, the contributed code doesn't have a consistent naming scheme. While most author just use the next unused number for their contributed code, it doesn't seem to be strictly followed. Also, there is a problem when there is multiple PR contributing different code with the same name.
What I propose is that we use author's name/pseudonym as the program name followed by a number as differentiator for multiple submission so that it is unambiguous for creating PR. Also putting the author's name in the program's name itself may motivate some people to submit as some form of bragging rights possibly.
It is supposed to compile rather fast
Would it be possible to add SBCL to the language pool?
I am aware that SBCL is included on The Computer Language Benchmark Game site but unfortunately the site maintainer is refusing to benchmark codes written by certain programmers, therefore the comparison results are not particularly valid.
There are codes sitting in the closed issues that are several times faster than the ones included in the result table.
Keep up the good work.
Thanks and Regards
No use of Memory or Span / ReadOnlyMemory etc.
Code is non optimized...
Thanks
It seems that Vector of Zig is using SIMD inside by default.
In that case all other implementations also could use SIMD for problems?
guys, i got some confused with the wiki (add a n00b section explanation too please xD)
but i all as i understand it is planned to compare diferent languages on build and deploy functions right?
the proprose of this is to be multiplatform? if yes, can i compile it easly on Windows and Linux? im trying to find a benchmark tool to check the java performance on different os like windows, linux(ubuntu, fedora, opensuse, puppy i.e), macos... to all of them the commands of the front of the readme.md are sufficient? or need to a specific changes between them before test?
add a test result example on readme.md too :)
Rust has an unstable feature for generators/semicoroutines - https://doc.rust-lang.org/beta/unstable-book/language-features/generators.html
It's interesting to see how it will compare to async based implementations.
Hi,
I'm willing to look into improving times for Julia language, but want to use tricks disallowed, unfairly, at the Debian benchmark game (at least currently).
Julia is currently optimized for long-running code, has a) a high startup-cost for the runtime itself, plus b) some for compiling the benchmarked code. That means Julia on default options can't win some benchmarks, such as "hello world", but a small/fast compiled such program has already been made with Julia.
Compiled code would be ok here, unlike at Debian? PackagesComiler.jl is to do that, and it seems you're working in that direction, at least I saw a merged "precompile" PR here, but unsure if it's already used.
Another option is a non-default sysimage, but with the benchmark code not in it. Ok? That's basicaly same as a non-default Julia runtime, or a fork of Julia (keeping compatibility with same Julia code).
One more option is mimalloc or other malloc, modifying the Julia binary. Ok? #257
I see Debian used 50000000 for nbody, while you have 5000000 and 500000, so for you Julia is nowhere close to the lead, because of the startup-overhead, unlike at Debian (there at 1.0x). However there, there's another category, and it goes down to 0.5x:
hand-written vector instructions | "unsafe"
https://programming-language-benchmarks.vercel.app/problem/nbody
sudo snap install zig --classic --edge
should have been
sudo snap install zig --classic --beta
for more stable branch.
Could you please explain why @setFloatMode(.Optimized) is allowed for zig in spectralnorm code?
As far as I know this is “equivalent” to -ffast-math in GCC yet quite rightfully -ffast-math is not allowed for gcc.
I would like to temporarily view the results on a Vercel deployment of my fork. How would I do that? The results don't seems to be updated after the benchmarks are ran on GitHub Actions of my fork
Thanks in advance
Please remove DMD compiler (it is too slow) - and add GDC (GCC backend) for Dlang
There is some clumsy way to install all that we need for D testing (LDC, GDC and DUB) - please check the comment here dlang-community/setup-dlang#35 (comment)
And could you consider an opportunity to upgrade CI workflow from Ubuntu 20.04 to 22.04?
Becase fresh GDC available only starting from that version of Ubuntu.
Some of the current problems being used for benchmarking usually result in program authors using a library to do much of the work. Examples of this are the edigits and pidigits problems that usually require an arbitrary precision math library like GMP, the regex-redux problem that usually requires a regular expression library like PCRE or RE2, and the secp256k1 problem that usually requires a crypto or arbitrary precision math library. It seems like the goal of these benchmarks is to benchmark programming languages and their implementations so it might not be a good idea to have problems that will typically be heavily dependent on libraries to do much of the work.
Many of the libraries will be implemented in a different programming language (like C, C++, or assembly) than the one the program is written in and additionally libraries can have greatly different performance from other libraries. This results in these problems being more of a benchmark of the libraries than the programming languages and their implementations.
Also if there is one highly dominant library (like GMP for arbitrary precision math), this can result in many ties. This was demonstrated about a year ago on the pidigits benchmark on the Computer Language Benchmarks game when there was roughly a 10 way tie for best performance. This is highly indicative of just how much program performance for these problems is dependent on the libraries being used.
Many people probably won't even be aware of this library usage but those who are probably won't find benchmarking of libraries to be quite as interesting as benchmarking programming languages and their implementations. I know at least a couple other people have the same thoughts that I do. I would suggest that the edigits, pidigits, regex-redux, and secp256k1 problems (as well as any others I may be missing) should be removed and future problems should try to avoid the use of libraries.
"Also since it does share a lot functionality with ur site, I didn't really intend to advertise and compete."
In that case, you need to brand your project so that it's distinct from the benchmarks game.
At present all of the URLs say benchmarks-game:
another-benchmarks-game.vercel.app
If the benchmarks game didn't exist what would be a good name for your project?
Is there a way to ask individual contributors directly? how does it work? I'm fairly new to github.
I tried creating a local branch and pushing it to the repo; it says I don't have permission to do that.
Are we allowed to use external C libraries for the secp256k1 codes or we need to use language native libraries only?
Very nice comparisons. Here are some improvements that can be made in C#
Some cases use Span or Memory
Link - Span, Memory
Some cases use ValueTask instead of Task
Link - ValueTask
For C# use Source Generator for Json (example below) and if possible bytes instead of string.
[JsonSerializable(typeof(Person))]
internal partial class MyJsonContext : JsonSerializerContext
{
}
Person person = new() { FirstName = "Jane", LastName = "Doe" };
byte[] utf8Json = JsonSerializer.SerializeToUtf8Bytes(person, MyJsonContext.Default.Person);
person = JsonSerializer.Deserialize(utf8Json, MyJsonContext.Default.Person):
why is this removed?
36160bc#diff-dad64536dad5a5311a7f653d00f5f8e10fb1fa76e473e7abca4c8a1df12680e2
On the next commit it says, "removed the trick of using struct" !
can anyone elaborate on this matter? using language specific features is bad?
Why not add assembly language for comparison with C language?
If the assembly language or the main machine code of the architecture itself is due to no difference with the existing hardware and the use of all its power in the criteria, we can measure the difference between the levels of abstraction in optimizing different compilers from other languages versus assembly.
The difference between what high-level languages compile and what should be the original pure machine code.
Please add assembly to your benchmark.
Happy New Year to You
I believe I've clearly explained everything I could with examples and profiling reports, and all I need is a profiler report to prove the fairness which I believe is clear and actionable.
Have you seen my messages regarding SBCL heap allocation profiling?
I am not sure if they went through.
Here are the heap allocation profiling results for a code using cons cells to store the binary tree and one for using struct to do the same clearly showing that cons as well as structs are allocated on heap for every and each instantiation.
1.cl with cons cells
CL-USER> (sb-aprof:aprof-run #'main :arguments 18)
stretch tree of depth 19 check: 1048575
262144 trees of depth 4 check: 8126464
65536 trees of depth 6 check: 8323072
16384 trees of depth 8 check: 8372224
4096 trees of depth 10 check: 8384512
1024 trees of depth 12 check: 8387584
256 trees of depth 14 check: 8388352
64 trees of depth 16 check: 8388544
16 trees of depth 18 check: 8388592
long lived tree of depth 18 check: 524287
553 (of 50000 max) profile entries consumed
% Bytes Count Function
------- ----------- --------- --------
99.5 1087722944 BUILD-TREE
50.5 549453824 34340864 CONS
49.5 538269120 33641820 LIST
0.5 5592320 349520 LOOP-DEPTHS - LIST
00.0 32 2 BINARY-TREES-UPTO-SIZE - LIST
======= ===========
100.0 1093315296
1093315296
CL-USER> ( + 34340864 33641820 349520 2) ;; is exactly equal to the sum of checks
68332206
5.cl with struct
CL-USER> (sb-aprof:aprof-run #'main :arguments 18)
stretch tree of depth 19 check: 1048575
262144 trees of depth 4 check: 8126464
65536 trees of depth 6 check: 8323072
16384 trees of depth 8 check: 8372224
4096 trees of depth 10 check: 8384512
1024 trees of depth 12 check: 8387584
256 trees of depth 14 check: 8388352
64 trees of depth 16 check: 8388544
16 trees of depth 18 check: 8388592
long lived tree of depth 18 check: 524287
562 (of 50000 max) profile entries consumed
% Bytes Count Function
------- ----------- --------- --------
99.5 2175445888 67982684 BUILD-TREE - NODE
0.5 11184640 349520 LOOP-DEPTHS - NODE
00.0 64 2 BINARY-TREES-UPTO-SIZE - NODE
======= ===========
100.0 2186630592
2186630592
CL-USER> (+ 67982684 349520 2) ;; is exactly equal to the sum of checks
68332206
And here is your example you demonstrated that OCaml is not doing what you expect according to
callgrind profiling report.
Here is the heap allocation profiling report clearly showing that SBCL does it as you expect.
This is with the PR-ed 5.cl
(defun make-tree (n)
(loop for i below n do (build-tree 1)))
CL-USER> (sb-aprof:aprof-run #'make-tree :arguments 10000)
8 (of 50000 max) profile entries consumed
% Bytes Count Function
------- ----------- --------- --------
100.0 960000 30000 BUILD-TREE - NODE
======= ===========
100.0 960000
960000
CL-USER>
Therefore could you please accept my PR with the latest codes and rerun the benchmark including codes 1.cl 2.cl 5.cl and 6.cl since SBCL clearly allocates every nodes on the heap meeting requirements?
On the home page
Main goals:
…
Facilitate benchmarking on real server environments as nowadays more and more applications are deployed with docker/k8s. It's likely to get a very different result from what you get on your dev machine.
…
Now there are plenty of results, are they different?
I can't make sense out of the way you execute these benchmarks. Here are a few things I noticed:
https://github.com/hanabi1224/Programming-Language-Benchmarks/blob/56aee38efa03b60ed4d6655cf8935324a90cd65c/bench/bench_c.yaml
No optimization options applied, does not seem realistic at all
https://github.com/hanabi1224/Programming-Language-Benchmarks/blob/56aee38efa03b60ed4d6655cf8935324a90cd65c/bench/bench_rust.yaml
Seems to be optimized for skylake instruction set?
https://github.com/hanabi1224/Programming-Language-Benchmarks/blob/56aee38efa03b60ed4d6655cf8935324a90cd65c/bench/bench_java.yaml
'Properly' benchmarking Java applications is a pain. First of all, you need to set -Xmx
, -Xms
, -server
, -Xbatch
, possibly I am missing a few options, then let JVM warmup by executing benchmark several times and only then run benchmarks (on the same JVM instance without stoping it). Or at least this is a way of benchmarking that is the most accurate for running applications in the cloud. But yes, this is not exactly easily changeable by the looks of current project structure
TruffleRuby 21.1.0 was released: https://github.com/oracle/truffleruby/releases/tag/vm-21.1.0
Could you update to the new version and run the benchmarks?
I think there should be some good improvements there :)
Should be "guarantee" not "garantee" in "currently use CI to generate benchmark results to garantee all the numbers are generated from the same environment at nearly the same time."
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.