juleskouatchou / basic_language_comparison Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
I ran the https://github.com/JulesKouatchou/basic_language_comparison/blob/master/Julia/test_munchausen.jl script on my laptop and got times of 7s to 5s for Julia 0.6.4, 1.0 and 1.5. Thus the 100s on a Xenon-node seem off.
Julia is kind of good in the way that it has nearly identical syntax as matlab which makes it great for converting code. But as far as I've heard, it's a dead language. I worked with it for a while & it was great but I think the community support for it has also faded so I guess it would be better to not update it in future runs of the comparisons. Unless you know a niche sector that are still using it.
Since some languages may be slower but require fewer lines of code for most tasks, I suggest
adding a chart showing the LOC by language for each task.
Thanks for this project.
It is really interesting to see that Numba gets faster than Julia. However, during my repro, I found some benchmarks in this repo quite unrelated to the conclusions.
TL;DR;: After fixing the benchmark, the examination shows that Julia is >10x faster than Numba for the case evaluate_functions
.
Consider the case evaluate_functions
, numba uses parallel
for loop. It seems to produce faster programs than other languages, however it is quite misunderstanding.
Programs using prange
gives different results from that use range
, while the latter is the behaviour of the corresponding Julia program.
In [17]: @njit(parallel=True)
...: def evaluate_functions(n):
...: """
...: Evaluate the trigononmetric functions for n values evenly
...: spaced over the interval [-1500.00, 1500.00]
...: """
...: vector1 = np.linspace(-1500.00, 1500.0, n)
...: iterations = 10000
...: for i in range(iterations):
...: vector2 = np.sin(vector1)
...: vector1 = np.arcsin(vector2)
...: vector2 = np.cos(vector1)
...: vector1 = np.arccos(vector2)
...: vector2 = np.tan(vector1)
...: vector1 = np.arctan(vector2)
...: return vector1
...:
In [18]: evaluate_functions(10)
Out[18]:
array([1.46030424, 1.13579218, 0.81128013, 0.48676808, 0.16225603,
0.16225603, 0.48676808, 0.81128013, 1.13579218, 1.46030424])
In [19]: @njit(parallel=True)
...: def evaluate_functions(n):
...: """
...: Evaluate the trigononmetric functions for n values evenly
...: spaced over the interval [-1500.00, 1500.00]
...: """
...: vector1 = np.linspace(-1500.00, 1500.0, n)
...: iterations = 10000
...: for i in prange(iterations):
...: vector2 = np.sin(vector1)
...: vector1 = np.arcsin(vector2)
...: vector2 = np.cos(vector1)
...: vector1 = np.arccos(vector2)
...: vector2 = np.tan(vector1)
...: vector1 = np.arctan(vector2)
...: return vector1
...:
In [20]: evaluate_functions(10)
Out[20]:
array([-1500. , -1166.66666667, -833.33333333, -500. ,
-166.66666667, 166.66666667, 500. , 833.33333333,
1166.66666667, 1500. ])
The Julia behaviour:
function evaluatefunctions(N)
#x = linspace(-1500.0, 1500.0, N)
x = collect(range(-1500.0, stop=1500.0, length=N))
M = 10000
for i in 1:M
y = sin.(x)
x = asin.(y)
y = cos.(x)
x = acos.(y)
y = tan.(x)
x = atan.(y)
end
return x
end
julia> evaluatefunctions(N)
10-element Vector{Float64}:
1.4603042376686257
1.135792184853451
0.8112801320381631
0.48676807922287524
0.16225602640761583
0.16225602640761583
0.48676807922287524
0.8112801320381631
1.135792184853451
1.4603042376686257
Fixing the incorrect use of prange
gives a fair result:
In [16]: %timeit evaluate_functions(10)
55.4 ms ± 2.95 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
julia> N = 10
10
julia> @btime evaluatefunctions(N)
4.224 ms (60001 allocations: 8.24 MiB)
Some benchmarks use quite a bit of printing to stdout during the benchmarks. This is not a good idea, and will in some cases dominate runtime.
In particular, the test_count_unique_words.jl
prints each line before counting the words. Now, the current implementation of word count is very inefficient, but still, the printing of each line takes 10-100x as long as the word count itself. File reading will also dominate.
If the purpose of the benchmark is to measure the performance of string manipulation, then reading from file and printing to stdout will make the benchmark meaningless, as it almost only measures the latter two.
Rust has the striving computational community, e.g:
When I looked at the benchmarks, I was particularly surprised because I couldn't imagine the recursive fib being that fast compared to C.
The R recursion example is not an apples-to-apples comparison, because it is using two accumulators. In fact the commented out R code at https://github.com/JulesKouatchou/basic_language_comparison/blob/master/R/test_fibonacci.R#L15 is the right one. Or alternatively, all languages should use the one with two accumulators.
For example, here's the Julia version that does what the R version does and runs as fast.
function Fib(N)
function F(N, a, b)
N == 0 && return b
F(N-1, a+b, a)
end
F(N, 1, 0)
end
Much of this was done by @syxpek on the Julia slack. I am just reporting it.
You do
File f = new File(filename);
ArrayList arr = new ArrayList();
HashMap<String, Integer> listOfWords = new HashMap<String, Integer>();
Scanner in = new Scanner(f);
int i = 0;
while(in.hasNext()) {
String s = in.next();
arr.add(s.replaceAll("[^a-zA-Z]", "").toLowerCase());
}
Iterator itr=arr.iterator();
while(itr.hasNext()) {
i++;
listOfWords.put((String) itr.next(), i);
}
Set<Object> uniqueValues = new HashSet<Object>(listOfWords.values());
This is inefficient in multiple aspects:
String.replaceAll("[^a-zA-Z]", ...)
parses and compiles the RegEx-Patter [^a-zA-Z]
over and over again. You should use once a Pattern.compile("[^a-zA-Z]")
before the loop and the reuse it within the loop[^a-z]
as you will not find UPPERCASE-Letters any more.So make this (only the core part here):
HashSet<String> uniqueValues = new HashSet<>();
Pattern pattern = Pattern.compile("[^a-zA-Z]");
while (in.hasNext()) {
String s = in.next();
s = pattern.matcher(s.toLowerCase()).replaceAll("");
uniqueValues.add(s);
}
This is ~3 times faster (after JVM-warm-up) than your implementation - with much less and simpler code
Note that also have such an optimal implementation for Python: https://github.com/JulesKouatchou/basic_language_comparison/blob/master/Python/test_count_unique_words.py#L29-L31
It would be great to use the current stable version of Julia (1.3) or at least the long term stable version (1.0) for the results that are published here:
OCaml language community recently started to work hard on providing the infrastructure for scientific and domain-specific computing. OWL framework is one of the most developed ones for OCaml. See their GitHub repositories. And a companies like Imandra help to bring more correctness (using the sound OCaml type system) into the production level computations.
I compare here the C-language implementation and Julia.
I noticed that the number of sweeps executed by these two implementations was different.
I tracked it down to the return line in timeStep4_Jacobi:
this C function fails to return the square root of the error accumulator, which the Julia functions do. Hence, when the error is compared with the tolerance, the C-language implementation exits the while loop much sooner. The number of sweeps is consequently not equal between the implementations, which invalidates the comparison.
I think it would be best for the benchmark to fix the number of sweeps and not test the error in the loop condition.
Please improve two Java-Codelines to avoid performance penaltys:
result.append(times + "" + repeat); // YOUR code creates (finally millions) intermediate StringBuilder's
result.append(times).append(repeat); // Better: THIS code avoids it - you already have a StringBuuilder
StringBuilder result = new StringBuilder(); // YOUR code creates Builder with a small (default-sized) buffer which must be increased over and over again while appending characters
StringBuilder result = new StringBuilder(number.length() * 2); // Better: THIS code directly creates a buffer with expected size
Please note that you also implemented in this optimized way for the C-version: https://github.com/JulesKouatchou/basic_language_comparison/blob/master/C/test_look_and_say.c#L18
The results obtained with the fast-math option are not reliable, at least in general.
Hence I believe this option should be disabled for the purpose of this benchmark.
This code is row major:
function optimized_time_step(u::Matrix{T}) where {T}
n, m = size(u)
error = T(0.0) # not an optimization, but makes it work for all element types
@inbounds for j = 2:m-1
for i = 2:n-1 # @simd ivdep, but why is it writing into u[j,i]? That's not the time step.
temp = u[j, i];
u[j, i] = ( T(4) *(u[j-1, i] + u[j+1, i] +
u[j, i-1] + u[j, i+1]) +
u[j-1, i-1] + u[j+1, i+1] +
u[j+1, i-1] + u[j-1, i+1]) / T(20)
difference = u[j, i] - temp
error += difference*difference;
end
end
return sqrt(error)
end
is probably a better code, though this is a case where most people would do:
using LoopVectorization
function optimized_time_step(u::Matrix{T}) where {T}
n, m = size(u)
error = T(0.0) # not an optimization, but makes it work for all element types
@turbo for j = 2:m-1 # @tturbo for threading
for i = 2:n-1
temp = u[j, i];
u[j, i] = ( T(4) *(u[j-1, i] + u[j+1, i] +
u[j, i-1] + u[j, i+1]) +
u[j-1, i-1] + u[j+1, i+1] +
u[j+1, i-1] + u[j-1, i+1]) / T(20)
difference = u[j, i] - temp
error += difference*difference;
end
end
return sqrt(error)
end
The copyright statement doesn't specify the licensing.
Of course, depending on what you want to show such a "warm-up" is or is not appropriate.
Though I propose to have a loop for(int i=0;i<100;i++) doTheTest();
around the core test methods.
This gives the Java-VM time to identify the method doTheTest()
to be a hotspot and then to JIT-compile it for the next invocation.
Finally you have 100 measurement results. Propose to publish the first one as e.g. "Java cold" and an average of the last 10 ones as "Java warm / JIT compiled".
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.