juleskouatchou / basic_language_comparison Goto Github PK

License: Other

C 5.93% Python 16.69% Fortran 10.34% IDL 2.02% Prolog 4.26% Java 8.14% Jupyter Notebook 0.02% Julia 9.31% MATLAB 5.15% R 11.05% Scala 6.87% Shell 20.22%

basic_language_comparison's Introduction

Basic Comparison of Various Computing Languages

Python, Julia, Matlab, IDL, R, Java, Scala, C, Fortran

Authors:

Jules Kouatchou ([email protected])
Alexander Medema

We use simple test cases to compare various high level programming languages (Python, Julia, Matlab, IDL, R, Java, Scala, C, Fortran). We implement the test cases from an angle of a novice programmer who is not familiar with the optimization techniques available in the languages. The goal is to highlight the strengths and weaknesses of each language but not to claim that one language is better than the others.

Background Information

Check the following docyments to obtain background information on this project:

List of Test Cases

The test cases are listed in categories that include:

Loops and Vectorization
String Manipulations
Numerical Calculations
Input/Output

Each test is "simple" enough to be quickly written in any of the languages and is meant to address issues such as:

Access of non-contiguous memory locations
Use of recursive functions,
Utilization of loops or vectorization,
Opening of a large number of files,
Manipulation of strings of arbitrary lengths,
Multiplication of matrices,
Use of iterative solvers
etc.

The source files are contain in the directories:

  C\    Fortran\  IDL\  Java\  Julia\  Matlab\  Python\  R\  Scala\

There is also a directory Data that contains a Python script that generates the NetCDF4 files needed for the test case on reading a large collection of files. It also has sample text files for the Count Unique Words in a File test case.

Loops and Vectorization

Copy Multidimensional Arrays

 Given an aribitraty n x n x 3 matrix A we perform the operations:
  
      A(i,j,1) = A(i,j,2)
      A(i,j,3) = A(i,j,1)
      A(i,j,2) = A(i,j,3)
 
 using loops and vectorization.

String Manipulations

Look and Say Sequence

 The look and say sequence reads a single integer. In each subsequent entry,
 the number of appearances of each integer in the previous entry is
 concatenated to the front of that integer. For example, an entry of 1223
 would be followed by 112213, or "one 1, two 2's, one 3."
 Here, we start with the number 1223334444 and determine the look and say
 sequence of order n (as n varies).

Count Unique Words in a File

 We open an arbitrary file and count the number of unique words in it
 with the assumption that words such as:
 
         ab   Ab   aB    a&*(-b:    17;A#~!b
         
 are the same.
 
 For our tests, we use the four files: 
 
        world192.txt, plrabn12.txt, bible.txt, and book1
        
 taken from the website (The Canterbury Corpus):
 
       http://corpus.canterbury.ac.nz/descriptions/

Numerical Computations

Fibonacci Sequence

 The Fibonacci Sequence is a sequence of numbers where each successive number
 is the sum of the two that precede it:

         F_n = F_n-1 + F_n-2 .

 Its first entries are

         F_0 = 0 ,  F_1 = F_2 = 1 .

 We measure the elapsed time when calculating an nth Fibonacci number.
 The calculation times are taken for both iterative and recursive calculation
 methods.

Matrix Multiplication

 Two randomly generated n x n matrices A and B are multiplied.
 The time to perform the multiplication is measured. This test highlights the
 importance of using the language's built-in libraries.

Belief Propagation Algorithm

 Belief propagation is an algorithm used for inference, often in the fields of
 artificial intelligence, speech recognition, computer vision, image processing,
 medical diagnostics, parity check codes, and others. The algorithm requires
 repeated matrix multiplication and then a normalization of the matrix.
 This test measures the elapsed time when performing n iterations of the
 algorithm.

Metropolis-Hastings Algorithm

 The Metropolis-Hastings algorithm is a an algorithm used to determine random
 samples from a probability distribution. This implementation uses a
 two-dimensional distribution (Domke 2012), and measures the elapsed time to
 iterate n times.

Compute the FFTs

 We create a n x n matrix M that randomn random complex values.
 We the compute the Fast Fourier Transform (FFT) of M and the absolute
 value of the result.

Iterative Solver

  We use the Jacobi iterative solver to numerically approximate the solution
  of the two-dimensional Laplace equation (that was discretized with a
  fouth order compact schems) to a differential equation. we record the
  elapsed time as the number of grid point varies.

Square Root of a Matrix

 Given an n x n matrix A, we are looking for the matrix B such that:
 
         B * B = A
         
 In our calculations, we consider A with 6s on the diagonal and 1s elsewhere.

Gauss-Legendre quadrature

 Gauss-Legendre quadrature is a numerical method for approximating definite
 integrals. It uses a weighted sum of n values of the integrand function.
 The result is exact if the integrand function is a polynomial of degree 0
 to 2n - 1. Here we consier an exponential function over the interval [-3,3]
 and record the time to perform the integral when n varies.

Function evaluations

 We iteratively calculate trigonometric functions on an n-element list of
 values, and then compute inverse trigonometric functions on the same list.
 The time to complete the full operations is measured as n varies.

Munchausen Numbers

 A Munchausen number is a natural number that is equal to the sum of its digits
 raised to each digit's power. In base 10, there are four such numbers: 
 0, 1, 3435 and 438579088. We determine how much time it takes to find them.

Pernicious Numbers

 A pernicious number is a positive integer which has prime number of ones in 
 its binary representation. The first pernicious number is 3 since 
 3 = (11)(in binary representation) and 1 + 1 = 2, which is a prime. 
 The first 10 pernicious numbers are:
     
     3, 5, 6, 7, 9, 10, 11, 12, 13, 14
     
 We determine the 100000th pernicious number, i.e., the integer 323410.

Input/Output

Reading a Large Collection of Files

We have a set of daily NetCDF files (7305) covering a period of 20 years.
The files for a given year are in a sub-directory labeled YYYY
(for instance Y1990, Y1991, Y1992, etc.). We want to write a script that
opens each file, reads a three-dimensional variable (longitude/latitude/level),
and manipulates it. A pseudo code for the script reads:

    Loop over the years
         Obtain the list of NetCDF files
         Loop over the files
              Read the variable (longitude/latitude/level)
              Compute the zonal mean average (new array of latitude/level)
              Extract the column array at latitude 86 degree South
              Append the column array to a "master" array (or matrix)

The goal is to be able to do a generate the three-diemsional arrays
(year/level/value) and carry out a contour plot.

Results

September 2019

Language	Version
Python	3.7
Julia	0.6.2
Java	10.0.2
Scala	2.13.0
IDL	8.5
R	3.6.1
Matlab	R2017b
GNU Compilers	9.1

May 2020

Language	Version
Python	3.7
Numba	0.45.1
Julia	1.2
Java	13.0.1
Scala	2.13.2
IDL	8.5
R	3.6.3
Matlab	R2020a
GNU Compilers	9.3.0

August 2021

Language	Version
Python	3.9
Numba	0.53.1
Julia	1.6.2
Java	15
Scala
IDL	8.5
R	4.1.0
Matlab	R2020a
GNU Compilers	11.1.0

The pernicious number test case was added.
All the calculations were done in an Intel Xeon Skylake node (40 cores, 2.4 Ghz per core, 4 Gb of memory per core).

basic_language_comparison's People

Contributors

Stargazers

Watchers

basic_language_comparison's Issues

Use a newer version of Julia

It would be great to use the current stable version of Julia (1.3) or at least the long term stable version (1.0) for the results that are published here:

https://modelingguru.nasa.gov/docs/DOC-2783

Add chart showing lines of code by language

Since some languages may be slower but require fewer lines of code for most tasks, I suggest
adding a chart showing the LOC by language for each task.

Thanks for this project.

Julia is kind of dead? why still include it

Julia is kind of good in the way that it has nearly identical syntax as matlab which makes it great for converting code. But as far as I've heard, it's a dead language. I worked with it for a while & it was great but I think the community support for it has also faded so I guess it would be better to not update it in future runs of the comparisons. Unless you know a niche sector that are still using it.

What is the licensing of this repository?

The copyright statement doesn't specify the licensing.

avoid unrelated comparisons

It is really interesting to see that Numba gets faster than Julia. However, during my repro, I found some benchmarks in this repo quite unrelated to the conclusions.

TL;DR;: After fixing the benchmark, the examination shows that Julia is >10x faster than Numba for the case evaluate_functions.

Consider the case evaluate_functions, numba uses parallel for loop. It seems to produce faster programs than other languages, however it is quite misunderstanding.

Programs using prange gives different results from that use range, while the latter is the behaviour of the corresponding Julia program.

In [17]: @njit(parallel=True)
    ...: def evaluate_functions(n):
    ...:     """
    ...:         Evaluate the trigononmetric functions for n values evenly
    ...:         spaced over the interval [-1500.00, 1500.00]
    ...:     """
    ...:     vector1 = np.linspace(-1500.00, 1500.0, n)
    ...:     iterations = 10000
    ...:     for i in range(iterations):
    ...:         vector2 = np.sin(vector1)
    ...:         vector1 = np.arcsin(vector2)
    ...:         vector2 = np.cos(vector1)
    ...:         vector1 = np.arccos(vector2)
    ...:         vector2 = np.tan(vector1)
    ...:         vector1 = np.arctan(vector2)
    ...:     return vector1
    ...:

In [18]: evaluate_functions(10)
Out[18]:
array([1.46030424, 1.13579218, 0.81128013, 0.48676808, 0.16225603,
       0.16225603, 0.48676808, 0.81128013, 1.13579218, 1.46030424])

In [19]: @njit(parallel=True)
    ...: def evaluate_functions(n):
    ...:     """
    ...:         Evaluate the trigononmetric functions for n values evenly
    ...:         spaced over the interval [-1500.00, 1500.00]
    ...:     """
    ...:     vector1 = np.linspace(-1500.00, 1500.0, n)
    ...:     iterations = 10000
    ...:     for i in prange(iterations):
    ...:         vector2 = np.sin(vector1)
    ...:         vector1 = np.arcsin(vector2)
    ...:         vector2 = np.cos(vector1)
    ...:         vector1 = np.arccos(vector2)
    ...:         vector2 = np.tan(vector1)
    ...:         vector1 = np.arctan(vector2)
    ...:     return vector1
    ...:

In [20]: evaluate_functions(10)
Out[20]:
array([-1500.        , -1166.66666667,  -833.33333333,  -500.        ,
        -166.66666667,   166.66666667,   500.        ,   833.33333333,
        1166.66666667,  1500.        ])

The Julia behaviour:

function evaluatefunctions(N)
           #x = linspace(-1500.0, 1500.0, N)
           x = collect(range(-1500.0, stop=1500.0, length=N))
           M = 10000
           for i in 1:M
               y = sin.(x)
               x = asin.(y)
               y = cos.(x)
               x = acos.(y)
               y = tan.(x)
               x = atan.(y)
           end
           return x
 end

julia> evaluatefunctions(N)
10-element Vector{Float64}:
 1.4603042376686257
 1.135792184853451
 0.8112801320381631
 0.48676807922287524
 0.16225602640761583
 0.16225602640761583
 0.48676807922287524
 0.8112801320381631
 1.135792184853451
 1.4603042376686257

Fixing the incorrect use of prange gives a fair result:

In [16]: %timeit evaluate_functions(10)
55.4 ms ± 2.95 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

julia> N = 10
10

julia> @btime evaluatefunctions(N)
  4.224 ms (60001 allocations: 8.24 MiB)

Munchausen timing for Julia is probably off

I ran the https://github.com/JulesKouatchou/basic_language_comparison/blob/master/Julia/test_munchausen.jl script on my laptop and got times of 7s to 5s for Julia 0.6.4, 1.0 and 1.5. Thus the 100s on a Xenon-node seem off.

R recursion implementation is not an apples-to-apples comparison

When I looked at the benchmarks, I was particularly surprised because I couldn't imagine the recursive fib being that fast compared to C.

The R recursion example is not an apples-to-apples comparison, because it is using two accumulators. In fact the commented out R code at https://github.com/JulesKouatchou/basic_language_comparison/blob/master/R/test_fibonacci.R#L15 is the right one. Or alternatively, all languages should use the one with two accumulators.

For example, here's the Julia version that does what the R version does and runs as fast.

 function Fib(N)
           function F(N, a, b)
               N == 0 && return b
               F(N-1, a+b, a)
           end
           F(N, 1, 0)
       end

Much of this was done by @syxpek on the Julia slack. I am just reporting it.

Add OCaml language for comparison

OCaml language community recently started to work hard on providing the infrastructure for scientific and domain-specific computing. OWL framework is one of the most developed ones for OCaml. See their GitHub repositories. And a companies like Imandra help to bring more correctness (using the sound OCaml type system) into the production level computations.

cc @mseri @ryanrhymes

Avoid intermediate printing in benchmarks

Some benchmarks use quite a bit of printing to stdout during the benchmarks. This is not a good idea, and will in some cases dominate runtime.

In particular, the test_count_unique_words.jl prints each line before counting the words. Now, the current implementation of word count is very inefficient, but still, the printing of each line takes 10-100x as long as the word count itself. File reading will also dominate.

If the purpose of the benchmark is to measure the performance of string manipulation, then reading from file and printing to stdout will make the benchmark meaningless, as it almost only measures the latter two.

Add Rust language for comparison

Rust has the striving computational community, e.g:

rustsim computer simulation community
nalgebra linear algebra framework
rust-sci community

See a good introduction to scientific computing with Rust

Jacobi iteration benchmark implementation not correct

I compare here the C-language implementation and Julia.

I noticed that the number of sweeps executed by these two implementations was different.
I tracked it down to the return line in timeStep4_Jacobi: this C function fails to return the square root of the error accumulator, which the Julia functions do. Hence, when the error is compared with the tolerance, the C-language implementation exits the while loop much sooner. The number of sweeps is consequently not equal between the implementations, which invalidates the comparison.

I think it would be best for the benchmark to fix the number of sweeps and not test the error in the loop condition.

Java-Implementation for `testCountUniqueWords` is highly inefficient

You do

File f = new File(filename);
ArrayList arr = new ArrayList();
HashMap<String, Integer> listOfWords = new HashMap<String, Integer>(); 
Scanner in = new Scanner(f);
int i = 0;
while(in.hasNext()) {
  String s = in.next();
  arr.add(s.replaceAll("[^a-zA-Z]", "").toLowerCase());
}
Iterator itr=arr.iterator();
while(itr.hasNext()) {
  i++;
  listOfWords.put((String) itr.next(), i);
}
Set<Object> uniqueValues = new HashSet<Object>(listOfWords.values());

This is inefficient in multiple aspects:

The String.replaceAll("[^a-zA-Z]", ...) parses and compiles the RegEx-Patter [^a-zA-Z] over and over again. You should use once a Pattern.compile("[^a-zA-Z]") before the loop and the reuse it within the loop
You first apply the RegEx (to remove non-alpha-chars) and then apply the toLowerCase. It's better to first apply the lowercase and then have a simpler RegEx, namely [^a-z] as you will not find UPPERCASE-Letters any more.
You first collect all words in an ArrayList, then loop over this ArrayList to fill a HashMap (where the unique words are the key) and finally you create a third collection (the HashSet). This all makes not much sense - you can directy fill a one-and-only HashSet in the first loop.

So make this (only the core part here):

HashSet<String> uniqueValues = new HashSet<>();
Pattern pattern = Pattern.compile("[^a-zA-Z]");
while (in.hasNext()) {
  String s = in.next();
  s = pattern.matcher(s.toLowerCase()).replaceAll("");
  uniqueValues.add(s);
}

This is ~3 times faster (after JVM-warm-up) than your implementation - with much less and simpler code

Note that also have such an optimal implementation for Python: https://github.com/JulesKouatchou/basic_language_comparison/blob/master/Python/test_count_unique_words.py#L29-L31

Should respect a "warm-up" for Java-VM to let JIT-Compiler do its work

Of course, depending on what you want to show such a "warm-up" is or is not appropriate.

Though I propose to have a loop for(int i=0;i<100;i++) doTheTest(); around the core test methods.
This gives the Java-VM time to identify the method doTheTest() to be a hotspot and then to JIT-compile it for the next invocation.

Finally you have 100 measurement results. Propose to publish the first one as e.g. "Java cold" and an average of the last 10 ones as "Java warm / JIT compiled".

Fast-math option shouldn't be used

The results obtained with the fast-math option are not reliable, at least in general.
Hence I believe this option should be disabled for the purpose of this benchmark.

Julia is column major

This code is row major:

https://github.com/JulesKouatchou/basic_language_comparison/blob/master/Julia/test_laplace_jacobi_4.jl#L40-L55

function optimized_time_step(u::Matrix{T})  where {T}
    n, m = size(u)
    error = T(0.0) # not an optimization, but makes it work for all element types
    @inbounds for j = 2:m-1
        for i = 2:n-1 # @simd ivdep, but why is it writing into u[j,i]? That's not the time step.
            temp = u[j, i];
            u[j, i] = ( T(4) *(u[j-1, i] + u[j+1, i] +
                        u[j, i-1] + u[j, i+1]) +
                        u[j-1, i-1] + u[j+1, i+1] +
                        u[j+1, i-1] + u[j-1, i+1]) / T(20)
            difference = u[j, i] - temp
            error += difference*difference;
        end
    end
    return sqrt(error)
end

is probably a better code, though this is a case where most people would do:

using LoopVectorization
function optimized_time_step(u::Matrix{T})  where {T}
    n, m = size(u)
    error = T(0.0) # not an optimization, but makes it work for all element types
    @turbo for j = 2:m-1 # @tturbo for threading
        for i = 2:n-1
            temp = u[j, i];
            u[j, i] = ( T(4) *(u[j-1, i] + u[j+1, i] +
                        u[j, i-1] + u[j, i+1]) +
                        u[j-1, i-1] + u[j+1, i+1] +
                        u[j+1, i-1] + u[j-1, i+1]) / T(20)
            difference = u[j, i] - temp
            error += difference*difference;
        end
    end
    return sqrt(error)
end

Java-Implementation for `testLookAndSay` is inefficient

Please improve two Java-Codelines to avoid performance penaltys:

result.append(times + "" + repeat);  // YOUR code creates (finally millions) intermediate StringBuilder's
result.append(times).append(repeat);  // Better: THIS code avoids it - you already have a StringBuuilder

StringBuilder result = new StringBuilder(); // YOUR code creates Builder with a small (default-sized) buffer which must be increased over and over again while appending characters
StringBuilder result = new StringBuilder(number.length() * 2); // Better: THIS code directly creates a buffer with expected size

Please note that you also implemented in this optimized way for the C-version: https://github.com/JulesKouatchou/basic_language_comparison/blob/master/C/test_look_and_say.c#L18