Giter VIP home page Giter VIP logo

dante.jl's Introduction

Stats

Hongyang's GitHub stats

Top Langs

dante.jl's People

Contributors

henry2004y avatar

Stargazers

 avatar

Watchers

 avatar  avatar

dante.jl's Issues

Overspecifying arguments

This toy code is a classic counter example of me using Julia's explicit type restrictions. Since Julia does not require explicit type declarations to generate optimized code, this is an overkill, and actually it will strongly limit the possible extension to existing functions. The only reasons for explicit type statement in terms of optimization is

  1. multiple dispatch,
  2. readability

I will clean up the code based on the new knowledge I've learned. A good pattern to follow is that only have type statements for the customized new types, but not intrinsic types like Array{Float64,3}.

StridedArrays

StridedArrays.jl is a relatively new package that can replace MArray from StaticArrays.jl. However, due to its dependency on LoopVectorization.jl, the loading time is quite long.

Parallelization

Take a look at ImplicitGlobalGrid.jl. It should be pretty straight-forward to parallelize Dante as it is also using a structured Cartesian mesh.

Note that ImplicitGlobalGrid also supports multi-GPU.

High dimensional arrays

Since this code is highly influenced by Batsrus, which is written in Fortran 90, it inherits the usage of high dimensional arrays. Back in the MATLAB version, I tested the performance of array of structures vs structure of arrays, which led to the conclusion that SoA is faster (contrary to Batsrus, based on Gabor's memory).

I am not satisfied with the 4-D arrays I am using, the first 3 being spatial dimensions and the last being the variable: this gives one the wrong impression that the 4 dimensions are equally important. If you think about how computers understand the 4-level loops, the compiler cannot tell if the last dimension is different from the first 3 dimensions.

Another issue for the kernel loops is that the loop range and the arrays are separated, so the compiler cannot make sure that the code has no erroneous operations such as out-of-bound access. This then becomes the responsibility of the programmer, which ends up in many additional @inbounds macros that speed up the code 4x. In other words, this design is not compiler-friendly!

Take a look at VectorOfArray in RecursiveArrayTools.jl. Maybe I can come up with a combination of VectorOfArray and OffsetArray to handle the separation of dimensions and ghost cells at the same time.

List of issues and improvements

Learn from other packages and improve! Things may change as I become more efficient in code optimization. (I moved the notes in README.md here.)

  • x^2 is much slower than x*x, if being treated naively under global scope. However, within a function the Julia compiler knows how to optimize the square operation and generate exactly the same machine code as x times x. For better readability, I need to create an inline function in Utility.jl.

  • One issue I encountered is using @view for face values. This greatly slows down the flux calculations because of intermediate heap allocated memories. As suggested by Roger on Julia's Chinese forum, one workaround is to use the unsafeArrays.jl package to allocate @view memory on the stack. This is worth trying because for the current implementation, LState_XV and RState_XV are just shifts of the original array State_GV. Ideally there is no need to copy the data: using pointers/views should be enough. This part has been improved in later versions.

  • My test shows that using SubArrays inside for loops is close to the performance of regular arrays. The slowdown is possibly caused by discontinous indexing of the SubArrays. That being said, if it is regularly strided, it should be equivalent to the regular arrays.

  • I made an experimental choice of adding a SubArray type of FaceState besides the copied array type. Although this would cause type instability for returning union type for calc_face_value, this has been greatly optimized since Julia 0.7 (https://julialang.org/blog/2018/08/union-splitting). The view type is only used for 1st order schemes, while others used the original arrays. Later I realized it was just a bad design choice to allocate stuff an return during time advance: this has been rewritten to have a allocation-free time stepping scheme.

  • Further improvements may be possible for 1D and 2D to reduce unnecessary calculations. For example, in the flux calculations Flux_YV and Flux_ZV are not needed for 1D problems. However, if this code is mainly developed for 3D, the current style may be fine. Also, the compiler may be smart enough to check this?

  • If there are small arrays inside functions, consider using static arrays.

  • To speedup the loop performance, we need @inbounds, @simd, or the latest @turbo from the LoopVectorization.jl package. Caveat: a significant increase in compile time due to heuristics when replacing all the @inbounds, @simd with @avx. May not worth the effort.

  • @fastmath is another thing worth trying.

Do I need a general divergence calculation function? Or it can be specialized to my grid size?

There are optional parameters in the PARAM.toml file, for instance PlotVar is
not needed if DoPlot=false. How can I handle that?

  • Analytical solution of shock tube tests
  • Convert into a package
  • Change U_ to M_ to clarify momentum from velocity
  • Change TOML dependency
  • LaTeX documentation
  • Adopt TimerOutputs.jl for profiling
  • Adopt Parameters.jl for input parsing
  • Remove submodules
  • Add figures to doc
  • Recheck LoopVectoriztion given the caching capability after Julia 1.9
  • GPU support #5
  • Implicit schemes
  • 2D/3D tests
  • MHD tests

GPU support

This toy model can be a prototype for offloading to GPU. This is easily achievable in MATLAB since every kernel operation is vectorized. It would take me more effort to achieve this in Julia, but definitely worth the attempt for future large projects.

Base.@kwdef

I've noticed that there is a macro @kwdef in Base for defining struct with default values. It is not a public API so it's subject to changes, but we can use it to substitute @with_kw imported from Parameters.jl.

In general, it would be more reliable to use intrinsic methods.

@inbounds

@inbounds has no effect when applied to an entire function as of Julia 1.7.1. See this issue.

As I have many kernel loops, @inbounds appears frequently. It would be great if I can eliminate some of them. One idea is that I should avoid using explicit indexing range and try to use foreach and in if possible. For handling boundary cells, something like foreach(vector)[3:end-2] can help.

A quick test shows that the code is 4x slower without @inbounds:

Without @inbounds:
17.096 ms (1186 allocations: 689.38 KiB)
With @inbounds:
4.407 ms (1184 allocations: 689.34 KiB)

Tullio usage

At some point I would like to explore the possibility of using Tullio to simplify the code and potentially get some speed up. Let's see.

New input system based on pure Julia

Both Oceananigans.jl and Trixi.jl abandon the simulation setup style commonly seen in a statically complied language where one uses a text file for specifying all the parameters and move on to pure Julia script setups, i.e. all parameters are set via runnable Julia code.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.