henry2004y / dante.jl Goto Github PK
View Code? Open in Web Editor NEWFinite volume MHD simulation on structured mesh
License: MIT License
Finite volume MHD simulation on structured mesh
License: MIT License
This toy code is a classic counter example of me using Julia's explicit type restrictions. Since Julia does not require explicit type declarations to generate optimized code, this is an overkill, and actually it will strongly limit the possible extension to existing functions. The only reasons for explicit type statement in terms of optimization is
I will clean up the code based on the new knowledge I've learned. A good pattern to follow is that only have type statements for the customized new types, but not intrinsic types like Array{Float64,3}
.
StridedArrays.jl is a relatively new package that can replace MArray from StaticArrays.jl. However, due to its dependency on LoopVectorization.jl, the loading time is quite long.
Take a look at ImplicitGlobalGrid.jl. It should be pretty straight-forward to parallelize Dante as it is also using a structured Cartesian mesh.
Note that ImplicitGlobalGrid also supports multi-GPU.
Add TimerOutputs.jl to the package, as a practice.
Since this code is highly influenced by Batsrus, which is written in Fortran 90, it inherits the usage of high dimensional arrays. Back in the MATLAB version, I tested the performance of array of structures
vs structure of arrays
, which led to the conclusion that SoA is faster (contrary to Batsrus, based on Gabor's memory).
I am not satisfied with the 4-D arrays I am using, the first 3 being spatial dimensions and the last being the variable: this gives one the wrong impression that the 4 dimensions are equally important. If you think about how computers understand the 4-level loops, the compiler cannot tell if the last dimension is different from the first 3 dimensions.
Another issue for the kernel loops is that the loop range and the arrays are separated, so the compiler cannot make sure that the code has no erroneous operations such as out-of-bound access. This then becomes the responsibility of the programmer, which ends up in many additional @inbounds
macros that speed up the code 4x. In other words, this design is not compiler-friendly!
Take a look at VectorOfArray
in RecursiveArrayTools.jl. Maybe I can come up with a combination of VectorOfArray
and OffsetArray
to handle the separation of dimensions and ghost cells at the same time.
Learn from other packages and improve! Things may change as I become more efficient in code optimization. (I moved the notes in README.md here.)
x^2
is much slower than x*x
, if being treated naively under global scope. However, within a function the Julia compiler knows how to optimize the square operation and generate exactly the same machine code as x times x. For better readability, I need to create an inline function in Utility.jl
.
One issue I encountered is using @view for face values. This greatly slows down the flux calculations because of intermediate heap allocated memories. As suggested by Roger on Julia's Chinese forum, one workaround is to use the unsafeArrays.jl package to allocate @view memory on the stack. This is worth trying because for the current implementation, LState_XV
and RState_XV
are just shifts of the original array State_GV
. Ideally there is no need to copy the data: using pointers/views should be enough. This part has been improved in later versions.
My test shows that using SubArrays inside for loops is close to the performance of regular arrays. The slowdown is possibly caused by discontinous indexing of the SubArrays. That being said, if it is regularly strided, it should be equivalent to the regular arrays.
I made an experimental choice of adding a SubArray
type of FaceState
besides the copied array type. Although this would cause type instability for returning union type for calc_face_value
, this has been greatly optimized since Julia 0.7 (https://julialang.org/blog/2018/08/union-splitting). The view type is only used for 1st order schemes, while others used the original arrays. Later I realized it was just a bad design choice to allocate stuff an return during time advance: this has been rewritten to have a allocation-free time stepping scheme.
Further improvements may be possible for 1D and 2D to reduce unnecessary calculations. For example, in the flux calculations Flux_YV
and Flux_ZV
are not needed for 1D problems. However, if this code is mainly developed for 3D, the current style may be fine. Also, the compiler may be smart enough to check this?
If there are small arrays inside functions, consider using static arrays.
To speedup the loop performance, we need @inbounds
, @simd
, or the latest @turbo
from the LoopVectorization.jl
package. Caveat: a significant increase in compile time due to heuristics when replacing all the @inbounds
, @simd
with @avx
. May not worth the effort.
@fastmath is another thing worth trying.
Do I need a general divergence calculation function? Or it can be specialized to my grid size?
There are optional parameters in the PARAM.toml file, for instance PlotVar
is
not needed if DoPlot=false
. How can I handle that?
This toy model can be a prototype for offloading to GPU. This is easily achievable in MATLAB since every kernel operation is vectorized. It would take me more effort to achieve this in Julia, but definitely worth the attempt for future large projects.
I've noticed that there is a macro @kwdef
in Base for defining struct
with default values. It is not a public API so it's subject to changes, but we can use it to substitute @with_kw
imported from Parameters.jl
.
In general, it would be more reliable to use intrinsic methods.
@inbounds
has no effect when applied to an entire function as of Julia 1.7.1. See this issue.
As I have many kernel loops, @inbounds
appears frequently. It would be great if I can eliminate some of them. One idea is that I should avoid using explicit indexing range and try to use foreach
and in
if possible. For handling boundary cells, something like foreach(vector)[3:end-2]
can help.
A quick test shows that the code is 4x slower without @inbounds
:
Without @inbounds:
17.096 ms (1186 allocations: 689.38 KiB)
With @inbounds:
4.407 ms (1184 allocations: 689.34 KiB)
We can do an experiment with OffsetArrays.jl and check if it's simpler and as efficient as the current implementation.
In Oceananigans.jl, boundary conditions are handled by the type system, which I think is better by design.
At some point I would like to explore the possibility of using Tullio to simplify the code and potentially get some speed up. Let's see.
Both Oceananigans.jl and Trixi.jl abandon the simulation setup style commonly seen in a statically complied language where one uses a text file for specifying all the parameters and move on to pure Julia script setups, i.e. all parameters are set via runnable Julia code.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.