Giter VIP home page Giter VIP logo

batsrus.jl's Introduction

Stats

Hongyang's GitHub stats

Top Langs

batsrus.jl's People

Contributors

dependabot[bot] avatar henry2004y avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

batsrus.jl's Issues

`OutOfMemoryError()` when replacing `matplotlib.tri.LinearTriInterpolator`

matplotlib.tri.LinearTriInterpolator is pretty slow. However, when I tried to replace that with either ScatteredInterpolations.jl or Dierckx.jl Spline2D, I got OOM errors...

spl = Spline2D(vec(X), vec(Y), vec(W))
#Wi = [evaluate(spl, i, j) for j in yi, i in xi]
ERROR: OutOfMemoryError()
Stacktrace:
 [1] Array
   @ .\boot.jl:475 [inlined]
 [2] Spline2D(x::Base.ReshapedArray{…}, y::Base.ReshapedArray{…}, z::Base.ReshapedArray{…}; w::Vector{…}, kx::Int64, ky::Int64, s::Float64)
   @ Dierckx C:\Users\hyzho\.julia\packages\Dierckx\TDOyl\src\Dierckx.jl:775
 [3] Spline2D(x::Base.ReshapedArray{…}, y::Base.ReshapedArray{…}, z::Base.ReshapedArray{…})
   @ Dierckx C:\Users\hyzho\.julia\packages\Dierckx\TDOyl\src\Dierckx.jl:733
 [4] top-level scope
   @ REPL[31]:1

Related to #14

Equally spaced x values in Matplotlib `streamplot`

For some outputs, drawing streamlines may end up with an error in Matplotlib

ValueError("'x' values must be equally spaced")
  File "C:\Users\hyzho\AppData\Local\Programs\Python\Python311\Lib\site-packages\matplotlib\__init__.py", line 1461,

where the actual check in Matplotlib says

if not np.allclose(np.diff(x), self.width / (self.nx - 1)):
    raise ValueError("'x' values must be equally spaced")

A similar check in Julia would be

all(i -> isapprox(xrange[i+1] - xrange[i], (xrange[end] - xrange[1])/(length(xrange)-1)), eachindex(xrange)[1:end-1])

This is clearly a mismatch between Julia and Numpy, probably for Float32 numbers. Not a big deal, but the error messages are annoying. A quick workaround would be to set the plotinterval value together with plotrange.

Animation support

A while ago, the wrapper over Matplotlib makes it difficult to modify the plots afterwards, which especially causes problems when dealing with time series snapshots. The colorbar is so hard to fix. The solution is, instead of using level, provide a range of points.

The current support of animation in Matplotlib is not good enough, especially for interactive plotting and scanning through multiple snapshots. Based on what I've seen, Makie does a better job in this category, so I make a ref to #21.

More recent versions improve on the wrapper dramatically. Animations with fixed colorbars may become feasible, just as I do in Vlasiator.jl where you manually create a colorbar with fixed range. I will work on this if new simulation results require analyzing, and until then we will decide which way to go

Refactoring the package

  • Rename the basic data structures: avoid ambiguous struct names like Data!
  • Use mmap to do lazy data loading.
  • Support colorful metadata output.
  • Refactor the plotting part: remove redundant plotdata pieces.
  • Avoid meshgrid if possible.
  • Remake the plot tests.
  • Remove Requires.jl.
  • Add PrecompileTools.jl.
  • Add performance benchmark.
  • Improve the doc; remove unnecessary parts; add more useful materials

VTK test diff

In WriteVTK v0.14.3, JuliaVTK/WriteVTK.jl#106 the connectivity and offsets for unstructured grids have been changed from Int32 to Int64. This causes test failure because we compare sha256 for the generated VTU files.

I am asking if they want to support the old behavior, or just keep the current behavior.

Ordering of connectivity

Tecplot and VTK unstructured data formats have the same connectivity ordering for hexahedron, but different ordering for voxel (in VTK). A function swaprows is implemented to switch the orderings.

AUXDATA reading error

I encountered a very bad problem of corrupting binary *.vtu files in the early days. It turned out that the issue is the starting position of data is wrong because of the way I skip the header AUXDATA part. Sometimes the binary numbers may contain newline character that confuses the reader. It is now fixed.

Later on the reading of the header part was completely rewritten to provide better support for a variety of Tecplot ASCII headers.
All the AXUDATA information is now stored into global VTK data.

Reading multiple files

So far there is no single use case where I need to read multiple files together. If you want to do so, just feed readdata with one string of filename at a time. By eliminating the support for multiple files, the function's internal structure is greatly simplified.

streamplot for dxSavePlot=0

In BATSRUS,

  • dxSavePlot = -1 => unstructured grid, cuts are saved point-wise with no grid information
  • dxSavePlot = 0 => structured grid at the finest cell level. For AMR 2D cuts, this will introduce many duplicate points such that the output file size is about 3 times larger than the unstructured equivalent.

In our plotting routines, it takes significantly longer time (~20x for ~100 MB data) to reconstruct the mesh given unstructured outputs. Streamline plotting is working, though.

Streamline plotting for AMR structured grid is not working.

VTK support

  1. The VTK files does not have timestep information. To allow for further time series processing in Paraview, a script create_pvd.jl is provided for generating the pvd container that stores the filenames and timestamps calculated from the filenames. This then becomes the new function create_pvd in vtk.jl.
    After dicussing with the author of WriteVTK.jl, the timestamp information is now added to each file through a scalar that is not related to grid in VTK, i.e. "global data" in the VTK world.

  2. Currently, multi-block (VTM), rectilinear (VTR), and unstructured (VTU) conversions are supported.
    By default the file size will be reduced with compression level 6, but the actual compression ratio depends on the original data.

  3. Currently, BATSRUS output contains only cell center coordinates and cell center values (referring as tcp in PARAM.in), or node coordinates and interpolated nodal values (referring as tec). It is recommended to save in the tcp format to avoid interpolation. In principle VTK also supports the combination of node coordinates and cell center values, but it is not necessary here.

  4. The simple multiblock VTK format conversion can only deal with data within a block, but it cannot figure out the connectivities between neighboring blocks. To fill the gap between blocks, we have to retrieve the tree data stored in .tree files. This requires a in-depth understanding of the grid tree structure in BATSRUS, and it took me a whole week to think, write and debug the code!

Information contained in iTree_IA:

  • the status of the node (used, unused, to be refined, to be coarsened, etc.);
  • the current, the maximum allowed and minimum allowed AMR levels for this node;
  • the three integer coordinates with respect to the whole grid;
  • the index of the parent node (if any);
  • the indexes of the children nodes (if any);
  • the processor index where the block is stored for active nodes;
  • the local block index for active nodes.

Basically, it requires several steps:

  1. Obtain the neighbor block indexes.
  2. Obtain the relative AMR level between neighbor blocks.
  3. Calculate the global cell indexes.
  4. Fill in the connectivity list.

Several issues worth noticing:

  • The blocks are load balanced in Morton curve ordering.
  • Rules are needed to decide which block is in charge of writing the connectivities. Following the ModWriteTecplot implementation,
    • connectivities between blocks on the same AMR level are written by the left neighbors.
    • connectivities between blocks on different AMR levels are written by the finer blocks. This choice may simplify the logic in writing.
  • 2D formatted outputs (with dxPlot⩾0) will generate duplicate points for 2D runs after postprocessing. I don't fully understand why.
  • 2D unformatted outputs (with dxPlot<0) can generate correct point coordinates, but the output points in postprocessing are sorted based on x+e*y+e^2*z to allow removing duplicate points. This is mostly useful for 2D cuts of 3D simulations where duplicate points are possible to exist, and should be merged/averaged. This also means that it is literally impossible to figure out the original cell ordering, and impossible to convert to 2D VTU format (unless you construct some new connectivities based on triangulation)! However, it guarantees that the cut outputs from parallel runs are identical.
  • 3D unformatted output as stored as it is.
  • The simulation is typically done with multiple MPI processes. In the tree structure, each node only records the MPI process local block index. What makes it more complicated is that the local block indexes on a given MPI process may have skipped numbers because they are not in the physical simulation domain. This means that in order to get the true global block indexes, one needs to count how many blocks are used on all MPI processes with lower ranks, and also the order of the current node based on local block index on this MPI process.
  • It would be pretty difficult to count the number of elements in the connectivity list. Appending to an array is a very expensive operation: counting the elements in the first loop, preallocating the array and then repeat the same computation to fill in the list results in over 1000 times speedup.
  • The implementation in v0.2.3 inherits the bug from BATSRUS ModWriteTecplot that will generate duplicate voxel grid. For example, imagine a coarse block which is surrounded by refined blocks all around. On edge 8, the two refined face neighbor will both think they are in charge of writing the connectivity for the cells around the edge. This does not cause any serious issues as of now, so I don't fix it. Speaking of fixing that, there are two ways:
    • fillCellNeighbors! not only deals with coarsened neighbors, but also the refined neighbors. This means that literally we need to implement the message_pass_cell completely. We still do a two-round process, write everything without checking, and in the end before IO remove the duplicate rows. This may be even faster then implementing some complicated writing rules to avoid duplicating.
    • Polish the writing rules even further. This requires me to consider all possible cases. As a first step, for the above example case, neither of the two face neighbor should write the connectivity, but instead the edge neighbor should write.
  • Many function names are inherited from BATL. I may consider rename them in the future if more general task is required.

ParaView python reader

ParaView allows for custom Python reader. Examples can be found in Chapter 12.4 in the official manual, and an example of full Python plugins can be found at Kiware's gitlab page.

The XML package not only provide writing into XML files, but also reading XML structures. Therefore, if you want you can create a VTK reader.

However, it would be even better to write VTK directly from BATSRUS! In the future versions of BATSRUS, we should be able to output VTK files directly with VTKFortran. I won't do it now.

IO show bug

When I want to add a test for Data type IO, the returned string is always empty. This must be a bug related to how I handle files, but I have no clue what it is...

Support on derived variables

Right now the derived quantity plots are not supported. In order to achieve this, I may need:

  • A new function getvar(data::Data, var::String) returning the derived variable
  • A new plotting function that understands the derived data type

The first one is achieved by a trick I found on discourse, which basically identifies symbols as names to members in a struct.
This test feature is not ideal and will be dropped in later versions. This looks like the Python Calculator in ParaView.
I don't know how ParaView achieve this, but in Python it is pretty easy to parse a string and execute it as some new codes using exec function, as Yuxi did in his interface to the yt library. Julia has equivalent metaprogramming capabilities, but there are many restrictions to apply it in practice so it is generally adviced to avoid evaluating expressions inside functions during runtime.
Another option is to create functions for derived variables. This is actually pretty good both in terms of performance and conciseness. The idea is to create a dictionary of derived variable names as keys and anonymous functions as values, and if the input string is found in the dictionary, call the corresponding function to obtain the value.

This has been successfully tested in my new scripts for processing Vlasiator outputs, and can be directly ported here for BATSRUS.

Makie recipe support

Makie is still very hard to use with limited examples and resources as of April 2021. I will only put more effort on this if they keep improving the APIs. Currently I only provide a minimal working example without using recipes.

HDF5 support

BATSRUS support HDF5 outputs, so we should also support reading those files.

User Recipe in Plots.jl

There is a extremely powerful user recipe in Plots.

  • Repeatly using the same GKSTerm on Mac will display only white backgrounds in the end.
  • By default Plots uses gr() backend. The GR backend contour plot only accept vector x,y!
  • There is already a UnitfulRecipes.jl that provides the capability of auto-displaying units in figure labels, and it works smoothly with my user recipe. Amazing.
  • I have already built a customized package UnitfulBatsrus.jl and set it as a dependency for VisAna. Instead of the usual u"km/s" notation, we just need to use bu"amucc". Now this part becomes a submodule within Batsrus.jl, mainly because a registered package cannot depend on an unregistered package.

Improve data structure

I was enlightened by the way SpacePy handles files. Instead of a separate header and data array, it may be better to build a more contained struct.
Also, the header could use NamedTuple instead of Dict.

These are all implemented in newer versions.

Interoperability with Python

Demos are provided for calling Matlab/Python directly from Julia for debugging and testing. This part will later be separated out for potential Python and Matlab users. Currently the plotting and interpolation needed during plotting are done in Python. For instance, the 3D scatterred interpolation is done via Interpolate in Scipy. Hopefully these additional dependencies will be cut down.

In the current version of PyCall and PyJulia, there is already direct support for accessing Julia struct objects (noted as jlwrap).

To overcome the dependency issue on Python, I take advantage of Requires.jl. The idea is that the plotting related functions are only loaded when PyPlot is imported.

Use FortranFiles to replace read in IO?

We can refactor the IO part by taking advantage of FortranFiles.jl. Several potential issues:

  • This only covers the binary files, not ascii files.
  • Header handling mighty be tricky.
  • Hard to work together with mmap.

Native triangulation support in Julia

Currently the 2D plot for unstructured data is done through triangulation and interpolation provided by Matplotlib: https://matplotlib.org/stable/api/tri_api.html#matplotlib.tri.TriInterpolator. I want to find a substitution of triangulation and interpolation in Julia.

There are several options in Julia, but I haven't found a good solution yet. I've looked at

  • ScatteredInterpolations.jl
  • BasicIntepolators.jl

These two are pretty similar, but neither of them support linear scatter interpolations, and they both suffer from the OOM issue.


  • Triangulate.jl is a wrapper over another famous triangle mesh generator. I don't know if it supports linear scatter interpolations.

  • GridInterpolation.jl is another option, but the so-called simplex grid implementation suffers from performance issues.

Optimizing read functions

This is the first time I use Julia for reading general ascii/binary files. It was a pain at first due to the lack of examples and documents using any basic function like read/read!, but fortunately I figured them out myself. One trick in reading binary array data is the usage of view, or subarrays, in Julia. In order to achieve that, I have to implement my own read! function in addition to the base ones.

Before v0.5.1 (the tag being used when the package was not registered), readdata function in MATLAB for large data is 2 times faster than that in Julia. The reason is simply using read or unsafe_read in the lower level. The latter one is much faster. After the fix, Julia version performs 5 times faster than the Matlab version in reading binary data.

After Julia 1.4, reading views of an array is supported by the base read function, so there is no need to dispatch for a customized version. This part has been removed from io.jl since I don't expect people use an older version of Julia anymore.

Learning from yt

The yt Project in Python is a much more mature postprocessing package than mine. That's a good place to learn.

  • Off-axis slices: very similar to ParaView and other 3D visualization tools provide, make a 2D cut at any direction. This requires a robust interpolation scheme.

  • Projection: it is like an integral form of a quantity in a certain direction, and can be weighted or unweighted. It is not a necessary feature, as one can easily implement this once he or she knows how to do make off-axis slices.

  • Same interface for both structured and unstructure data: this is important for a larger user base.

  • Mesh plotting: they provide a method for adding mesh on top of plots. I haven't had a good implementation of this yet.

  • Center: it would be good to give the option of choosing the center of plots.

  • Units: they build a unit system using dictionary themselves. I don't want to reinvent the wheel, so I will just keep an eye on Unitful.jl and yt. What is inherited from IDL in Batsrus.jl needs rewritten.

  • Property control: I don't like the way yt handles figure properties like colorbar, axes, etc. Matplotlib already teaches you how to set these things, and yet again in yt they wrap everything with names they come up with. This is a more of a burden instead of benefit. They do certain things right like set_cmap, but not all of them. The section Further Customization via Matplotlib is all I want to see. Using popular libraries generally means that new comers won't take long to get used to the interfaces, and developers can have a eaiser time developing new features.

  • Log scale: symlog for data containing both positive and negative data, and switch to linear scale for small values.

Units

There is a unit package in Julia unitful for handling units.
Take a look at that one if you really want to solve the unit problem.
On top of that, UnitfulRecipes.jl provides integrity with Plots.jl. The ideas of how to make abstractions is more important than the unit conversion itself.

Eventually, I want this to replace the old way of utilizing units. This is now partially complete, but not fully tested.

Makie support

Recipe

Related to Issue #20, we may also want a true user recipe in Makie.

MakieLayout is a nice extension built on top of Makie to create publication quality figures and interactive plots. It basically includes all the funcationalities I want, so definitely worth a try.

Now in recent version this is merged into the main package, and they are working on unified APIs. Things will improve, but at this point I feel that many things are so hard to make them right without the main developers' help. Clearly they are lack of documentations and examples. Let's see how it goes.

GUI

As for the GUI development, GTK seems to be an ideal candidate. However, the GTK interface in Julia lacks full support for the toolkit, which makes it a little bit hard to use. I have only played with it for half a day. You can design the appearance of your window interactively, and save your in an HTML-like file.

Makie is actually good at this, with the underlying OpenGL support.

At this point GUI is not necessarily needed, if it does not speed up my own workflow.

Headless Matplotlib support

When doing processing in batch mode on a cluster, there's usually no need to render the plots on screen. There exists such a backend for this purpose:

using PyPlot
PyPlot.matplotlib.use("Agg")

However, notice that currently Agg backend does not support draw_artist. For example, you cannot add an anchored text to your figure.

Right now I use this in tests.

meshgrid support

I have made several rounds of mistakes related to array storage ordering, specifically meshgrid vs ndgrid. Even though some Matplotlib functions like contourf support 1D grid arrays, it is very easy to mess it up when other conditions are handling 2D ndarrays from the interpolation package Dierckx.

Therefore I decided to eliminate passing 1D arrays and always use 2D grid arrays. The list comprehension version of meshgrid is

X = [x for _ in y, x in x]
Y = [y for y in y, _ in x]

while the matrix multiplication version is

X = x' .* ones(length(y))
Y = ones(length(x))' .* y

Based on my test results, the list comprehension version is consistently faster and consumes less memory than the matrix multiplication version. I have now replaced all the occurrences with a function called meshgrid.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.