Giter VIP home page Giter VIP logo

Comments (10)

WeiqunZhang avatar WeiqunZhang commented on July 30, 2024

amrex::Gpu::streamSynchronize waits for async GPU kernels launched earlier to finish and then checks if there are any errors in those earlier kernels reported by the CUDA runtime.

Could you provide an inputs file for C++ that I can test without using python?

from warpx.

archermarx avatar archermarx commented on July 30, 2024

Certainly, here's the file generated by the python interface, with some apparently duplicated outputs removed.

max_step = 1000
warpx.verbose = 1
warpx.const_dt = 5e-12
warpx.numprocs = 2 2 2

warpx.do_electrostatic = labframe
warpx.self_fields_required_precision = 1e-05

amr.n_cell = 128 128 128
amr.max_grid_size = 32
amr.blocking_factor = 1
amr.max_level = 0
amrex.use_gpu_aware_mpi = 1

geometry.dims = 3
geometry.prob_lo = -2e-05 -2e-05 -2e-05
geometry.prob_hi = 2e-05 2e-05 2e-05
geometry.is_periodic = 1 1 1
geometry.coord_sys = 0

boundary.field_lo = periodic periodic periodic
boundary.field_hi = periodic periodic periodic
boundary.particle_lo = periodic periodic periodic
boundary.particle_hi = periodic periodic periodic

algo.current_deposition = direct
algo.particle_shape = 1
particles.species_names = electrons

electrons.mass = m_e
electrons.charge = -q_e
electrons.injection_style = nuniformpercell
electrons.initialize_self_fields = 0
electrons.num_particles_per_cell_each_dim = 8 8 8
electrons.xmin = -2e-05
electrons.xmax = 2e-05
electrons.ymin = -2e-05
electrons.ymax = 2e-05
electrons.zmin = -2e-05
electrons.zmax = 2e-05
electrons.momentum_distribution_type = gaussian
electrons.ux_m = 0.0
electrons.uy_m = 0.0
electrons.uz_m = 0.0
electrons.ux_th = 0.01
electrons.uy_th = 0.01
electrons.uz_th = 0.01
electrons.profile = constant
electrons.density = 1e+25

amrex.abort_on_out_of_gpu_memory = 1
amrex.the_arena_is_managed = 0
amrex.omp_threads = nosmt
tiny_profiler.device_synchronize_around_region = 1
particles.do_tiling = 0
amrex.v = 1
amrex.verbose = 1
amrex.max_gpu_streams = 4
device.v = 0
device.verbose = 0
device.numThreads.x = 0
device.numThreads.y = 0
device.numThreads.z = 0
device.numBlocks.x = 0
device.numBlocks.y = 0
device.numBlocks.z = 0
device.graph_init = 0
device.graph_init_nodes = 10000
amrex.regtest_reduction = 0
amrex.signal_handling = 1
amrex.throw_exception = 0
amrex.call_addr2line = 1
amrex.abort_on_unused_inputs = 0
amrex.handle_sigsegv = 1
amrex.handle_sigterm = 0
amrex.handle_sigint = 1
amrex.handle_sigabrt = 1
amrex.handle_sigfpe = 1
amrex.handle_sigill = 1
amrex.fpe_trap_invalid = 0
amrex.fpe_trap_zero = 0
amrex.fpe_trap_overflow = 0
amrex.the_arena_init_size = 63697108992
amrex.the_device_arena_init_size = 8388608
amrex.the_managed_arena_init_size = 8388608
amrex.the_pinned_arena_init_size = 8388608
amrex.the_comms_arena_init_size = 8388608
amrex.the_arena_release_threshold = 9223372036854775807
amrex.the_device_arena_release_threshold = 9223372036854775807
amrex.the_managed_arena_release_threshold = 9223372036854775807
amrex.the_pinned_arena_release_threshold = 42464739328
amrex.the_comms_arena_release_threshold = 9223372036854775807
amrex.the_async_arena_release_threshold = 9223372036854775807
fab.init_snan = 0
DistributionMapping.v = 0
DistributionMapping.verbose = 0
DistributionMapping.efficiency = 0.90000000000000002
DistributionMapping.sfc_threshold = 0
DistributionMapping.node_size = 0
DistributionMapping.verbose_mapper = 0
fab.initval = nan
fab.do_initval = 0
fabarray.maxcomp = 25
amrex.mf.alloc_single_chunk = 0
vismf.v = 0
vismf.headerversion = 1
vismf.groupsets = 0
vismf.setbuf = 1
vismf.usesingleread = 0
vismf.usesinglewrite = 0
vismf.checkfilepositions = 0
vismf.usepersistentifstreams = 0
vismf.usesynchronousreads = 0
vismf.usedynamicsetselection = 1
vismf.iobuffersize = 2097152
vismf.allowsparsewrites = 1
amrex.async_out = 0
amrex.async_out_nfiles = 64
amrex.vector_growth_factor = 1.5
machine.verbose = 0
machine.very_verbose = 0
tiny_profiler.verbose = 0
tiny_profiler.v = 0
tiny_profiler.print_threshold = 1
amrex.use_profiler_syncs = 0

amr.v = 0
amr.n_proper = 1
amr.grid_eff = 0.69999999999999996
amr.refine_grid_layout = 1
amr.refine_grid_layout_x = 1
amr.refine_grid_layout_y = 1
amr.refine_grid_layout_z = 1
amr.check_input = 1
vismf.usesingleread = 1
vismf.usesinglewrite = 1
particles.particles_nfiles = 1024
particles.use_prepost = 0
particles.do_unlink = 1
particles.do_mem_efficient_sort = 1
lattice.reverse = 0

from warpx.

WeiqunZhang avatar WeiqunZhang commented on July 30, 2024

I can see an issue. For the first multigrid solver, the min and max of the rhs are 1.8095128179727603e+17, and 1.8095128179728016e+17. Since the problem has all periodic boundaries, the matrix is singular. The solver is having trouble with this singular matrix problem with rhs being almost a constant.

I don't know if this is the issue you are seeing. I only tested it with a much smaller setup on a single CPU core. Could you try with the following change? If the hack works, we can then discuss how to implement a real fix.

--- a/Source/ablastr/fields/PoissonSolver.H
+++ b/Source/ablastr/fields/PoissonSolver.H
@@ -265,8 +265,19 @@ computePhi (amrex::Vector<amrex::MultiFab*> const & rho,
         mlmg.setAlwaysUseBNorm(always_use_bnorm);
 
         // Solve Poisson equation at lev
-        mlmg.solve( {phi[lev]}, {rho[lev]},
-                    relative_tolerance, absolute_tolerance );
+        auto rhomin = rho[lev]->min(0);
+        auto rhomax = rho[lev]->max(0);
+        {
+            amrex::Print().SetPrecision(17) << "xxxxx " << rhomin
+                                            << ", " << rhomax
+                                            << ", " << (rhomax-rhomin)/rhomin << std::endl;
+        }
+        if (std::abs(rhomax-rhomin) <= 1.e-12_rt * std::abs(rhomax+rhomin)) {
+            phi[lev]->setVal(0.0_rt);
+        } else {
+            mlmg.solve( {phi[lev]}, {rho[lev]},
+                        relative_tolerance, absolute_tolerance );
+        }
 
         // needed for solving the levels by levels:
         // - coarser level is initial guess for finer level

from warpx.

WeiqunZhang avatar WeiqunZhang commented on July 30, 2024

In this test, there are no initial fields because all particles are uniformly distributed. But particles have ux_th = 0.01, uy_th = 0.01 and uz_th = 0.01. The test uses warpx.const_dt = 5e-12. So in one step, rms velocity will move a particle by 1.5e-5. The domain size is only 4e-5. So some particles are probably way out of bound such that one periodic shift will not bring them back to the initial domain. This will result in out of bound access to arrays, thus the invalid memory access.

Maybe you need to use a smaller const_dt. Maybe you need to change the initial setup. Maybe WarpX needs to implement a way to allow for the simulation to start with a smaller dt and then gradually increase to warpx.const_dt. @RemiLehe

from warpx.

archermarx avatar archermarx commented on July 30, 2024

from warpx.

archermarx avatar archermarx commented on July 30, 2024

Looks like reducing the timestep fixed the primary issue, thanks! I didn't notice much of a change by implementing your precision fix, but I have run into problems with this sort of thing before. It's quite common to initialize domains with uniform plasmas which may then become excited by a perturbation. It would be good if the ES solver could handle uniform plasmas gracefully. I noticed that in the first ten iterations of these uniform plasma tests, each timestep took between 2 and 10 seconds, versus 0.4 seconds per step once the simulation had progressed a bit. I suspect this may be related.

Unfortunately, I am still having issues with the simulation not finalizing. It hangs just before outputting the expected "AMReX finalized" when running the ES simulation, but not when running EM.

from warpx.

WeiqunZhang avatar WeiqunZhang commented on July 30, 2024

I cannot reproduce the hang before "amrex finalized". Maybe it's in the python part?

from warpx.

archermarx avatar archermarx commented on July 30, 2024

Unfortunately not. Running it with the binaries directly still exhibits this problem on my system. I will try running with CPU only later to see if that is the issue.

from warpx.

WeiqunZhang avatar WeiqunZhang commented on July 30, 2024

We need to figure out where it hangs. If your job is interactive, pressing ctrl-c might produce backtrace files. If it's a batch job, you need to send a signal to the hanging job to terminal the job and hopefully backtrace files are then produced. If it's slurm, you could use scancel --signal=INT or when you submit job you do sbatch --signal=INT@300 (where 300 means sending SIGINT 300 seconds before the time limit.

If you are using cmake to build the code, you probably want to build it with RelWithDebInfo, not Release so that you can get some debug information into the executable.

I don't understand how python handles signals. Maybe you need to run the executable directly without python in the middle for the signal handling stuff work properly.

from warpx.

archermarx avatar archermarx commented on July 30, 2024

OK, I found that reducing the timestep further to 1e-14 seconds fixes all of the problems, seemingly. It might be nice to emit a warning if the user has picked a timestep that is likely to result in problems. I can make a PR to do that, if that seems reasonable.

from warpx.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.