Comments (10)
amrex::Gpu::streamSynchronize
waits for async GPU kernels launched earlier to finish and then checks if there are any errors in those earlier kernels reported by the CUDA runtime.
Could you provide an inputs file for C++ that I can test without using python?
from warpx.
Certainly, here's the file generated by the python interface, with some apparently duplicated outputs removed.
max_step = 1000
warpx.verbose = 1
warpx.const_dt = 5e-12
warpx.numprocs = 2 2 2
warpx.do_electrostatic = labframe
warpx.self_fields_required_precision = 1e-05
amr.n_cell = 128 128 128
amr.max_grid_size = 32
amr.blocking_factor = 1
amr.max_level = 0
amrex.use_gpu_aware_mpi = 1
geometry.dims = 3
geometry.prob_lo = -2e-05 -2e-05 -2e-05
geometry.prob_hi = 2e-05 2e-05 2e-05
geometry.is_periodic = 1 1 1
geometry.coord_sys = 0
boundary.field_lo = periodic periodic periodic
boundary.field_hi = periodic periodic periodic
boundary.particle_lo = periodic periodic periodic
boundary.particle_hi = periodic periodic periodic
algo.current_deposition = direct
algo.particle_shape = 1
particles.species_names = electrons
electrons.mass = m_e
electrons.charge = -q_e
electrons.injection_style = nuniformpercell
electrons.initialize_self_fields = 0
electrons.num_particles_per_cell_each_dim = 8 8 8
electrons.xmin = -2e-05
electrons.xmax = 2e-05
electrons.ymin = -2e-05
electrons.ymax = 2e-05
electrons.zmin = -2e-05
electrons.zmax = 2e-05
electrons.momentum_distribution_type = gaussian
electrons.ux_m = 0.0
electrons.uy_m = 0.0
electrons.uz_m = 0.0
electrons.ux_th = 0.01
electrons.uy_th = 0.01
electrons.uz_th = 0.01
electrons.profile = constant
electrons.density = 1e+25
amrex.abort_on_out_of_gpu_memory = 1
amrex.the_arena_is_managed = 0
amrex.omp_threads = nosmt
tiny_profiler.device_synchronize_around_region = 1
particles.do_tiling = 0
amrex.v = 1
amrex.verbose = 1
amrex.max_gpu_streams = 4
device.v = 0
device.verbose = 0
device.numThreads.x = 0
device.numThreads.y = 0
device.numThreads.z = 0
device.numBlocks.x = 0
device.numBlocks.y = 0
device.numBlocks.z = 0
device.graph_init = 0
device.graph_init_nodes = 10000
amrex.regtest_reduction = 0
amrex.signal_handling = 1
amrex.throw_exception = 0
amrex.call_addr2line = 1
amrex.abort_on_unused_inputs = 0
amrex.handle_sigsegv = 1
amrex.handle_sigterm = 0
amrex.handle_sigint = 1
amrex.handle_sigabrt = 1
amrex.handle_sigfpe = 1
amrex.handle_sigill = 1
amrex.fpe_trap_invalid = 0
amrex.fpe_trap_zero = 0
amrex.fpe_trap_overflow = 0
amrex.the_arena_init_size = 63697108992
amrex.the_device_arena_init_size = 8388608
amrex.the_managed_arena_init_size = 8388608
amrex.the_pinned_arena_init_size = 8388608
amrex.the_comms_arena_init_size = 8388608
amrex.the_arena_release_threshold = 9223372036854775807
amrex.the_device_arena_release_threshold = 9223372036854775807
amrex.the_managed_arena_release_threshold = 9223372036854775807
amrex.the_pinned_arena_release_threshold = 42464739328
amrex.the_comms_arena_release_threshold = 9223372036854775807
amrex.the_async_arena_release_threshold = 9223372036854775807
fab.init_snan = 0
DistributionMapping.v = 0
DistributionMapping.verbose = 0
DistributionMapping.efficiency = 0.90000000000000002
DistributionMapping.sfc_threshold = 0
DistributionMapping.node_size = 0
DistributionMapping.verbose_mapper = 0
fab.initval = nan
fab.do_initval = 0
fabarray.maxcomp = 25
amrex.mf.alloc_single_chunk = 0
vismf.v = 0
vismf.headerversion = 1
vismf.groupsets = 0
vismf.setbuf = 1
vismf.usesingleread = 0
vismf.usesinglewrite = 0
vismf.checkfilepositions = 0
vismf.usepersistentifstreams = 0
vismf.usesynchronousreads = 0
vismf.usedynamicsetselection = 1
vismf.iobuffersize = 2097152
vismf.allowsparsewrites = 1
amrex.async_out = 0
amrex.async_out_nfiles = 64
amrex.vector_growth_factor = 1.5
machine.verbose = 0
machine.very_verbose = 0
tiny_profiler.verbose = 0
tiny_profiler.v = 0
tiny_profiler.print_threshold = 1
amrex.use_profiler_syncs = 0
amr.v = 0
amr.n_proper = 1
amr.grid_eff = 0.69999999999999996
amr.refine_grid_layout = 1
amr.refine_grid_layout_x = 1
amr.refine_grid_layout_y = 1
amr.refine_grid_layout_z = 1
amr.check_input = 1
vismf.usesingleread = 1
vismf.usesinglewrite = 1
particles.particles_nfiles = 1024
particles.use_prepost = 0
particles.do_unlink = 1
particles.do_mem_efficient_sort = 1
lattice.reverse = 0
from warpx.
I can see an issue. For the first multigrid solver, the min and max of the rhs are 1.8095128179727603e+17
, and 1.8095128179728016e+17
. Since the problem has all periodic boundaries, the matrix is singular. The solver is having trouble with this singular matrix problem with rhs being almost a constant.
I don't know if this is the issue you are seeing. I only tested it with a much smaller setup on a single CPU core. Could you try with the following change? If the hack works, we can then discuss how to implement a real fix.
--- a/Source/ablastr/fields/PoissonSolver.H
+++ b/Source/ablastr/fields/PoissonSolver.H
@@ -265,8 +265,19 @@ computePhi (amrex::Vector<amrex::MultiFab*> const & rho,
mlmg.setAlwaysUseBNorm(always_use_bnorm);
// Solve Poisson equation at lev
- mlmg.solve( {phi[lev]}, {rho[lev]},
- relative_tolerance, absolute_tolerance );
+ auto rhomin = rho[lev]->min(0);
+ auto rhomax = rho[lev]->max(0);
+ {
+ amrex::Print().SetPrecision(17) << "xxxxx " << rhomin
+ << ", " << rhomax
+ << ", " << (rhomax-rhomin)/rhomin << std::endl;
+ }
+ if (std::abs(rhomax-rhomin) <= 1.e-12_rt * std::abs(rhomax+rhomin)) {
+ phi[lev]->setVal(0.0_rt);
+ } else {
+ mlmg.solve( {phi[lev]}, {rho[lev]},
+ relative_tolerance, absolute_tolerance );
+ }
// needed for solving the levels by levels:
// - coarser level is initial guess for finer level
from warpx.
In this test, there are no initial fields because all particles are uniformly distributed. But particles have ux_th = 0.01
, uy_th = 0.01
and uz_th = 0.01
. The test uses warpx.const_dt = 5e-12
. So in one step, rms velocity will move a particle by 1.5e-5. The domain size is only 4e-5. So some particles are probably way out of bound such that one periodic shift will not bring them back to the initial domain. This will result in out of bound access to arrays, thus the invalid memory access.
Maybe you need to use a smaller const_dt. Maybe you need to change the initial setup. Maybe WarpX needs to implement a way to allow for the simulation to start with a smaller dt and then gradually increase to warpx.const_dt
. @RemiLehe
from warpx.
from warpx.
Looks like reducing the timestep fixed the primary issue, thanks! I didn't notice much of a change by implementing your precision fix, but I have run into problems with this sort of thing before. It's quite common to initialize domains with uniform plasmas which may then become excited by a perturbation. It would be good if the ES solver could handle uniform plasmas gracefully. I noticed that in the first ten iterations of these uniform plasma tests, each timestep took between 2 and 10 seconds, versus 0.4 seconds per step once the simulation had progressed a bit. I suspect this may be related.
Unfortunately, I am still having issues with the simulation not finalizing. It hangs just before outputting the expected "AMReX finalized" when running the ES simulation, but not when running EM.
from warpx.
I cannot reproduce the hang before "amrex finalized". Maybe it's in the python part?
from warpx.
Unfortunately not. Running it with the binaries directly still exhibits this problem on my system. I will try running with CPU only later to see if that is the issue.
from warpx.
We need to figure out where it hangs. If your job is interactive, pressing ctrl-c
might produce backtrace files. If it's a batch job, you need to send a signal to the hanging job to terminal the job and hopefully backtrace files are then produced. If it's slurm, you could use scancel --signal=INT
or when you submit job you do sbatch --signal=INT@300
(where 300 means sending SIGINT 300 seconds before the time limit.
If you are using cmake to build the code, you probably want to build it with RelWithDebInfo
, not Release
so that you can get some debug information into the executable.
I don't understand how python handles signals. Maybe you need to run the executable directly without python in the middle for the signal handling stuff work properly.
from warpx.
OK, I found that reducing the timestep further to 1e-14 seconds fixes all of the problems, seemingly. It might be nice to emit a warning if the user has picked a timestep that is likely to result in problems. I can make a PR to do that, if that seems reasonable.
from warpx.
Related Issues (20)
- CUDA initialization failed HOT 5
- CMake: Finalize WarpX/ABLASTR Installer HOT 2
- CI: CUDA RZ PSATD
- NERSC Perlmutter Compilation Error (pre-#4986) HOT 3
- Error recompiling on Lassen HOT 3
- Some basic AMR questions`1 HOT 2
- Adios2 using Blosc2 HOT 3
- Poor scaling to multiple GPUs with electrostatic solver HOT 5
- PICMI documentation is gone HOT 4
- Issue with Limiting External Electric Field to Specific Boundary in 3D Simulation HOT 8
- Clean code: remove `tmp_particle_data`
- Convergence behavior of electrostatic solver changes whether embedded boundary support is on or off HOT 1
- Cannot get FFT method to work for electrostatic solver HOT 4
- NumPy 2.0 Compatibility HOT 5
- Runtime error with Laser Ion acceleration test run HOT 2
- Clean code: avoid duplication in Source/Parallelization/WarpXComm.cpp
- Segfault converting SOA particles to conduit blueprint
- How to utilize WarpX to investigate the evolution of relativistic electron beams in vacuum HOT 2
- Laser injected at an angle having non-physical effects HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from warpx.