Giter VIP home page Giter VIP logo

snap's Introduction

SNAP: SN (Discrete Ordinates) Application Proxy

Description

SNAP serves as a proxy application to model the performance of a modern discrete ordinates neutral particle transport application. SNAP may be considered an update to Sweep3D, intended for hybrid computing architectures. It is modeled off the Los Alamos National Laboratory code PARTISN. PARTISN solves the linear Boltzmann transport equation (TE), a governing equation for determining the number of neutral particles (e.g., neutrons and gamma rays) in a multi-dimensional phase space. SNAP itself is not a particle transport application; SNAP incorporates no actual physics in its available data, nor does it use numerical operators specifically designed for particle transport. Rather, SNAP mimics the computational workload, memory requirements, and communication patterns of PARTISN. The equation it solves has been composed to use the same number of operations, use the same data layout, and load elements of the arrays in approximately the same order. Although the equation SNAP solves looks similar to the TE, it has no real world relevance.

The solution to the time-dependent TE is a "flux" function of seven independent variables: three spatial (3-D spatial mesh), two angular (set of discrete ordinates, directions in which particles travel), one energy (particle speeds binned into "groups"), and one temporal. PARTISN, and therefore SNAP, uses domain decomposition over these dimensions to coherently distribute the data and the tasks associated with solving the equation. The parallelization strategy is expected to be the most efficient compromise between computing resources and the iterative strategy necessary to converge the flux.

The iterative strategy is comprised of a set of two nested loops. These nested loops are performed for each step of a time-dependent calculation, wherein any particular time step requires information from the preceding one. No parallelization is performed over the temporal domain. However, for time-dependent calculations two copies of the unknown flux must be stored, each copy an array of the six remaining dimensions. The outer iterative loop involves solving for the flux over the energy domain with updated information about coupling among the energy groups. Typical calculations require tens to hundreds of groups, making the energy domain suitable for threading with the node's (or nodes') provided accelerator. The inner loop involves sweeping across the entire spatial mesh along each discrete direction of the angular domain. The spatial mesh may be immensely large. Therefore, SNAP spatially decomposes the problem across nodes and communicates needed information according to the KBA method. KBA is a transport-specific application of general parallel wavefront methods. Nested threads, spawned by the energy group threads, are available to use in one of two ways. Per one approach, nested threads may be used to further parallelize the work to sweep different energy groups assigned to a main-level thread. This option is still experimental and has only been implemented to work in the case of using a single MPI process. Alternatively, nested threads are used to perform "mini KBA" sweeps by concurrently operating on cells lying on the same diagonal of spatial sub-domains already decomposed across the distributed memory architecture (i.e., different MPI ranks). Lastly, although KBA efficiency is improved by pipelining operations according to the angle, current chipsets operate best with vectorized operations. During a mesh sweep, SNAP operations are vectorized over angles to take advantage of the modern hardware.

SNAP should be tested with problem sizes that accurately reflect the types of calculations PARTISN frequently handles. The spatial domain shall be decomposed to 2,000--4,000 cells per node (MPI rank). Each node will own all the energy groups and angles for that group of cells; typical calculations feature 10--100 energy groups and as few as 100 to as many as 2,000 angles. Moreover, sufficient memory must be provided to store two full copies of the solution vector for time-dependent calculations. The preceding parameters assume current trends in available per core memory. Significant advances or detriments affecting this assumption shall require reconsideration of appropriate parameters per compute node.

Compilation

SNAP has been written to the Fortran 90/95 standard primarily. The retrieval of command line arguments, which contain file names, is handled with a standard Fortran 2003 intrinsic subroutine. It has been successfully built with, but not necessarily limited to, gfortran and ifort. Moreover, the code has been built with the profiling tool Byfl. The accompanying Makefile provides sample build options for gfortran and ifort. The build system depends on the availability of MPI. Both example builds assume the usage of mpif90 from an MPI installation. Builds may be selected by switching the COMPILER option in the Makefile or choosing one with the "make COMPILER=[]" command. The builds also assume the availability of OpenMP. Compiling SNAP without MPI or OpenMP will require modification to the source code to remove related subroutine calls and directives.

MPI implementations typically suggest using a "wrapper" compiler to compile the code. SNAP has been built and tested with OpenMPI and MPICH. OpenMPI allows one to set the underlying Fortran compiler with the environment variable OMPI_FC, where the variable is set to the (path and) compiler of choice, e.g., ifort, gfortran, etc.

The makefile currently is set up for several build options using different MPI wrappers and Fortran compilers. One example uses:

FORTRAN = mpif90
COMPILER = ifort

and testing has been performed with

OMPI_FC = [path]/ifort

Fortran compilation flags can be set according to the underlying compiler. The current flags are set for the ifort compiler and using OpenMP for parallel threading.

TARGET = isnap
FFLAGS = -03 -[q]openmp -align array32byte -fp-model fast -fp-speculation fast -xHost
FFLAG2 =

where FFLAG2 is reserved for additional flags that may need applied differently, depending on the compiler. To make SNAP with these default settings, simply type

make

on the command line within the SNAP directory.

A debugging version of SNAP can be built by typing

make OPT=no

on the command line. The unoptimized, debugging version of SNAP features bounds checking, back-tracing an error, and the necessary debug compiler flags. With ifort, these flags appear as:

FFLAGS = -O0 -[q]openmp -g -check bounds -traceback -warn unused
FFLAG2 =

The values for these compilation variables have been modified for various Fortran compilers and the Makefile provides details of what has been used previously. These lines are commented out for clarity at this time and to ensure that changes to the build system are made carefully before attempting to rebuild with a different compiler.

The SNAP directory can be cleaned up of its module and object files if the user desires with:

make clean

This removes all the *.mod and *.o files, as well as *.bc files from Byfl builds. Moreover, it will enforce complete recompilation of all files upon the next instance of make or make OPT=no. Currently, there is no separate directory for the compilation files of separate optimized and unoptimized builds. The user must do a make clean before building the code if the previous build used the opposite command.

Pre-processing has been added for the inclusion/exclusion of MPI and OpenMP. To build without MPI, OpenMP, or both, use the command lines, respectively:

make MPI=no
make OPENMP=no
MAKE MPI=no OPENMP=no

Default make settings will build with MPI and OpenMP included. These options are further available with unpotimized settings, OPT=no.

Lastly, a line count report is generated with:

make count

The line count report excludes blank lines and comments. It counts the number of code lines in all *.f90 files and sums the results. The information is printed to the the Lines file.

Usage

When SNAP is built with MPI, to execute SNAP, use the following command:

mpirun -n [#] [path]/snap [infile] [outfile]

This command will automatically run with the number of threads specified by the input file, which is used to set the number of OpenMP threads, overwriting any environment variable to set the number of threads. Testing has shown that to ensure proper concurrency of work, the above command can be modified to

mpirun -cpus-per-proc [#threads] -np [#procs] [path]/snap [infile] [outfile]

The command line is read for the input/output file names. If one of the names is missing, the code will not execute. Moreover, the output file overwrites any pre-existing files of the same name.

The specific command to invoke a run with MPI and the corresponding options may be dependent on the specific machine used to execute. Most notably, the "aprun" command is used on Cray systems.

Sample Input

The following is a sample input of a SNAP job. Several other examples are provided as part of the small set of regression testing. For more information about the valid range of values and descriptions of the input variables, please see the user manual.

! Input from namelist
&invar
  npey=2
  npez=2
  ichunk=2
  nthreads=2
  nnested=1
  ndimen=3
  nx=6
  lx=0.6
  ny=6
  ly=0.6
  nz=6
  lz=0.6
  nmom=1
  nang=10
  ng=4
  epsi=1.0E-4
  iitm=5
  oitm=30
  timedep=0
  tf=1.0
  nsteps=1
  mat_opt=0
  src_opt=0
  scatp=0
  it_det=0
  fluxp=0
  fixup=1
  soloutp=1
  kplane=0
  popout=0
  swp_typ=0
/

Sample Output

The following is the corresponding output to the above case. A brief outline of the output file contents is version and run time information, echo of input (or default) values of the namelist variables, echo of relevant parameters after setup, iteration monitor, mid-plane flux output, and the timing summary. Warning and error messages may be printed throughout the output file to alert the user to some problem with the execution. Unlike errors, warnings do not cause program termination.

 SNAP: SN (Discrete Ordinates) Application Proxy
 Version Number..  1.05
 Version Date..  02-19-2015
 Ran on  2-20-2015 at time 10:53:26

********************************************************************************

          keyword Input Echo - Values from input or

default
********************************************************************************

  NML=invar
     npey=     2
     npez=     2
     ichunk=     2
     nthreads=     2
     nnested=   1
     ndimen=  3
     nx=     6
     ny=     6
     nz=     6
     lx=  6.0000E-01
     ly=  6.0000E-01
     lz=  6.0000E-01
     nmom=   1
     nang=   10
     ng=    4
     mat_opt=  0
     src_opt=  0
     scatp=  0
     epsi=  1.0000E-04
     iitm=   5
     oitm=   30
     timedep=  0
     tf=  1.0000E+00
     nsteps=     1
     swp_typ=  0
     it_det=  0
     soloutp=  1
     kplane=    0
     popout=  0
     fluxp=  0
     fixup=  1

********************************************************************************

          keyword Calculation Run-time Parameters Echo
********************************************************************************

  Geometry
    ndimen = 3
    nx =     6
    ny =     6
    nz =     6
    lx =  6.0000E-01
    ly =  6.0000E-01
    lz =  6.0000E-01
    dx =  1.0000E-01
    dy =  1.0000E-01
    dz =  1.0000E-01

  Sn
    nmom = 1
    nang =   10
    noct = 8

    w =  1.2500E-02   ... uniform weights

          mu              eta               xi
     5.00000000E-02   9.50000000E-01   3.08220700E-01
     1.50000000E-01   8.50000000E-01   5.04975247E-01
     2.50000000E-01   7.50000000E-01   6.12372436E-01
     3.50000000E-01   6.50000000E-01   6.74536878E-01
     4.50000000E-01   5.50000000E-01   7.03562364E-01
     5.50000000E-01   4.50000000E-01   7.03562364E-01
     6.50000000E-01   3.50000000E-01   6.74536878E-01
     7.50000000E-01   2.50000000E-01   6.12372436E-01
     8.50000000E-01   1.50000000E-01   5.04975247E-01
     9.50000000E-01   5.00000000E-02   3.08220700E-01

  Material Map
    mat_opt = 0   -->   nmat = 1
    Base material (default for every cell) = 1

  Source Map
    src_opt = 0
    Source strength per cell (where applied) = 1.0
    Source map:
        Starting cell: (     1,     1,     1 )
        Ending cell:   (     6,     6,     6 )

  Pseudo Cross Sections Data
    ng =   4

    Material 1
    Group         Total         Absorption      Scattering
       1       1.000000E+00    5.000000E-01    5.000000E-01
       2       1.010000E+00    5.050000E-01    5.050000E-01
       3       1.020000E+00    5.100000E-01    5.100000E-01
       4       1.030000E+00    5.150000E-01    5.150000E-01

  Solution Control Parameters
    epsi =  1.0000E-04
    iitm =   5
    oitm =   30
    timedep = 0
    swp_typ = 0
    it_det = 0
    soloutp = 1
    kplane =    0
    popout = 0
    fluxp = 0
    fixup = 1


  Parallelization Parameters
    npey =     2
    npez =     2
    nthreads =    2

      Thread Support Level
           0 - MPI_THREAD_SINGLE
           1 - MPI_THREAD_FUNNELED
           2 - MPI_THREAD_SERIALIZED
           3 - MPI_THREAD_MULTIPLE
    thread_level =  0

    .FALSE. nested threading
      nnested =    1

    Parallel Computational Efficiency = 0.8889

********************************************************************************

          keyword Iteration Monitor
********************************************************************************
  Outer
    1    Dfmxo= 3.5528E-01    No. Inners=   17
    2    Dfmxo= 1.7376E-01    No. Inners=   14
    3    Dfmxo= 8.6338E-03    No. Inners=    9

  No. Outers=   3    No. Inners=   40

********************************************************************************

          keyword Scalar Flux Solution
********************************************************************************

 ***********************************
  Group=   1   Z Mid-Plane=    4
 ***********************************

     y    x    1      x    2      x    3      x    4      x    5      x    6
     6  1.8403E-01  2.3461E-01  2.4743E-01  2.4743E-01  2.3461E-01  1.8403E-01
     5  2.3461E-01  2.9818E-01  3.1572E-01  3.1572E-01  2.9818E-01  2.3461E-01
     4  2.4743E-01  3.1572E-01  3.3604E-01  3.3604E-01  3.1572E-01  2.4743E-01
     3  2.4743E-01  3.1572E-01  3.3604E-01  3.3604E-01  3.1572E-01  2.4743E-01
     2  2.3461E-01  2.9818E-01  3.1572E-01  3.1572E-01  2.9818E-01  2.3461E-01
     1  1.8403E-01  2.3461E-01  2.4743E-01  2.4743E-01  2.3461E-01  1.8403E-01

********************************************************************************


 ***********************************
  Group=   2   Z Mid-Plane=    4
 ***********************************

 y    x    1      x    2      x    3      x    4      x    5      x    6
 6  1.8434E-01  2.3510E-01  2.4797E-01  2.4797E-01  2.3510E-01  1.8434E-01
 5  2.3510E-01  2.9891E-01  3.1654E-01  3.1654E-01  2.9891E-01  2.3510E-01
 4  2.4797E-01  3.1654E-01  3.3697E-01  3.3697E-01  3.1654E-01  2.4797E-01
 3  2.4797E-01  3.1654E-01  3.3697E-01  3.3697E-01  3.1654E-01  2.4797E-01
 2  2.3510E-01  2.9891E-01  3.1654E-01  3.1654E-01  2.9891E-01  2.3510E-01
 1  1.8434E-01  2.3510E-01  2.4797E-01  2.4797E-01  2.3510E-01  1.8434E-01

********************************************************************************


 ***********************************
  Group=   3   Z Mid-Plane=    4
 ***********************************

     y    x    1      x    2      x    3      x    4      x    5      x    6
     6  1.8990E-01  2.4282E-01  2.5648E-01  2.5648E-01  2.4282E-01  1.8990E-01
     5  2.4282E-01  3.0956E-01  3.2828E-01  3.2828E-01  3.0956E-01  2.4282E-01
     4  2.5648E-01  3.2828E-01  3.4996E-01  3.4996E-01  3.2828E-01  2.5648E-01
     3  2.5648E-01  3.2828E-01  3.4996E-01  3.4996E-01  3.2828E-01  2.5648E-01
     2  2.4282E-01  3.0956E-01  3.2828E-01  3.2828E-01  3.0956E-01  2.4282E-01
     1  1.8990E-01  2.4282E-01  2.5648E-01  2.5648E-01  2.4282E-01  1.8990E-01

********************************************************************************


 ***********************************
  Group=   4   Z Mid-Plane=    4
 ***********************************

     y    x    1      x    2      x    3      x    4      x    5      x    6
     6  2.2018E-01  2.8475E-01  3.0277E-01  3.0277E-01  2.8475E-01  2.2018E-01
     5  2.8475E-01  3.6725E-01  3.9202E-01  3.9202E-01  3.6725E-01  2.8475E-01
     4  3.0277E-01  3.9202E-01  4.2062E-01  4.2062E-01  3.9202E-01  3.0277E-01
     3  3.0277E-01  3.9202E-01  4.2062E-01  4.2062E-01  3.9202E-01  3.0277E-01
     2  2.8475E-01  3.6725E-01  3.9202E-01  3.9202E-01  3.6725E-01  2.8475E-01
     1  2.2018E-01  2.8475E-01  3.0277E-01  3.0277E-01  2.8475E-01  2.2018E-01

********************************************************************************

          keyword Timing Summary
********************************************************************************

  Code Section                          Time (seconds)
 **************                        ****************
    Parallel Setup                       9.7394E-04
    Input                                4.7112E-04
    Setup                                7.1216E-04
    Solve                                7.6568E-03
       Parameter Setup                   2.8968E-04
       Outer Source                      3.0041E-05
       Inner Iterations                  7.0901E-03
          Inner Source                   5.4121E-05
          Transport Sweeps               6.6264E-03
          Inner Misc Ops                 4.0960E-04
       Solution Misc Ops                 2.4700E-04
    Output                               6.4492E-04
  Total Execution time                   3.4652E-02

  Grind Time (nanoseconds)         1.1078E+01

********************************************************************************

Additional outputs in the form of slgg and flux files are available when requested according to the scatp and fluxp input variables, respectively.

License

Los Alamos National Security, LLC owns the copyright to "SNAP: SN (Discrete Ordinates) Application Proxy, Version 1.x (C13087)". The license is BSD with standard clauses regarding indicating modifications before future redistribution:

Copyright (c) 2013, Los Alamos National Security, LLC All rights reserved.

Copyright 2013. Los Alamos National Security, LLC. This software was produced under U.S. Government contract DE-AC52-06NA25396 for Los Alamos National Laboratory (LANL), which is operated by Los Alamos National Security, LLC for the U.S. Department of Energy. The U.S. Government has rights to use, reproduce, and distribute this software. NEITHER THE GOVERNMENT NOR LOS ALAMOS NATIONAL SECURITY, LLC MAKES ANY WARRANTY, EXPRESS OR IMPLIED, OR ASSUMES ANY LIABILITY FOR THE USE OF THIS SOFTWARE. If software is modified to produce derivative works, such modified software should be clearly marked, so as not to confuse it with the version available from LANL.

Additionally, redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  • Neither the name of Los Alamos National Security, LLC, Los Alamos National Laboratory, LANL, the U.S. Government, nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY LOS ALAMOS NATIONAL SECURITY, LLC AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL LOS ALAMOS NATIONAL SECURITY, LLC OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Classification

SNAP is Unclassified and contains no Unclassified Controlled Nuclear Information. It has been assigned Los Alamos Computer Code number LA-CC-13-016.

Authors

Joe Zerr, rzerr _ at _ lanl.gov

Randal Baker, rsb _ at _ lanl.gov

Additional Contacts

Mike Lang, mlang _ at _ lanl.gov

Josip Loncaric, josip _ at _ lanl.gov

Last Modification to this Readme

03/03/2016

snap's People

Contributors

junghans avatar mewall avatar spad12 avatar womeld avatar zerr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

snap's Issues

Fails to compile with `MPI=no`

On line 515 and 516 of thrd_comm.f90, a call to tests is made that includes one argument. However, if -DMPI is not specified, the tests function that gets defined in plib.f90 has two arguments. The result is that, if the user sets MPI = no in the Makefile, SNAP fails to compile.

CALL tests ( reqs(cor) )

Compilation Error

I am trying to run your code and I am unable to get the right libraries in the makefile.

[cc@kubeib-1 src]$ make
mpif90 -Ofast -funroll-loops -march=native -fopenmp -c global.f90
mpif90 -Ofast -funroll-loops -march=native -fopenmp -c sn.f90
mpif90 -Ofast -funroll-loops -march=native -fopenmp -c data.f90
mpif90 -Ofast -funroll-loops -march=native -fopenmp -c geom.f90
mpif90 -Ofast -funroll-loops -march=native -fopenmp -c control.f90
cpp -P -DMPI -DOPENMP time.F90 >time.f90
mpif90 -Ofast -funroll-loops -march=native -fopenmp -c time.f90
cpp -P -DMPI -DOPENMP plib.F90 >plib.f90
mpif90 -Ofast -funroll-loops -march=native -fopenmp -c plib.f90
mpif90 -Ofast -funroll-loops -march=native -fopenmp -c mms.f90
mpif90 -Ofast -funroll-loops -march=native -fopenmp -c solvar.f90
mpif90 -Ofast -funroll-loops -march=native -fopenmp -c dealloc.f90
mpif90 -Ofast -funroll-loops -march=native -fopenmp -c utils.f90
mpif90 -Ofast -funroll-loops -march=native -fopenmp -c version.f90
mpif90 -Ofast -funroll-loops -march=native -fopenmp -c input.f90
mpif90 -Ofast -funroll-loops -march=native -fopenmp -c setup.f90
mpif90 -Ofast -funroll-loops -march=native -fopenmp -c output.f90
mpif90 -Ofast -funroll-loops -march=native -fopenmp -c snap_main.f90
mpif90 -Ofast -funroll-loops -march=native -fopenmp -c expxs.f90
mpif90 -Ofast -funroll-loops -march=native -fopenmp -c thrd_comm.f90
mpif90 -Ofast -funroll-loops -march=native -fopenmp -c dim1_sweep.f90
mpif90 -Ofast -funroll-loops -march=native -fopenmp -c dim3_sweep.f90
mpif90 -Ofast -funroll-loops -march=native -fopenmp -c mkba_sweep.f90
mkba_sweep.f90:161.8:

!$OMP PARALLEL DO NUM_THREADS(nnstd_used) IF(nnstd_used>1) &
1
Error: Unclassifiable OpenMP directive at (1)
mkba_sweep.f90:446.23:

!$OMP END PARALLEL DO
1
Error: Unexpected !$OMP END PARALLEL DO statement at (1)
make: *** [mkba_sweep.o] Error 1

MPI tag can overflow

The MPI tag value can overflow when using Cray MPI:

Rank 65531 [Thu Jun 22 21:21:27 2017] [c4-5c0s13n0] Fatal error in PMPI_Isend: Invalid tag, error stack:
PMPI_Isend(161): MPI_Isend(buf=0x2aad1623efc0, count=3840, MPI_DOUBLE_PRECISION, dest=256, tag=2097153, comm=0x84000006, request=0x2aad35ffe280) failed
PMPI_Isend(108): Invalid tag, value is 2097153
Rank 65273 [Thu Jun 22 21:21:28 2017] [c4-5c0s3n0] Fatal error in MPI_Recv: Invalid tag, error stack:
MPI_Recv(212): MPI_Recv(buf=0x2aad16c7a000, count=3840, MPI_DOUBLE_PRECISION, src=253, tag=2097153, comm=0xc4000000, status=0x2aad2dffc000) failed
MPI_Recv(118): Invalid tag, value is 2097153
Rank 65530 [Thu Jun 22 21:21:27 2017] [c4-5c0s13n0] Fatal error in MPI_Recv: Invalid tag, error stack:
MPI_Recv(212): MPI_Recv(buf=0x2aad15e03f80, count=3840, MPI_DOUBLE_PRECISION, src=257, tag=2097153, comm=0x84000006, status=0x2aad31ffc000) failed
MPI_Recv(118): Invalid tag, value is 2097153
forrtl: error (76): Abort trap signal

The maximum valid tag in cray-mpich/7.4.4 is 2097151 (which is 2^21 - 1). The MPI standard specifies that the tag upper bound must be at least 32767. Ideally the tag value in SNAP should be kept below the value specified by the MPI standard.

This error happened when running the APEX "Grand Challenge" SNAP problem on 8192 nodes of Cori-KNL at NERSC with 65532 MPI ranks (npey=258, npez=254) and 8 OpenMP threads per MPI rank.

Large stack arrays causing segfaults

In both the inner and outer convergence checking, the df array is allocated on the stack. When we build with the Arm 19.2 compiler, and if this array happens to be fairly large (say >1MB), we see a segmentation fault at runtime.

One fix might be to allocate df on the heap instead:

diff --git a/src/inner.f90 b/src/inner.f90
index fa0e8fe..63877a4 100644
--- a/src/inner.f90
+++ b/src/inner.f90
@@ -275,7 +275,8 @@ MODULE inner_module
 
     INTEGER(i_knd) :: n, g
 
-    REAL(r_knd), DIMENSION(nx,ny,nz,ng_per_thrd) :: df
+    REAL(r_knd), ALLOCATABLE, DIMENSION(:,:,:,:) :: df
+    ALLOCATE(df(nx,ny,nz,ng_per_thrd))
 !_______________________________________________________________________
 !
 !   Thread group loops for computing local difference (df) array.
@@ -330,6 +331,8 @@ MODULE inner_module
 !_______________________________________________________________________
 !_______________________________________________________________________
 
+  DEALLOCATE(df)
+
   END SUBROUTINE inner_conv
 
 
diff --git a/src/outer.f90 b/src/outer.f90
index 849cd95..1fc020b 100644
--- a/src/outer.f90
+++ b/src/outer.f90
@@ -299,7 +299,8 @@ MODULE outer_module
 
     REAL(r_knd) :: dft
 
-    REAL(r_knd), DIMENSION(nx,ny,nz,ng_per_thrd) :: df
+    REAL(r_knd), DIMENSION(:,:,:,:), ALLOCATABLE :: df
+    ALLOCATE(df(nx,ny,nz,ng_per_thrd))
 !_______________________________________________________________________
 !
 !   Thread to speed up computation of df by looping over groups. Rejoin
@@ -347,6 +348,8 @@ MODULE outer_module
       otrdone = .TRUE.
 
   !$OMP END MASTER
+
+  DEALLOCATE(df)
 !_______________________________________________________________________
 !_______________________________________________________________________

A better way would be to rewrite the loops as a reduction to the scalar dfmxi(g) to avoid needing this array entirely.

The executable hangs with no MPI

I use the following input file:

! Input from namelist
&invar
  npey=1
  npez=1
  ichunk=2
  nthreads=2
  nnested=1
  ndimen=3
  nx=6
  lx=0.6
  ny=6
  ly=0.6
  nz=6
  lz=0.6
  nmom=1
  nang=10
  ng=4
  epsi=1.0E-4
  iitm=5
  oitm=30
  timedep=0
  tf=1.0
  nsteps=1
  mat_opt=0
  src_opt=0
  scatp=0
  it_det=0
  fluxp=0
  fixup=1
  soloutp=1
  kplane=0
  popout=0
  swp_typ=0
/

and I compile with:

--- a/src/Makefile
+++ b/src/Makefile
@@ -7,13 +7,14 @@ FFLAG2 =
 DEFS =
 PP =
 
-MPI = yes
-OPENMP = yes
+MPI = no
+OPENMP = no
 
-FORTRAN = mpif90
+#FORTRAN = mpif90
 #FORTRAN = mpifort
 #FORTRAN = ftn
 #FORTRAN = mpiifort
+FORTRAN = gfortran
 
 TARGET = gsnap
 #TARGET = isnap

I run it with:

$ ./gsnap in1 out1        
          keyword Iteration Monitor
********************************************************************************
  Outer

it will just hang there, no other output produced, the file out1 is also empty. When I compile with MPI it seems to work, even just using one MPI rank (it finishes under 1s).

Is it supposed to run without MPI?

Run time failures on Cori supercomputer at NERSC

Hello SNAP developers,

I am using the Cori KNL partition and running the version of SNAP at http://www.nersc.gov/research-and-development/apex/apex-benchmarks/snap/. I have compiled SNAP with Intel compiler version 17.0.3.191 and am using the "extra large" benchmark problem. I have encountered failures when using 41,472 and 82,944 MPI ranks.

My job output shows the following:

MPICH2 ERROR [Rank 1683] [job id 10004815419] [Thu May  4 04:50:24 2017] [c6-1c2s11n1] [nid03629] - xpmem_seglist_lookup(): failed lookup for src rank 2

Rank 1683 [Thu May  4 04:50:24 2017] [c6-1c2s11n1] xpmem_seglist_lookup failed

The core file shows that both xpmem_seglist_lookup and PMPI_Recv are in the call path:

(gdb) bt
#0  0x000000002022a8fb in raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37
#1  0x00000000203eb945 in abort () at abort.c:99
#2  0x000000002032aa1e in for.issue_diagnostic ()
#3  0x000000002032e8d4 in for.signal_handler ()
#4  <signal handler called>
#5  0x000000002022a8fb in raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37
#6  0x00000000203eb888 in abort () at abort.c:78
#7  0x00000000200b2f22 in MPID_Abort ()
#8  0x00000000200ca693 in xpmem_seglist_lookup ()
#9  0x00000000200ca776 in do_xpmem_attach ()
#10 0x00000000200cb88d in MPID_nem_lmt_xpmem_start_recv ()
#11 0x00000000200c91a3 in do_cts ()
#12 0x00000000200c9f58 in pkt_RTS_handler ()
#13 0x00000000200c1f2c in MPIDI_CH3I_Progress ()
#14 0x00000000200814f0 in PMPI_Recv ()
#15 0x000000002006deed in pmpi_recv__ ()
#16 0x000000002000c1a6 in plib_module_mp_precv_d_3d_ ()
#17 0x0000000020058ff1 in thrd_comm_module_mp_sweep_recv_bdry_ ()
#18 0x0000000020048de4 in dim3_sweep_module_mp_dim3_sweep_ ()
#19 0x000000002003be4d in octsweep_module_mp_octsweep_ ()
#20 0x000000002003a966 in sweep_module_mp_sweep_ ()
#21 0x00000000200337be in inner_module_mp_inner_ ()
#22 0x000000002002b8f8 in outer_module_mp_outer_ ()
#23 0x000000002002050b in translv_ ()
#24 0x0000000020295573 in __kmp_invoke_microtask ()
#25 0x000000002024fba0 in __kmp_invoke_task_func ()
#26 0x000000002024ee35 in __kmp_launch_thread ()
#27 0x00000000202959c1 in _INTERNAL_26_______src_z_Linux_util_cpp_47afea4b::__kmp_launch_worker(void*) ()
#28 0x0000000020214134 in start_thread (arg=0x2aab163d5800) at pthread_create.c:309
#29 0x0000000020450f69 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

It looks like memory may be getting corrupted. Have you seen this error before? Any suggestions to fix the error?

Thanks,
Chris

Fails to compile with the NAG compiler

nagfor  -c setup.f90
NAG Fortran Compiler Release 7.0(Yurakucho) Build 7026
Error: setup.f90, line 860: Missing comma in format specification
       detected at 'Group         Total         Absorption      '@'Scattering'
Error: setup.f90, line 945: Missing comma in format specification
       detected at 'Column-order loops:'@' Mats (fastest ), Moments, Groups, Groups (slowest)'
[NAG Fortran Compiler pass 1 error termination, 2 errors]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.