Giter VIP home page Giter VIP logo

Comments (8)

gassmoeller avatar gassmoeller commented on June 20, 2024

Hi @RobbieTheK, sorry for being slow to respond. Yes, both issues seem to be related to the interaction between the images and the system you are running the images on. In order to say more I would need to know more about the system you are using, but here are some general information about the images:

  • The first error when you used geodynamics/aspect:latest-tacc looks like an MPI error. It looks like you are not using one of the TACC systems to run this, is this correct? In this case the specialized MPI inside this container is likely just not working with your system (e.g. it might try to communicate in a way that is not supported by your cluster).
  • The second image: geodynamics/aspect:latest uses an unmodified OpenMPI that should work on most normal systems. It is not optimized for Infiniband or other high-speed interconnects. The error you get in the second part of your message looks like you are running this on some cluster that uses SLURM? Are you sure that you have set up your job script correctly that you can start a program successfully? Try if you can run something like echo in parallel, e.g. as in mpirun -np 4 echo hello, if this does not succeed (i.e. you dont see 4 output lines displaying hello) then your problem is not related to ASPECT or its docker image, but to how you set up your job on the cluster.

from aspect.

SomePersonSomeWhereInTheWorld avatar SomePersonSomeWhereInTheWorld commented on June 20, 2024

Re: 1st error, correct not a TACC system was just trying to see if it would work.

This is a Bright Computing 9.1 cluster running RHEL 8 with Slurm 20, openmpi/gcc/64/4.1.5a1

I'm using an interactive srun job -c4 -n4 as options.

mpirun -np 4 echo hello hello
hello
hello
hello

Same error:

singularity run aspect.sif aspect-release slab_detachment.prm
[g225:3682368] OPAL ERROR: Unreachable in file ext3x_client.c at line 112
--------------------------------------------------------------------------
The application appears to have been direct launched using "srun",
but OMPI was not built with SLURM's PMI support and therefore cannot
execute. There are several options for building PMI support under
SLURM, depending upon the SLURM version you are using:

  version 16.05 or later: you can use SLURM's PMIx support. This
  requires that you configure and build SLURM --with-pmix.

  Versions earlier than 16.05: you must use either SLURM's PMI-1 or
  PMI-2 support. SLURM builds PMI-1 by default, or you can manually
  install PMI-2. You must then build Open MPI using --with-pmi pointing
  to the SLURM PMI library location.

Please configure as appropriate and try again.
--------------------------------------------------------------------------
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[g225:3682368] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!

from aspect.

gassmoeller avatar gassmoeller commented on June 20, 2024

I'm using an interactive srun job -c4 -n4 as options.

I have not tried using srun directly on this image. Can you instead set up a batch script that you run with sbatch or run the command interactively on a development node? The message seems to say that you need to compile MPI in a specific way to use srun and our docker image was not set up that way.

from aspect.

bangerth avatar bangerth commented on June 20, 2024

@RobbieTheK Can I assume that using a batch script solved the problem?

from aspect.

SomePersonSomeWhereInTheWorld avatar SomePersonSomeWhereInTheWorld commented on June 20, 2024

No I ended up building and compiling candi and deal II. I'd be happy to try again with sbatch but I mentioned I did try with srun to no avail.

from aspect.

bangerth avatar bangerth commented on June 20, 2024

What I meant to ask is whether you found a way to make it work for you?

from aspect.

SomePersonSomeWhereInTheWorld avatar SomePersonSomeWhereInTheWorld commented on June 20, 2024

from aspect.

bangerth avatar bangerth commented on June 20, 2024

I don't know that I have anything to offer. I don't know much about singularity (or containers in general) and I don't work on the TACC machines. I'm also not sure we have the resources as a project to really figure this out.

@gassmoeller @tjhei Do you have anything to offer? Or should we just say "We'd love to provide this, but we can't" and close the issue?

from aspect.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.