Giter VIP home page Giter VIP logo

infinityhub-ci's Introduction

InfinityHub-CI

The purpose of this Repository is to provide a way to build containers similar to what are provided on AMD's Infinity Hub.
Each builds provides parameters to specify different source code branches, release versions of ROCm, OpenMPI, UCX, and different versions Ubuntu to be build in.

Single-Node Server Requirements

CPUs GPUs Operating Systems ROCm™ Driver Container Runtimes
X86_64 CPU(s) AMD Instinct MI300A APU(s)
AMD Instinct MI200 GPU(s)
AMD Instinct MI100 GPU(s)
Ubuntu 20.04
Ubuntu 22.04
RHEL8
RHEL9
SLES 15 sp4
ROCm v5.x compatibility
ROCm v6.x compatibility
Docker Engine
Singularity

For ROCm installation procedures and validation checks, see:

Applications:

Application Builds ROCm Version Domains
AMD ROCm with GPU-Aware MPI Container 6.0
  • Tools
  • Libraries
AMD's implementation of Gromacs with HIP latest
  • Molecular Dynamics
Amber 4.5
  • Molecular Dynamics
BabelStream 5.3
  • Benchmark
Cholla latest
  • Astrophysics
Chroma latest
  • Physics
CP2K latest
  • Electronic Structure
Grid latest
  • Physics
HPCG latest
  • Benchmark
HPL-MxP 5.3
  • Benchmark
Kokkos latest
  • Tools
  • Libraries
LAMMPS latest
  • Molecular Dynamics
MILC latest
  • Physics
Mini-HACC 4.5
  • Astrophysics
MPAS latest
  • Climate
  • Weather
NAMD 4.3/4.5
  • Molecular Dynamics
NEKO latest
  • Computational Fluid Dynamics
NWChem 5.3
  • Computational Chemistry
OpenFOAM 5.7
  • Computational Fluid Dynamics
OpenMM 5.7
  • Molecular Dynamics
PETSc 5.7
  • Tools
  • Libraries
PIconGPU 5.7
  • Physics
PyFR latest
  • Tools
  • Libraries
QUDA latest
  • Computational Chemistry
QMCPACK latest
  • Quantom Monte Carlo Simulation
RAJA latest
  • Tools
  • Libraries
RELION 5.3
  • Electronic Structure
rocHPL latest
  • Benchmark
Specfem3D- Cartesian latest
  • Geophysics
Trilinos latest
  • Tools
  • Libraries

infinityhub-ci's People

Contributors

cmcknigh avatar doscherda avatar gsalinaslopez avatar mexicanwolfhpc avatar ronniec91 avatar seanofthemillers avatar sidamd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

infinityhub-ci's Issues

hpl-mxp: PMIX ERROR when running with singularity

Why is the following command:
singularity run --writable-tmpfs --pwd /benchmark ./hpl-ai.sif mpirun -np 8 --map-by node:PE=1 hpl-ai -P 4 -B 2560 -N 332800
failing with the following error?:
PMIX ERROR: NO-PERMISSIONS in file ../../../../../../../../../../opal/mca/pmix/pmix3x/pmix/src/mca/common/dstore/dstore_base.c at line 237

My Config:
ROCm 5.7
RHEL8
Singularity 4.0.2
4xMI250

rocm_gpu:6.0 is not found

I am trying to build a Docker image for GROMACS, but it seems rocm_gpu:6.0 doesn't exist. What is the correct image name of ROCM v6?
Dockerfile: https://github.com/amd/InfinityHub-CI/blob/main/gromacs/docker/Dockerfile#L1-L3

Docker build error:

$ docker build -t mycontainer/gromacs-hip . 
[+] Building 4.0s (2/2) FINISHED                                                                                           docker:default
 => [internal] load build definition from Dockerfile                                                                                 0.1s
 => => transferring dockerfile: 1.65kB                                                                                               0.0s
 => ERROR [internal] load metadata for docker.io/library/rocm_gpu:6.0                                                                3.6s
------
 > [internal] load metadata for docker.io/library/rocm_gpu:6.0:
------
Dockerfile:3
--------------------
   1 |     ARG IMAGE="rocm_gpu:6.0"
   2 |     
   3 | >>> FROM ${IMAGE}
   4 |     
   5 |     ARG GROMACS_BRANCH="develop_2023_amd_sprint_rocm6"
--------------------
ERROR: failed to solve: rocm_gpu:6.0: pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed
$

Test

Testing to make sure the config is setup properly and we all get emails

Tensorflow container lacks some Python modules

I have been trying to run a simple Tensorflow + Horovod training task using the rocm-tensorflow-rocm5.5-tf2.11-dev tensorflow container. Unfortunately, I find it lacks several Python modules. For instance,

$ python3 01_horovod_mnist.py 
2023-08-10 13:12:18.690627: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
  File "/scratch/pawsey0001/cdipietrantonio/cdipietrantonio-machinelearning/models/01_horovod_mnist.py", line 34, in <module>
    import horovod.tensorflow as hvd
  File "/usr/local/lib/python3.9/dist-packages/horovod-0.27.0-py3.9-linux-x86_64.egg/horovod/tensorflow/__init__.py", line 27, in <module>
    from horovod.tensorflow import elastic
  File "/usr/local/lib/python3.9/dist-packages/horovod-0.27.0-py3.9-linux-x86_64.egg/horovod/tensorflow/elastic.py", line 22, in <module>
    from horovod.common.elastic import run_fn, ObjectState
  File "/usr/local/lib/python3.9/dist-packages/horovod-0.27.0-py3.9-linux-x86_64.egg/horovod/common/elastic.py", line 20, in <module>
    from horovod.runner.elastic.worker import HostUpdateResult, WorkerNotificationManager
  File "/usr/local/lib/python3.9/dist-packages/horovod-0.27.0-py3.9-linux-x86_64.egg/horovod/runner/elastic/worker.py", line 21, in <module>
    from horovod.runner.common.util import network, secret
  File "/usr/local/lib/python3.9/dist-packages/horovod-0.27.0-py3.9-linux-x86_64.egg/horovod/runner/common/util/network.py", line 16, in <module>
    import psutil
ModuleNotFoundError: No module named 'psutil'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.