Comments (9)
If you use the Azure Marketplace image CentOS-HPC 7.6 or 7.7 (preferred), it will come with Mellanox OFED drivers installed and a number of MPI libraries will be preinstalled (openmpi, hpcx, intel mpi and mvapich2). This works well with HPC VM's like HB, HC and HBv2 (all support InfiniBand via SRIOV). You can access the different mpi libraries with module commands (use "module av" to see what mpi libraries are available).
from azurehpc.
I'm using CentOS-HPC 7.7 however I don't see an ib0 interface. What do I miss here?
from azurehpc.
It would be interesting to see if the device shows up. Are you able to get output from running 'lspci |grep -i Mell' ? I assume already that 'ifconfig' does not show the ib0. This may help determine if it's a software issue.
from azurehpc.
from azurehpc.
This is HB60rs:
]# lspci
0000:00:00.0 Host bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX Host bridge (AGP disabled) (rev 03)
0000:00:07.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 01)
0000:00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01)
0000:00:07.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 02)
0000:00:08.0 VGA compatible controller: Microsoft Corporation Hyper-V virtual VGA
0750:00:02.0 Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
from azurehpc.
So the InfiniBand card is present. Does 'lsmod |grep ib' show the InfiniBand modules are loaded?
from azurehpc.
This is on Standard_NC24rs_v3:
0000:00:00.0 Host bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX Host bridge (AGP disabled) (rev 03)
0000:00:07.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 01)
0000:00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01)
0000:00:07.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 02)
0000:00:08.0 VGA compatible controller: Microsoft Corporation Hyper-V virtual VGA
0001:00:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB] (rev a1)
0002:00:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB] (rev a1)
0003:00:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB] (rev a1)
0004:00:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB] (rev a1)
7cb1:00:02.0 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
lsmod:
# lsmod|grep -i ib|sort
devlink 60067 2 mlx4_ib,mlx4_core
ib_core 255469 2 mlx4_ib,ib_uverbs
ib_uverbs 102208 1 mlx4_ib
libata 243133 3 pata_acpi,ata_generic,ata_piix
libcrc32c 12644 2 xfs,nf_conntrack
mlx4_core 315896 1 mlx4_ib
mlx4_ib 179001 0
from azurehpc.
I switched to HC44rs sku and ib0 has an IP address.
from azurehpc.
Closing this issue now although please reopen if you still have any problems
from azurehpc.
Related Issues (20)
- Tag resource group with the pipeline name HOT 1
- [bug] Pipeline image creation failed - BuildCluster Gen#1, BuildCluster Gen#2
- [bug] Slum_autoscale pipeline failed with headnode connnection refused. HOT 1
- [bug] cc_anf pipeline failed with provisioning failed (InternalServerError) HOT 1
- Unable to create a cluster out of an HPC Image derived from a VHD - package epel-release is not installed epel-release-7-11.noarch HOT 3
- Support OpenPBS 20 HOT 1
- xfs nobarrier is deprecated since kernel 4.13
- [bug] NFS mount fails due issues in nfs.conf HOT 1
- support cyclecloud8 in cc_install.sh HOT 3
- cyclecloud8 config fails on "authorization.check_datastore_permissions"' HOT 1
- Using existing resources: RG, Vnet, Jumpbox etc HOT 4
- [slurm version in AutoScale script] HOT 3
- start_gpu_data_collector.sh script failure when tried to excute HOT 1
- [feature] Add the link of this video in the documentation
- gpu_monitoring: Script returns error on Ubuntu 20.04 LTS [bug]
- [feature] specify subscription through config.json?
- This repo is missing important files
- [bug] "Error with `azhpc-scp` command in `apps/wrf/readme.md` : -r flag unrecognized"
- [bug]: Unable to locate a modulefile for 'spack/spack' in `build-wrf.sh`and `build_wps.sh` HOT 1
- [bug] `azhpc-build` script fails but resources are created, leading to unintended charges
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from azurehpc.