Comments (8)
The error above seems to be related to accessing your headnode with ssh. (Maybe a glitch with name resolution?)
I would suggest you try either of the following options.
-
login to your headnode (e.g azhpc-connect headnode) and try to manually run the failed script (e.g beegfspkgs.sh) (on headnode , cd az*/install and ./*beegfspkgs.sh
-
Try to re-run the failed install script step (e.g azhpc run_install --step STEP_NUMBER)
from azurehpc.
Also, please clarify what OS you are using? I tested on OpenLogic:CentOS-HPC:7.7:7.7.2020062600
Thanks.
from azurehpc.
@garvct I'm using a base image that i built on top of OpenLogic:CentOS:7_7-gen2:latest
from azurehpc.
@garvct i created a cluster without the beegfs and installed manually all what i needed on one compute node. But i realized i can't create an image from the compute node since it's considered a "Scale set instance" and not a VM. Do you know if there is a way around that?
from azurehpc.
Thanks for clarification regarding the OS version version you are using. I Will run some tests to verify.
I am not aware of a way to create a custom image from a vmss instance. (Sorry). You could create a script containing what you manually installed on a vmss instance and then execute a parallel shell (pdsh or pssh) to run your script on your vmss (Your vmss hostlist is contained in hpcadmin@headnode:~/az*/hostlist/*)
You should be able to use AzureHPC to deploy without BeeGFS (or if you want to install your own BeeGFS scripts you can do so by creating a local scripts directory, copy your scripts to that location and use tags in your config file to indicate where you want your scripts to run.)
from azurehpc.
I was able to successfully deploy BeeGFS on OpenLogic:CentOS:7_7-gen2:latest
[2020-11-11 14:30:52] deploying arm template
headnode Microsoft.Compute/virtualMachines OK
headnode_nic Microsoft.Network/networkInterfaces Created
beegfsm Microsoft.Compute/virtualMachines OK
compute Microsoft.Compute/virtualMachineScaleSets OK
beegfssm Microsoft.Compute/virtualMachineScaleSets OK
beegfsm_nic Microsoft.Network/networkInterfaces Created
hpcvnet Microsoft.Network/virtualNetworks OK
headnode_nsg Microsoft.Network/networkSecurityGroups OK
headnode_pip Microsoft.Network/publicIPAddresses OK
[2020-11-11 14:33:36] Provising succeeded
[2020-11-11 14:33:36] re-evaluating the config
[2020-11-11 14:33:36] building host lists
[2020-11-11 14:34:06] building install scripts
[2020-11-11 14:34:40] Step 00 : install_node_setup.sh (jumpbox_script)
[2020-11-11 14:36:08] duration: 88 seconds
[2020-11-11 14:36:35] Step 01 : disable-selinux.sh (jumpbox_script)
[2020-11-11 14:37:02] duration: 27 seconds
[2020-11-11 14:37:29] Step 02 : beegfspkgs.sh (jumpbox_script)
[2020-11-11 14:39:54] duration: 145 seconds
[2020-11-11 14:40:20] Step 03 : beegfsm.sh (jumpbox_script)
[2020-11-11 14:40:54] duration: 34 seconds
[2020-11-11 14:41:20] Step 04 : beegfssd.sh (jumpbox_script)
[2020-11-11 14:42:19] duration: 59 seconds
[2020-11-11 14:42:45] Step 05 : beegfsmd.sh (jumpbox_script)
[2020-11-11 14:43:19] duration: 33 seconds
[2020-11-11 14:43:45] Step 06 : beegfsc.sh (jumpbox_script)
[2020-11-11 14:45:24] duration: 99 seconds
[2020-11-11 14:45:50] Step 07 : cndefault.sh (jumpbox_script)
[2020-11-11 14:46:25] duration: 35 seconds
[2020-11-11 14:46:51] Step 08 : create_raid0.sh (jumpbox_script)
[2020-11-11 14:47:38] duration: 47 seconds
[2020-11-11 14:48:04] Step 09 : make_filesystem.sh (jumpbox_script)
[2020-11-11 14:49:45] duration: 101 seconds
[2020-11-11 14:50:14] Step 10 : install-nfsserver.sh (jumpbox_script)
[2020-11-11 14:50:46] duration: 32 seconds
[2020-11-11 14:51:13] Step 11 : nfsclient.sh (jumpbox_script)
[2020-11-11 14:51:53] duration: 41 seconds
[2020-11-11 14:52:19] Step 12 : localuser.sh (jumpbox_script)
[2020-11-11 14:52:47] duration: 27 seconds
[2020-11-11 14:53:13] Step 13 : pbsdownload.sh (jumpbox_script)
[2020-11-11 14:53:40] duration: 28 seconds
[2020-11-11 14:54:06] Step 14 : pbsserver.sh (jumpbox_script)
[2020-11-11 14:55:39] duration: 92 seconds
[2020-11-11 14:56:05] Step 15 : pbsclient.sh (jumpbox_script)
[2020-11-11 14:56:38] duration: 33 seconds
linuxuser@Surface_laptop:/beegfs_azurehpc$/beegfs_azurehpc$
linuxuser@Surface_laptop:
linuxuser@Surface_laptop:/beegfs_azurehpc$/beegfs_azurehpc$ vi config.json
linuxuser@Surface_laptop:
linuxuser@Surface_laptop:~/beegfs_azurehpc$ azhpc-connect headnode
[2020-11-11 15:38:51] logging directly into headnode6027f3.southcentralus.cloudapp.azure.com
[hpcadmin@headnode ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 16G 0 16G 0% /dev
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 16G 8.6M 16G 1% /run
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/sda2 30G 2.8G 27G 10% /
/dev/sda1 494M 65M 430M 13% /boot
/dev/sda15 495M 12M 484M 3% /boot/efi
/dev/sdb1 63G 2.1G 58G 4% /mnt/resource
beegfs_nodev 3.5T 134M 3.5T 1% /beegfs
/dev/md10 4.0T 35M 4.0T 1% /share
tmpfs 3.2G 0 3.2G 0% /run/user/0
tmpfs 3.2G 0 3.2G 0% /run/user/1000
[hpcadmin@headnode ~]$
from azurehpc.
@Smahane can you please give an update. Are you still blocked ?
from azurehpc.
@xpillons yes i still couldn't get it to work with OpenLogic:CentOS-HPC:7_7-gen2:latest
I moved on by creating a cluster with out it.
Thanks,
from azurehpc.
Related Issues (20)
- [bug] Pipeline image creation failed - BuildCluster Gen#1, BuildCluster Gen#2
- [bug] Slum_autoscale pipeline failed with headnode connnection refused. HOT 1
- [bug] cc_anf pipeline failed with provisioning failed (InternalServerError) HOT 1
- Unable to create a cluster out of an HPC Image derived from a VHD - package epel-release is not installed epel-release-7-11.noarch HOT 3
- Support OpenPBS 20 HOT 1
- xfs nobarrier is deprecated since kernel 4.13
- [bug] NFS mount fails due issues in nfs.conf HOT 1
- support cyclecloud8 in cc_install.sh HOT 3
- cyclecloud8 config fails on "authorization.check_datastore_permissions"' HOT 1
- Using existing resources: RG, Vnet, Jumpbox etc HOT 4
- [slurm version in AutoScale script] HOT 3
- start_gpu_data_collector.sh script failure when tried to excute HOT 1
- [feature] Add the link of this video in the documentation
- gpu_monitoring: Script returns error on Ubuntu 20.04 LTS [bug]
- [feature] specify subscription through config.json?
- This repo is missing important files
- [bug] "Error with `azhpc-scp` command in `apps/wrf/readme.md` : -r flag unrecognized"
- [bug]: Unable to locate a modulefile for 'spack/spack' in `build-wrf.sh`and `build_wps.sh` HOT 1
- [bug] `azhpc-build` script fails but resources are created, leading to unintended charges
- doesn't connect cluster
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from azurehpc.