Giter VIP home page Giter VIP logo

Comments (8)

garvct avatar garvct commented on July 26, 2024

The error above seems to be related to accessing your headnode with ssh. (Maybe a glitch with name resolution?)

I would suggest you try either of the following options.

  • login to your headnode (e.g azhpc-connect headnode) and try to manually run the failed script (e.g beegfspkgs.sh) (on headnode , cd az*/install and ./*beegfspkgs.sh

  • Try to re-run the failed install script step (e.g azhpc run_install --step STEP_NUMBER)

from azurehpc.

garvct avatar garvct commented on July 26, 2024

Also, please clarify what OS you are using? I tested on OpenLogic:CentOS-HPC:7.7:7.7.2020062600

Thanks.

from azurehpc.

Smahane avatar Smahane commented on July 26, 2024

@garvct I'm using a base image that i built on top of OpenLogic:CentOS:7_7-gen2:latest

from azurehpc.

Smahane avatar Smahane commented on July 26, 2024

@garvct i created a cluster without the beegfs and installed manually all what i needed on one compute node. But i realized i can't create an image from the compute node since it's considered a "Scale set instance" and not a VM. Do you know if there is a way around that?

from azurehpc.

garvct avatar garvct commented on July 26, 2024

Thanks for clarification regarding the OS version version you are using. I Will run some tests to verify.

I am not aware of a way to create a custom image from a vmss instance. (Sorry). You could create a script containing what you manually installed on a vmss instance and then execute a parallel shell (pdsh or pssh) to run your script on your vmss (Your vmss hostlist is contained in hpcadmin@headnode:~/az*/hostlist/*)

You should be able to use AzureHPC to deploy without BeeGFS (or if you want to install your own BeeGFS scripts you can do so by creating a local scripts directory, copy your scripts to that location and use tags in your config file to indicate where you want your scripts to run.)

from azurehpc.

garvct avatar garvct commented on July 26, 2024

I was able to successfully deploy BeeGFS on OpenLogic:CentOS:7_7-gen2:latest

[2020-11-11 14:30:52] deploying arm template
headnode Microsoft.Compute/virtualMachines OK
headnode_nic Microsoft.Network/networkInterfaces Created
beegfsm Microsoft.Compute/virtualMachines OK
compute Microsoft.Compute/virtualMachineScaleSets OK
beegfssm Microsoft.Compute/virtualMachineScaleSets OK
beegfsm_nic Microsoft.Network/networkInterfaces Created
hpcvnet Microsoft.Network/virtualNetworks OK
headnode_nsg Microsoft.Network/networkSecurityGroups OK
headnode_pip Microsoft.Network/publicIPAddresses OK
[2020-11-11 14:33:36] Provising succeeded
[2020-11-11 14:33:36] re-evaluating the config
[2020-11-11 14:33:36] building host lists
[2020-11-11 14:34:06] building install scripts
[2020-11-11 14:34:40] Step 00 : install_node_setup.sh (jumpbox_script)
[2020-11-11 14:36:08] duration: 88 seconds
[2020-11-11 14:36:35] Step 01 : disable-selinux.sh (jumpbox_script)
[2020-11-11 14:37:02] duration: 27 seconds
[2020-11-11 14:37:29] Step 02 : beegfspkgs.sh (jumpbox_script)
[2020-11-11 14:39:54] duration: 145 seconds
[2020-11-11 14:40:20] Step 03 : beegfsm.sh (jumpbox_script)
[2020-11-11 14:40:54] duration: 34 seconds
[2020-11-11 14:41:20] Step 04 : beegfssd.sh (jumpbox_script)
[2020-11-11 14:42:19] duration: 59 seconds
[2020-11-11 14:42:45] Step 05 : beegfsmd.sh (jumpbox_script)
[2020-11-11 14:43:19] duration: 33 seconds
[2020-11-11 14:43:45] Step 06 : beegfsc.sh (jumpbox_script)
[2020-11-11 14:45:24] duration: 99 seconds
[2020-11-11 14:45:50] Step 07 : cndefault.sh (jumpbox_script)
[2020-11-11 14:46:25] duration: 35 seconds
[2020-11-11 14:46:51] Step 08 : create_raid0.sh (jumpbox_script)
[2020-11-11 14:47:38] duration: 47 seconds
[2020-11-11 14:48:04] Step 09 : make_filesystem.sh (jumpbox_script)
[2020-11-11 14:49:45] duration: 101 seconds
[2020-11-11 14:50:14] Step 10 : install-nfsserver.sh (jumpbox_script)
[2020-11-11 14:50:46] duration: 32 seconds
[2020-11-11 14:51:13] Step 11 : nfsclient.sh (jumpbox_script)
[2020-11-11 14:51:53] duration: 41 seconds
[2020-11-11 14:52:19] Step 12 : localuser.sh (jumpbox_script)
[2020-11-11 14:52:47] duration: 27 seconds
[2020-11-11 14:53:13] Step 13 : pbsdownload.sh (jumpbox_script)
[2020-11-11 14:53:40] duration: 28 seconds
[2020-11-11 14:54:06] Step 14 : pbsserver.sh (jumpbox_script)
[2020-11-11 14:55:39] duration: 92 seconds
[2020-11-11 14:56:05] Step 15 : pbsclient.sh (jumpbox_script)
[2020-11-11 14:56:38] duration: 33 seconds
linuxuser@Surface_laptop:/beegfs_azurehpc$
linuxuser@Surface_laptop:
/beegfs_azurehpc$
linuxuser@Surface_laptop:/beegfs_azurehpc$
linuxuser@Surface_laptop:
/beegfs_azurehpc$ vi config.json
linuxuser@Surface_laptop:~/beegfs_azurehpc$ azhpc-connect headnode
[2020-11-11 15:38:51] logging directly into headnode6027f3.southcentralus.cloudapp.azure.com
[hpcadmin@headnode ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 16G 0 16G 0% /dev
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 16G 8.6M 16G 1% /run
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/sda2 30G 2.8G 27G 10% /
/dev/sda1 494M 65M 430M 13% /boot
/dev/sda15 495M 12M 484M 3% /boot/efi
/dev/sdb1 63G 2.1G 58G 4% /mnt/resource
beegfs_nodev 3.5T 134M 3.5T 1% /beegfs
/dev/md10 4.0T 35M 4.0T 1% /share
tmpfs 3.2G 0 3.2G 0% /run/user/0
tmpfs 3.2G 0 3.2G 0% /run/user/1000
[hpcadmin@headnode ~]$

from azurehpc.

xpillons avatar xpillons commented on July 26, 2024

@Smahane can you please give an update. Are you still blocked ?

from azurehpc.

Smahane avatar Smahane commented on July 26, 2024

@xpillons yes i still couldn't get it to work with OpenLogic:CentOS-HPC:7_7-gen2:latest
I moved on by creating a cluster with out it.
Thanks,

from azurehpc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.