Giter VIP home page Giter VIP logo

Comments (5)

ssenator avatar ssenator commented on August 17, 2024

transcription of timing data...

some timings: Vagrant 2.2.9, Virtualbox 6.1.6, underlying platform; Fedora30: 5.6.13-100
VirtualBox VMs file system: xfs over luks, underlying SSD/nvme
tarballs file system: xfs, underlying SSD/nvme

vcfs: 9m, ingest/createrepo repos.tgz, yum install/timeouts, selinux homedirs
vcsvc: 3m, yum install/timeouts, rsyslog selinux
vcbuild: 11m, yum (epel/priorities) install/timeouts, slurm build - if triggered, prereq timeouts
vcdb: 7m, yum (epel/priorities) install/timeouts, mysql (community mysql repo?) install, prereq timeouts
vcaltdb: 9m, yum (epel/priorities?) install/timeouts, mysql (community mysql repo?) install, prereq timeouts
vcsched: 6m, yum (epel/priorities?) install/timeouts, prereq timeouts
vc1: 13m, yum (epel/priorities?) install/timeouts, prereq timeouts
vc2: 4m, yum (epel/priorities?) install timeouts, prereq timeouts
vclogin: 22m, slurm test jobs, slurm db/config, yum (epel/priorities?) install timeouts
vxsched: 3m, yum install (epel/priorities?) timeouts
vx1: 5m, yum install (epel/priorities?) timeouts
vx2: 4m, yum install (epel/priorities?) timeouts
vxlogin: 10m, slurm test jobs, slurm db/config, yum (epel/priorities?) install timetouts

muon/centos underlying similar, slower: VirtualBox VMs & tarballs: zfs, raidz1 over HDD

So, first-cut:

  1. cache more from epel into local repositories, attempting to remove it from most/all nodes,
  2. find another mechanism besides yum-plugin-prorities to force local repo first, then remote fallback,
  3. tune prereq timeouts
  4. all else, such as vclogin, slurm-test-jobs, selinux

from hpc-collab.

ssenator avatar ssenator commented on August 17, 2024

VMtouch, to lock memory pages in the host: https://github.com/hoytech/vmtouch
User-level NFS server, replacing vboxsf

preliminary timings don't indicate much of a gain, implying the problem isn't host I/O, but rather host to VM I/O or in-VM I/O.
Cpu load is not high when this is occurring, implying that it is not trashing on the host system. (in either the host or guests)

from hpc-collab.

ssenator avatar ssenator commented on August 17, 2024

Contacting the epel repository is a source of delay in node provisioning. This is a wait for network I/O, rather than any real productive work. This may be best addressed by a snapshot-checkpoint-suspend/resume cycle.

This has been addressed by careful caching of RPMs to avoid epel and yum search timeouts, especially in the early RPM installation step.

from hpc-collab.

ssenator avatar ssenator commented on August 17, 2024

compute nodes may be unprovisioned and reprovisioned safely, separate from other nodes.

from hpc-collab.

ssenator avatar ssenator commented on August 17, 2024

estale monitoring code added, allowing most nodes (except vcdb) to be reprovisioned. In particular, vcfs may disappear and reappear, much like a traditional NFS server

from hpc-collab.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.