Giter VIP home page Giter VIP logo

lb-analysis-framework's People

Contributors

bradybray avatar cwschilly avatar dependabot[bot] avatar lifflander avatar marcinwrobel1986 avatar nlslatt avatar pierrepebay avatar ppebay avatar tlamonthezie avatar yaleleenga avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Forkers

cheelee

lb-analysis-framework's Issues

Make sure NodeGossiper does not crash when ParaView is not found

  1. We can safely assume that VTK proper remains a requirement because the VTK graph viz features we need are not part of the ParaView distributed by Kitware
  2. However we can also assume that not everyone will have ParaView on their systems and in this case the NodeGossiper should still run. It should just not generate the ParaView visualizations.

Add modeling between LB stats input and strategy evaluation

The vt runtime has a layer of load modeling between the raw instrumented data about each object's workload and the values used in the LB strategy implementations.

We expect this to become more critical with the strongly disparate subphase structure of execution and the load imbalances therein in EMPIRE. The load models are being used for computing a scalar load value to feed into the strategies from the vector of per-subphase loads, and we want to be able to experiment with how this ought to be done.

Implement LBS reader for VT traces

The goal of this issue is to add a reader to the LBS (in its IO directory), that will be able to ingest VT traces and populate an initial lbsEpoch with these.

Current capability is limited to populating the initial lbsEpoch with pseudo-random sources of objects and processor assignments (uniform or log-normal).

Define measure of persistence

The main goal of this issue is to determine when it the "persistence" assumption (needed for statistically-based distributed LB) is satisfied so such LB can be efficiently performed

Outputting zoomed in PNG

python ./src/Applications/NodeGossiper.py -o 128 -x 8 -y 8 -z 1 -t uniform,1.0,10.0 -w uniform,1.0,10.0 -k 5 -f 2 -i 3 -p 10 -c 2 -d 3 -e

diff --git a/src/Applications/AnimationViewer.py b/src/Applications/AnimationViewer.py
index ba0dd03..3860279 100644
--- a/src/Applications/AnimationViewer.py
+++ b/src/Applications/AnimationViewer.py
@@ -50,7 +50,7 @@ class AnimationViewer(ParaviewViewer):
         super(AnimationViewer, self).__init__(exodus, file_name, viewer_type)

     ###########################################################################
-    def saveView(self, reader):
+    def saveView(self, reader, view):
         """Save animation
         """

@@ -67,11 +67,14 @@ class AnimationViewer(ParaviewViewer):
             + "[AnimationViewer] "
             + bcolors.END
             + "###  Generating AVI animation...")
-        pv.WriteAnimation(self.file_name+".avi",
-                       Magnification=1,
-                       Quality = 2,
-                       FrameRate=1.0,
-                       Compression=True)
+        filename = "{}.avi".format(self.file_name)
+        pv.SaveAnimation(filename)
+        # pv.WriteAnimation(self.file_name+".avi",
+        #                Magnification=1,
+        #                Quality = 2,
+        #                viewOrLayout=view,
+        #                FrameRate=1.0,
+        #                Compression=True)

True e 0 000000
True e 1 000000
True e 2 000000
True e 3 000000

Update all local information on object migration

When an object is migrated from a sending processor to a receiving one, the former should update all its information about known underloaded potential targets and their respective (under-loads). This shall drastically improve picking especially for the case using cached loads.

Fix incorrect empirical CMF computation under new criterion

Current code may result in negative CMF values as underloaded ranks that become overloaded with the improved transfer criterion are not removed from the list of possible targets.

We may want to keep them however so they can still offer useful targets with criterion "6 prime". But in that case the CMF computation becomes incorrect.

Thanks @nlslatt for the catch!

compare relative effects of number of rounds vs number of iterations

Time to solution is almost identical in both cases (~52s) and iteration indices were renormalized on this basis. Here we can see that, ceteris paribus, 10 LB iterations with 2 gossip rounds each yield a better outcome than 2 LB iterations with 10 gossiping rounds each:

nodegossiper-n4096-p16-o10000-uniform-f6-t1_0--i10-k2-vs-i2-k10--0s-52s

nodegossiper-n4096-p16-o10000-uniform-f6-t1_0--i10-k2-vs-i2-k10--25s-52s

Modify MoveCountsViewer to take parameters as command line arguments

The goal of this issue is to replace all hard-coded parameter settings in this utility, such as:

# Number of processors
n_p = 8

or

file_name = "NodeGossiper-n8-lstats-i5-k4-f4-t1_0.0.{}.vom".format(i)

with command-line arguments (e.g., -i <input-VOM-prefix> -p <number-of-processors>).

Create a new command-line flag to specify the suffix/extension of vt trace files

Currently we only let the user specify the prefix, and we assume that the extension is always ".vom". However vt outputs ".out" stats, which forces us to do file manipulation prior to running LBAF on those.

In order to further automate the process and therefore LBAF understand one additional and optional flag like -e <extension> with a default at '' (because there could also be no extension at all". Note that in this setting we would need to pass, e.g., -e ".out" and not just -e out.

NB: this means that the current -e (for exodus outputs) must be changed to something else: I suggest -m for (Mesh outputs).

Setup CI using GitHub Actions for analysis framework

We should set up some docker files to build containers for testing. Then we can launch those containers in GitHub Actions.

There are several levels of testing to accomplish:

  • Obtain all deps (VTK, etc). for python scripts
  • Run LB simulator with inputs and check it runs to end without assertion failures
  • Verify simulator correctness with input decks evaluating the quality of distributions produced by LB

Fix import bug

Step to reproduce:

$ python NodeGossiper.py -l ../../data/dev210112TS4-gossiptrials-printstatslboff-100n4-gossip-full-stats-0/stats  -x 20 -y 20 -z 1 -s 1 -k 2 -f 400 -i 8 -c 1

Traceback (most recent call last):
File "NodeGossiper.py", line 55, in
from src.Model.lbsPhase import Phase
ModuleNotFoundError: No module named 'src'

Odd NodeGossiper behavior when VTK missing

When trying to run NodeGossiper without VTK, the following error message is displayed:

*  ERROR: Could not write to ExodusII file by lack of VTK

But still tries to get further making LBAF crash:

Traceback (most recent call last):
  File "NodeGossiper.py", line 580, in <module>
    params.verbose)
  File "C:\dev\git\LBAF\src\IO\lbsWriterExodusII.py", line 118, in write
    n_p = len(self.phase.processors)
AttributeError: WriterExodusII instance has no attribute 'phase'

Such case should be properly handled by avoiding crashing.

Write efficient comm-aware criterion and hybrid load/comm optimizer

The goal of this issue is two-fold:
(1) replace the naive, first implementation of a communication-only criterion (StrictLocalizer) with one that allows for the transfer of locally-communication objects iff this results in better locality on the target processor;
(2) extend the main optimizer loop logic to take into account communication costs (and not only loads)

@lifflander @nlslatt

Improve statistics reporting

This is generic statistics improvement thread.

Several sub-issues can be formulated, in particular:
. the clear delineation of what pertains to statistics from what does not
. addition of graph statistics
. use of theoretical results as comparison baselines
. other improvements/additions

Abstract out high-level LB algorithm

We want to be able to implement any LB algorithm (centralized, hierarchical, distributed, etc.) with LBAF. So we want to abstract out the actual LB algorithm but keep all the utilities.

Backwards propagate underload information

The goal of this issue is to return to all underloaded processors, at the end of the gossiping phase, the complete information as to how many overloaded ones are aware of them.

We will refer to such processors as the "overloaded_viewers" of a given underloaded one.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.