darma-tasking / lb-analysis-framework Goto Github PK
View Code? Open in Web Editor NEWAnalysis framework for exploring, testing, and comparing load balancing strategies
License: Other
Analysis framework for exploring, testing, and comparing load balancing strategies
License: Other
This follows #4 in particular.
@lifflander
Currently if PV is not installed on the system the NodeGossiper fails.
Switch to Paraview-embedded VTK library.
Protect Paraview import against non-availability.
This was not explicitly disallowed in the original paper, but we think self-informing is a waste of time.
At least lbsLoadReaderVT
but maybe others too (to be verified).
It seems that there are some backwards compatibility problems
This will result in removal of lbsCriterion
helper class and its replacement by a call on the factory method on the base class in the lbsRuntime
.
Currently this reader skips communication lines in VT traces
The vt runtime has a layer of load modeling between the raw instrumented data about each object's workload and the values used in the LB strategy implementations.
We expect this to become more critical with the strongly disparate subphase structure of execution and the load imbalances therein in EMPIRE. The load models are being used for computing a scalar load value to feed into the strategies from the vector of per-subphase loads, and we want to be able to experiment with how this ought to be done.
The goal of this issue is to add a reader to the LBS (in its IO directory), that will be able to ingest VT traces and populate an initial lbsEpoch with these.
Current capability is limited to populating the initial lbsEpoch with pseudo-random sources of objects and processor assignments (uniform or log-normal).
[Statistics] Descriptive statistics of communication weights:
cardinality: 0 sum: nan imbalance: nan
minimum: nan mean: nan maximum: nan
standard deviation: nan variance: nan
skewness: nan kurtosis excess: nan
Compare to existing LBs (HierarchicalLB, GreedyLB) in VT
The main goal of this issue is to determine when it the "persistence" assumption (needed for statistically-based distributed LB) is satisfied so such LB can be efficiently performed
See DARMA-tasking/vt#708 for details of why this is desired, and the code that produces the stats files containing per-subphase timing data.
This is to be done AFTER the subsequent commits have been moved to a WIP branch
python ./src/Applications/NodeGossiper.py -o 128 -x 8 -y 8 -z 1 -t uniform,1.0,10.0 -w uniform,1.0,10.0 -k 5 -f 2 -i 3 -p 10 -c 2 -d 3 -e
diff --git a/src/Applications/AnimationViewer.py b/src/Applications/AnimationViewer.py
index ba0dd03..3860279 100644
--- a/src/Applications/AnimationViewer.py
+++ b/src/Applications/AnimationViewer.py
@@ -50,7 +50,7 @@ class AnimationViewer(ParaviewViewer):
super(AnimationViewer, self).__init__(exodus, file_name, viewer_type)
###########################################################################
- def saveView(self, reader):
+ def saveView(self, reader, view):
"""Save animation
"""
@@ -67,11 +67,14 @@ class AnimationViewer(ParaviewViewer):
+ "[AnimationViewer] "
+ bcolors.END
+ "### Generating AVI animation...")
- pv.WriteAnimation(self.file_name+".avi",
- Magnification=1,
- Quality = 2,
- FrameRate=1.0,
- Compression=True)
+ filename = "{}.avi".format(self.file_name)
+ pv.SaveAnimation(filename)
+ # pv.WriteAnimation(self.file_name+".avi",
+ # Magnification=1,
+ # Quality = 2,
+ # viewOrLayout=view,
+ # FrameRate=1.0,
+ # Compression=True)
When an object is migrated from a sending processor to a receiving one, the former should update all its information about known underloaded potential targets and their respective (under-loads). This shall drastically improve picking especially for the case using cached loads.
Ideally a new subdirectory would be created that would contain all outputs:
Current code may result in negative CMF values as underloaded ranks that become overloaded with the improved transfer criterion are not removed from the list of possible targets.
We may want to keep them however so they can still offer useful targets with criterion "6 prime". But in that case the CMF computation becomes incorrect.
Thanks @nlslatt for the catch!
The goal of this issue is to replace all hard-coded parameter settings in this utility, such as:
# Number of processors
n_p = 8
or
file_name = "NodeGossiper-n8-lstats-i5-k4-f4-t1_0.0.{}.vom".format(i)
with command-line arguments (e.g., -i <input-VOM-prefix> -p <number-of-processors>
).
Currently this is causing failures and this is not good
Currently we only let the user specify the prefix, and we assume that the extension is always ".vom". However vt outputs ".out" stats, which forces us to do file manipulation prior to running LBAF on those.
In order to further automate the process and therefore LBAF understand one additional and optional flag like -e <extension>
with a default at ''
(because there could also be no extension at all". Note that in this setting we would need to pass, e.g., -e ".out"
and not just -e out
.
NB: this means that the current -e
(for exodus outputs) must be changed to something else: I suggest -m
for (Mesh outputs).
StrictLocalizingCriterion
script has been lost -- probably not staged during commit
Example in dev210112TS-gossiptrials-100n4-gossip-full-stats
We should set up some docker files to build containers for testing. Then we can launch those containers in GitHub Actions.
There are several levels of testing to accomplish:
This is a generic issue to be used when pushing new experimental data
The goal of this issue is to use the previously implemented #50 computation of "viewers" of underloaded processors from overloaded ones to better steer the picking of candidates for object migration (currently only based on under-load values).
Step to reproduce:
$ python NodeGossiper.py -l ../../data/dev210112TS4-gossiptrials-printstatslboff-100n4-gossip-full-stats-0/stats -x 20 -y 20 -z 1 -s 1 -k 2 -f 400 -i 8 -c 1
Traceback (most recent call last):
File "NodeGossiper.py", line 55, in
from src.Model.lbsPhase import Phase
ModuleNotFoundError: No module named 'src'
When trying to run NodeGossiper without VTK, the following error message is displayed:
* ERROR: Could not write to ExodusII file by lack of VTK
But still tries to get further making LBAF crash:
Traceback (most recent call last):
File "NodeGossiper.py", line 580, in <module>
params.verbose)
File "C:\dev\git\LBAF\src\IO\lbsWriterExodusII.py", line 118, in write
n_p = len(self.phase.processors)
AttributeError: WriterExodusII instance has no attribute 'phase'
Such case should be properly handled by avoiding crashing.
By default it should be .vom
Also clean up legacy Python 2 imports etc.
This is to integrate in the same concept both notions of Load and Communications and provide a unified framework.
This is to allow for a better simulation of asynchronous LB, where aggregated loads are not updated in real-time during the migration phase.
All line preambles (contained between square brackets, e.g. [lbsStatistics]
) should appear in a color different than that of the subsequent text.
Add this as a new strategy to the LB suite in VT.
The goal of this issue is two-fold:
(1) replace the naive, first implementation of a communication-only criterion (StrictLocalizer) with one that allows for the transfer of locally-communication objects iff this results in better locality on the target processor;
(2) extend the main optimizer loop logic to take into account communication costs (and not only loads)
This is generic statistics improvement thread.
Several sub-issues can be formulated, in particular:
. the clear delineation of what pertains to statistics from what does not
. addition of graph statistics
. use of theoretical results as comparison baselines
. other improvements/additions
This is related to #53
We want to be able to implement any LB algorithm (centralized, hierarchical, distributed, etc.) with LBAF. So we want to abstract out the actual LB algorithm but keep all the utilities.
Add the possibility to specify arguments in a configuration file in yml
or json
.
The goal of this issue is to return to all underloaded processors, at the end of the gossiping phase, the complete information as to how many overloaded ones are aware of them.
We will refer to such processors as the "overloaded_viewers" of a given underloaded one.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.