Giter VIP home page Giter VIP logo

unibas-dmi-hpc / dls4lb Goto Github PK

View Code? Open in Web Editor NEW
15.0 3.0 2.0 66 KB

Dynamic Loop Self-scheduling For Load Balancing (DLS4LB) is an MPI-Based load balancing library. It is implemented in C and FORTRAN (F90) programming languages to support scientific applications executed on High Performance Computing (HPC) systems.

Home Page: https://hpc.dmi.unibas.ch/en/scientific_output/software-prototypes-data-sets/

License: GNU General Public License v3.0

C 44.43% Fortran 55.57%
loop-scheduling-technique loadbalancing mpi-library dynamic-scheduling openmp mpi parallel-computing hpc parallel

dls4lb's Introduction

DLS4LB

Dynamic Loop Self-scheduling For Load Balancing (DLS4LB) is an MPI-Based load balancing library. It is implemented in C and FORTRAN (F90) programming languages to support scientific applications executed on High Performance Computing (HPC) systems.

DLS4LB used reproduction as a means for the verification of the implementation of the scheduling techniques in the library [1]. Therefore, scheduling techniques implemented in DLS4LB conform to their original implementation as published decades ago. DLS4LB is developed by Ali Mohammed ([email protected]) and Florina M. Ciorba ([email protected]) It includes 14 scheduling techniques and support for Simulation-assisted scheduling Algorithm Selection (SimAS) as the fifteen DLS technique [2]. That is, the most efficient DLS technique will be selected dynamically during execution based on simulation results. Please read publication [2] for more details.

DLS4LB library is based on the DLB_tool developed by Ricolindo L. Carino ([email protected]) and Ioana Banicescu ([email protected]), see publication [3]. It is modified and extended by Ali Mohammed ([email protected]) to support more scheduling techniques and some bug fixes, see publication [1].

The DLS4LB parallelizes and load balances scientific applications that contain simple parallel loops (1D loops) or nested parallel loops (2D loops). The tool employs a master-worker model where workers request work from the master whenever they become free. The master serves work requests and assigns workers chunks of loop iterations according to the selected DLS technique. The master also doubles as a worker and executes chunks of loop iterations when it is not serving any requests.

The DLS4LB library supports fourteen scheduling techniques:

  1. Straightforward static scheduluing (STATIC)
  2. Self Scheduling (SS)
  3. Fixed size chunk (FSC)
  4. Modified fixed size chhunk (mFSC)
  5. Guided self-scheduling (GSS)
  6. Trapezoid self-scheduling (TSS)
  7. Factoring (FAC)
  8. Weighted factoring (WF)
  9. Adaptive weighted factoring (AWF)
  10. Adaptive weighted factoring - Batch (AWF-B)
  11. Adaptive weighted factoring - Chunk (AWF-C)
  12. Adaptive weighted factoring - Batch with scheduling overhead (AWF-D)
  13. Adaptive weighted factoring - Chunk with scheduling overhead (AWF-E)
  14. Adaptive factoring (AF)

The DLS4LB is designed to load balance scientific applications with minimum code changes. Below is a simple example on how to use the DLS4LB tool for simple 1D loops (Type I) and nested 2D loops (Type II).

Type I ! begin i-loop do i=1,N ... i-iterate end do ! end i-loop

Type II ! begin j-loop do j=1,M ... part of j-iterate ! begin i-loop do i=1,N(j) ... i-iterate of j-iterate end do ! end i-loop ... part of j-iterate end do ! end j-loop

Dynamic load balancing is achieved in the application by invoking DLS routines to manipulate the loop indices.

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

For Type I loops, the application must be modified as:

use DLS include 'mpif.h' ... type (infoDLS) info integer method, iStart, iSize integer iIters double precision iTime ... method = ... (choice of loop scheduling technique) call DLS_Setup (MPI_COMM_WORLD, info) call DLS_StartLoop (info, 1, N, method) do while ( .not. DLS_Terminated(info) ) call DLS_StartChunk (info, iStart, iSize) ! begin i-loop, new extents do i=iStart, iStart+iSize-1 ! 1,N ... i-iterate end do ! end i-loop call DLS_EndChunk (info) end do ! while ( .not. DLS_Terminated(info) ) call DLS_EndLoop (info, iIters, iTime)

where:

DLS_Setup (MPI_COMM_WORLD, info) Initializes a dynamic load balancing environment on MPI_COMM_WORLD. Information about this environment is stored in "info" (see type declaration infoDLS below)

DLS_StartLoop (info, 1, N, method) The synchronization point to start loop execution. (1, N) is the loop range, and "method" is a user- specified index [1..9] for the loop scheduling method (see array DLS_LongName() below)

DLS_Terminated(info) returns .TRUE. if all loop iterates have been executed

DLS_StartChunk (info, iStart, iSize) Returns a range for a chunk of iterates, starting at "iStart", with size "iSize"

do i=iStart, iStart+iSize-1 ! 1,N ... i-iterate end do The original "serial" code, but now executing a chunk of iterates instead of "i=1,N"

DLS_EndChunk (info) Indicates the end of execution of a chunk of iterates

DLS_EndLoop (info, iIters, iTime) The synchronization point to end loop execution. "iIters" is the number of iterates done by the calling processor, and "iTime" is its cost (in seconds)

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

For Type II loops, the application must be modified as:

use DLS include 'mpif.h' ... type (infoDLS) iInfo, jInfo integer iComm, jComm ! communicators integer iMethod, iStart, iSize, jMethod, jStart, jSize integer iIters, jIters double precision iTime, jTime integer coordinator ... iMethod = ... (choice of loop scheduling technique) jMethod = ... (choice of loop scheduling technique) coordinator = 0 call DLS_GroupSetup (MPI_COMM_WORLD, coordinator, jInfo, iInfo) call DLS_StartLoop (jInfo, 1, M, jMethod) do while ( .not. DLS_Terminated(jInfo) ) call DLS_StartChunk (jInfo, jStart, jSize)

! begin j-loop code, new extents do j=jStart, jStart+jSjze-1 ! 1,M ... part of j-iterate

call DLS_StartLoop (iInfo, 1, N(j), iMethod) do while ( .not. DLS_Terminated(iInfo) ) call DLS_StartChunk (iInfo, iStart, iSize) ! begin i-loop, new extents do i=iStart, iStart+iSize-1 ! 1,N ... i-iterate of j-iterate end do end i-loop call DLS_EndChunk (iInfo) end do ! while ( .not. DLS_Terminated(iInfo) ) call DLS_EndLoop (iInfo, iIters, iTime)

... part of j-iterate end do ! end j-loop

call DLS_EndChunk (jInfo) end do ! while ( .not. DLS_Terminated(jInfo) ) call DLS_EndLoop (jInfo, jIters, jTime)

where:

DLS_GroupSetup (MPI_COMM_WORLD, coordinator, jInfo, iInfo) Splits MPI_COMM_WORLD into non-overlapping "iComm"s and a "jComm" comprised of the "coordinator" and the foremen (rank 0 of the iComms). Similar to DLS_Setup(), DLS_GroupSetup() initializes load balancing environments in "jComm" and "iComm", keeping the data in "jInfo" and "iInfo" respectively. Graphically:

Authors:

[1] Ali Mohammed

[2] Florina M. Ciorba

[3] Ricolindo L. Carino

[4] Ioana Banicescu

Publications:

[1] Ali Mohammed, Ahmed Eleliemy, Florina M. Ciorba, Franziska Kasielke, Ioana Banicescu, “An Approach for Realistically Simulating the Performance of Scientific Applications on High Performance Computing Systems”, Future Generation Computer Systems (FGCS), 2019, https://doi.org/10.1016/j.future.2019.10.007 (Open Access)

[2] Ali Mohammed and Florina M. Ciorba, “SimAS: A simulation-assisted approach for the scheduling algorithm selection under perturbations”, In Concurrency and Computation: Practice and Experience (CPE), 2020, https://onlinelibrary.wiley.com /doi/full/10.1002/cpe.5648 (Open Access)

[3] Ricolindo L. Cariño and Ioana Banicescu. A Tool for a Two-level Dynamic Load Balancing Strategy in Scientific Applications. Scalable Computing: Practice and Experience, 8(3), 2007.

dls4lb's People

Contributors

aoa-mohammed avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.