Giter VIP home page Giter VIP logo

hpc-carpentry / hpc-parallel-novice Goto Github PK

View Code? Open in Web Editor NEW
2.0 7.0 5.0 1.54 MB

Introductory material on parallelization using python with a focus on HPC platforms

Home Page: https://hpc-carpentry.github.io/hpc-parallel-novice

License: Other

Makefile 3.87% Python 64.30% Shell 1.75% HTML 12.37% R 2.90% Ruby 0.21% TeX 14.59%
lesson hpc-carpentry english alpha hpc-python hpc-carpentry-lab

hpc-parallel-novice's Introduction

Introduction to Parallelisation on HPC platforms

Novice introduction to parallelisation with high performance computing. This material was conceived as a sandbox project for hpc-carpentry. This work derives from hpc-in-a-day but will not be kept in sync.

Material

The material can be viewed here!

Audience

The material targets future users of a HPC infrastructure of any discipline. The learners are expected to have an introductory level of programming skills. Learners are expected to know how to submit a batch job on a HPC cluster. Further, knowledge on how to write functions in python are required. Basic numpy array commands are beneficial but not required to follow the course.

This boils down to the fact, that learners should have completed:

Scheduler

This material tries to be scheduler agnostic. Currently, it supports LSF and SLURM. The job scheduler type can be set with the workshop_scheduler variable in _config.yaml.

How to Teach

More information on how to teach this material can be found in the instructor notes under the Extras tab.

How to build

Dependencies

The material is based on the software carpentry lesson template. It hence depends on a fairly recent version of jekyll. Just give building it with make site in the root directory a try. If you find any problems, please open an issue.

local tests

To test the material locally, open a terminal and type:

$ make serve
# ...
    Server address: http://127.0.0.1:4000
  Server running... press ctrl-c to stop.

Once you see the above, open a web browser on the same machine and copy & paste http://127.0.0.1:4000 in the URL address field. Load this page and you should see a local version of the material display. Exit the rendering as prompted by pressing ctrl-c.

hpc-parallel-novice's People

Contributors

annajiat avatar bkmgit avatar psteinb avatar tkphd avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

hpc-parallel-novice's Issues

Add full example multiprocessing code

Possible version

import numpy as np
import argparse
import sys
from multiprocessing import Pool

np.random.seed(2021)

def inside_circle(total_count):
  x = np.float32(np.random.uniform(size=total_count))
  y = np.float32(np.random.uniform(size=total_count))
  radii = np.sqrt (x*x + y*y)
  filtered = np.where(radii <= 1.0)
  count = len(radii[filtered])
  return count

def estimate_pi(total_count,n_cores):
  partitions = [ ]
  for i in range(n_cores):
    partitions.append(int(total_count/n_cores))

  pool = Pool(processes=n_cores)
  counts = pool.map(inside_circle, partitions)
  total_count = sum(partitions)
  return (4.0 * sum(counts) / total_count )


def main():
  parser = argparse.ArgumentParser(
          description='Estimate Pi using a Monte Carlo method.')
  parser.add_argument('n_samples', metavar='N', type=int, nargs=1,
          default=10000,
          help='number of times to draw a random number')
  parser.add_argument('n_cores', metavar='N', type=int, nargs=1,
          default=1,
          help='number of cores to use')
  args = parser.parse_args()

  n_samples = args.n_samples[0]
  n_cores = args.n_cores[0]
  my_pi = estimate_pi(n_samples,n_cores)

  print("[multiprocessing version] pi is %f from %i samples with %i" % (my_pi, n_samples,n_cores))
  sys.exit(0)

if __name__=='__main__':
  main()

Update Dask Example

  • It may not be realistic to assume that most clusters will allow setting up of a webserver for viewing the scheduler.
  • Since most HPC clusters will have schedulers, it may be worth using dask-jobqueue, a possible example script is below
import numpy as np
import argparse
import sys
import math
import dask.array as da
from dask_jobqueue import SLURMCluster
from dask.distributed import Client

cluster = SLURMCluster(cores=4,
                       processes=1,
                       memory="4GB",
                       walltime="00:10:00")


np.random.seed(2021)
da.random.seed(2021)

def inside_circle(total_count, chunk_size=-1):
  x = da.random.uniform(size=(total_count),
                        chunks=(chunk_size))
  y = da.random.uniform(size=(total_count),
                        chunks=(chunk_size))
  radii = da.sqrt (x*x + y*y)
  filtered = da.where(radii <= 1.0)
  indices = np.array(filtered[0])
  count = len(radii[indices])
  return count

def estimate_pi(total_count,chunk_size):
  count = inside_circle(total_count, chunk_size)
  return (4.0 * count / total_count )


def main():
  parser = argparse.ArgumentParser(
          description='Estimate Pi using a Monte Carlo method.')
  parser.add_argument('n_samples', metavar='N', type=int, nargs=1,
          default=10000,
          help='number of times to draw a random number')
  parser.add_argument('chunk_size', metavar='N', type=int, nargs=1,
          default=1000,
          help='chunk size')
  args = parser.parse_args()

  n_samples = args.n_samples[0]
  chunk_size = args.chunk_size[0]
  client = Client(cluster)
  my_pi = estimate_pi(n_samples,chunk_size)

  print("[dask version] pi is %f from %i samples with %i" % (my_pi, n_samples,chunk_size))
  sys.exit(0)

if __name__=='__main__':
  main()
  • It may also be worth considering Ray

Add some clarification/context to MPI code

We had a workshop yesterday and there were some questions about why things are done in a particular way for the MPI example

  • Why the use of comm.Barrier()
  • Why do count_item = comm.scatter(counts, root=0) when you know there is no useful information to scatter?

These kinds of lines in the example are done quite deliberately and point to some of the intricacies of MPI, we should probably make more effort to explain why we need to use them.

reference to Buffon's needle

While Buffon's needle is an elegant and interesting mathematical problem which produces an estimate of ฯ€, it is not at all what we're doing in this episode. Indeed, the algorithm used here is so simple as to be obvious to anyone who knows the formula for the area of a circle.

I recommend removing the attribution.

multi-socket image

Learners might not know what a CPU "socket" is -- unless you've built a computer from parts, there's no need to know what's under the heat sink/fan; and if you've only ever had a laptop, tablet, or smartphone, it's all a sleek monolith.

It would help convey the idea to include images of a single-, dual-, and quad-socket motherboard in the lesson material for discussion. When we return to in-person workshops, old motherboards make great props!

(This may fall under "too deep in the weeds" territory, but the concept of sockets was raised by learners in the HEIBRiDS workshop.)

Maintainers needed

If you have the time, please volunteer to help maintain this repository.
The commitment is mostly to reviewing pull requests & providing timely feedback.

CI is not active

This lesson should use continuous integration to check for problems in PRs. Use the .github folder in hpc-intro as astarting point, and configure a GitHub Action to get this going.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.