Giter VIP home page Giter VIP logo

user2017's Introduction

Introduction to parallel computing with R

Instructor: Hana Ševčíková, University of Washington

Where and when: useR!2017 conference, Brussels, July 4th 2017

Keywords: Parallel Computing, Master-Slave Paradigm, Reproducibility, Load Balancing, Distributed Random Number Generators

Goal

The goal of this tutorial is to introduce attendees to concepts and tools available in R for parallel computing. It is aimed at novice R programmers to lower an often percieved mental hurdle when dealing with code paralelization.

Description

Over the past few years, R has become increasingly popular outside of the statistical community. It became one of the most popular programming languages among data scientists. With an increasing amount of data and more complex algorithms available to scientists today, parallel processing is almost always a must, and in fact is expected in packages implementing time-consuming methods.

Numerous R packages for parallel computing have been developed over the past two decades, with snow being one of the pioneers in providing a high level interface for parallel computations on a cluster or in a multicore environment. More recently, most of the snow functionality has been implemented in the R core package parallel.

The main focus of the tutorial will be on the viewpoint implemented in the parallel package, namely the master-slave paradigm. Note that other viewpoints, such as Single Program/Multiple Data (SPMD), grid computing, map/reduce will be briefly introduced only as concepts, without going into detail.

In a parallel statistical application a few issues need more attention than in its sequential counterpart. These include reproducibility, random number generation, computation transparency or load balancing. We will talk about solutions to these and other issues implemented in user-contributed packages.

Outline

  • Paradigms of parallel computing
  • The master-slave paradigm in R
  • Examples of using parallel
  • Random numbers generation
  • Reproducibility and load balancing
  • Review of useful snow-like R packages with examples (snowFT and foreach)
  • Benchmarking

Pre-requisites

The tutorial is targeting people relatively new to R, so only basic knowledge of R is required.

Please install the following packages:

install.packages(c("foreach", "doParallel", "doRNG", 
                   "snowFT", "extraDistr", "ggplot2", 
                   "reshape2", "wpp2017"), 
                 dependencies = TRUE)

Technical Note:

For RStudio users, please note that currently RStudio contains a bug that prevents one of the packages handled in the tutorial from working correctly. Thus I recommend that you use an alternative R user interface than RStudio. If you use RStudio, you will not be able to run about 1/4 of the material. (This bug in RStudio was reported and hopefully will be fixed soon.)

Instructor

Hana Ševčíková is a Senior Research Scientist at the Center for Statistics and the Social Sciences at the University of Washington. She has collaborated on implementation of R packages for parallel computing and distributed random number generators, such as snowFT, rlecuyer and snow. More recently, she has been involved in developing demographic R packages as part of a collaborative research project with the United Nations.

Material

user2017's People

Contributors

hanase avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.