Giter VIP home page Giter VIP logo

hrolive / upscaling-ai-worflows Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 851 KB

How to use Docker and Singularity containers in conjunction with TensorFlow and Horovod to do distributed training and upscale an AI app.

Jupyter Notebook 94.49% Python 5.51%
containers deep-learning docker high-performance-computing horovod machine-learning natural-language-processing notebook singularity tensorflow

upscaling-ai-worflows's Introduction

Workshop

Table of Contents

  1. Description
  2. Instructions
  3. Getting access to HPC resources
  4. Certificate

Description

In this workshop, sponsored by The Swedish EuroCC Hub for High-Performance Computing, we started by overviewing the basics of Docker and Singularity. Then, we used resources from the Slovenian supercomputer, Vega, to do distributed training using TensorFlow and Horovod frameworks. Moreover, we also used Docker and Singularity containers in conjunction with TensorFlow and Horovod to upscale an AI app.

Artificial Intelligence (AI) has become a foundational building block of our modern world. Accordingly, a vast effort has been put into bringing AI to researchers and practitioners of a wide range of fields. Nonetheless, the computationally intensive task of training an AI increasingly requires more computational power than what our laptops and PCs can offer. Therefore, the ability to develop and train a neural network on large clusters seems imperative. This workshop teaches us how to scale an AI-powered applications in large clusters, i.e., supercomputers.

The outcomes of the workshop are:

  • Create, deploy, and update containers locally on a supercomputer
  • Upscale the transfer learning of an NLP model in TensorFlow
  • Upscale the transfer learning of an NLP model using Horovod
  • Upscale the transfer learning of a containerized NLP model

Instructions

All necessary information and links for the workshop, QA, exercises, tutorials, etc, can be found in the workshop website.

Getting access to HPC resources

Detailed instructions on how to access Vega's resources can be found on The Swedish EuroCC Hub for High-Performance Computing website.

Workshop

How to reserve nodes and launch a jupyter notebook on VEGA:

Running Jupyter on Vega and connect to it

To book a node:

salloc -n 1 --gres=gpu:1 --reservation=enccs-day1 --mem-per-gpu=40GB --ntasks 4 --cpus-per-task 1 -t 01:00:00

You will see something like salloc: Nodes gn38 are ready for job

Login to compute node from login node (replace gn38 with your compute node ID):

ssh gn38

Load modules on compute node:

module load Anaconda3/2020.11
module load scikit-learn
module load TensorFlow/2.5.0-fosscuda-2020b

Pick a random port between 8000 and 9000, and replace gnXX

jupyter-notebook --no-browser --port=8123 --ip=gnXX

From local machine (replace gnXX with name of compute node, and replace 8123 with the port you selected when running jupyter-notebook):

ssh -L localhost:9999:gnXX:8123 [email protected]

Open in your browser (port 9999 should match what you selected in the ssh-tunnel command above):

http://localhost:9999/

Detailed instructions can be found on the Vega website.

Certificate

The certificate can be found here.

upscaling-ai-worflows's People

Contributors

hrolive avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.