Giter VIP home page Giter VIP logo

horovod-pytorch-tutorial's Introduction

Horovod Tutorial for Pytorch

Introduction

Horovod is a open-source library for distributed deep learning.

It uses the Ring-AllReduce algorithm for efficient distributed training of neural networks.

This repository is a very simple hands-on guide for using Horovod-Pytorch with NVIDIA-Docker.

The aim is to provide a template for other projects using Horovod for Pytorch.

It also attempts to provide a more detailed explanation of what is going on.

The Horovod documentation leaves much to the imagination as of February, 2020.

Here, I try to explain the details as much as I can.

Please star/fork my repository if you find this tutorial helpful!

Installation

To run this project please install NVIDIA-Docker first.

Unfortunately for Windows users, NVIDIA-Docker is only available for Linux as of the time of writing.

NVIDIA-Docker has many dependencies, such as the NVIDIA driver and Docker.

These are all necessary for this project.

I am using Docker because I have found that local installation often fails.

This is likely due to complicated dependency issues.

Also, catastrophic errors are easier to handle in a Docker container than on a local machine.

Please view basic Docker concepts for this project.

Don't be afraid! It's not that difficult to understand!

Environment

The Docker container generated by the Dockerfile will create a Ubuntu 18.04 LTS image with CUDA 10.0, CuDNN 7.6.0.64-1, NCCL 2.4.7-1, and OpenMPI 4.0.2.

Python version is 3.6.7, Pytorch is 1.4.0, and Torchvision is 0.5.0.

The settings were modified from the currently available official horovod image.

The current official horovod Docker image has an issue with pillow 7 incompatibility with Torchvision 0.4.2.

Task

A very simple task using ResNet34 for CIFAR10 classification was used.

Its main purpose is to explain what is going on.

horovod-pytorch-tutorial's People

Contributors

veritas9872 avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.