Giter VIP home page Giter VIP logo

pytorch-multi-gpu-multi-process-testing's Introduction

A Script for PyTorch Multi-GPU Multi-Process Testing

In this repository, We provide a multi-GPU multi-process testing script that enables distributed testing in PyTorch (should also work for TensorFlow).

Problem

PyTorch distributed training is easy to use. However, we have to test the model sample by sample on a single GPU, since different testing samples often have different sizes. When we have multiple GPUs as same as training, this is a waste of time.

Solution

We use the multiprocessing package for distributed testing on multiple GPUs. It supports multiple process on multiple GPUs and each GPU can run multiple processes if you have large enough GPU memory. Note that each process is an independent execution of the testing function.

Important To Know

1, As each process is independent in terms of PyTorch, we need to load the model into GPU memory repeatedly before testing every sample, so our distirubted testing script may NOT save your time. It is suitable for cases where testing a single sample needs long runtime. For example, zero-shoting learning tasks and video recognition tasks.

2, When we only has one testing process per GPU (i.e., max_process_per_gpu==1), it always works fine. But when we try to start multiple processes per GPU (i.e., max_process_per_gpu>=2), it may get stuck on some computers or clusters.

Platform max_process_per_gpu==1 max_process_per_gpu>=2
Standalone Computer pass✅ pass✅
Slurm pass✅ pass✅
IBM LSF pass✅ get stuck sometimes❌

3, When there is a runtime error (e.g., out-of-memory error) in one testing process, it will NOT output error message or impact other processes. Be sure to check that the output number and input number are equal after all processes are finished.

pytorch-multi-gpu-multi-process-testing's People

Contributors

jingyunliang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.