Giter VIP home page Giter VIP logo

ai-radiologist-gpu's Introduction

AI Radiologist GPU

GPU version of fast training of a radiologist using Multiple GPUs on a large scale setting using multinodes.

A chest x-ray identifies a lung mass. Source: NIH : A chest x-ray identifies a lung mass.

Getting Started

Please refer to this article to install horovod and tensorflow.

Hardware & Technology Stack

Hardware configuration Software Configuration
4 PowerEdge C4140 Deep Learning Framework: Tensorflow-GPU V1.12.0
4 Nvidia V100 32GB SXM2 Horovod version: 0.16.4
2 20 core Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz MPI version: 4.0.0 with CUDA and UCX Support
384 GB RAM, DDR4 2666MHz CUDA version: 10.1.105
Lustre file system NCCL version: 2.4.7
Python version: 3.6.8
OS and version: RHEL 7.4

Clone this Repo

https://github.com/dellemc-hpc-ai/ai-radiologist-GPU.git
cd ai-radiologist-GPU

Run the following commands from the ai-radiologist-GPU directory.

Download the dataset

./download_dataset.sh

you must be able to see tars folder.

Extract the dataset

./extract_images.sh

You must be able to see all the extracted images inside images_all/images folder.

Running the tests

At this point, you can start to train the cheXNet model using raw images, refer to Run the Job if you'd like to continue training with raw images. However, if you want to train the model with TF Records then convert the raw images to TF Records.

Convert to TF Records

This will convert your raw images to TF Records which improves the training speed by ~8-15% percent.

./write_tfrecords.sh

Run the Job

Please Ensure that:

  • You edit the conda env name to your env name in your submission script.
  • Change the paths from the submission script to your MPI, CUDA, etc build locations.

If you're using slurm as scheduler, submit the corresponding script based on the data you want to run. You can change numbers for N and n inside the scripts.

sbatch job_submissions/slurm/{raw_1gpu.sh/tfrec_1gpu.sh}

Throughput Numbers

Total Process (Number of GPUs) Images/Second
1 185.12
2 315.26
3 421.85
4 589.36
8 1116.99
12 1527.60
16 1912.82

Time to solution - Using 10 epochs

Total Process (Number of GPUs) Time to Train 1 Epoch(s) - Averaged from 10 epochs
1 4206.58
2 2470.08
3 1845.96
4 1321.29
8 697.15
12 509.76
16 407.1

AUCROC

Pathology AUCROC
Cardiomegaly 0.875733593
Emphysema 0.892907302
Effussion 0.890032406
Hernia 0.833090067
Nodule 0.715641755
Pneumonia 0.83281297
Atelectasis 0.816172298
Pleural Thickening 0.751320603
Mass 0.823109117
Edema 0.871388368
Consolidation 0.79861069
Infiltration 0.673332357
Fibrosis 0.784720033
Pneumothorax 0.695001477
AVG AUCROC 0.803848074

Related Articles/Blogs

Acknowledgments

ai-radiologist-gpu's People

Contributors

rakshithvasudev avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.