Giter VIP home page Giter VIP logo

synthetic-medical-images's Introduction

Generating synthetic medical images

Automated diagnosis of any kind are hampered by the small size, lack of diversity and expensiveness of available dataset of medical images. To tackle this problem, several approaches using generative models have been applied.

Thence, in this project we will be using a (conditional) Deep Convolutional Generative Adversarial Network (cDCGAN) to synthetically generate medical images; concretely, dermatological images of pigmented skin lesions.

Generated samples during the training process

Original samples

Original samples of the HAM10000 dataset

Generated samples

Generated samples from random noise and arbitrary labels

This were generated after a training with IMAGE_SIZE = 64 and the current hyperparameters, which yielded the following error losses during training:

Losses during training

It is trivial to see the discriminator was too simple to keep improving at some point (its predictive performance was eventually at a standstill, preventing the generator to improve). This is due to the fact no fine-tuning was yet carried out.

Dataset

We will use the HAM10000 dataset ("Human Against Machine with 10000 training images"), published in the 2018. In the words of the authors:

We collected dermatoscopic images from different populations acquired and stored by different modalities. Given this diversity we had to apply different acquisition and cleaning methods and developed semi-automatic workflows utilizing specifically trained neural networks. The final dataset consists of 11788 dermatoscopic images, of which 10010 will be released as a training set for academic machine learning purposes and will be publicly available through the ISIC archive. This benchmark dataset can be used for machine learning and for comparisons with human experts. Cases include a representative collection of all important diagnostic categories in the realm of pigmented lesions. More than 50% of lesions have been confirmed by pathology, the ground truth for the rest of the cases was either follow-up, expert consensus, or confirmation by in-vivo confocal microscopy.

This same dataset, has been uploaded at several places:

Technical information

Training of neural networks for automated diagnosis of pigmented skin lesions is hampered by the small size and lack of diversity of available dataset of dermatoscopic images. This problem was tackled by releasing the HAM10000 ("Human Against Machine with 10000 training images") dataset. The authors collected dermatoscopic images from different populations, acquired and stored by different modalities. The final dataset consists of 10015 dermatoscopic images which can serve as a training set for academic machine learning purposes. Cases include a representative collection of all important diagnostic categories in the realm of pigmented lesions:

  • Actinic keratoses and intraepithelial carcinoma / Bowen's disease (akiec, 0 labeled)
  • Basal cell carcinoma (bcc, 1 labeled)
  • Benign keratosis-like lesions: solar lentigines / seborrheic keratoses and lichen-planus like keratoses (bkl, 2 labeled)
  • Dermatofibroma (df, 3 labeled)
  • Melanoma (mel, 4 labeled)
  • Melanocytic nevi (nv, 5 labeled)
  • Vascular lesions: angiomas, angiokeratomas, pyogenic granulomas and hemorrhage (vasc, 6 labeled).

The ground-truth of the lesions are confirmed through:

  • Histopathology (histo): more than 50% of lesions
  • Follow-up examination (follow_up)
  • Expert consensus (consensus)
  • Confirmation by in-vivo confocal microscopy (confocal). These are labeled accordingly in the dx_type column of the HAM10000_metadata.csv.

Due to upload size limitations, images were stored in two files:

  • HAM10000_images_part_1: with 5000 .jpg files
  • HAM10000_images_part_2: with 5015 .jpg files

The dataset includes lesions with multiple images, which can be tracked by the lesion_id column within the HAM10000_metadata.csv file. This file contains the following columns:

  • image_id: id and name with which the image can be found in one of HAM10000_images_part_1 or HAM10000_images_part_2 folders
  • lesion_id: id of the lesion (note one lesion can contain more than one image)
  • dx_type: 4-category procedure of the diagnostic (through which dx was confirmed)
  • age: age of the patient as integer
  • sex: male or female in function of the biological sex of the patient
  • localization: body part in which the lesion is found
  • dx: the ground-truth, i.e. our label

Other medical images datasets

There are plenty of medical images datasets. Below we list some websites which may prove useful:

There are lots of general datasets' repositories out there, which may be worht looking at:

Reproducing the results

Dedicated conda environment

To run the code locally (without any Docker container), I installed pytorch (with GPU support) in a dedicated conda environment by following this guide.

Note the cuDNN files have to be copied at C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8. Once, CUDA 11.8 and the compatible cuDNN version are installed, we set up GPU-pytorch through conda according to the official website.

conda create -n hamgan python=3.9
conda activate hamgan
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia -y
conda install jupyter notebook pandas matplotlib seaborn -y

Set up a dedicated jupyter kernel

In order to run the repo .ipynb we need to set up our conda environment as jupyter kernel, by typing:

python -m ipykernel install --user --name=hamgan

synthetic-medical-images's People

Contributors

gcastro-98 avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

synthetic-medical-images's Issues

Uncertainty quantification

Implement model-agnostic uncertainity quantification procedures, such as according to this survey:

Assess interpretability

Enhance the decoder CNN by explaining its decisions by:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.