Giter VIP home page Giter VIP logo

tcga_segmentation's Introduction

This repository is a software system containing an end-to-end Whole Slide Imaging pre-processing pipeline from The Cancer Genome Atlas download documents, as well as a complete implementation of deep learning tumor segmentation from WSI binary labels as detailed in "Weakly supervised multiple instance learning histopathological tumor segmentation".

Example of WSI segmentations

Example of Whole Slide Image tumor segmentation (black background; blue: normal tissue; pink: neoplastic tissue).

Major features

This software is entirely written in Python3 and contains two major parts:

  • a tool to automatically download data from TCGA GDC Data Portal, which also handles tiles extraction, background removal, and tumor label extraction.
  • an end-to-end pytorch software that can train many types of common image classifier architectures for the task of tumor segmentation on WSI based on weak binary WSI labels indicating the presence of tumor in each WSI.
  • a collection of 6481 semi-automatically generated tumor maps for the entire snap-frozen WSI of TCGA repository for breast, kidney, and bronchus and lung locations.

Installation

Use python3 and install mandatory libraries:

virtualenv -p python3 --system-site-packages venv
source venv/bin/activate
pip install -r requirements.txt 

Quick Start

Downloading TCGA cohorts + WSI pre-processing

  1. Download the GDC Data Transfer Tool executable (not included here for license issues)
  2. Constitute any cohort on the TCGA GDC Data Portal, then download the associated manifest file, and place it in a source_folder
  3. Launch the download and pre-processing pipeline with:
python -m code.data_processing.main --gdc gdc_executable_path source_folder

This script first downloads all files in the manifest file, then tiles WSI, extracts tiles of a given magnification, removes background tiles, and finally seeks to extract per-slide binary labels from their name. More information here (in construction).

Training WSI segmentation models

After data download and pre-processing has been performed, launch the training pipeline using:

python -m code.training --preprocessed-data-folder ./data/preprocessed --alpha 0.1 --beta 0.

Many parameters are tunable, see python -m code.training --help

More informations about the training pipeline, including available imaging models here (in construction).

License

This software is released under the GNU Affero General Public License v3.0 license.

Citation

If you use this software or any part of this software in your research, please use the following BibTeX entry.

@misc{lerousseau2020weakly,
    title={Weakly supervised multiple instance learning histopathological tumor segmentation},
    author={Marvin Lerousseau and Maria Vakalopoulou and Marion Classe and Julien Adam and Enzo Battistella and Alexandre Carré and Théo Estienne and Théophraste Henry and Eric Deutsch and Nikos Paragios},
    year={2020},
    eprint={2004.05024},
    archivePrefix={arXiv},
    primaryClass={eess.IV}
}

or

Lerousseau, Marvin, Maria Vakalopoulou, Marion Classe, Julien Adam, Enzo Battistella, Alexandre Carré, Théo Estienne, Théophraste Henry, Eric Deutsch, and Nikos Paragios. "Weakly supervised multiple instance learning histopathological tumor segmentation." arXiv preprint arXiv:2004.05024 (2020).

tcga_segmentation's People

Contributors

dgonzmd avatar marvinler avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.