Giter VIP home page Giter VIP logo

nsclc_radiogenomics's Introduction

A Radiogenomics Pipeline for Lung Nodules Segmentation and Prediction of EGFR Mutation Status from CT Scans.

This is the official repository for the Preprint "A Radiogenomics Pipeline for Lung Nodules Segmentation and Prediction of EGFR Mutation Status from CT Scans".

Link: ArXiv

Abstract

Lung cancer is a leading cause of death worldwide. Early-stage detection of lung cancer is essential for a more favorable prognosis. Radiogenomics is an emerging discipline that combines medical imaging and genomics features for modeling patient outcomes non-invasively. This study presents a radiogenomics pipeline that has: 1) a novel mixed architecture (RA-Seg) to segment lung cancer through attention and recurrent blocks; and 2) deep feature classifiers to distinguish Epidermal Growth Factor Receptor (EGFR) mutation status. We evaluate the proposed algorithm on multiple public datasets to assess its generalizability and robustness. We demonstrate how the proposed segmentation and classification methods outperform existing baseline and SOTA approaches (73.54 Dice and 93 F1 scores).

Dependencies

In addition to the python packages, lungmask is required.

pip install -r requirements.txt

Preprocessing

Preprocess argument prepares the data as available from their sources, data is normalized and stored as tensor for training or inference. Preprocessing is available for public dataset: MSD Task6_Lung, NSCLC Radiomics (RAD), and NSCLC Radiogenomics (RADGEN).

For other datasets user must create the corresponding function making sure first axis contains slices, second axis goes from chest to back, and third axis from right to left.

Preprocessing:

  • Voxel Intensity clipped to the range [-200, 250].
  • Min-max normalization to the range [0, 1].
  • Resampling to anisotropic resolution 1x1x1.5 mm^3.
  • Lungs Volume of Interest (VOI) crop utilizing U-Net(R231) pretrained weights.
  • Resize to 256×256×256 using b-spline for the image and nearest interpolation for the segmentation.
  • Generation of 3D bounding boxes using the tumor segmentation mask.
cd radiogenomics

python main.py preprocess RAW_DATA_PATH PREPROCESSED_DATA_OUTPUT DATASET
python main.py preprocess ./data/MSD/Task06_Lung ./data/MSD/processed msd

Illustrated example of data preprocessing: alt text

Segmentation. Lung tumor segmentation.

"Residual-Attention Segmentor" (RA-Seg) architecture combines the core concepts of three models:

The proposed “Recurrence-Attention Segmentor” (RA-Seg) generates a tumor mask with a U-Net like structure, the introduction of an Atenttion block to recalibrate the skip connections, and a Recurrent Block to capture slice interdependencies of the CT.

The architecture of the proposed model: alt text

Architecture outline and details: alt text

Segmentation Inference. Lung tumor segmentation mask.

Segmentation results:

  • Trained on MSD. Average DSC 67.24 (5 fold CV) and 68.65 on RADGEN testset.
  • Trained on RAD. Average DSC 72.36 (5 fold CV) and 66.11 on RADGEN testset.
  • Trained on MSD+RAD. Average DSC 71.42 (5 fold CV) and 73.54 on RADGEN testset.
  • Trained on RADGEN with pretrained weights (MSD+RAD). Average DSC 75.26 (5 fold CV).

Available pretrained weight:

  • Trained on RADGEN with pretrained weights (MSD+RAD).

Inference command will save tumor mask for preprocessed data available on DATA_PATH and extract the high-level raw features (NO dimensionality reduction). File structure:

DATA PATH
|____ CASE1
            required:
|____ |____ INPUT.pt
|____ |____ TUMOR_BBX.pt
            generated:
|____ |____ TUMOR_SEG.npy
|____ |____ RAW_FEATS.npy
|
|____ CASE2
|____ |____ INPUT.pt
|____ |____ TUMOR_BBX.pt
|____ |____ TUMOR_SEG.npy
|____ |____ RAW_FEATS.npy
|____ CASE3
.
.
.

Command Example:

cd radiogenomics # Make sure you are on the right directory.

python main.py inf_seg DATA_PATH WEIGHTS_PATH MODEL
python main.py inf_seg ./data/radgen/processed/lungs_roi ./weights/radgen_finetune.pt RA_Seg

alt text

Classification. EGFR mutation status classification.

Highl-level deep features are extracted from the Decoder 2 step output. Then undergo preprocessing (mean and flatten operation) with LDA dimensionality reduction.

Classification results:

  • Quadratic discriminant analysis (QDA). Average ROC-AUC 0.90 (5 fold CV)
  • Decision Tree (DT). Average ROC-AUC 0.91 (5 fold CV)
  • Random Forest (RF). Average ROC-AUC 0.93 (5 fold CV)
  • C-Support Vector Classification (SVC). Average ROC-AUC 0.83 (5 fold CV)

Classifiers available as SAV files:

  • QDA
  • DT
  • RF
  • SVC

Inference command will return the prediction of EGFR mutation status. Class-negative corresponds to "Wildtype" and Class-positive corresponds to "Mutated".

cd radiogenomics # Make sure you are on the right directory.

python main.py inf_class DATA_PATH SAVED_MODELS_PATH MODEL
python main.py inf_class ./data/radgen/processed/lungs_roi ./classifiers qda

nsclc_radiogenomics's People

Contributors

gollini avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.