Giter VIP home page Giter VIP logo

rachellea / ct-volume-preprocessing Goto Github PK

View Code? Open in Web Editor NEW
43.0 3.0 16.0 26 KB

End-to-end Python CT volume preprocessing pipeline to convert raw DICOMs into clean 3D numpy arrays for ML. From paper Draelos et al. "Machine-Learning-Based Multiple Abnormality Prediction with Large-Scale Chest Computed Tomography Volumes."

License: MIT License

Python 100.00%
python dicom data-science preprocessing radiology ct-scans numpy volumetric medical-image-processing

ct-volume-preprocessing's Introduction

ct-volume-preprocessing

ct-volume-preprocessing includes:

  • preprocess_volumes.py, an end-to-end Python pipeline for complete preprocessing of computed tomography (CT) scans from DICOM format to clean numpy arrays suitable for machine learning.
  • download_volumes.py, an example Python pipeline for downloading CT data in bulk.
  • visualize_volumes.py, which contains Python code for visualizing CT scans in an interactive way, visualizing CT scans as GIFs, and making other figures from CT data.

Requirements

All pipelines are implemented in Python and can be run using the Singularity container or the requirements defined here: https://github.com/rachellea/research-container

I've also included a requirements.txt in this repo which was obtained by pruning out everything I don't think is necessary from the complete requirements.txt provided in the research-container repo.

Details: Preprocessing CT Data

The steps of the CT volume preprocessing pipeline are described in detail in "Appendix A.2 CT Volume Preparation" of our Medical Image Analysis paper. The paper is also available on arXiv.

If you find this work useful in your research, please consider citing us:

Draelos R.L., et al. "Machine-Learning-Based Multiple Abnormality Prediction with Large-Scale Chest Computed Tomography Volumes." Medical Image Analysis (2020).

The CleanCTScans class in preprocess_volumes.py assumes that the CT scans to be processed are saved in one directory and that each CT scan is saved as a pickled Python list. Each element of the list is a pydicom.dataset.FileDataset that represents a DICOM file and thus contains metadata as well as pixel data. Each pydicom.dataset.FileDataset corresponds to one slice of the CT scan. The slices are not necessarily 'in order' in this list.

Note that if your CT scans are instead stored as raw DICOMs with one DICOM per slice, you can easily modify the pipeline to first read each DICOM file into a pydicom.dataset.FileDataset directly using pydicom. Then you can aggregate these into a list to use the pipeline on your data.

For each CT scan, preprocess_volumes.py will order the slices and stack them into a volume, rescale pixel values to Hounsfield Units (HU), clip the pixel values to [-1000 HU, +1000 HU], resample to spacing of 0.8 x 0.8 x 0.8 mm, and save the final CT volume as a zip-compressed numpy array of 16-bit integers. These numpy arrays can then be loaded as input data for machine learning with PyTorch or TensorFlow.

(Note that ML-specific preprocessing like normalizing pixel values is not part of this particular pipeline, as those steps are quick and often are customized to a particular data set.)

Details: Downloading CT Data in Bulk

download_volumes.py is included as an example of a pipeline for downloading CT scans in bulk. This pipeline will not run because the required endpoints, IDs, and tokens have all been removed for security reasons. The code is only included as an example.

A CT scan is associated with multiple series. This download pipeline includes logic for selecting the original series with the greatest number of slices.

Credits

The slice ordering step uses code modified from Innolitics' dicom-numpy repo:

https://github.com/innolitics/dicom-numpy/blob/master/dicom_numpy/combine_slices.py

This dicom-numpy code was originally downloaded on September 19, 2019.

ct-volume-preprocessing's People

Contributors

rachellea avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.