Self-Supervised driven Consistency Training for Annotation Efficient Histopathology Image Analysis

by Chetan L. Srinidhi, Seung Wook Kim, Fu-Der Chen and Anne L. Martel

Official repository for Self-Supervised driven Consistency Training for Annotation Efficient Histopathology Image Analysis. [arXiv preprint]

Overview

We propose a self-supervised driven consistency training paradigm for histopathology image analysis that learns to leverage both task-agnostic and task-specific unlabeled data based on two strategies:

A self-supervised pretext task that harnesses the underlying multi-resolution contextual cues in histology whole-slide images (WSIs) to learn a powerful supervisory signal for unsupervised representation learning.
A new teacher-student semi-supervised consistency paradigm that learns to effectively transfer the pretrained representations to downstream tasks based on prediction consistency with the task-specific unlabeled data.

We carry out extensive validation experiments on three histopathology benchmark datasets across two classification and one regression-based task, i.e., tumor metastasis detection (Breast), tissue type classification(Colorectal), and tumor cellularity quantification (Breast). We compare against the state-of-the-art self-supervised pretraining methods based on generative and contrastive learning techniques: Variational Autoencoder (VAE) and Momentum Contrast (MoCo), respectively.

1. Self-Supervised pretext task

2. Consistency training

Results

Predicted tumor cellularity (TC) scores on BreastPathQ test set for 10% labeled data

Predicted tumor probability on Camelyon16 test set for 10% labeled data

Prerequisites

Core implementation:

Python 3.7+
Pytorch 1.7+
Openslide-python 1.1+
Albumentations 1.8+
Scikit-image 0.15+
Scikit-learn 0.22+
Matplotlib 3.2+
Scipy, Numpy (any version)

Additional packages can be installed via:

pip install -r requirements.txt

Datasets

BreastPathQ: to download the dataset, check this link :
https://breastpathq.grand-challenge.org/Overview/
Camelyon16: to download the dataset, check this link :
https://camelyon16.grand-challenge.org
Colorectal cancer tissue classification (Kather et al. 2019): to download the dataset, check this link :
https://zenodo.org/record/1214456#.YCbVXy3b1hE

Training

The model training happens at three stages:

Task-agnostic self-supervised pretext task (i.e., the proposed Resolution sequence prediction (RSP) task)
Task-specific supervised fine-tuning (SSL)
Task-specific teacher-student consistency training (SSL_CR)

1. Self-supervised pretext task: Resolution sequence prediction (RSP) in WSIs

From the file "pretrain_BreastPathQ.py / pretrain_Camelyon16.py", you can pretrain the network (ResNet18) for predicting the resolution sequence ordering in WSIs on BreastPathQ & Camelyon16 dataset, respectively. This can be easily adapted to any other dataset of choice.

The choice of resolution levels for the RSP task can also be set in dataset.py#L277 while pretraining on any other datasets.
The argument --train_image_pth is the only required argument and should be set to the directory containing your training WSIs. There are many more arguments that can be set, and these are all explained in the corresponding files.

python pretrain_BreastPathQ.py    // Pretraining on BreastPathQ   
python pretrain_Camelyon16.py    // Pretraining on Camelyon16

We also provided the pretrained models for BreastPathQ and Camelyon16, found in the "Pretrained_models" folder. These models can also be used for feature transferability (domain adaptation) between datasets with different tissue types/organs.

2. Task specific supervised fine-tuning on downstream task

From the file "eval_BreastPathQ_SSL.py / eval_Camelyon_SSL.py / eval_Kather_SSL.py", you can fine-tune the network (i.e., task-specific supervised fine-tuning) on the downstream task with limited label data (10%, 25%, 50%). Refer to, paper for more details.

Arguments: --model_path - path to load self-supervised pretrained model (i.e., trained model from Step 1). There are other arguments that can be set in the corresponding files.

python eval_BreastPathQ_SSL.py  // Supervised fine-tuning on BreastPathQ   
python eval_Camelyon_SSL.py    // Supervised fine-tuning on Camelyon16
python eval_Kather_SSL.py    // Supervised fine-tuning on Kather dataset (Colorectal)

Note: we didn't perform self-supervised pretraining on the Kather dataset (colorectal) due to the unavailability of WSI's. Instead, we performed domain adaptation by pretraining on Camelyon16 and fine-tuning on the Kather dataset. Refer to, paper for more details.

3. Task specific teacher-student consistency training on downstream task

From the file "eval_BreastPathQ_SSL_CR.py / eval_Camelyon_SSL_CR.py / eval_Kather_SSL_CR.py", you can fine-tune the student network by keeping the teacher network frozen via task-specific consistency training on the downstream task with limited label data (10%, 25%, 50%). Refer to, paper for more details.

Arguments: --model_path_finetune - path to load SSL fine-tuned model (i.e., self-supervised pretraining followed by supervised fine-tuned model from Step 2) to intialize "Teacher and student network" for consistency training; There are other arguments that can be set in the corresponding files.

python eval_BreastPathQ_SSL_CR.py  // Consistency training on BreastPathQ   
python eval_Camelyon_SSL_CR.py    // Consistency training on Camelyon16
python eval_Kather_SSL_CR.py    // Consistency training on Kather dataset (Colorectal)

Testing

The test performance is validated at two stages:

Self-Supervised pretraining followed by supervised fine-tuning

From the file "eval_BreastPathQ_SSL.py / eval_Kather_SSL.py ", you can test the model by changing the flag in argument: '--mode' to 'evaluation'.

Consistency training

From the file "eval_BreastPathQ_SSL_CR.py / eval_Kather_SSL_CR.py", you can test the model by changing the flag in argument: '--mode' to 'evaluation'.

The prediction on Camelyon16 test set can be performed using "test_Camelyon16.py" file.

Citation

If you use significant portions of our code or ideas from our paper in your research, please cite our work:

@article{srinidhi2021self,
  title={Self-supervised driven consistency training for annotation efficient histopathology image analysis},
  author={Srinidhi, Chetan L and Kim, Seung Wook and Chen, Fu-Der and Martel, Anne L},
  journal={arXiv preprint arXiv:2102.03897},
  year={2021}
}

Acknowledgements

We would like to acknowledge the use of Compute Canada facilities for our computing resources. This work was funded by the Canadian Cancer Society (grant number #705772); National Cancer Institute of the National Institutes of Health [grant number #U24CA199374-01]; Canadian Institutes of Health Research.

Questions or Comments

Please direct any questions or comments to me; I am happy to help in any way I can. You can email me directly at [email protected].

ankitshah009 / ssl_cr_histo Goto Github PK

ssl_cr_histo's Introduction