Self-Supervised Vision Transformers for multi-channel single-cells images

Application of DINO for automated microscopy-derived fluorescent imaging datasets of single cells and instructions on how to run subsequent downstream analyses with trained Vision Transformers (ViTs) of non-RGB multi-channel images. See Emerging Properties in Self-Supervised Vision Transformers for the original DINO implementation and Self-supervised vision transformers accurately decode cellular state heterogeneity for the adaption described here. [DINO arXiv] [scDINO bioRxiv]

Check out our recent publication, Cellular Architecture Shapes the Naïve T Cell Response, in Science Magazine. We used scDINO to identify distinct T cell phenotypes by examining over 30,000 single-cell crops of CD4 and CD8 T cells from healthy donors. We trained ViT-S/16 models exclusively on CD3 single-channel images, and downstream analysis to investigate the phenotypic heterogeneity was performed by clustering the CLS-Token latent space and visualizing it with the TopOMetry framework [Science].

Further demonstration of the usefulness of the DINO framework for image-based biological discovery is presented in the preprint, Unbiased single-cell morphology with self-supervised vision transformers. This work demonstrates that self-supervised vision transformers can encode cellular morphology at various scales, from subcellular to multicellular [bioRxiv].

This codebase provides:

Workflow to run analyses of multi-channel image datasets (non-RGB) with publicly available self-supervised Vision Transformers (DINO-ss-ViTs) from [DINO arXiv] and with scDINO (scDINO-ss-ViTs) introduced in our paper [scDINO bioRxiv]
Workflow to train ViTs on multi-channel single-cell images generated by automated microscopy using scDINO and subsequently run downstream analyses

Pretrained models

Public available ss-ViTs pretrained on Imagenet with DINO

This table is adapted from the official DINO repository. You can choose to download the weights of the pretrained backbone used for downstream tasks, or the full checkpoint containing backbone and projection head weights for both student and teacher networks. Detailed arguments and training/evaluation logs are provided. Note that DeiT-S and ViT-S names refer exactly to the same architecture.

arch	download
DINO-ss-ViT-S/16	backbone only	full ckpt	args	logs
DINO-ss-ViT-S/8	backbone only	full ckpt	args	logs
DINO-ss-ViT-B/16	backbone only	full ckpt	args	logs
DINO-ss-ViT-B/8	backbone only	full ckpt	args	logs

scDINO ss-ViTs pretrained on high-content imaging data of single immune cells

Here you can donwload the pretrained single-cell DINO (scDINO) ss-ViTs used in our article [scDINO bioRxiv]. The ViTs are pretrained on the Deep phenotyping PBMC Image Set of Y.Severin, a high-content imaging dataset containing labeled single-cell images of 8 different immune cell classes from multiple healthy donors. Here we provide the scDINO-ss-ViT-S/16 full checkpoint trained for 100 epochs.

arch	download
scDINO-ss-ViT-S/16	full ckpt	args	logs

Requirements

This codebase has been developed on a linux machine with python version 3.8, snakemake 7.20.0, torch 1.8.1, torchvision 0.9.1 and a HPC cluster running with the slurm workload manager. All required python packages and corresponding version for this setup can be found in the requirements.txt file.

Analyse non-RGB multi-channel images with pretrained ViTs

In Figure 1 of our manuscript we show [scDINO bioRxiv] how DINO-ss-ViTs can be applied to decipher stem cell heterogeneity using single-cell images derived from high-content imaging. These single-cell images images are not RGB-based, but composed of several separate microscopy-derived greyscale images that were combined in one multi-channel TIFF image. To be able to use these multi-channel input images in combination with ViTs, we load the values of a TIFF input file as a multidimensional pytorch tensor in the Multichannel_dataset(datasets.ImageFolder) Class in the compute_CLS_features.py which is used to construct the pytorch dataset object.

Run all 3 analyses at once

To send a job to the slurm cluster to compute the CLS Token representations, visualise their embeddings using UMAP and generate example attention images all at once, use the only_downstream_snakefile snakemake file and the only_downstream_snakefile.yml configuration file.

Example submission:

snakemake -s only_downstream_snakefile all \
--configfile="configs/only_downstream_analyses.yaml" \
--keep-incomplete \
--drop-metadata \
--keep-going \
--cores 8 \
--jobs 40 \
-k \
--cluster "sbatch --time=01:00:00 \
--gpus=1 \
-n 8 \
--mem-per-cpu=9000 \
--output=slurm_output_evaluate.txt \
--error=slurm_error_evaluate.txt" \
--latency-wait 45 \

All configurations and parameters (metadata and hyperparameters) of the job can be set in the only_downstream_snakefile.yml file. The results will be saved in the output_dir folder. Instead of running all 3 analyses at once, you can also run them separately, by specifying the target rule in the snakemake command.

Compute [CLS] Token representations

The representation of an image is given by the output of the [CLS] Token in form of a numeric vector with dimensionality d = 384 for ViT-S and d = 768 for ViT-B. To compute a [CLS] feature space for a given dataset, prepare the configuration variables in the downstream_analyses: subsection in only_downstream_snakefile.yaml.

To learn more about the args in the configuration file for the computation of the features, run:

pyscripts/compute_CLS_features.py --help

Visualise CLS token space using UMAP

To get a glimpse of the feature space, we can use the UMAP algorithm to project multidimensional vectors into a 2D embedding. The UMAP parameters can be adjusted in the downstream_analyses: umap_eval: subsection of the config file.

Run k-NN evaluation

To quantitatively evaluate label-specific clustering, we can run a k-NN evaluation to get a global clustering score across classes. The kNN parameters can be adjusted in the configuration file in the downstream_analyses: kNN: subsection.

Visualisation of the CLS Token-based Self-Attention Mechanism of ss-ViTs

To visualise the CLS token-based self-attention of the ss-ViTs, attention maps can be generated for each image class. Our default settings randomly pick 1 image per image class in the given dataset. The attention maps are saved in the attention_maps subfolder of the output_dir in the results folder. Each attention head is saved as a separate image. Additionally, for each original multi-channel input image, all channels are separately saved as a single image.

scDINO training and evaluation on greyscale multi-channel images

To train your own Vision Transformers on a given dataset from scratch and subsequently evaluate them on downstream tasks (with automatic train and test split), use the full_pipeline_snakefile and the scDINO_full_pipeline_snakefile.yml configuration file.

Example submission:

snakemake -s full_pipeline_snakefile all \
--configfile="configs/scDINO_full_pipeline.yaml" \
--keep-incomplete \
--drop-metadata \
--cores 8 \
--jobs 40 \
-k \
--cluster "sbatch --time=04:00:00 \
--gpus=2 \
-n 8 \
--mem-per-cpu=9000 \
--output=slurm_output.txt \
--error=slurm_error.txt" \
--latency-wait 45 \

To reproduce the scDINO-ss-ViT-S/16 used in our manuscript, download the Deep phenotyping PBMC Image Set of Y.Severin and define the path to the dataset in the config file under dataset_dir.

License

This repository adheres to the Apache 2.0 license. You can find more information on this in the LICENSE file.

Citation

If you find this adaption useful for your research, please consider citing us:

@article {Pfaendler2023.01.16.524226,
	author = {Pfaendler, Ramon and Hanimann, Jacob and Lee, Sohyon and Snijder, Berend},
	title = {Self-supervised vision transformers accurately decode cellular state heterogeneity},
	year = {2023},
	doi = {10.1101/2023.01.16.524226},
	URL = {https://www.biorxiv.org/content/early/2023/01/18/2023.01.16.524226},
	eprint = {https://www.biorxiv.org/content/early/2023/01/18/2023.01.16.524226.full.pdf},
	journal = {bioRxiv}
}

CLS features all nan when passing in 4 channel images.

I am using scDINO on my image sets. I have four channel fluorescence images. To make my images compatible with the pre-trained model, I made a 5th channel with all 0's.

This is an example of how I make my blank channel(s).
Where image is the loaded image via the tifffile python package

if image.shape[-1] < 5:
        channels_to_add = 5 - image.shape[-1]
        for channel in range(channels_to_add):
            # add a new channel of all zeros
            new_channels = np.zeros((image.shape[0], image.shape[1], 1))
            image_merge = np.concatenate((image,new_channels), axis=-1)
    print(image_merge.shape)

The issue I am having in scDINO seems to arise in the normalize_numpy_0_to_1 & normalize_tensor_per_channel functions in the pyscripts/utils.py file.
When I have a channel of a uniform distribution of pixel values I end up with a normalized array value in the uniformly distributed channel of NaN due to the zerodivision that occurs on lines 78 & 108.

I have a proposed change that would help scDINO use multichannel images that contain uniform distributions.
If the maintainers of scDINO agree with the proposed change I can happily open a PR from this issue!
The proposed change is below.

def normalize_numpy_0_to_1(x):
    print("x",x.shape)
    x_min = x.min(axis=(0,1), keepdims=True)
    x_max = x.max(axis=(0,1), keepdims=True)
    diff_min_max = x_max - x_min
    if check_nan(diff_min_max):
        print("diff_min_max is nan")
    if check_nan(x-x_min):
        print("x-x_min is nan:")
    if check_nan(x):
        print("x contains nan before normalization")
    if check_zero(diff_min_max):
        print("diff_min_max has zero")
        print("x_max",x_max)
        print("x_min",x_min)
        print("diff_min_max",diff_min_max)
        print("x",x.shape)
        # replace x_max 0 values with 1
        for i in range(len(x_max[0][0])):
            if x_max[0][0][i] == 0:
                x_max[0][0][i] = 1
    x = (x - x_min)/(x_max-x_min)
    if check_nan(x):
        print("x contains nan after normalization")
    return x

Excellent work on scDINO and implementations! I would love to talk more in the future!

jacobhanimann / scdino Goto Github PK

scdino's Introduction

Self-Supervised Vision Transformers for multi-channel single-cells images

This codebase provides:

Pretrained models

Public available ss-ViTs pretrained on Imagenet with DINO

scDINO ss-ViTs pretrained on high-content imaging data of single immune cells

Requirements

Analyse non-RGB multi-channel images with pretrained ViTs

Run all 3 analyses at once

Compute [CLS] Token representations

Visualise CLS token space using UMAP

Run k-NN evaluation

Visualisation of the CLS Token-based Self-Attention Mechanism of ss-ViTs

scDINO training and evaluation on greyscale multi-channel images

License

Citation

scdino's People

Contributors

Stargazers

Watchers

Forkers

scdino's Issues

Recommend Projects

Recommend Topics

Recommend Org