GeneSegNet: a deep learning framework for cell segmentation by integrating gene expression and imaging. Genome Biology

Overview

Installation

Create conda environments, use:

conda create -n GeneSegNet python=3.8
conda activate GeneSegNet

Install Pytorch (1.12.1 Version), use:

conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch

but the above command may not match your CUDA environment, please check the link: https://pytorch.org/get-started/previous-versions/#v1121 to find the proper command that satisfies your CUDA environment.

Clone the repository, use:

git clone https://github.com/BoomStarcuc/GeneSegNet.git

Install dependencies, use:

pip install -r requirement.txt

Datasets and Model

Download the demo training datasets at GoogleDrive and unzip them to your project directory.
Download GeneSegNet pre-trained model at GoogleDrive, and put it into your project directory.

Data preprocess

Input

Directory structure of initial input data. See hippocampus demo datasets at GoogleDrive.

your raw dataset
 |-images
 |   |-image sample 1
 |   |-image sample 2
 |   |-...
 |-labels
 |   |-label sample 1
 |   |-label sample 2
 |   |-...
 |-spots
 |   |-spot sample 1
 |   |-spot sample 2
 |   |-...

Output

After preprocessing, you will output a dataset without splitting into training, validation and testing, as follows：

your preprocessed dataset
 |-sample 1
 |   |-HeatMaps
 |   |   |-HeatMap
 |   |   |-HeatMap_all
 |   |-images
 |   |-labels
 |   |-spots
 |-sample 2
 |   |-HeatMaps
 |   |   |-HeatMap
 |   |   |-HeatMap_all
 |   |-images
 |   |-labels
 |   |-spots
 |-...

Please see preprocessed hippocampus demo datasets at GoogleDrive.

Code run

If you use the demo training dataset we provided, you can skip this section. But if you want to train on your own dataset, you first need to run the preprocessing code in preprocess directory to satisfy the dataset structure during training.

python Generate_Image_Label_locationMap.py

Note: base_dir and save_crop_dir need to be modified to your corresponding path.

Training from scratch

Input

You will need to split the output of the preprocessing step into training, validation, and test sets in reasonable proportions. The structure of the dataset should be as follows:

your split dataset
 |-train
 |   |-sample 1
 |   |   |-HeatMaps
 |   |   |   |-HeatMap
 |   |   |   |-HeatMap_all
 |   |   |-images            
 |   |   |-labels 
 |   |   |-spots
 |   |-sample 2
 |   |-...
 |-val
 |   |-sample 3
 |   |   |-HeatMaps
 |   |   |   |-HeatMap
 |   |   |   |-HeatMap_all
 |   |   |-images            
 |   |   |-labels 
 |   |   |-spots
 |   |-sample 4
 |   |-...
 |-test
 |   |-sample 5
 |   |   |-HeatMaps
 |   |   |   |-HeatMap
 |   |   |   |-HeatMap_all
 |   |   |-images            
 |   |   |-labels 
 |   |   |-spots
 |   |-sample 6
 |   |-...

Please see the demo training dataset at GoogleDrive. Then you can start to train your model using command.

Output

After training, the algorithm will save the trained model to your specified path.

Code run

To run the algorithm on your data, use:

python -u GeneSeg_train.py --use_gpu --train_dir  training dataset path --val_dir validation dataset path --test_dir test dataset path --pretrained_model None --save_png --save_each --img_filter _image --mask_filter _label --all_channels --verbose --metrics --dir_above --save_model_dir save model path

Here:

use_gpu will use GPU if torch with cuda installed.
train_dir is a folder containing training data to train on.
val_dir is a folder containing validation data to train on.
test_dir is a folder containing test data to validate training results.
img_filter, mask_filter, and heatmap_filter are end strings for images, cell instance mask, and heat map.
pretrained_model is a model to use for running or starting training.
chan is a parameter to change the number of channels as input (default 2 or 4).
verbose shows information about running and settings and saves to log.
save_each save the model under per n epoch for later comparison.
save_png save masks as png and outlines as a text file for ImageJ.
metrics compute the segmentation metrics.
save_model_dir save training model to a directory

To see the full list of command-line options run:

python GeneSeg_train.py --help

Test and run a pre-trained model

Input

The input is your test dataset.

your test dataset
 |-test
 |   |-sample 5
 |   |   |-HeatMaps
 |   |   |   |-HeatMap
 |   |   |   |-HeatMap_all
 |   |   |-images            
 |   |   |-labels 
 |   |   |-spots
 |   |-sample 6
 |   |-...

Output

The output will include the following two images: 1) the predicted cell instance masks; 2) the cell boundary comparison plot between predicted results and training labels.

Code run

To run the test or a pre-trained model, use:

python GeneSeg_test.py --use_gpu --test_dir test dataset path --pretrained_model your trained model --save_png --img_filter _image --mask_filter _label --all_channels --metrics --dir_above --output_filename a folder name

Note: if you want to run a pre-trained model, you should download the pre-trained model provided first.

Network Inference

Input

The input of the network inference is your raw datasets. See hippocampus demo datasets at GoogleDrive.

your raw dataset
 |-images
 |   |-image sample 1
 |   |-image sample 2
 |   |-...
 |-labels
 |   |-label sample 1
 |   |-label sample 2
 |   |-...
 |-spots
 |   |-spot sample 1
 |   |-spot sample 2
 |   |-...

Output

The output of the network inference includes four files of each sample as follows:

|-HeatMap
|   |-sample 1
|   |-sample 2
|- predicted full-resolution .mat file for sample 1
|- predicted full-resolution .png file for sample 1
|- predicted full-resolution .jpg file for sample 1
|- predicted full-resolution .mat file for sample 2
|- predicted full-resolution .png file for sample 2
|- predicted full-resolution .jpg file for sample 2
|-...

Code run

To obtain final full-resolution segmentation results, use slidingwindows_gradient.py in Inference directory:

python slidingwindows_gradient.py

Note: root_dir, save_dir, and model_file need to be modified to your corresponding path.

Find the mapping relationships between transcripts and cells

Input

There are two types of input as follows:

1. your raw spot dataset
 |-spots
 |   |-spot sample 1
 |   |-spot sample 2
 |   |-...

2. your output of the network inference
 |-HeatMap
 |   |-sample 1
 |   |-sample 2
 |- predicted full-resolution .mat file for sample 1
 |- predicted full-resolution .png file for sample 1
 |- predicted full-resolution .jpg file for sample 1
 |- predicted full-resolution .mat file for sample 2
 |- predicted full-resolution .png file for sample 2
 |- predicted full-resolution .jpg file for sample 2
 |-...

Output

The output is a .csv file including four columns (cell_id, spotX, spotY, and gene) so that each gene will find its unique corresponding cell.

   cell_id   spotX   spotY   gene
 |    0	      213     419    Pvalb
 |    0	      248     442    Gad1
 |    1	      1212    18     Plp1
 |    .        .       .      .
 |    .        .       .      .
 |    .        .       .      .

Code run

python generate_MappingRelationships.py

Note: spot_dir, label_dir, and save_dir need to be modified to your corresponding path.

Citation

If you find our work useful for your research, please consider citing the following paper.

@article{wang2023genesegnet,
  title={GeneSegNet: a deep learning framework for cell segmentation by integrating gene expression and imaging},
  author={Wang, Yuxing and Wang, Wenguan and Liu, Dongfang and Hou, Wenpin and Zhou, Tianfei and Ji, Zhicheng},
  journal={Genome Biology},
  volume={24},
  number={1},
  pages={235},
  year={2023},
  publisher={Springer}
}

boomstarcuc / genesegnet Goto Github PK

genesegnet's Introduction

GeneSegNet: a deep learning framework for cell segmentation by integrating gene expression and imaging. Genome Biology

Overview

Installation

Datasets and Model

Data preprocess

Input

Output

Code run

Training from scratch

Input

Output

Code run

Test and run a pre-trained model

Input

Output

Code run

Network Inference

Input

Output

Code run

Find the mapping relationships between transcripts and cells

Input

Output

Code run

Citation

Recommend Projects

Recommend Topics

Recommend Org