Giter VIP home page Giter VIP logo

genesegnet's Introduction

GeneSegNet: a deep learning framework for cell segmentation by integrating gene expression and imaging. Genome Biology

Overview

Installation

  1. Create conda environments, use:
conda create -n GeneSegNet python=3.8
conda activate GeneSegNet
  1. Install Pytorch (1.12.1 Version), use:
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch

but the above command may not match your CUDA environment, please check the link: https://pytorch.org/get-started/previous-versions/#v1121 to find the proper command that satisfies your CUDA environment.

  1. Clone the repository, use:
git clone https://github.com/BoomStarcuc/GeneSegNet.git
  1. Install dependencies, use:
pip install -r requirement.txt

Datasets and Model

  1. Download the demo training datasets at GoogleDrive and unzip them to your project directory.
  2. Download GeneSegNet pre-trained model at GoogleDrive, and put it into your project directory.

Data preprocess

Input

Directory structure of initial input data. See hippocampus demo datasets at GoogleDrive.

your raw dataset
 |-images
 |   |-image sample 1
 |   |-image sample 2
 |   |-...
 |-labels
 |   |-label sample 1
 |   |-label sample 2
 |   |-...
 |-spots
 |   |-spot sample 1
 |   |-spot sample 2
 |   |-...

Output

After preprocessing, you will output a dataset without splitting into training, validation and testing, as follows:

your preprocessed dataset
 |-sample 1
 |   |-HeatMaps
 |   |   |-HeatMap
 |   |   |-HeatMap_all
 |   |-images
 |   |-labels
 |   |-spots
 |-sample 2
 |   |-HeatMaps
 |   |   |-HeatMap
 |   |   |-HeatMap_all
 |   |-images
 |   |-labels
 |   |-spots
 |-...

Please see preprocessed hippocampus demo datasets at GoogleDrive.

Code run

If you use the demo training dataset we provided, you can skip this section. But if you want to train on your own dataset, you first need to run the preprocessing code in preprocess directory to satisfy the dataset structure during training.

python Generate_Image_Label_locationMap.py

Note: base_dir and save_crop_dir need to be modified to your corresponding path.

Training from scratch

Input

You will need to split the output of the preprocessing step into training, validation, and test sets in reasonable proportions. The structure of the dataset should be as follows:

your split dataset
 |-train
 |   |-sample 1
 |   |   |-HeatMaps
 |   |   |   |-HeatMap
 |   |   |   |-HeatMap_all
 |   |   |-images            
 |   |   |-labels 
 |   |   |-spots
 |   |-sample 2
 |   |-...
 |-val
 |   |-sample 3
 |   |   |-HeatMaps
 |   |   |   |-HeatMap
 |   |   |   |-HeatMap_all
 |   |   |-images            
 |   |   |-labels 
 |   |   |-spots
 |   |-sample 4
 |   |-...
 |-test
 |   |-sample 5
 |   |   |-HeatMaps
 |   |   |   |-HeatMap
 |   |   |   |-HeatMap_all
 |   |   |-images            
 |   |   |-labels 
 |   |   |-spots
 |   |-sample 6
 |   |-...

Please see the demo training dataset at GoogleDrive. Then you can start to train your model using command.

Output

After training, the algorithm will save the trained model to your specified path.

Code run

To run the algorithm on your data, use:

python -u GeneSeg_train.py --use_gpu --train_dir  training dataset path --val_dir validation dataset path --test_dir test dataset path --pretrained_model None --save_png --save_each --img_filter _image --mask_filter _label --all_channels --verbose --metrics --dir_above --save_model_dir save model path

Here:

  • use_gpu will use GPU if torch with cuda installed.
  • train_dir is a folder containing training data to train on.
  • val_dir is a folder containing validation data to train on.
  • test_dir is a folder containing test data to validate training results.
  • img_filter, mask_filter, and heatmap_filter are end strings for images, cell instance mask, and heat map.
  • pretrained_model is a model to use for running or starting training.
  • chan is a parameter to change the number of channels as input (default 2 or 4).
  • verbose shows information about running and settings and saves to log.
  • save_each save the model under per n epoch for later comparison.
  • save_png save masks as png and outlines as a text file for ImageJ.
  • metrics compute the segmentation metrics.
  • save_model_dir save training model to a directory

To see the full list of command-line options run:

python GeneSeg_train.py --help

Test and run a pre-trained model

Input

The input is your test dataset.

your test dataset
 |-test
 |   |-sample 5
 |   |   |-HeatMaps
 |   |   |   |-HeatMap
 |   |   |   |-HeatMap_all
 |   |   |-images            
 |   |   |-labels 
 |   |   |-spots
 |   |-sample 6
 |   |-...

Output

The output will include the following two images: 1) the predicted cell instance masks; 2) the cell boundary comparison plot between predicted results and training labels.

Code run

To run the test or a pre-trained model, use:

python GeneSeg_test.py --use_gpu --test_dir test dataset path --pretrained_model your trained model --save_png --img_filter _image --mask_filter _label --all_channels --metrics --dir_above --output_filename a folder name

Note: if you want to run a pre-trained model, you should download the pre-trained model provided first.

Network Inference

Input

The input of the network inference is your raw datasets. See hippocampus demo datasets at GoogleDrive.

your raw dataset
 |-images
 |   |-image sample 1
 |   |-image sample 2
 |   |-...
 |-labels
 |   |-label sample 1
 |   |-label sample 2
 |   |-...
 |-spots
 |   |-spot sample 1
 |   |-spot sample 2
 |   |-...

Output

The output of the network inference includes four files of each sample as follows:

|-HeatMap
|   |-sample 1
|   |-sample 2
|- predicted full-resolution .mat file for sample 1
|- predicted full-resolution .png file for sample 1
|- predicted full-resolution .jpg file for sample 1
|- predicted full-resolution .mat file for sample 2
|- predicted full-resolution .png file for sample 2
|- predicted full-resolution .jpg file for sample 2
|-...

Code run

To obtain final full-resolution segmentation results, use slidingwindows_gradient.py in Inference directory:

python slidingwindows_gradient.py

Note: root_dir, save_dir, and model_file need to be modified to your corresponding path.

Find the mapping relationships between transcripts and cells

Input

There are two types of input as follows:

1. your raw spot dataset
 |-spots
 |   |-spot sample 1
 |   |-spot sample 2
 |   |-...

2. your output of the network inference
 |-HeatMap
 |   |-sample 1
 |   |-sample 2
 |- predicted full-resolution .mat file for sample 1
 |- predicted full-resolution .png file for sample 1
 |- predicted full-resolution .jpg file for sample 1
 |- predicted full-resolution .mat file for sample 2
 |- predicted full-resolution .png file for sample 2
 |- predicted full-resolution .jpg file for sample 2
 |-...

Output

The output is a .csv file including four columns (cell_id, spotX, spotY, and gene) so that each gene will find its unique corresponding cell.

   cell_id   spotX   spotY   gene
 |    0	      213     419    Pvalb
 |    0	      248     442    Gad1
 |    1	      1212    18     Plp1
 |    .        .       .      .
 |    .        .       .      .
 |    .        .       .      .

Code run

python generate_MappingRelationships.py

Note: spot_dir, label_dir, and save_dir need to be modified to your corresponding path.

Citation

If you find our work useful for your research, please consider citing the following paper.

@article{wang2023genesegnet,
  title={GeneSegNet: a deep learning framework for cell segmentation by integrating gene expression and imaging},
  author={Wang, Yuxing and Wang, Wenguan and Liu, Dongfang and Hou, Wenpin and Zhou, Tianfei and Ji, Zhicheng},
  journal={Genome Biology},
  volume={24},
  number={1},
  pages={235},
  year={2023},
  publisher={Springer}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.