HIA (Histopathology Image Analysis)

This repository contains the Python version of a general workflow for end-to-end artificial intelligence on histopathology images. It is based on workflows which were previously described in Kather et al., Nature Medicine 2019 and Echle et al., Gastroenterology 2020. The objective is to predict a given label directly from digitized histological whole slide images (WSI). The label is defined on the level of patients, not on the level of pixels in a given WSI. Thus, the problems addressed by HAI are weakly supervised problems. Common labels are molecular subtype of cancer, binarized clinical outcome or treatment response. Compared to previous Matlab-based implementations of this framework (e.g. DeepHistology), this version is implemented using Python and PyTorch and is highly scalable and extensively validated in multiple clincially relevant problems. A key feature of HIA is that it provides an implementation of multiple artificial intelligence algorithms, including

Classical resnet-based training (similar to Kather et al., Nature Medicine 2019)
Vision transformers (inspired by 8Dosovitskiy et al., conference paper at ICLR 2021](https://arxiv.org/abs/2010.11929)
Multiple instance learning (similar to Campanella et al., Nature Medicine 2019)
CLAM - Clustering-constrained attention multiple instance learning (described in Lu et al., Nature Machine Intelligence 2020)

This is important to notice that there are various changes in this version but it follows the same steps.

++ These scripts are still under the development and please always use the final version of it ++

How to use this repository:

To use this workflow, you need to modfiy specific experiement file based on your project. Experiment file is a text file and an example of it can be find this repository. For this file you need to fill the following options:

Input Variable name	Description
-projectDetails	This is an optional string input. In this section you can write down some keywords about your experiment.
-dataDir_train	Path to the directory containing the normalized tiles. For example : ["K:\TCGA-CRC-DX"]. This folder should contain a subfolder of tiles which can have one of the following names: {BLOCKS_NORM_MACENKO, BLOCKS_NORM_VAHADANE, BLOCKS_NORM_REINHARD or BLOCKS}. The clinical table and the slide table of this data set should be also stored in this folder. This is an example of the structure for this folder: K:\TCGA-CRC-DX: { 1. BLOCKS_NORM_MACENKO 2. TCGA-CRC-DX_CLINI.xlsx 3. TCGA-CRC-DX_SLIDE.csv }
-dataDir_test	If you are planning to have external validation for your experiemnt, this varibal is the path to the directory containing the normalized tiles which will be used in external validation. This folder should have the same structure as the 'dataDir_train'.
-targetLabels	This is the list of targets which you want to analyze. The clinical data should have the values for these targets. For Example : ["isMSIH", "stage"].
-trainFull	If you are planning to do cross validation, this variable should be defined as False. If you want to use all the data to train and then use the external validation, then this variable should be defined as True.
-maxNumBlocks	This integer variable, defines the maximum number of tiles which will be used per slide. Since the number of extracted tiles per slide can vary alot, we use limited number of tiles per slide. For more detail, please ckeck the paper.
-epochs	This integer variable, defines the number of epochs for training.
-batchSize	This integer variable, defines the batch size for training.
-k	This integer variable, defined the number of K for cross validation experiment. This will be considered only if the trainFull variable has the value of False.
-modelName	This is a string variable which can be defined using one of the following neural network models. The script will download the pretrained weights for each of these models. {resnet, alexnet, vgg, squeezenet, densenet, inception, vit, efficient}
-opt	This is a string variable defining the name of optimizer to use for training. {"adam" or "sgd"}
-lr	This float variable defines the learning rate for the optimizer.
-reg	This float variable defines the weight_decay for the optimizer.
-gpuNo	If the computer has more than one gpu, this variable can be assigned to run the experiment on specified gpu.
-freezeRatio	This is a float variable which can vary between [0, 1]. It will specified the ratio of the neural network layers to be freezed during the training.

Run training :

To start training, we use the Main.py script. The full path to the experiemnt file, should be used as an input variable in this script.

External Validation:

If you used trainFull = True in the experiemnt file and you want to evaluate your model on the external data set, you should use the script named Deploy_Classic.py. In this script, following two inputs should be filled:
{
1. addressExp: is the full path to the experiment file created for external validation. This experiemnt file has the same features as explained above. DataDir_test is the path to folder of dataset which will be used for external validation. The targetLabels is a single target which you want to evaluate.
2. modelAdr is the full path to the model which is saved in the RESULT folder of the experiemnt which you defined trainFull as True.
}

pzsuen / hia Goto Github PK

hia's Introduction

HIA (Histopathology Image Analysis)

How to use this repository:

Run training :

External Validation:

hia's People

Contributors

Stargazers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent