peter554 / staintools Goto Github PK

View Code? Open in Web Editor NEW

312.0 7.0 108.0 64.51 MB

Tools for tissue image stain normalisation and augmentation in Python 3

License: MIT License

Python 100.00%

computational-pathology medical-imaging python

staintools's Introduction

StainTools

Tools for tissue image stain normalization and augmentation in Python 3.

Install

pip install staintools
Install SPAMS. This is a dependency to staintools and is technically available on PyPI (see here). However, personally I have had some issues with the PyPI install and would instead recommend using conda (see here).

Quickstart

Normalization

Original images:

Stain normalized images:

# Read data
target = staintools.read_image("./data/my_target_image.png")
to_transform = staintools.read_image("./data/my_image_to_transform.png")

# Standardize brightness (optional, can improve the tissue mask calculation)
target = staintools.LuminosityStandardizer.standardize(target)
to_transform = staintools.LuminosityStandardizer.standardize(to_transform)

# Stain normalize
normalizer = staintools.StainNormalizer(method='vahadane')
normalizer.fit(target)
transformed = normalizer.transform(to_transform)

Augmentation

# Read data
to_augment = staintools.read_image("./data/my_image_to_augment.png")

# Standardize brightness (optional, can improve the tissue mask calculation)
to_augment = staintools.LuminosityStandardizer.standardize(to_augment)

# Stain augment
augmentor = staintools.StainAugmentor(method='vahadane', sigma1=0.2, sigma2=0.2)
augmentor.fit(to_augment)
augmented_images = []
for _ in range(100):
    augmented_image = augmentor.pop()
    augmented_images.append(augmented_image)

More examples

For more examples see files inside of the examples directory.

staintools's People

Contributors

Stargazers

Watchers

staintools's Issues

rgb-od and od-rgb

Many thanks of your codes. They are really useful for me.
I have a question about the color deconvolution.
The Lambert-Beer's Law is T = \frac{I}{I_0} = e^{-\tau} = 10^{-A}, where optical depth is \tau and absorbance is A.
So, rgb to od can be OD = -log_{10} (\frac{I}{I_0}), why od to rgb is RGB = exp(-1 * od) but not 10^(-1 * od)?

No such file or directory: 'VERSION'

I got an error when I tried to install staintools with pip.

$ pip install staintools
Collecting staintools
  Using cached https://files.pythonhosted.org/packages/3d/64/7856a7009d49ab4661abb2618c4deccf8f7330634585408c81db93f55fb2/staintools-0.1.0.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-up7ulmum/staintools/setup.py", line 6, in <module>
        with open('VERSION') as f:
    FileNotFoundError: [Errno 2] No such file or directory: 'VERSION'
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-up7ulmum/staintools/

There is no VERSION in this tar. Hope you can fix it.

AssertionError: Negative optical density.

Had a few issues while using this library. They are as follows:

Installation issue(spams error)
Running stain normalization on a batch if 15000 images resulted in the Negative optical density error.

Looking forward to a fix. Thanks.

How to preserve the normalized picture?

I have used your code like:
i1=utils.read_image('./data/i1.png') i2=utils.read_image('./data/i8.tif') n=stainNorm_Reinhard.Normalizer() n.fit(i1) normalized=n.transform(i2) cv.imwrite('i8_n.tif', normalized)
but the picture preserved as 'i8_n.tif' looks strange. I want to know how to preserved the normalized picture which looks like in your example.

Edit contribution.md

StainTools/CONTRIBUTING.md

Line 1 in 42d0478

Principal Eigen vectors inversion, based on the principal eigen vector

Thank you for open sourcing your code. I have a question regarding the two lines of code below.
Why you inversed the eigen vectors based on the principal eigen vector's values?

# Make sure vectors are pointing the right way
        if V[0, 0] < 0: V[:, 0] *= -1
        if V[0, 1] < 0: V[:, 1] *= -1

SPAMS dependency / is spams lasso the slowest part in the whole thing?

We could replace spams lasso with a small NN based mapping to the concentrations. That should speed up the augmentor. What do you think @Peter554 ?

numpy.linalg.LinAlgError when calling transform

/users/sli59/anaconda3/envs/torch_lester/lib/python3.7/site-packages/staintools/stain_normalizer.py:41: RuntimeWarning: divide by zero encountered in true_divide
  source_concentrations *= (self.maxC_target / maxC_source)
/users/sli59/anaconda3/envs/torch_lester/lib/python3.7/site-packages/staintools/stain_normalizer.py:41: RuntimeWarning: invalid value encountered in multiply
  source_concentrations *= (self.maxC_target / maxC_source)
Empty Tissue Mask
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/users/sli59/anaconda3/envs/torch_lester/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "normalize.py", line 40, in training_slide_range_to_images
    training_slide_to_image(slide_num)
  File "normalize.py", line 31, in training_slide_to_image
    to_transform = normalizer.transform(to_transform)
  File "/users/sli59/anaconda3/envs/torch_lester/lib/python3.7/site-packages/staintools/stain_normalizer.py", line 38, in transform
    stain_matrix_source = self.extractor.get_stain_matrix(I)
  File "/users/sli59/anaconda3/envs/torch_lester/lib/python3.7/site-packages/staintools/stain_extraction/macenko_stain_extractor.py", line 30, in get_stain_matrix
    _, V = np.linalg.eigh(np.cov(OD, rowvar=False))
  File "/users/sli59/anaconda3/envs/torch_lester/lib/python3.7/site-packages/numpy/linalg/linalg.py", line 1444, in eigh
    _assertRankAtLeast2(a)
  File "/users/sli59/anaconda3/envs/torch_lester/lib/python3.7/site-packages/numpy/linalg/linalg.py", line 207, in _assertRankAtLeast2
    'at least two-dimensional' % a.ndim)
numpy.linalg.LinAlgError: 0-dimensional array given. Array must be at least two-dimensional
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "normalize.py", line 105, in <module>
    multiprocess_training_slides_to_images()
  File "normalize.py", line 78, in multiprocess_training_slides_to_images
    (start_ind, end_ind) = result.get()
  File "/users/sli59/anaconda3/envs/torch_lester/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
numpy.linalg.LinAlgError: 0-dimensional array given. Array must be at least two-dimensional

Let me know if any additional information is needed for diagnosing the problem. Thank you for your time.

Copyright and usage question

@Peter554 Thank you so much for open-sourcing your code. The implementations are really nice to follow and are clear. This is immensely helpful to the computational pathology community, myself included. I wanted to ask if you have any specific copyright restrictions on usage and modification of your code with clear and unambiguous attribution of this repository as the source. Specifically, I am one of the people who contribute to and regularly use an open-source package called HistomicsTK:

https://github.com/DigitalSlideArchive/HistomicsTK

I would love to integrate some of the implementations you have here, either as-is or with modifications to integrate well with other aspects of the package. Ideally, if things are modified so that they integrate well with HistomicsTK then some of these pipelines can be run on the server-side on a whole-slide level, which is one reason why I cannot simply add StainTools as a dependency. If that is alright with you, I would love if you:

Confirm if it is OK to integrate and modify code from this repository other open-access packages.
If you agree to the previous point, what copyright notice would you like me to show, and where?

Thank you so much again for this wonderful repo.

The software stops when have a empty tissue image when I tried to process hundreds of images

I don't think it is a real issue, but when I process lots of images using for loop in python, one blank image with empty tissue will stop the whole process. Can you modified the code and let it do nothing on that image, but keep processing other images? Thank you!

Here is the error message:

~/staintools/tissue_masks/luminosity_threshold_tissue_locator.py", line 27, in get_tissue_mask
raise TissueMaskException("Empty tissue mask computed")
staintools.miscellaneous.exceptions.TissueMaskException: Empty tissue mask computed

spams??

home/liuzhisheng/Stain_Normalization/stain_utils.py in ()
11 import numpy as np
12 import cv2 as cv
---> 13 import spams
14 import matplotlib.pyplot as plt
15
ImportError: No module named spams

How to install spams,when I run pip install spams, No matching distribution found for spams

Floating point exception (core dumped)

Hello @Peter554,
I get a Floating point exception (core dumped) when I'm using transform(img) from staintools.StainNormalizer(method='vahadane') with stain_normalizer.fit(i1_standard) previously applied and i1_standard being the same image as given in the example.
Here is a sample code to reproduce the error:

import staintools
import numpy as np

i1 = staintools.read_image("i1.png")
img = np.load("out.npy")

stain_standardizer = staintools.BrightnessStandardizer()
stain_normalizer = staintools.StainNormalizer(method='vahadane')

i1_standard = stain_standardizer.transform(i1)
stain_normalizer.fit(i1_standard)

print("img shape", img.shape)
print("img dtype", img.dtype)
img = stain_standardizer.transform(img)
img = stain_normalizer.transform(img)

Output:

img shape (128, 128, 3)
img dtype uint8
Floating point exception (core dumped)

Any idea why this happens? I've put the npy array file here.
Thanks a lot for this amazing tool!

Pip install broken

  Collecting staintools
  Downloading https://files.pythonhosted.org/packages/02/09/e6facc38145dc02d521c5233165d65768369b9326a9dde7f032a353f8475/staintools-2.1.1.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-5kh4awl3/staintools/setup.py", line 4, in <module>
        readme = f.read()
      File "/usr/lib/python3.5/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 3940: ordinal not in range(128)
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-5kh4awl3/staintools/

Abstract class for normalizers?

@Peter554 , Given that there are some other methods that are possible to add for stain normalisation, would it make sense to create an abstract base class? Something like

class abstract_normalizer(metaclass=ABCmeta):
     def __init__(self):
      
     @abstractmethod
     def fit():
           raise NotImplementedError("Should implement fit method")

Failed building wheel for spams

For anyone who has trouble with installing spam by pip, I used this command before installing:

sudo apt-get install libatlas-base-dev liblapack-dev libblas-dev

The reference comes from the original spam git. However, the command is out of date in the original documentation.

https://gitlab.inria.fr/thoth/spams-devel/tree/dev/swig/python3_archives

omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.

i meet this error

ModuleNotFoundError after installation

MacOS Mojave 10.14.6
Python 3.7.4

I installed SPAMS via conda. Then, I installed staintools 2.1.2 via pip, which succeeded. Then, I went to import staintools and received a ModuleNotFoundError.

I saw a past issue that was similar. Not sure where to go from here. Any help would be appreciated!

About stain normalization speed.....

Which one of stain normalization algorithms is fastest? Besides the methods have implemented here.

As I use the Vahadane's method, it cost 1.4s for a 512x512 image. However, I need to run a stain normalization method for thousands of millions of images, and it cost too much time.

Problem of using StainTools for a tensor input

I want to use StainTools in the Kaggle notebook TPU platform. But it seems StainTools does not support tensor inputs and return an error.

Can you update your image and give more about how to use it？

bad_array_new_length via vahadane/macenko on very large image

I am running on anaconda py35 and getting this error with a large slide image.

terminate called after throwing an instance of 'std::bad_array_new_length'
what(): std::bad_array_new_length
Aborted (core dumped)

Reference Shape
(39060, 17973, 3)

Target Shape
(73838, 27392, 3)

This is an image that is bigger than openable via cv2 so I do load it in via skimage which returns the same type of array object that cv2 does so I don't know if it's unexpected behavior because this image is large.

Via reinhard, I just get memory issues on a 240gb of ram vm.

Reasoning behind using spams package for stain concentrations?

Hello @Peter554 , amazing peace of code. thanks! Just wondering what was the reasoning behind using spams for lasso?

Move RJ to normalizers

oy @Peter554 ! Can I move RJ to normalizers from the augmentors?

Question about Normalization for Immunohistochemical Stainings

Hi :)!

I'm trying to use your tool to do color normalization of immunohistochemical stains (3 dyes, CD8, PanCk, and nucleus), but I am not getting good results. I was wondering if there is a parameter I need to change because I have three different dyes and not just two like in H&E.

I've been using the VahadaneStainExtractor with the default settings so far.

Thanks for the help!

ModuleNotFoundError after installation

Hi Peter,

I got a ModuleNotFoundError when importing staintools.
Here's the traceback:

Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
  File "/lib/python3.6/site-packages/staintools/__init__.py", line 1, in <module>
    from .stain_extractors.macenko_stain_extractor import MacenkoStainExtractor
ModuleNotFoundError: No module named 'staintools.stain_extractors'

I installed staintools via pip on Ubuntu16.04 with python 3.6. I installed the required library as well.
Do you have any idea why this happens and how to make it work properly?

AttributeError: module 'staintools' has no attribute 'BrightnessStandardizer'

I get the following error when I run your quick start code in readme.file

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-8-69abf12806f3> in <module>()
      1 from matplotlib import pyplot as plt
      2 
----> 3 transformed = transform(target,source)
      4 print(type(transformed))
      5 plt.imshow(transformed)

<ipython-input-7-707bae0476cf> in transform(source, target)
     22 
     23     #Standardize brightness (This step is optional but can improve the tissue mask calculation)
---> 24     standardizer = staintools.BrightnessStandardizer()
     25     target = standardizer.transform(target)
     26     to_transform = standardizer.transform(to_transform)

AttributeError: module 'staintools' has no attribute 'BrightnessStandardizer'

stain augmentation seems to modify contrast/intensity but not hue

Hello,
I have tried to augment image patches using different values for sigma1 and 2. Is it possible that this function perturbs contrast or intensity but not hue/color?
Best regards,
Kai

Missing to scale stain concentration in VahadaneNormalizer??

I read the paper and found we need to scale stain concentration in VahadaneNormalizer.
But maybe, it doesn't implemented in this library, right??

I think, we should fix fit and transfer method.

Error with division of zero in vahadane staining

n=stainNorm_Vahadane.normalizer()
n.fit(i1)
normalized=utils.build_stack((i1,n.transform(i2),n.transform(i3),n.transform(i4),n.transform(i5),n.transform(i6)))
stain_utils.py:143: RuntimeWarning: divide by zero encountered in log
return -1 * np.log(I / 255)
NotImplementedError Traceback (most recent call last)
in ()
1 n=stainNorm_Vahadane.normalizer()
----> 2 n.fit(i1)
3 normalized=utils.build_stack((i1,n.transform(i2),n.transform(i3),n.transform(i4),n.transform(i5),n.transform(i6)))

/home/tuantr/Documents/Thesis/Stain_Normalization/stainNorm_Vahadane.pyc in fit(self, target)
36 def fit(self, target):
37 target = ut.standardize_brightness(target)
---> 38 self.stain_matrix_target = get_stain_matrix(target)
39
40 def target_stains(self):

/home/tuantr/Documents/Thesis/Stain_Normalization/stainNorm_Vahadane.pyc in get_stain_matrix(I, threshold, lamda)
17 OD = ut.RGB_to_OD(I).reshape((-1, 3))
18 OD = OD[mask]
---> 19 dictionary = spams.trainDL(OD.T, K=2, lambda1=lamda, mode=2, modeD=0, posAlpha=True, posD=True, verbose=False).T
20 if dictionary[0, 0] < dictionary[1, 0]:
21 dictionary = dictionary[[1, 0], :]

/home/tuantr/anaconda2/lib/python2.7/site-packages/spams.pyc in trainDL(X, return_model, model, D, numThreads, batchsize, K, lambda1, lambda2, iter, t0, mode, posAlpha, posD, expand, modeD, whiten, clean, verbose, gamma1, gamma2, rho, iter_updateD, stochastic_deprecated, modeParam, batch, log_deprecated, logName)
1978 lambda3 = 0.
1979 regul = None
-> 1980 return __allTrainDL(X,return_model,model,False,D,None,None,numThreads,0.000001,True,False,batchsize,K,lambda1,lambda2,lambda3,iter,t0,mode,regul,posAlpha,posD,expand,modeD,whiten,clean,verbose,gamma1,gamma2,rho,iter_updateD,stochastic_deprecated,modeParam,batch,log_deprecated,logName)
1981
1982 def structTrainDL(

/home/tuantr/anaconda2/lib/python2.7/site-packages/spams.pyc in __allTrainDL(X, return_model, model, in_memory, D, graph, tree, numThreads, tol, fixed_step, ista, batchsize, K, lambda1, lambda2, lambda3, iter, t0, mode, regul, posAlpha, posD, expand, modeD, whiten, clean, verbose, gamma1, gamma2, rho, iter_updateD, stochastic_deprecated, modeParam, batch, log_deprecated, logName)
1826 iter,t0,mode,regul,posAlpha,posD,
1827 expand,modeD,whiten,clean,verbose,gamma1,gamma2,rho,iter_updateD,
-> 1828 stochastic_deprecated,modeParam,batch,log_deprecated,logName)
1829
1830 if return_model:

/home/tuantr/anaconda2/lib/python2.7/site-packages/spams_wrap.pyc in alltrainDL(*args)
262 alltrainDL(Data< float > * X, bool in_memory, bool return_model, Matrix< float > * m_A, Matrix< float > * m_B, int m_iter, Matrix< float > * D1, Vector< float > * eta_g, SpMatrix< bool > * groups, SpMatrix< bool > * groups_var, Vector< int > * own_variables, Vector< int > * N_own_variables, int num_threads, float tol, bool fixed_step, bool ista, int batch_size, int K, double lambda1, double lambda2, double lambda3, int iter, double t0, constraint_type mode, char * name_regul, bool posAlpha, bool posD, bool expand, constraint_type_D modeD, bool whiten, bool clean, bool verbose, double gamma1, double gamma2, float rho, int iter_updateD, bool stochastic, int modeParam, bool batch, bool log, char * logName) -> Matrix< float > *
263 """
--> 264 return _spams_wrap.alltrainDL(*args)
265
266 def archetypalAnalysis(*args):

NotImplementedError: Wrong number or type of arguments for overloaded function 'alltrainDL'.
Possible C/C++ prototypes are:
_alltrainDL< double >(Data< double > *,bool,Matrix< double > **,Matrix< double > **,Vector< int > **,bool,Matrix< double > *,Matrix< double > *,int,Matrix< double > *,Vector< double > *,SpMatrix< bool > *,SpMatrix< bool > *,Vector< int > *,Vector< int > *,int,double,bool,bool,int,int,double,double,double,int,double,constraint_type,char *,bool,bool,bool,constraint_type_D,bool,bool,bool,double,double,double,int,bool,int,bool,bool,char *)
_alltrainDL< float >(Data< float > *,bool,Matrix< float > **,Matrix< float > **,Vector< int > **,bool,Matrix< float > *,Matrix< float > *,int,Matrix< float > *,Vector< float > *,SpMatrix< bool > *,SpMatrix< bool > *,Vector< int > *,Vector< int > *,int,float,bool,bool,int,int,double,double,double,int,double,constraint_type,char *,bool,bool,bool,constraint_type_D,bool,bool,bool,double,double,float,int,bool,int,bool,bool,char *)

MultiProcessing

Hello,

I want to use multiple CPUs to accelerate the function.
When I use your class in multiprocessing, the kernel just freezes, and it does not do anything.

Can you help on this?

Thanks,

I use your code ,But it does not have a good effect

The result is:

Python 2 compatibility

Does anybody want to do some compatibility tests for Python 2? Presently only really checked on Python 3 (3.5). I've added from __future__ import division to most of the files but this may not be enough..

Make `pip install` great again (2) pip install documentation?

Here's a suggestion; Adding the following instructions to the README:

Installing spams from pip:

$ pip install numpy  # Must be done before installing spams
$ pip install spams --index-url https://test.pypi.org/simple

Another option for pip users would be adding spams to setup.py's setup's install_require argument, as the following:

install_requires=[
    'numpy',
    'opencv-python',
    'matplotlib',
    'future',
    'spams',
    ], ...

One caveat of this option would be that installation wouldn't be as simple as
pip install staintools, because the spams related issues also apply here:

$ pip install numpy  # Must be done before because of `spams` dependency
$ pip install staintools --extra-index-url https://test.pypi.org/simple  # also `spams` related

Negative optical density issue

It seems on certain images that a negative optical density issue slips through. I ran the documentation code on some proprietary breast cancer H&E stains using python 3.6, and I ran into the error multiple times. I tracked the error to remove_zeros function in misc_utils.py as the check to find a zero is

mask = (I == 0)

But Numpy's log function occasionally returns -0.0. I used a hacky fix in convert_RGB_to_OD(I) to prevent the error for now:

return np.maximum(-1 * np.log(I / 255), np.zeros(I.shape) + 0.1)

This seems to be working for the time being, but there may be a better way to patch it up.

GPU accelerate？

Hello! I'm using your code to normalize over 1.6 million patches, and find it's really time consuming. Do you have any plans to accelerate using GPUs?

A bug??

LinAlgError Traceback (most recent call last)
in ()
1 n=stainNorm_Macenko.normalizer()
----> 2 n.fit(i1)
3 normalized=utils.build_stack((i1,n.transform(i2),n.transform(i3),n.transform(i4),n.transform(i5),n.transform(i6)))

/home/liuzhisheng/stain_Normalization/stainNorm_Macenko.pyc in fit(self, target)
55 def fit(self, target):
56 target = ut.standardize_brightness(target)
---> 57 self.stain_matrix_target = get_stain_matrix(target)
58 self.target_concentrations = ut.get_concentrations(target, self.stain_matrix_target)
59

/home/liuzhisheng/stain_Normalization/stainNorm_Macenko.pyc in get_stain_matrix(I, beta, alpha)
25 OD = ut.RGB_to_OD(I).reshape((-1, 3))
26 OD = (OD[(OD > beta).any(axis=1), :])
---> 27 _, V = np.linalg.eigh(np.cov(OD, rowvar=False))
28 V = V[:, [2, 1]]
29 if V[0, 0] < 0: V[:, 0] *= -1

/home/liuzhisheng/anaconda2/lib/python2.7/site-packages/numpy/linalg/linalg.pyc in eigh(a, UPLO)
1274
1275 signature = 'D->dD' if isComplexType(t) else 'd->dd'
-> 1276 w, vt = gufunc(a, signature=signature, extobj=extobj)
1277 w = w.astype(_realType(result_t), copy=False)
1278 vt = vt.astype(result_t, copy=False)

/home/liuzhisheng/anaconda2/lib/python2.7/site-packages/numpy/linalg/linalg.pyc in _raise_linalgerror_eigenvalues_nonconvergence(err, flag)
94
95 def _raise_linalgerror_eigenvalues_nonconvergence(err, flag):
---> 96 raise LinAlgError("Eigenvalues did not converge")
97
98 def _raise_linalgerror_svd_nonconvergence(err, flag):

LinAlgError: Eigenvalues did not converge