elektronn / elektronn3 Goto Github PK

A PyTorch-based library for working with 3D and 2D convolutional neural networks, with focus on semantic segmentation of volumetric biomedical image data

License: MIT License

Python 99.84% Shell 0.16%

3d-cnn 3d-convolutional-network biomedical-image-processing convolutional-neural-networks electron-microscopy pytorch semantic-segmentation

elektronn3's Introduction

About ELEKTRONN

ELEKTRONN is a highly configurable toolkit for training 3D/2D CNNs and general Neural Networks.

It is written in Python 2 and based on Theano, which allows CUDA-enabled GPUs to significantly accelerate the pipeline.

The package includes a sophisticated training pipeline designed for classification/localisation tasks on 3D/2D images. Additionally, the toolkit offers training routines for tasks on non-image data.

ELEKTRONN 1.0 and 2.0 was originally created by Marius Killinger and Gregor Urban, with contributions by Sven Dorkenwald, under the supervision of Joergen Kornfeld at the Max Planck Institute For Medical Research in Heidelberg to solve connectomics tasks.

Important Note

ELEKTRONN 1.0 and 2.0 has been superceded by the more flexible, PyTorch-based elektronn3 library. elektronn3 is actively developed and supported, so we encourage you to use it instead of ELEKTRONN 1.0/2.0.

Membrane and mitochondria probability maps. Predicted with a CNN with recursive training. Data: zebra finch area X dataset j0126 by Jörgen Kornfeld.

Learn More:

Website

Installation instructions

Documentation

Source code

Toy Example

$ elektronn-train MNIST_CNN_warp_config.py

This will download the MNIST data set and run a training defined in an example config file. The plots are saved to ~/CNN_Training/2D/MNIST_example_warp.

File structure

ELEKTRONN
├── doc                     # Documentation source files
├── elektronn
│   ├── examples            # Example scripts and config files
│   ├── net                 #  Neural network library code
│   ├── scripts             #  Training script and profiling script
│   ├── training            #  Training library code
│   └── ...
├── LICENSE.rst
├── README.rst
└── ...

elektronn3's People

Contributors

Stargazers

Watchers

elektronn3's Issues

Offline validation/evaluation of trained models

Validation, preview predictions etc. are currently tied to the Trainer class and are only run periodically during training. This code should be made re-usable for offline (out of the training loop) evaluation of models. It needs to be easy to compare different model snapshots on a user-defined validation data set (calculating metrics and optionally visualizing inference results).
This should also be shown in an example script.

How to train the model with custom datasets

I wish to use ADNI 3D MRI dataset. I saw training script. Idk how to modify it for custom datasets, especially like ADNI.

I just wish to normally create a model object and then train the model with the 3D images I have inside a directory. I just need to visualise the feature maps at each unit.

Import error when running in SyConn environment

When running this SyConn command:

python SyConn/examples/semseg_spine.py --kzip=~/1_example.k.zip

the console outputs this error message:

2019-08-15 14:04:40 login2 syconn[4260] INFO EGL rendering enabled.
2019-08-15 14:04:58 login2 syconn[4260] ERROR elektronn3 could not be imported (cannot import name 'DataLoaderIter'). Please see 'https://github.com/ELEKTRONN/elektronn3' for more information.
Traceback (most recent call last):
  File "/home/anaconda3/envs/pysy/lib/python3.6/site-packages/elektronn3/training/train_utils.py", line 18, in <module>
    from torch.utils.data.dataloader import _DataLoaderIter as DataLoaderIter
ImportError: cannot import name '_DataLoaderIter'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/SyConn/syconn/handler/prediction.py", line 1021, in get_semseg_spiness_model
    from elektronn3.models.base import InferenceModel
  File "/home/anaconda3/envs/pysy/lib/python3.6/site-packages/elektronn3/models/base.py", line 13, in <module>
    from elektronn3.training.train_utils import pretty_string_time
  File "/home/anaconda3/envs/pysy/lib/python3.6/site-packages/elektronn3/training/__init__.py", line 1, in <module>
    from .trainer import Trainer, Backup
  File "/home/anaconda3/envs/pysy/lib/python3.6/site-packages/elektronn3/training/trainer.py", line 31, in <module>
    from elektronn3.training.train_utils import pretty_string_time
  File "/home/anaconda3/envs/pysy/lib/python3.6/site-packages/elektronn3/training/train_utils.py", line 20, in <module>
    from torch.utils.data.dataloader import DataLoaderIter
ImportError: cannot import name 'DataLoaderIter'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "SyConn/examples/semseg_spine.py", line 31, in <module>
    m = get_semseg_spiness_model()
  File "/home/SyConn/syconn/handler/prediction.py", line 1026, in get_semseg_spiness_model
    raise ImportError(msg)
ImportError: elektronn3 could not be imported (cannot import name 'DataLoaderIter'). Please see 'https://github.com/ELEKTRONN/elektronn3' for more information.

Your advice on how to fix it would be appreciated... It seems related to the torch version -- please see attached for a list of packages installed (conda list).

Thanks!
packages_pysy.txt

readthedocs page with auto-generated API documentation

Once #15 is resolved, we should create a page on https://readthedocs.io to host elektronn3 documentation, similar to https://elektronn2.readthedocs.io/en/latest/. See ELEKTRONN/ELEKTRONN2#1 for an example of the initial readthedocs set-up.

noise to zero_like, not empty_like

elektronn3/elektronn3/data/transforms/transforms.py

Line 668 in 2298c30

noise = np.empty_like(inp)

noise should be np.zeros_like instead of np.empty_like to add 0 noise (not undefined) for channels which should not be modified.

Line 884 in 2298c30

deformed_img = np.empty_like(inp)

Input normalization

I wonder why input normalization has so much impact on the training of UNets when the first batch normalization is actually performed very early in the network (and should play a similar role).

I'm mostly asking because making input normalization dispensable could avoid the downsides of each strategy, especially for fluorescence microscopy for which absolute intensity often conveys an important source of information.

Sparse annotations for training

Is there currently a way to use sparse annotations for training, i.e. for pixel classification by ignoring all the pixels set to 0 in the target image during loss computation and only considering pixels set to a value >0? If that is not possible, what would be the minimal modification to the code to achieve this (at least for some, or ideally for all existing losses)?

Early stopping and dynamic learning rate reduction

Support ReduceLROnPlateau in StoppableTrainer
Implement an early stopping criterion in addition to maximum time and maximum number of iterations in StoppableTrainer. This could work similarly to ReduceLROnPlateau (except instead of reducing learning rate, training is terminated when no improvement is observed for a long time).

Elastic deformations

Elastic deformations, as described in the U-Net paper, are a much-needed addition to the coordinate-based augmentation methods in elektronn3.
This screenshot from the video at https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/ shows the general idea:

A 2D prototype demo for this augmentation method can be found at https://gist.github.com/mdraw/95e7e204e17eaa08c07735a008318acf (needs to be run locally with jupyter notebook to work).
We need to add 3D support to this and integrate it into the source coordinate warping pipeline.

Refactor augmentations

This is a meta-issue that summarizes the planned data loading/augmentation (elektronn3.data) refactoring.

Image augmentation methods should live in their own module (maybe elektronn3.data.augmentations) and should be less entangled with data reading code.
We should cleanly separate a) geometric augmentations that work on source coordinates of inputs and targets (i.e. perspective/affine transforms and the upcoming elastic deformations) from b) "non-geometric" augmentations that work on the images themselves (histogram augmentations, random erasing etc.). a) and b) should be in different submodules.
We need a good, user-friendly interface for enabling/configuring augmentation methods in PatchCreator.
All augmentations should work with 3D and 2D images transparently.
Where possible, augmentation methods should be easily usable from external code (without PatchCreator and other elektronn3-specifics).
The coordinate based warping code (currently in elektronn3.data.transformations) needs more documentation, comments and more readable implementations - especially the numba-jitted functions.
Ensure that all augmentations that depend on random numbers can be run deterministically by providing a fixed random seed.
Tests would be nice

Calculating and visualizing effective (empirical) receptive fields of network models

The receptive field of network layers (and of the whole network) informs us how much spatial context information is available to the network when predicting class probabilities. Since high (and anisotropic) spatial context-awareness is especially important when dealing with large high-res anisotropic 3D images, we should have a tool to calculate and visualize receptive fields, so we can evaluate different network architectures better.

The "theoretical" receptive field that e.g. ELEKTRONN2 calculates automatically (where it is called "fov") has been shown to be misleading:

Understanding the Effective Receptive Field in Deep Convolutional Neural Networks
Object Detectors Emerge in Deep Scene CNNs (section 3.2)
ParseNet: Looking Wider to See Better (section 3.1 and figure 2).

2. and 3. suggest that the effective receptive field can be empirically computed and visualized by feeding crafted inputs into the network and analysing the relationship between input pixels and network activations. The method proposed in 2. looks rather effortful, whereas the approach described in 3. (section 3.1) seems to be easier to implement.
There is also a project (4.) https://github.com/fornaxai/receptivefield which aims to calculate effective receptive fields with an even simpler approach (for TensorFlow and Keras models). We can't directly use this inside of elektronn3, because it is GPL-licensed, though, so writing our own implementation for PyTorch is probably the best way to go.

UNet Implementation details

I noticed some subtle differences in the implementation of your UNet and the one in this repo: https://github.com/johschmidt42/PyTorch-2D-3D-UNet-Tutorial.

Basically, the batch normalization is performed after the ReLU layers instead of the convolutional layers. Also, the weight are initialized by Xavier_uniform method instead of being set constant.

For my application it leads to better accuracy and speeds up the training by quite some margin.
Have you been testing both approaches and do you have a strong opinion about it?

Related to this, I'm wondering if you would consider including some new UNet variants such as pre-trained encoder and hybrid networks with transformer blocks.

If not, is there a repository that you would particularly recommend to import Pytorch models from for bioimage analysis applications (3D fluorescence LM and 3D EM)?

Modularize and clean up StoppableTrainer

Split StoppableTrainer.train() into multiple functions.
Progress reporting to terminal and logging to TensorBoard should be less entangled with the actual training
The current training loop implementation is memory-inefficient: some tensors are kept alive for too long for logging purposes. They may take away valuable GPU memory. To prevent OOM crashes during validation and preview predictions, it should be rewritten to free resources ASAP when they are no longer needed.

Docstrings and type information

We need docstrings, especially for the two core classes PatchCreator and StoppableTrainer.

While we used NumPy-style docstrings in ELEKTRONN2, I decided to write new docstrings in [Google style] because they are less cumbersome to write, require significantly less lines and are more commonly used (most importantly: PyTorch uses Google style).
IMO they are also more suitable for type-hint-free docstrings. Type hints should not be included in docstrings anymore; they instead belong in the function signatures in a PEP-484-compliant way, so we can eventually use mypy for static type checking.
One example of a class that follows the docstring- and type information style that I am proposing here is the UNet class at

elektronn3/elektronn3/models/unet.py

Line 225 in a18301d

class UNet(nn.Module):

Elektronn3

Random HDF5 read errors

Once in a while, data loaders (especially validation data loader) encounter a random read error when slicing from HDF5 files at

elektronn3/elektronn3/data/utils.py

Line 44 in e4dff1b

cut = src[

The end of the traceback looks like this:

[...]
self.id.read(mspace, fspace, arr, mtype, dxpl=self._dxpl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5d.pyx", line 181, in h5py.h5d.DatasetID.read
File "h5py/_proxy.pyx", line 130, in h5py._proxy.dset_rw
File "h5py/_proxy.pyx", line 84, in h5py._proxy.H5PY_H5Dread
OSError: Can't read data (wrong B-tree signature)

Attempting to read from the same source coordinates again usually works, which is why it's wrapped in a retry-block and doesn't affect training. It's still very annoying to have this issue.

Quoting from 0ed4408:

Since the errors are not deterministic, I guess they are either caused
by a concurrency issue in PyTorch's DataLoader, in HDF5/h5py or maybe it's
even a filesystem issue.
(One of the error messages can be found in the commit message of e1a55ed.)

Add non-geometric augmentation methods

Here are many examples of image augmentations. The geometric ones (rotation, scaling, affine etc.) are already implemented in elektronn3 and many of the others are not suitable for biomedical data sets, but some of them like "ContrastNormalization (per channel)", "Add/Multipy (per channel)" and "SaltAndPepper" could be really helpful for us.
We can't use https://github.com/aleju/imgaug directly because it depends on OpenCV and is not (fully) 3D compatible, but implementing those methods in numpy for 3D/2D ourselves shouldn't be too hard.

Noise, blur and random erasing ("blobs") are already implemented in ELEKTRONN2 and will be ported over to elektronn3.
"ElasticTransformation" is already being tracked in #3.
Not sure about the broken greyscale augmentation

elektronn3/elektronn3/data/transformations.py

Line 23 in 23d244d

def grey_augment(d, channels, rng):

.
It could be fixed by changing how normalization is applied to input data, but maybe it's better to re-implement it in a cleaner way.

train_unet_neurodata.py default seed is a seed of 0, not a random seed

train_unet_neurodata.py has a default of 0 for the --seed option --- meaning, use 0 for the random seed. A seed of 0 is not a seed of None --- it's just like using any other fixed integral seed. This means that, unless you specify a seed, all runs of train_unet_neurodata.py will end up using the same sequence of results from the PRNGs.

Since there is no provision for getting different runs out of the program, I assume this is not what is intended.

One might consider using -1 as the default value for the --seed option, and then using None as a seed if args.seed is -1:

index 0b42797..e9630de 100644
--- a/examples/train_unet_neurodata.py
+++ b/examples/train_unet_neurodata.py
@@ -46,7 +46,8 @@ parser.add_argument(
 "onsave": Use regular Python model for training, but trace it on-demand for saving training state;
 "train": Use traced model for training and serialize it on disk"""
 )
-parser.add_argument('--seed', type=int, default=0, help='Base seed for all RNGs.')
+# Use an illegal seed value to indicate "no seed" (0 is a seed of 0, not random at all)
+parser.add_argument('--seed', type=int, default=-1, help='Base seed for all RNGs.')
 parser.add_argument(
     '--deterministic', action='store_true',
     help='Run in fully deterministic mode (at the cost of execution speed).'
@@ -55,9 +56,14 @@ args = parser.parse_args()
 
 # Set up all RNG seeds, set level of determinism
 random_seed = args.seed
-torch.manual_seed(random_seed)
-np.random.seed(random_seed)
-random.seed(random_seed)
+if random_seed < 0:
+    np.random.seed()
+    random.seed()
+else:
+    torch.manual_seed(random_seed)
+    np.random.seed(random_seed)
+    random.seed(random_seed)
+
 deterministic = args.deterministic
 if deterministic:
     torch.backends.cudnn.deterministic = True

High-level documentation

Currently we only have documentation based on docstrings. We should add some beginner-friendly content at the top of https://elektronn3.readthedocs.io, similar to what we've done with the ELEKTRONN2 doc page.

"Invalid" targets with out-of-bounds elements

In some of the batches that are created by PatchCreator, the target tensor contains elements that are not inside the expected value range (which is given by the number of unique classes that exist in the data set).

Quoting a comment from a previous commit message (46d0b2b):

I found that the values of the maximum elements of the
invalid targets are usually quite similar. Here are the last few examples
from the warning message at cnndata:145, collected from a few different
training runs at random steps:

65072
39121
65535
63480
65535 # found directly after the previous value
65509
64205

All of those are below 65536, which is 2**16.
Most are only slightly smaller than 65536.
(Why 2**16? Everything should be 32 bit (float) or 64 bit (int)...)

Such invalid targets are automatically detected by PatchCreator and their batches are discarded as a workaround for this problem, but that's certainly not a good way of dealing with it in the long term. We need to find out what's causing this bug.
We may find the root of the problem somewhere around this code block:

elektronn3/elektronn3/data/transformations.py

Line 355 in 05dcd88

if target_src is not None:

or in the numba-jitted functions that are called from there.

NaN losses when training with torch.nn.CrossEntropyLoss

As mentioned in the commit message of e5a6005, CrossEntropyLoss sometimes suddenly produces NaN values in trainings based on https://github.com/ELEKTRONN/elektronn3/blob/master/examples/train_unet_neurodata.py, even at low learning rates.

Missing assignment to mode in train_unet_neurodata.py

Around line 99 of train_unet_neurodata.py, this code appears:

if args.jit == 'onsave':
    # Make sure that tracing works
    tracedmodel = torch.jit.trace(model, example_input.to(device))
elif args.jit == 'train':
    if getattr(model, 'checkpointing', False):
        raise NotImplementedError(
            'Traced models with checkpointing currently don\'t '
            'work, so either run with --disable-trace or disable '
            'checkpointing.')
    tracedmodel = torch.jit.trace(model, example_input.to(device))
    model = tracedmodel

Is a model = tracedmodel assignment missing from the if args.jit == 'onsave' TRUE branch? That is, should the code read:

if args.jit == 'onsave':
    # Make sure that tracing works
    tracedmodel = torch.jit.trace(model, example_input.to(device))
    model = tracedmodel
elif args.jit == 'train':
    if getattr(model, 'checkpointing', False):
        raise NotImplementedError(
            'Traced models with checkpointing currently don\'t '
            'work, so either run with --disable-trace or disable '
            'checkpointing.')
    tracedmodel = torch.jit.trace(model, example_input.to(device))
    model = tracedmodel

Decide what to do with train.py

The file scripts/train.py is currently used as a provisional entry point for quickly testing changes in elektronn3. It supports some configuration via CLI arguments, but there are still many limiting hard-coded values in it, so editing its source code is often necessary. What do we do with it? The following alternatives came to my mind:

Keep it as is, with its small set of options, and declare it as a "template for custom training scripts".
Reduce its options to a bare minimum again and declare it as a minimum training script template. The standard usage scenario of elektronn3 would then be to make a copy of train.py and customize it for your own needs, always running your trainings from your own version of train.py. We could place multiple alternative versions of it in an examples directory, making them take the place of ELEKTRONN2's example configs.
Go the ELEKTRONN{,2} way and create an "official" elektronn3-train entry-point that abstracts common training initialization and where all configurable stuff is placed in training config" .py files. (I don't think this can be implemented in a sensible way.)
Just keep adding more CLI options.

IMO alternative 3 and 4 would lead to too much bloat and overengineering eventually, because there are just too many possible configurations that need to be supported. I would go for 2.
Any other suggestions or opinions?

Back up training code to save_path for more reproducible experiments

The code of the training script, of the network model as well as the code of elektronn3 itself that was used for running a training experiment should be archived to the save_path.
For reference, here is the equivalent feature implemented in ELEKTRONN2: https://github.com/ELEKTRONN/ELEKTRONN2/blob/27ed6c9a07cdd65c5789697013d568060a392514/elektronn2/training/trainutils.py#L651. Note that we can't implement this in the same way, because the training script has to be included here. The back up routine has to be called from within the training script so it can include itself in addition to the code it uses.

An open question is what exactly the back-up archive should contain. It's clear that the training script, the model code and the current elektronn3 source code (in case of user modification) should be in it, but what if the training script imports code from outside of elektronn3?
It's probably also a good idea to include a summary of the system configuration (installed package versions, host name, name of the GPU that was used etc.).

Error with 3D convolution on custom Kernels

I wanted to add my own 3D kernel on every layer. I tried using this,

`
def conv3(in_channels, out_channels, kernel_size=3, stride=1,
padding=1, bias=True, planar=False, dim=3):
"""Returns an appropriate spatial convolution layer, depending on args.
- dim=2: Conv2d with 3x3 kernel
- dim=3 and planar=False: Conv3d with 3x3x3 kernel
- dim=3 and planar=True: Conv3d with 1x3x3 kernel
"""
if planar:
stride = planar_kernel(stride)
padding = planar_pad(padding)
kernel_size = planar_kernel(kernel_size)

weights = torch.tensor([[[4., 1., 4.],
                    [1., 1., 1.],
                    [4., 1., 4.]], 
                    [[1., 1., 1.],
                    [1., 10., 1.],
                    [1., 1., 1.]], 
                    [[4., 1., 4.],
                    [1., 1., 1.],
                    [4., 1., 4.]]])
# weights
weightsu = weights.view(3, 3, 3).repeat(1, 1, 1, 1, 1)
# weightsu

   
kernel = get_conv(dim)(
    in_channels,
    out_channels,
    kernel_size=kernel_size,
    stride=stride,
    padding=padding,
    bias=bias
)
print(kernel.weight.shape)
with torch.no_grad():
    kernel.weight = nn.Parameter(weightsu)

return kernel

But I got an error like this,

Testing 3D U-Net with n_blocks = 1, planar_blocks = ()...
torch.Size([32, 1, 3, 3, 3])
torch.Size([32, 32, 3, 3, 3])

RuntimeError: Given weight of size [1, 1, 3, 3, 3], expected bias to be 1-dimensional with 1 elements, but got bias of size [32] instead

I kindly request you to Help me fix this!!!

@Optiligence
@my-tien
@xeray
@mdraw
@jmrk84

Plot overlay images to TensorBoard

Prediction/target overlay images should be plotted to TensorBoard in addition to pure prediction and target images. Existing code (for matplotlib, needs to be adapted):

elektronn3/elektronn3/training/trainer.py

Line 281 in 23d244d

write_overlayimg(

and https://github.com/ELEKTRONN/elektronn3/blob/23d244df59f5fe313054cfa0c0e4768cd94105fc/elektronn3/data/image.py

Inference example script

We should provide an example script that showcases inference with a trained model loaded from disk.

Reproducibility

All PyTorch RNGs should be seeded with a configurable seed in example scripts.
NumPy RNGs in worker processes should use deterministic seeds that can be controlled from training scripts. Worker seeds need to be based on the PyTorch seed, but we have to manually make sure they are different for each forked process.

NumPy RNG reproducibility has high priority because the order of training samples for network training can matter, especially if we want to do clean hyperparameter optimization.
We probably don't want to aim for "maximum reproducibility" (torch.backends.cudnn.deterministic etc.). If someone really wants this, they can set it themselves in custom training scripts.

References:

Support 2D data sets

Both the data loading (PatchCreator etc.) and the training components (StoppableTrainer) currently only support 3D (volumetric) image data sets. Support for 2D images would be nice, as long as it doesn't interfere with 3D support. In many parts of the code, 2D support could be transparently added as a special case of 3D without code duplication.

OOM when using the current PyTorch git master

7b33ef4 works fine, but newer revisions (I don't know since when exactly), training any network with elektronn3 leads to growing memory consumption with every training iteration until the GPU is out of memory. Maybe some operation inside of StoppableTrainer.train() is now accidentally accumulating gradients? I couldn't yet produce a minimal piece of training code that doesn't slowly eat up all memory.

Large-scale prediction using trained models

Currently, only training scenarios are handled by elektronn3. Tools for large-scale model deployment on big data volumes will be needed eventually. This needs some discussion and careful planning first, because there are many open questions in this regard, e.g. w.r.t. if we should start working on multi-node distributed prediction in PyTorch or instead use another framework such as Caffe2 for this via ONNX export.