Giter VIP home page Giter VIP logo

pytorch-toolbelt's Introduction

Important Update

ukraine-flag

On February 24th, 2022, Russia declared war and invaded peaceful Ukraine. After the annexation of Crimea and the occupation of the Donbas region, Putin's regime decided to destroy Ukrainian nationality. Ukrainians show fierce resistance and demonstrate to the entire world what it's like to fight for the nation's independence.

Ukraine's government launched a website to help russian mothers, wives & sisters find their beloved ones killed or captured in Ukraine - https://200rf.com & https://t.me/rf200_now (Telegram channel). Our goal is to inform those still in Russia & Belarus, so they refuse to assault Ukraine.

Help us get maximum exposure to what is happening in Ukraine, violence, and inhuman acts of terror that the "Russian World" has brought to Ukraine. This is a comprehensive Wiki on how you can help end this war: https://how-to-help-ukraine-now.super.site/

Official channels

Glory to Ukraine!

Pytorch-toolbelt

A pytorch-toolbelt is a Python library with a set of bells and whistles for PyTorch for fast R&D prototyping and Kaggle farming:

What's inside

  • Easy model building using flexible encoder-decoder architecture.
  • Modules: CoordConv, SCSE, Hypercolumn, Depthwise separable convolution and more.
  • GPU-friendly test-time augmentation TTA for segmentation and classification
  • GPU-friendly inference on huge (5000x5000) images
  • Every-day common routines (fix/restore random seed, filesystem utils, metrics)
  • Losses: BinaryFocalLoss, Focal, ReducedFocal, Lovasz, Jaccard and Dice losses, Wing Loss and more.
  • Extras for Catalyst library (Visualization of batch predictions, additional metrics)

Showcase: Catalyst, Albumentations, Pytorch Toolbelt example: Semantic Segmentation @ CamVid

Why

Honest answer is "I needed a convenient way to re-use code for my Kaggle career". During 2018 I achieved a Kaggle Master badge and this been a long path. Very often I found myself re-using most of the old pipelines over and over again. At some point it crystallized into this repository.

This lib is not meant to replace catalyst / ignite / fast.ai high-level frameworks. Instead it's designed to complement them.

Installation

pip install pytorch_toolbelt

How do I ...

Model creation

Create Encoder-Decoder U-Net model

Below a code snippet that creates vanilla U-Net model for binary segmentation. By design, both encoder and decoder produces a list of tensors, from fine (high-resolution, indexed 0) to coarse (low-resolution) feature maps. Access to all intermediate feature maps is beneficial if you want to apply deep supervision losses on them or encoder-decoder of object detection task, where access to intermediate feature maps is necessary.

from torch import nn
from pytorch_toolbelt.modules import encoders as E
from pytorch_toolbelt.modules import decoders as D

class UNet(nn.Module):
    def __init__(self, input_channels, num_classes):
        super().__init__()
        self.encoder = E.UnetEncoder(in_channels=input_channels, out_channels=32, growth_factor=2)
        self.decoder = D.UNetDecoder(self.encoder.channels, decoder_features=32)
        self.logits = nn.Conv2d(self.decoder.channels[0], num_classes, kernel_size=1)

    def forward(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return self.logits(x[0])

Create Encoder-Decoder FPN model with pretrained encoder

Similarly to previous example, you can change decoder to FPN with contatenation.

from torch import nn
from pytorch_toolbelt.modules import encoders as E
from pytorch_toolbelt.modules import decoders as D

class SEResNeXt50FPN(nn.Module):
   def __init__(self, num_classes, fpn_channels):
       super().__init__()
       self.encoder = E.SEResNeXt50Encoder()
       self.decoder = D.FPNCatDecoder(self.encoder.channels, fpn_channels)
       self.logits = nn.Conv2d(self.decoder.channels[0], num_classes, kernel_size=1)

   def forward(self, x):
       x = self.encoder(x)
       x = self.decoder(x)
       return self.logits(x[0])

Change number of input channels for the Encoder

All encoders from pytorch_toolbelt supports changing number of input channels. Simply call encoder.change_input_channels(num_channels) and first convolution layer will be changed. Whenever possible, existing weights of convolutional layer will be re-used (in case new number of channels is greater than default, new weight tensor will be padded with randomly-initialized weigths). Class method returns self, so this call can be chained.

from pytorch_toolbelt.modules import encoders as E

encoder = E.SEResnet101Encoder()
encoder = encoder.change_input_channels(6)

Misc

Count number of parameters in encoder/decoder and other modules

When designing a model and optimizing number of features in neural network, I found it's quite useful to print number of parameters in high-level blocks (like encoder and decoder). Here is how to do it with pytorch_toolbelt:

from torch import nn
from pytorch_toolbelt.modules import encoders as E
from pytorch_toolbelt.modules import decoders as D
from pytorch_toolbelt.utils import count_parameters

class SEResNeXt50FPN(nn.Module):
    def __init__(self, num_classes, fpn_channels):
        super().__init__()
        self.encoder = E.SEResNeXt50Encoder()
        self.decoder = D.FPNCatDecoder(self.encoder.channels, fpn_channels)
        self.logits = nn.Conv2d(self.decoder.channels[0], num_classes, kernel_size=1)

    def forward(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return self.logits(x[0])

net = SEResNeXt50FPN(1, 128)
print(count_parameters(net))
# Prints {'total': 34232561, 'trainable': 34232561, 'encoder': 25510896, 'decoder': 8721536, 'logits': 129}

Compose multiple losses

There are multiple ways to combine multiple losses, and high-level DL frameworks like Catalyst offers way more flexible way to achieve this, but here's 100%-pure PyTorch implementation of mine:

from pytorch_toolbelt import losses as L

# Creates a loss function that is a weighted sum of focal loss 
# and lovasz loss with weigths 1.0 and 0.5 accordingly.
loss = L.JointLoss(L.FocalLoss(), L.LovaszLoss(), 1.0, 0.5)

TTA / Inferencing

Apply Test-time augmentation (TTA) for the model

Test-time augmetnation (TTA) can be used in both training and testing phases.

from pytorch_toolbelt.inference import tta

model = UNet()

# Truly functional TTA for image classification using horizontal flips:
logits = tta.fliplr_image2label(model, input)

# Truly functional TTA for image segmentation using D4 augmentation:
logits = tta.d4_image2mask(model, input)

Inference on huge images:

Quite often, there is a need to perform image segmentation for enormously big image (5000px and more). There are a few problems with such a big pixel arrays:

  1. There are size limitations on maximum size of CUDA tensors (Concrete numbers depends on driver and GPU version)
  2. Heavy CNNs architectures may eat up all available GPU memory with ease when inferencing relatively small 1024x1024 images, leaving no room to bigger image resolution.

One of the solutions is to slice input image into tiles (optionally overlapping) and feed each through model and concatenate the results back. In this way you can guarantee upper limit of GPU ram usage, while keeping ability to process arbitrary-sized images on GPU.

import numpy as np
from torch.utils.data import DataLoader
import cv2

from pytorch_toolbelt.inference.tiles import ImageSlicer, CudaTileMerger
from pytorch_toolbelt.utils.torch_utils import tensor_from_rgb_image, to_numpy


image = cv2.imread('really_huge_image.jpg')
model = get_model(...)

# Cut large image into overlapping tiles
tiler = ImageSlicer(image.shape, tile_size=(512, 512), tile_step=(256, 256))

# HCW -> CHW. Optionally, do normalization here
tiles = [tensor_from_rgb_image(tile) for tile in tiler.split(image)]

# Allocate a CUDA buffer for holding entire mask
merger = CudaTileMerger(tiler.target_shape, 1, tiler.weight)

# Run predictions for tiles and accumulate them
for tiles_batch, coords_batch in DataLoader(list(zip(tiles, tiler.crops)), batch_size=8, pin_memory=True):
    tiles_batch = tiles_batch.float().cuda()
    pred_batch = model(tiles_batch)

    merger.integrate_batch(pred_batch, coords_batch)

# Normalize accumulated mask and convert back to numpy
merged_mask = np.moveaxis(to_numpy(merger.merge()), 0, -1).astype(np.uint8)
merged_mask = tiler.crop_to_orignal_size(merged_mask)

Advanced examples

  1. Inria Sattelite Segmentation
  2. CamVid Semantic Segmentation

Citation

@misc{Khvedchenya_Eugene_2019_PyTorch_Toolbelt,
  author = {Khvedchenya, Eugene},
  title = {PyTorch Toolbelt},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/BloodAxe/pytorch-toolbelt}},
  commit = {cc5e9973cdb0dcbf1c6b6e1401bf44b9c69e13f3}
}

pytorch-toolbelt's People

Contributors

andreygurevich avatar anshulrai avatar bloodaxe avatar deepsourcebot avatar ksenobojca avatar seefun avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-toolbelt's Issues

Question about tiled inference

🐛 Question about tiled inference

Hello, thank you for your excellent work. I understand the advantage of tiled inference, but the way we use it confuses me. For each tile, we multiply the result of the inference with the weight. However, at final step, we then divide it by the norm mask (in the merge function). In my opinion, the action of dividing the results by the norm mask seems to produce a result without a weighting mechanism. Could you please explain this further? Maybe we would need a norm_mask containing different weight with the weight of inference result (for example norm_mask is an amount of inferences in each pixels which is different with pyramid_patch_weight_loss) to normalize correctly our result ? Thank you in advance !

To Reproduce

  for tile, (x, y, tile_width, tile_height) in zip(batch, crop_coords):
      self.image[:, y : y + tile_height, x : x + tile_width] += tile * self.weight
      self.norm_mask[:, y : y + tile_height, x : x + tile_width] += self.weigh
  def merge(self) -> torch.Tensor:
      return self.image / self.norm_mask

TypeError: object of type 'int' has no len()

I am unable to create a basic UNet model from the library as given on the readme. Here's the code for the same:

from torch import nn
from pytorch_toolbelt.modules import encoders as E
from pytorch_toolbelt.modules import decoders as D

class UNet(nn.Module):
    def __init__(self, input_channels, num_classes):
        super().__init__()
        self.encoder = E.UnetEncoder(in_channels=input_channels, out_channels=32, growth_factor=2)
        self.decoder = D.UNetDecoder(self.encoder.channels, decoder_features=32)
        self.logits = nn.Conv2d(self.decoder.channels[0], num_classes, kernel_size=1)

    def forward(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return self.logits(x[0])
    
model= UNet(input_channels= 3, num_classes= 1)

Error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-4e8064bebb83> in <module>
     15         return self.logits(x[0])
     16 
---> 17 model= UNet(input_channels= 3, num_classes= 1)

<ipython-input-1-4e8064bebb83> in __init__(self, input_channels, num_classes)
      7         super().__init__()
      8         self.encoder = E.UnetEncoder(in_channels=input_channels, out_channels=32, growth_factor=2)
----> 9         self.decoder = D.UNetDecoder(self.encoder.channels, decoder_features=32)
     10         self.logits = nn.Conv2d(self.decoder.channels[0], num_classes, kernel_size=1)
     11 

~/anaconda3/envs/dl_gpu/lib/python3.7/site-packages/pytorch_toolbelt/modules/decoders/unet.py in __init__(self, feature_maps, decoder_features, unet_block, upsample_block)
     38             decoder_features = [None] * num_blocks
     39         else:
---> 40             if len(decoder_features) != num_blocks:
     41                 raise ValueError(f"decoder_features must have length of {num_blocks}")
     42         in_channels_for_upsample_block = feature_maps[-1]

TypeError: object of type 'int' has no len()

Update for collections.abc in installation

🐛 Bug

Traceback (most recent call last):
File "/home/sebasmos/Desktop/TRPD/segmentation_models_test.py", line 1, in
import segmentation_models_pytorch as smp
File "/home/sebasmos/anaconda3/envs/sebasmos/lib/python3.9/site-packages/segmentation_models_pytorch/init.py", line 1, in
from .unet import Unet
File "/home/sebasmos/anaconda3/envs/sebasmos/lib/python3.9/site-packages/segmentation_models_pytorch/unet/init.py", line 1, in
from .model import Unet
File "/home/sebasmos/anaconda3/envs/sebasmos/lib/python3.9/site-packages/segmentation_models_pytorch/unet/model.py", line 3, in
from ..encoders import get_encoder
File "/home/sebasmos/anaconda3/envs/sebasmos/lib/python3.9/site-packages/segmentation_models_pytorch/encoders/init.py", line 14, in
from .timm_efficientnet import timm_efficientnet_encoders
File "/home/sebasmos/anaconda3/envs/sebasmos/lib/python3.9/site-packages/segmentation_models_pytorch/encoders/timm_efficientnet.py", line 4, in
from timm.models.efficientnet import EfficientNet
File "/home/sebasmos/anaconda3/envs/sebasmos/lib/python3.9/site-packages/timm/init.py", line 2, in
from .models import create_model, list_models, is_model, list_modules, model_entrypoint,
File "/home/sebasmos/anaconda3/envs/sebasmos/lib/python3.9/site-packages/timm/models/init.py", line 1, in
from .cspnet import *
File "/home/sebasmos/anaconda3/envs/sebasmos/lib/python3.9/site-packages/timm/models/cspnet.py", line 20, in
from .helpers import build_model_with_cfg
File "/home/sebasmos/anaconda3/envs/sebasmos/lib/python3.9/site-packages/timm/models/helpers.py", line 17, in
from .layers import Conv2dSame, Linear
File "/home/sebasmos/anaconda3/envs/sebasmos/lib/python3.9/site-packages/timm/models/layers/init.py", line 7, in
from .cond_conv2d import CondConv2d, get_condconv_initializer
File "/home/sebasmos/anaconda3/envs/sebasmos/lib/python3.9/site-packages/timm/models/layers/cond_conv2d.py", line 16, in
from .helpers import to_2tuple
File "/home/sebasmos/anaconda3/envs/sebasmos/lib/python3.9/site-packages/timm/models/layers/helpers.py", line 6, in
from torch._six import container_abcs
ImportError: cannot import name 'container_abcs' from 'torch._six' (/home/sebasmos/anaconda3/envs/sebasmos/lib/python3.9/site-packages/torch/_six.py)

To Reproduce

Steps to reproduce the behavior:

  1. Cloning as of today (12 dec): pip install -U git+https://github.com/jlcsilva/segmentation_models.pytorch

Solution - how it worked for me based on huggingface/pytorch-image-models@94ca140#diff-c7abf83bc43184f6101237b08d7c489c361f3d57b3538d633f6f01d35254b73c

""" Layer/Module Helpers

Hacked together by / Copyright 2020 Ross Wightman
"""
from itertools import repeat
import collections.abc

def _ntuple(n):
def parse(x):
if isinstance(x, collections.abc.Iterable):
return x
return tuple(repeat(x, n))
return parse

to_1tuple = _ntuple(1)
to_2tuple = _ntuple(2)
to_3tuple = _ntuple(3)
to_4tuple = _ntuple(4)
to_ntuple = _ntuple

Ошибка при вызове метода merge ?

Спасибо за удобные инструменты для работы но я не совсем понял в каком виде я должен передавать tiles в метод merge. У меня tiles выглядит как [[512, 512, 1], [512, 512, 1], ...] и когда я вызываю для них метод merge я получаю такую ошибку:

image[y:y + tile_height, x:x + tile_width] += tile * w
ValueError: non-broadcastable output operand with shape (512,512,1) doesn't match the broadcast shape (512,512,512)

Не подскажите в чем может быть вызвана эта проблема ?

10 crop TTA

It would be nice to have an option for 10 crop TTA that is widely used for the classification tasks:

5 crops are:

  1. left top
  2. right top
  3. left bottom
  4. right bottom
  5. center

And 5 more with a horizontally flipped image.

UnetSegmentationModel dimension won't match

I want to try hrnet34_unet64 for image segmentation using:

encoder = E.HRNetV2Encoder34(pretrained=pretrained, layers=[0, 1, 2, 3, 4])
UnetSegmentationModel(encoder, num_classes=num_classes, unet_channels=[64, 128, 256, 512], dropout=dropout)

And got an error:
``RuntimeError: Sizes of tensors must match except in dimension 2. Got 128 and 256 (The offending index is 0)```

Could you please let me know what is wrong? Thanks!

integrate_batch throws error: RuntimeError: The size of tensor a (6) must match the size of tensor b (928) ...

Hi, I'm trying to use your tiling tools with my yolov5 model but in the following line I get following error:

self.image[:, y : y + tile_height, x : x + tile_width] += tile * self.weight

RuntimeError: The size of tensor a (6) must match the size of tensor b (928) at non-singleton dimension 2

The debugger shows a tile tensor size of (52983,6) and a weight tensor size of (1, 928,928). What could be the reason for the difference in the tensor size?

Some more infos:
model size: 928x928
image size is 3840*2160
I am leading the model using DetectMultiBackend from yolov5

Dice Loss/Score question

Hey Eugene,

First of all, thank you for this very useful package. I'm transferring my environment from TF to Pytorch now and having your advanced losses is very helpful. However, when I trained the same model on the same data using same loss functions in both frameworks, I noticed that I get very different loss numbers (I'm using multilabel approach). Digging a little deeper in your code I noticed that when you calculate the Dice Loss you always calculate per sample AND per channel loss and then average it. I don't understand why are you doing the per channel calculation ad averaging, and not the Dice loss for all classes together. I can show What I mean on a dummy example below:

Let's prepare 2 dummy multilabel matrices - ground truth (d_gt) and prediction (d_pr) with 3 classes each, 0 Red, 1 Green and 2 Blue:
d_gt = np.zeros(shape=(20,20,3))
d_gt[5:10,5:10,0] =1
d_gt[10:15,10:15,1] =1
d_gt[:,:,2] = (1 - d_gt.sum(axis=-1, keepdims=True)).squeeze()
plt.imshow(d_gt)

image

d_pr = np.zeros(shape=(20,20,3))
d_pr[4:9,4:9,0] =1
d_pr[11:14,11:14,1] =1
d_pr[:,:,2] = (1 - d_pr.sum(axis=-1, keepdims=True)).squeeze()
plt.imshow(d_pr)

image

One can see that (using Dice Loss = 1- Dice Score):

  • Dice Loss for Red is 1- ((16+ 16) / (25+ 25)) = 0.36
  • Dice Loss for Green is 1 - ((9+9)/(9+25) = 0.4706
  • Dice Loss for Blue is 1 - ((341+341)/(350+366)) = 0.0474

However, total Dice Loss for the whole picture is 1 - (2*(16+9+341)/(2*400) = 0.085

After wrapping them into tensors
d_gt_tensor = torch.from_numpy(np.transpose(d_gt,(2,0,1))).unsqueeze(0)
d_pr_tensor = torch.from_numpy(np.transpose(d_pr,(2,0,1))).unsqueeze(0)
what your Dice Loss (with from_logits=False) is returning is 0.2927 which is the averaged loss of individual channels instead of the total loss. The culprit seems to be passing dims=(0,2) to the soft_dice_score function, I think that dims=(1,2) should be passed instead to get individual scores for each item in the batch? Unless this behaviour is intended but then I'd need some more explanation why.

Second smaller question regrading your Dice Loss is why you use from_logits= True by default?

Thanks in advance!

Focal loss error

Multiclass Focal loss returns error.

    loss = criterion(preds, target)
  File "/Users/vladbahteev/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/vladbahteev/miniconda3/lib/python3.7/site-packages/pytorch_toolbelt/losses/joint_loss.py", line 32, in forward
    return self.first(*input) + self.second(*input)
  File "/Users/vladbahteev/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/vladbahteev/miniconda3/lib/python3.7/site-packages/pytorch_toolbelt/losses/joint_loss.py", line 18, in forward
    return self.loss(*input) * self.weight
  File "/Users/vladbahteev/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/vladbahteev/miniconda3/lib/python3.7/site-packages/pytorch_toolbelt/losses/focal.py", line 89, in forward
    loss += self.focal_loss_fn(cls_label_input, cls_label_target)
  File "/Users/vladbahteev/miniconda3/lib/python3.7/site-packages/pytorch_toolbelt/losses/functional.py", line 45, in focal_loss_with_logits
    logpt = F.binary_cross_entropy_with_logits(output, target, reduction="none")
  File "/Users/vladbahteev/miniconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 2580, in binary_cross_entropy_with_logits
    raise ValueError("Target size ({}) must be the same as input size ({})".format(target.size(), input.size()))
ValueError: Target size (torch.Size([5, 1, 256, 256])) must be the same as input size (torch.Size([5, 256, 256]))
Exception ignored in: <function tqdm.__del__ at 0x7fd03260d400>
Traceback (most recent call last):
  File "/Users/vladbahteev/miniconda3/lib/python3.7/site-packages/tqdm/std.py", line 1128, in __del__
  File "/Users/vladbahteev/miniconda3/lib/python3.7/site-packages/tqdm/std.py", line 1341, in close
  File "/Users/vladbahteev/miniconda3/lib/python3.7/site-packages/tqdm/std.py", line 1520, in display
  File "/Users/vladbahteev/miniconda3/lib/python3.7/site-packages/tqdm/std.py", line 1131, in __repr__
  File "/Users/vladbahteev/miniconda3/lib/python3.7/site-packages/tqdm/std.py", line 1481, in format_dict
TypeError: cannot unpack non-iterable NoneType object

I think that line 83 in pytorch_toolbelt/losses/focal.py should be changed
from cls_label_input = label_input[:, cls, ...]
to cls_label_input = label_input[:, cls, ...].unsqueeze(1)

AttributeError: 'MulticlassDiceMetricCallback' object has no attribute 'order'

  File "/home/vladimir/anaconda3/lib/python3.6/site-packages/catalyst/dl/runner/supervised.py", line 197, in train
    monitoring_params=monitoring_params
  File "/home/vladimir/anaconda3/lib/python3.6/site-packages/catalyst/dl/experiment/base.py", line 40, in __init__
    self._callbacks = process_callback(callbacks)
  File "/home/vladimir/anaconda3/lib/python3.6/site-packages/catalyst/dl/utils/callbacks.py", line 23, in process_callback
    result = sorted(callbacks, key=lambda x: x.order)
  File "/home/vladimir/anaconda3/lib/python3.6/site-packages/catalyst/dl/utils/callbacks.py", line 23, in <lambda>
    result = sorted(callbacks, key=lambda x: x.order)
AttributeError: 'MulticlassDiceMetricCallback' object has no attribute 'order'

How to Implement TTA For binary segmentation

Anyone kind enough to share a code on how to use TTA for binary segmentation using this code?
I have my trained model weights but can't figure out how to use Pytroch toolbelt.

Thank you.

Conda installations

Hello,

Do you have a conda installation, somehow my azure vm and py35 does not load pip installation on jupyter kernels.

Please suggest.
Sayak

Dice loss is smaller when computed on entire batch

🐛 Bug

I noticed that when I compute the dice loss on an entire batch the loss is smaller than computing it singularly for each sample and then averaging it. Is this behavior intended?

Expected behavior

Dice loss on batch equivalent to average of dice losses

Environment

Using loss from segmentation_models_pytorch

Tiled inference potentially generates wrong multi-class predictions

🐛 Bug

I believe the current implementation of the tiled inference could produce erroneous predictions. If I understand it correctly, in your tiled inference approach you accumulate predictions for each pixel and then divide them by a norm_mask (which is the total number of predictions for each pixel). This works well for a binary case, but not for a multi-class classification. For example, if I have 4 classes to predict and I do tiled inference (e.g. tile_size=128, tile_step=64) using your moving window approach I can end up with a mix of predictions for a pixel (e.g. 1,4,4,4), and the final prediction of this pixel (after applying norm_mask) will be 3. Wouldn't it be more appropriate to take mode of all predictions for this pixel to get the final prediction of 4?

To Reproduce

Steps to reproduce the behavior:

for tile, (x, y, tile_width, tile_height) in zip(batch, crop_coords):
    self.image[:, y : y + tile_height, x : x + tile_width] += tile * self.weight
    self.norm_mask[:, y : y + tile_height, x : x + tile_width] += self.weigh
def merge(self) -> torch.Tensor:
    return self.image / self.norm_mask

Environment

  • Pytorch-toolbelt version: 0.6.2
  • Pytorch version: 2.0.0
  • Python version: 3.10
  • OS: Windows 11

Getting out of memory by using inference on huge images

I have tried pretty small slices but get cuda out of memory on ---> 23 pred_batch = best_model(tiles_batch)[:, 0:1, :,:] As I can see it finally preceded few steps but failed. I have GPU with 8 GB, model it`s unet but wuth heavy encoders. Image shape (6300, 6304, 3)

import numpy as np
import torch
import cv2
from tqdm import tqdm_notebook
from pytorch_toolbelt.inference.tiles import ImageSlicer, CudaTileMerger
from pytorch_toolbelt.utils.torch_utils import tensor_from_rgb_image, to_numpy


image = img_to_predict

# Cut large image into overlapping tiles
tiler = ImageSlicer(image.shape, tile_size=(64, 64), tile_step=(64, 64), weight='pyramid')

# HCW -> CHW. Optionally, do normalization here
tiles = [tensor_from_rgb_image(tile) for tile in tiler.split(image)]

# Allocate a CUDA buffer for holding entire mask
merger = CudaTileMerger(tiler.target_shape, 1, tiler.weight)

# Run predictions for tiles and accumulate them
for tiles_batch, coords_batch in tqdm_notebook(DataLoader(list(zip(tiles, tiler.crops)), batch_size=1, pin_memory=True)):
    tiles_batch = tiles_batch.float().cuda()
    pred_batch = best_model(tiles_batch)[:, 0:1, :,:] # taking only first channel

    merger.integrate_batch(pred_batch, coords_batch)

# Normalize accumulated mask and convert back to numpy
merged_mask = np.moveaxis(to_numpy(merger.merge()), 0, -1).astype(np.uint8)
merged_mask = tiler.crop_to_orignal_size(merged_mask)

SoftCrossEntropyLoss error

When I use the SoftCrossEntropyLoss, I got the error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

Could anyone help me? BTW, what paper proposed the SoftCrossEntropyLoss?

Is dependency on `opencv-python` necessary?

Depending on opencv-python makes it difficult to use the library in the docker environment since there is typically no gui. Would it be possible to depend on the opencv-python-headless instead?

Thanks.

I faced AttributeError: can't set attribute 'channels'

🐛 Bug

When I used pytorch_toolbelt.modules.decoders.FPNCatDecoder I got AttributeError.

think there is a duplicate usage of the "channel" variable in the FPNCatDecoder object, which is causing an error. As a solution, I renamed the channel variable used in FPNCatDecoder to "channel_o," and it executed without any issues. The potential location of the variable duplication seems to be in the channel variable of the DecoderModule.

can't install on windows with pip

  Could not find a version that satisfies the requirement torch>=0.4.1 (from pytorch_toolbelt) (from versions: 0.1.2, 0.1.2.post1)
No matching distribution found for torch>=0.4.1 (from pytorch_toolbelt)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.