Giter VIP home page Giter VIP logo

bonnetal's Introduction

Bonnetal!

Build Status

Example semantic segmentation of People vs Background using one of the included, real-time, architectures (running at 100FPS).

By Andres Milioto et.al @ University of Bonn.

In early 2018 we released Bonnet, which is a real-time, robotics oriented semantic segmentation framework using Convolutional Neural Networks (CNNs). Bonnet provides an easy pipeline to add architectures and datasets for semantic segmentation, in order to train and deploy CNNs on a robot. It contains a full training pipeline in Python using Tensorflow and OpenCV, and it also some C++ apps to deploy a CNN in ROS and standalone. The C++ library is made in a way which allows to add other backends (such as TensorRT).

Back then, most of my research was in the field of semantic segmentation, so that was what the framework was therefore tailored specifically to do. Since then, we have found a way to make things even more awesome, allowing for a suite of other tasks, like classification, detection, instance and semantic segmentation, feature extraction, counting, etc. Hence, the new name of this new framework: "Bonnetal", reflects that this is nothing but the old Bonnet, and then some. Hopefully, the explict et.al. will also spawn further collaboration and many pull requests! ๐Ÿ˜„

We've also switched to PyTorch to allow for easier mixing of backbones, decoders, and heads for different tasks. If you are still comfortable with just semantic segmentation, and/or you're a fan of TensorFlow, you can still find the original Bonnet here. Otherwise, keep on reading, and I'll try to explain why Bonnetal rules!

DISCLAIMER: I am currently bringing all the functionality out from a previously closed-source framework, so be patient if the task/weights are a placeholder, and send me an email to ask for a schedule on the particular part that you need.

Description

This code provides a framework to mix-match popular, imagenet-trained, backbones with different decoders to achieve different CNN-enabled tasks. All of these have pre-trained imagenet weights when used, that get downloaded by default if the conditions are met.

The main reason for the "lack" of variety of backbones so far is that imagenet pre-training takes a while, and it is pretty resource intensive. If you want a new backbone implemented we can talk about it, and you can share your resources to pretrain it ๐Ÿ˜ƒ (PR's welcome ๐Ÿ˜‰)

  • Tasks included are:

    • Full-image classification: /train, /deploy.
    • Semantic Segmentation: /train, /deploy.
    • More coming (but patience, since development is now a bit stagnant)...

The code is (like the original Bonnet) separated into a training part developed in Python, using Pytorch, and a deployment/inference part, which is fully written in C++, and contains the code to run on the robot, either using ROS or standalone.

Docker!

An nvidia-docker container is provided to run the full framework, and as a dependency check, as well as for the continuous integration. You can check the instructions to run the containers in /docker.

Training

/train contains Python code to easily mix and match backbones and decoders in order to train them for different image recognition tasks. It also contains helper scripts for other tasks such as converting graphs to ONNX for inference, getting image statistics for normalization, class statistics in the dataset, inference tests, accuracy assessment, etc, etc.

Deployment

/deploy contains C++ code for deployment on edge. Every task has its own library and namespace, and every package is a catkin package. Therefore, each task has 4 catkin packages:

  • A lib package that contains all inference files for the library.
  • A standalone package that shows how to use the library linked to a standalone C++ application.
  • A ros package that contains a node handler and some nodes to use the library with ROS for the sensor data message-passing, and
  • (optionally) a msg package that defines the messages required for a specific task, should this be required.

Inference is done either:

  • By generating a PyTorch traced model through the python interface that can be infered with the libtorch library, both on GPU and CPU, or
  • By generating an ONNX model through the python interface, that is later picked up by TensorRT, profiled in the individual computer looking at available memory and half precision capabilities, and inferer with the TensorRT engine. Notice that not all architectures are supported by TensorRT and we cannot take responsibility for this, so when you implement an architecture, do a quick test that it works with tensorRT before training it and it will make your life easier.

Pre-trained models

Imagenet pretrained weights for the backbones are downloaded directly to the backbones in first use, so they never start from scratch. Whenever you use a backbone for a task, if the image is RGB, then the weights from imagenet are downloaded into the backbone (unless a specific pretrained model is otherwise explicitly stated in the parameters).

These are the currently trained models we have:


License

Bonnetal: MIT

Copyright 2019, Andres Milioto, Cyrill Stachniss. University of Bonn.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Pretrained models: Model and Dataset Dependent

The pretrained models with a specific dataset maintain the copyright of such dataset.


Citations

If you use our framework for any academic work, please cite the original paper.

@InProceedings{milioto2019icra,
  author     = {A. Milioto and C. Stachniss},
  title      = {{Bonnet: An Open-Source Training and Deployment Framework for Semantic Segmentation in Robotics using CNNs}},
  booktitle  = {Proc. of the IEEE Intl. Conf. on Robotics \& Automation (ICRA)},
  year       = 2019,
  codeurl    = {https://github.com/Photogrammetry-Robotics-Bonn/bonnet},
  videourl   = {https://www.youtube.com/watch?v=tfeFHCq6YJs},
}

If you use our Instance Segmentation code, please cite its paper paper:

@InProceedings{milioto2019icra-fiass,
  author     = {A. Milioto and L. Mandtler and C. Stachniss},
  title      = {{Fast Instance and Semantic Segmentation Exploiting Local Connectivity, Metric Learning, and One-Shot Detection for Robotics }},
  booktitle  = {Proc. of the IEEE Intl. Conf. on Robotics \& Automation (ICRA)},
  year       = 2019,
}

Our networks are either built directly on top of, or strongly based on, the following architectures, so if you use them for any academic work, please give a look at their papers and cite them if you think proper:


Other useful GitHub's:

  • Sync Batchnorm. Allows to train bigger nets in multi-gpu setup with larger batch sizes so that batch norm doesn't diverge to something that doesn't represent the data.
  • Queueing tool: Very nice queueing tool to share GPU, CPU and Memory resources in a multi-GPU environment.
  • Pytorch: The backbone of everything.
  • onnx-tensorrt: ONNX graph to TensorRT engine for fast inference.
  • nvidia-docker: Docker that allows you to also exploit your nvidia GPU.

Internal Contributors (not present in open-source commits)


Acknowledgements

This work has partly been supported by the German Research Foundation under Germany's Excellence Strategy, EXC-2070 - 390732324 (PhenoRob). We also thank NVIDIA Corporation for providing a Quadro P6000 GPU partially used to develop this framework.

bonnetal's People

Contributors

chandrahasjr avatar tano297 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bonnetal's Issues

Training terminates after the first epoch due to excessive RAM usage

I am trying to train a semantic segmentation model from scratch using COCO dataset, and every time I try to run the training script, it is Killed at the validation step after epoch 0.

At first, I got RuntimeError: Dataloader worker (pid xxxx) is killed by signal: Killed. After looking online, I tried setting number of workers to 0, which caused a similar error at the same stage, but the message just says Killed. Looking at the memory usage, just before the process was killed, the RAM usage went all the way up to 97%. I have 64Gb of RAM, which is enough to fit the entire training set if needed, so I don't really understand where the issue originates.

I have attached two screenshots showing the errors. The first one suggests that it failed when trying to colourise the images with colorizer.py.

Could you suggest a workaround? I am hoping to train a model on COCO data to understand how it works, and then train it on my own data which I will format to be COCO-like.

Screenshot 2019-12-06 at 10 05 44

Screenshot 2019-12-06 at 00 27 05

some skip layers no grad in segmentation

in some segmenation tasks, why some skip layers from early layers doesn'y have a grad infomation?
because it seems that in mobilenetv2 backbone,it is x.detatch()

libcublas.so.10, needed by libnvinfer.so, not found

Hey,

I try to compile the packages without docker (I dont like it and it is also not working for me).

I am running Ubuntu 18.04 with Cuda 10.1, CUDNN8, TensorRT 5.1.5.

/usr/bin/ld: warning: libcublas.so.10, needed by /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/libnvinfer.so, not found (try using -rpath or -rpath-link) /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/libnvinfer.so: undefined reference to[email protected]'
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/libnvinfer.so: undefined reference to [email protected]' /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/libnvinfer.so: undefined reference to [email protected]'
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/libnvinfer.so: undefined reference to [email protected]' /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/libnvinfer.so: undefined reference to [email protected]'
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/libnvinfer.so: undefined reference to [email protected]' /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/libnvinfer.so: undefined reference to [email protected]'
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/libnvinfer.so: undefined reference to [email protected]' /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/libnvinfer.so: undefined reference to [email protected]'
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/libnvinfer.so: undefined reference to [email protected]' /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/libnvinfer.so: undefined reference to [email protected]'
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/libnvinfer.so: undefined reference to `[email protected]'
collect2: error: ld returned 1 exit status
make[2]: *** [/home/tobias/src/bonnetal/deploy/devel/.private/bonnetal_segmentation_ros/lib/bonnetal_segmentation_ros/bonnetal_segmentation_node] Error 1
make[1]: *** [CMakeFiles/bonnetal_segmentation_node.dir/all] Error 2
make: *** [all] Error 2

`

/bin/sh: 1: cannot create /etc/passwd: Permission denied when running nvidia-docker build -t tano297/bonnetal:runtime -f Dockerfile .

Dear maintainers,
I am getting the following error when running nvidia-docker build -t tano297/bonnetal:runtime -f Dockerfile .

---> Running in 8399ad0e2141
/bin/sh: 1: cannot create /etc/passwd: Permission denied
The command '/bin/sh -c export uid=1000 gid=1000 && mkdir -p /home/developer && mkdir -p /etc/sudoers.d && echo "developer:x:${uid}:${gid}:Developer,,,:/home/developer:/bin/bash" >> /etc/passwd && echo "developer:x:${uid}:" >> /etc/group && echo "developer ALL=(ALL) NOPASSWD: ALL" > /etc/sudoers.d/developer && chmod 0440 /etc/sudoers.d/developer && chown ${uid}:${gid} -R /home/developer && adduser developer sudo' returned a non-zero code: 2

Inference uses too much GPU memory

Hi,

I have trained a model on my own data, and now trying to run inference. When running infer_img.py on one image (640x480) I see my GPU (GeForce GTX 1080) usage jump up to 7.5gb, which seems excessive for one image only. Is this expected behaviour?

Do you have any suggestions on how to decrease GPU usage during inference, as I only have 8gb of GPU memory, and need it to run a simulation as well as a couple of other inference scripts at the same time, as the data is coming in.

I would also just like to say thanks for open sourcing your work. From what I've seen, this is one of the best detection/segmentation projects out there in terms of code quality and readability, as well as good explanations how to get everything working.

Error building the base docker due to unspecified depenency versions

When building the base docker, it fails due to this error:
Depends: libnvinfer-dev (= 5.1.5-1+cuda10.1) but 8.2.3-1+cuda11.4 is to be installed
The solution was to replace the line:
apt install tensorrt python3-libnvinfer-dev -yqq
with
apt install tensorrt python3-libnvinfer=5.1.5-1+cuda10.1 python3-libnvinfer-dev=5.1.5-1+cuda10.1 -yqq

in the DockerFile of the base image!

Error with ros while building the docker Image.

after running:
nvidia-docker build -t tano297/bonnetal:base -f docker/base/Dockerfile .

I get the following Error message:
Err:12 http://packages.ros.org/ros/ubuntu bionic InRelease The following signatures couldn't be verified because the public key is not available: NO_PUBKEY F42ED6FBAB17C654
Reading package lists...
W: GPG error: http://packages.ros.org/ros/ubuntu bionic InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY F42ED6FBAB17C654
E: The repository 'http://packages.ros.org/ros/ubuntu bionic InRelease' is not signed.

How to fix the RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED ? Thank you!

INTERFACE:
config yaml:  config/cityscapes/darknet21_aspp.yaml
log dir /home/pc/logs/2019-8-20-16:13/
model path None
eval only False
No batchnorm False
----------

Commit hash (training version):  b'5368eed'
----------

Opening config file config/cityscapes/darknet21_aspp.yaml
No pretrained directory found.
Copying files to /home/pc/logs/2019-8-20-16:13/ for further reference.
WARNING: Logging before flag parsing goes to stderr.
W0820 16:13:16.396194 140436803987200 deprecation_wrapper.py:119] From ../../common/logger.py:16: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

Images from:  /home3/data/city/city_selected/leftImg8bit/train
Labels from:  /home3/data/city/city_selected/gtFine/train
Inference batch size:  1
Images from:  /home3/data/city/city_selected/leftImg8bit/val
Labels from:  /home3/data/city/city_selected/gtFine/val
Original OS:  32
New OS:  8
Strides:  [2, 2, 2, 1, 1]
Dilations:  [1, 1, 1, 2, 4]
Trying to get backbone weights online from Bonnetal server.
Using pretrained weights from bonnetal server for backbone
[Decoder] os:  4 in:  128 skip: 128 out:  128
[Decoder] os:  2 in:  128 skip: 64 out:  64
[Decoder] os:  1 in:  64 skip: 32 out:  32
Using normalized weights as bias for head.
No path to pretrained, using bonnetal Imagenet backbone weights and random decoder.
Total number of parameters:  19239412
Total number of parameters requires_grad:  19239412
Param encoder  14920544
Param decoder  4318208
Param head  660
Training in device:  cuda
Ignoring class  19  in IoU evaluation
[IOU EVAL] IGNORE:  tensor([19])
[IOU EVAL] INCLUDE:  tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
        18])
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [3,0,0], thread: [576,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [2,0,0], thread: [352,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [2,0,0], thread: [353,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [2,0,0], thread: [354,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [2,0,0], thread: [355,0,0] Assertion `t >= 0 && t < n_classes` failed.
Traceback (most recent call last):
  File "train.py", line 117, in <module>
    trainer.train()
  File "../../tasks/segmentation/modules/trainer.py", line 302, in train
    scheduler=self.scheduler)
  File "../../tasks/segmentation/modules/trainer.py", line 487, in train_epoch
    loss.backward()
  File "/usr/local/lib/python3.5/dist-packages/torch/tensor.py", line 107, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/usr/local/lib/python3.5/dist-packages/torch/autograd/__init__.py", line 93, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

[Support] Need help reducing GPU memory usage.

Hello!

Nice looking library! I'd like to train mobilenetv2 for semantic segmentation using my coco-like dataset. I've copied the coco dataloader and updated things for my data.

But even though my GPU has 16GB of ram and I've set batch size to 1, I'm still consuming all my GPU memory as soon as training begins, crashing the session.

My config is below, followed by a dump of my terminal output. I'm not sure what I'm doing wrong.

#training parameters
train:
  loss: "xentropy"       # must be either xentropy or iou
  max_epochs: 1000
  max_lr: 0.005          # sgd learning rate max
  min_lr: 0.001          # warmup initial learning rate
  up_epochs: 1           # warmup during first XX epochs (can be float)
  down_epochs:  20       # warmdown during second XX epochs  (can be float)
  max_momentum: 0.7      # sgd momentum max when lr is mim
  min_momentum: 0.5      # sgd momentum min when lr is max
  final_decay: 0.95      # learning rate decay per epoch after initial cycle (from min lr)
  w_decay: 0.0001        # weight decay
  batch_size:  1         # batch size
  report_batch: 1        # every x batches, report loss
  report_epoch: 1        # every x epochs, report validation set
  save_summary: False    # Summary of weight histograms for tensorboard
  save_imgs: False        # False doesn't save anything, True saves some
                         # sample images (one per batch of the last calculated batch)
                         # in log folder
  avg_N: 3               # average the N best models
  crop_prop:
    width: 2560
    height: 1440

# backbone parameters
backbone:
  name: "mobilenetv2"
  dropout: 0.01
  bn_d: 0.01
  OS: 8  # output stride
  train: True # train backbone?
  extra:
    width_mult: 1.0
    shallow_feats: True # get features before the last layer (mn2)

decoder:
  name: "aspp_residual"
  dropout: 0.01
  bn_d: 0.01
  train: True # train decoder?
  extra:
    aspp_channels: 64
    skip_os: [4]
    last_channels: 32

# classification head parameters
head:
  name: "segmentation"
  dropout: 0.01

# dataset (to find parser)
dataset:
  name: "rover"
  location: "/home/taylor/datasets/rover"
  workers: 1  # number of threads to get data
  img_means: #rgb
    - 0.47037394
    - 0.44669544
    - 0.40731883
  img_stds: #rgb
    - 0.27876515
    - 0.27429348
    - 0.28861644
  img_prop:
    width: 2560
    height: 1440
    depth: 3
  labels:
    0: 'nothing'
    1: 'trail'
    2: 'terrain'
    3: 'sidewalk'
    4: 'person'
    5: 'traffic_cone'
    6: 'vehicle'
    7: 'private_road'
    8: 'dirt_road'
    9: 'drivable_ground'
    10: 'building'
    11: 'public_street'
  labels_w:
    0: 1.0
    1: 1.0
    2: 1.0
    3: 1.0
    4: 1.0
    5: 1.0
    6: 1.0
    7: 1.0
    8: 1.0
    9: 1.0
    10: 1.0
    11: 1.0
  color_map: # bgr
    0: [0, 0, 0]
    1: [220, 20, 60]
    2: [119, 11, 32]
    3: [0, 0, 142]
    4: [0, 0, 230]
    5: [106, 0, 228]
    6: [0, 60, 100]
    7: [0, 80, 100]
    8: [0, 0, 70]
    9: [0, 0, 192]
    10: [250, 170, 30]
    11: [100, 170, 30]

Here is my terminal output (There's a few extra things being printed due to some debugging):

developer@taylor-desktop:~/bonnetal/train/tasks/segmentation$ ./train.py --cfg config/rover/mobilenetv2_aspp_res.yaml 
----------
INTERFACE:
config yaml:  config/rover/mobilenetv2_aspp_res.yaml
log dir /home/developer/logs/2020-5-01-03:19/
model path None
eval only False
No batchnorm False
----------

Commit hash (training version):  b'5368eed'
----------

Opening config file config/rover/mobilenetv2_aspp_res.yaml
No pretrained directory found.
Copying files to /home/developer/logs/2020-5-01-03:19/ for further reference.
Images from:  /home/taylor/datasets/rover/images/rover_train
Labels from:  /home/taylor/datasets/rover/annotations/rover_train
Inference batch size:  1
Images from:  /home/taylor/datasets/rover/images/rover_test
Labels from:  /home/taylor/datasets/rover/annotations/rover_test
['__add__', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__len__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'crop', 'crop_param', 'filenames', 'filenamesGt', 'h', 'h_flip', 'images_root', 'jitter', 'labels_root', 'means', 'norm', 'stds', 'subset', 'tensorize_img', 'tensorize_lbl', 'w']
dict_items([(0, 1.0), (1, 1.0), (2, 1.0), (3, 1.0), (4, 1.0), (5, 1.0), (6, 1.0), (7, 1.0), (8, 1.0), (9, 1.0), (10, 1.0), (11, 1.0)])
Original OS:  32
New OS:  8.0
Trying to get backbone weights online from Bonnetal server.
Using pretrained weights from bonnetal server for backbone
[Decoder] os:  4 in:  32 skip: 24 out:  24
[Decoder] os:  2 in:  24 skip: 16 out:  16
[Decoder] os:  1 in:  16 skip: 3 out:  16
Using normalized weights as bias for head.
No path to pretrained, using bonnetal Imagenet backbone weights and random decoder.
Total number of parameters:  2144900
Total number of parameters requires_grad:  2144900
Param encoder  1812800
Param decoder  331896
Param head  204
Training in device:  cuda
[IOU EVAL] IGNORE:  tensor([], dtype=torch.int64)
[IOU EVAL] INCLUDE:  tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
Traceback (most recent call last):
  File "./train.py", line 117, in <module>
    trainer.train()
  File "../../tasks/segmentation/modules/trainer.py", line 303, in train
    scheduler=self.scheduler)
  File "../../tasks/segmentation/modules/trainer.py", line 479, in train_epoch
    output = model(input)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "../../tasks/segmentation/modules/segmentator.py", line 102, in forward
    x = self.decoder(x, skips)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "../..//tasks/segmentation/decoders/aspp_residual.py", line 83, in forward
    features = mixconv(features)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "../../common/layers.py", line 83, in forward
    return x + self.inv_dwise_conv(x)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/batchnorm.py", line 83, in forward
    exponential_average_factor, self.eps)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 1697, in batch_norm
    training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: CUDA out of memory. Tried to allocate 338.00 MiB (GPU 0; 15.89 GiB total capacity; 11.95 GiB already allocated; 324.06 MiB free; 558.73 MiB cached)

Here is nvidia-smi when the program is not running. Looks like plenty of RAM available:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro P5000        On   | 00000000:08:00.0  On |                  Off |
| 32%   52C    P0    45W / 180W |   2694MiB / 16275MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1635      G   /usr/lib/xorg/Xorg                            97MiB |
|    0      1872      G   /usr/bin/gnome-shell                          53MiB |
|    0      3769      G   /usr/lib/xorg/Xorg                           974MiB |
|    0      3910      G   /usr/bin/gnome-shell                         894MiB |
|    0      6807      G   ...quest-channel-token=7931726709970216186   113MiB |
|    0      7241      G   gnome-control-center                          94MiB |
|    0     10884      G   /usr/bin/vlc                                 110MiB |
|    0     12677      G   ...quest-channel-token=7615775727565811985    56MiB |
|    0     21292      G   ...-token=1B3DA049A377FA772C5604DC206A395E   234MiB |
|    0     23554      G   /usr/lib/firefox/firefox                       1MiB |
|    0     23638      G   /usr/lib/firefox/firefox                       1MiB |
|    0     23666      G   /usr/lib/firefox/firefox                       1MiB |
|    0     27000      G   /usr/lib/firefox/firefox                       1MiB |
|    0     30196      G   kicad                                         46MiB |
+-----------------------------------------------------------------------------+

Any help would be appreciated!

Pytorch 1.6 Issues with OneShot

Hi,

I am trying to use the oneshot while using Pytorch 1.6 however it gives me a warning:

UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate.

Do you know how I could solve this? Thank you

bash: /home/developer/bonnetal/deploy/devel/setup.bash: No such file or directory

Hello everyone! I have been using bonnet for semantic segmentation before and now switching to bonnetal.

When I try to run the example it gives me the following:

developer@my-pc: /bonnetal/train/tasks/segmentation$ ./train.py -c ./config/
coco/mobilenetv2_aspp_res
mobilenetv2_aspp_res.yaml mobilenetv2_aspp_res_attention.yaml
developer@my-pc:/bonnetal/train/tasks/segmentation$ ./train.py -c ./config/
coco/mobilenetv2_aspp_res
mobilenetv2_aspp_res.yaml mobilenetv2_aspp_res_attention.yaml
developer@my-pc:/bonnetal/train/tasks/segmentation$ ./train.py -c ./config/
coco/mobilenetv2_aspp_res.yaml

INTERFACE:
config yaml: ./config/coco/mobilenetv2_aspp_res.yaml
log dir /home/developer/logs/2019-8-01-09:06/
model path None
eval only False
No batchnorm False

Commit hash (training version): b'5aed807'

Opening config file ./config/coco/mobilenetv2_aspp_res.yaml
./train.py:80: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
CFG = yaml.load(f)
No pretrained directory found.
Copying files to /home/developer/logs/2019-8-01-09:06/ for further reference.
Images from: /cache/datasets/coco/train2017
Labels from: /cache/datasets/coco/annotations/panoptic_train2017_remap
Traceback (most recent call last):
File "./train.py", line 116, in
trainer = Trainer(CFG, FLAGS.log, FLAGS.path, FLAGS.eval, FLAGS.no_batchnorm)
File "../../tasks/segmentation/modules/trainer.py", line 68, in init
workers=self.CFG["dataset"]["workers"])
File "../..//tasks/segmentation/dataset/coco/parser.py", line 377, in init
drop_last=True)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 176, in init
sampler = RandomSampler(dataset)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/sampler.py", line 66, in init
"value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0

I also got the following error while downloading docker image:

After the command:
sudo nvidia-docker run -ti --rm -e DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix -v $HOME/.Xauthority:/home/developer/.Xauthority -v /home/$USER:/home/$USER --net=host --pid=host --ipc=host tano297/bonnetal:runtime /bin/bash

I got the following:
To run a command as administrator (user "root"), use "sudo ".
See "man sudo_root" for details.
bash: /home/developer/bonnetal/deploy/devel/setup.bash: No such file or directory

Can you please help me to solve this problem? Many thanks!

GPU stops working when running inference

Hi,

I am using a GeForce RTX 2060 with bonnetal and it is crashing the GPU. I get the error:

Unable to determine the device handle for GPU 0000:01:00.0: GPU is lost. Reboot the system to recover this GPU

In this case, i am using my own code for ROS which uses the user.infer. This is the code:

#!/usr/bin/env python3
# Futures
from __future__ import print_function

# STD
import sys
import time
import argparse
import subprocess
import datetime
import os
import shutil

# ROS
import rospy
import roslib
from sensor_msgs.msg import CompressedImage

# numpy and scipy
import numpy as np
from scipy.ndimage import filters

# OpenCV
import cv2
from cv_bridge import CvBridge, CvBridgeError

# For overlaying images
from PIL import Image

import torch
# check if cuda is activated
cuda = torch.cuda.is_available()
if cuda == False:
    print("Model is NOT using GPU")
print ("Cuda:", torch.cuda.is_available())

class BonnetalNode:
    """
    Encapsulates the bonnetal functionality into a ROS node.
    """
    # A ROS subscriber for input images
    img_sub = None
    labelled_img_pub = None
    overlaid_img_pub = None
    # Bonnetal interface
    user = None

    def __init__(self):
        """
        Initializes ROS (pubs and subs) and bonnetal.
        """
        # Initialize ROS
        rospy.init_node("bonnetal_node")
        init = rospy.Time.now()
        # Parameters Config 
        path_model = rospy.get_param("path_model")
        backend = rospy.get_param("backend")
        camera_topic = rospy.get_param("camera_topic")

        # Add path for bonnetal files
        abs_path = rospy.get_param("abs_path")
        print ("Abs path is: ", abs_path)
        sys.path.insert(0, abs_path + "bonnetal/train")

        # Initialize bonnetal
        self.initialize_bonnetal(path=path_model, backend=backend)

        # Initialize publishers and subscribers
        self.overlaid_img_pub = rospy.Publisher("/overlaid_image/compressed",
                CompressedImage, queue_size = 1)
        self.labelled_img_pub = rospy.Publisher("/output_labelled_img/compressed",
                CompressedImage, queue_size = 1)
        # buff size allows callback to get the latest msg instead of queueing them
        self.img_sub = rospy.Subscriber(camera_topic,
                    CompressedImage, self.image_callback,  queue_size = 1, buff_size=2**32)

        rospy.loginfo("Segmentation node initialized in {} seconds!".format(
            (rospy.Time.now()-init).to_sec()))

    def initialize_bonnetal(self, path, backend="native", workspace=8000000000, calib_images=None):
        """
        Initializes bonnetal

        :type path: string
        :param path: full path to pretrained model

        :type backend: string
        :param backend: framework for segmentation task

        :type workspace: int
        :param workspace: max workspace size (only for TensorRT framework)

        :type calib_images: list
        :param calib_images: calibration images, must be a list of images (only for TensorRT framework)
        """
        # create inference context for the desired backend
        if backend == "tensorrt":
            # import and use tensorRT
            try:
                print("Using tensorRT")
                from tasks.segmentation.modules.userTensorRT import UserTensorRT
                self.user = UserTensorRT(path, workspace, calib_images)
            except ImportError as e:
                print ("ERROR:", e)
                sys.exit(0)
            except:
                print('\nERROR:TensorRT needs to use inference model type .onnx. You can make one '
                    'using tasks/segmentation/make_deploy_model.py')
                sys.exit(0)
        elif backend == "caffe2":
            try:
                # import and use caffe2
                print("Using caffe2")
                from tasks.segmentation.modules.userCaffe2 import UserCaffe2
                self.user = UserCaffe2(path)
            except ImportError as e:
                print ("ERROR:", e)
                sys.exit(0)
            except:
                print('\nERROR:Caffe2 needs to use inference model type .onnx. You can make one '
                    'using tasks/segmentation/make_deploy_model.py')
                sys.exit(0)

        elif backend == "pytorch":
            # import and use pytorch
            try:
                print("Using PyTorch")
                from tasks.segmentation.modules.userPytorch import UserPytorch
                self.user = UserPytorch(path)
            except ImportError as e:
                print ("ERROR:", e)
                sys.exit(0)
            except:
                print('\nERROR:PyTorch needs to use inference model type .pytorch. You can make one '
                    'using tasks/segmentation/make_deploy_model.py')
                sys.exit(0)

        else:
            # default to native pytorch
            print("Using native PyTorch")
            from tasks.segmentation.modules.user import User
            self.user = User(path)

    def segment_image(self, cv_img):
        """
        Input should be cv image.

        :type cv_img: int
        :param cv_img: max workspace size (only for TensorRT framework)

        :rtype: numpy.ndarray
        :returns: OpenCV color image with labels of fuel

        :rtype: numpy.ndarray
        :returns: OpenCV color image from the camera with overlay labels of fuel
        """
        # infer
        # print("Inferring ")
        _, lbl_img = self.user.infer(cv_img, False)
        overlay_img = Image.blend(Image.fromarray(cv_img), Image.fromarray(lbl_img), 0.5)

        return lbl_img, overlay_img

    def unpack_image_msg(self, msg):
        """
        Receives a sensor_msgs/CompressedImage and returns a cv image

        :type msg: CompressedImage
        :param msg: CompressedImage ROS message

        :rtype: numpy.ndarray
        :returns: OpenCV color image
        """
        np_arr = np.fromstring(msg.data, np.uint8)
        cv_img = cv2.imdecode(np_arr, cv2.IMREAD_COLOR)

        return cv_img

    def re_pack_image_msg(self, cv_img):
        """
        Packing OpenCV image to ROS message CompressedImage

        :type cv_img: CompressedImage
        :param cv_img: CompressedImage ROS message

        :rtype: CompressedImage
        :returns: CompressedImage ROS message in jpeg format
        """
        #img_msg = cv2_to_imgmsg(cv_img, encoding="bgr8")

        img_msg = CompressedImage()
        img_msg.header.stamp = rospy.Time.now()
        img_msg.format = "jpeg"
        img_msg.data = np.array(cv2.imencode('.jpg', np.asarray(cv_img))[1]).tostring()

        return img_msg

    def pub_lbl_img(self, cv_img):
        """
        Publishes the labelled (segmented) images.

        :type cv_img: CompressedImage
        :param cv_img: CompressedImage ROS message
        """
        img_msg = self.re_pack_image_msg(cv_img)
        self.labelled_img_pub.publish(img_msg)

    def pub_overlay_img(self, cv_img):
        """
        Publishes the overlaid images.

        :type cv_img: CompressedImage
        :param cv_img: CompressedImage ROS message
        """
        img_msg = self.re_pack_image_msg(cv_img)
        self.overlaid_img_pub.publish(img_msg)

    def image_callback(self, msg):
        """
        Receives sensor_msgs/CompressedImage and publishes labelled images.

        :type msg: CompressedImage
        :param msg: CompressedImage ROS message
        """
        cv_img = self.unpack_image_msg(msg)

        lbl_img, overlay_img = self.segment_image(cv_img)

        self.pub_lbl_img(lbl_img)
        self.pub_overlay_img(overlay_img)


    def run(self):
        """
        Enters the main loop for processing messages.
        """
        rospy.spin()


def main():
    node = BonnetalNode()
    node.run()


if __name__ == "__main__":
    main()

Do you know what the issue could be?

How to configure cfg.yaml file?

Hello!
I am using bonnetal for semantic segmentation task. It works and I would like to ask you how to configure the parameters in cfg.yaml file to make the results better. I have only one class to segment and the second is background as in the example you provided.

If there is any documentation of the meaning of each of the parameters?

Semantic Segmentation: Only 2 of 3 classes get trained

Hey there,

I am trying to train different backbone/decoder combinations in a semantic segmentation task. My data is similar to the sugarbeets from the old bonnet, with 3 classes: background, carrot, and weed.

Somehow I always end up training only the first two classes. From how I understand it, the number of classes is defined completely in the cfg-file, right? With labels, label-weights and color_map. I would appreciate any help :)

I just double-checked before posting, and noticed that when I run the calculate_segmentation_weights, I dont get differences in frequencies in the carrot and weed classes, which is of course weird but I cant see the reason.

Edit: To be exact, both are Zero.
Num of pixels: 681984000
Frequency: [0.89665005 0. 0. ]
I cant seem to know why

I'll attach the cfg I'm using right now.
Best regards,
Olli

P.S. Any chance you got pretrained models for the sugarbeet data in this framework?

backbone:
  OS: 8
  bn_d: 0.001
  dropout: 0.0
  extra:
    shallow_feats: true
    width_mult: 1.0
  name: mobilenetv2
  train: true
dataset:
  color_map:
    0:
    - 0
    - 0
    - 0
    1:
    - 255
    - 0
    - 0
    2:
    - 0
    - 0
    - 255
  img_means:
  - 0.42437694635119927
  - 0.5040352731582203
  - 0.4624443929799188
  img_prop:
    depth: 3
    height: 720
    width: 1280
  img_stds:
  - 0.14995696217848253
  - 0.1564923538805844
  - 0.17123649653037434
  labels:
    0: ground
    1: carrot
    2: weed
  labels_w:
    0: 0.10334995
    1: 1.0
    2: 1.0
  location: dataset/leaf1280/
  name: leaf1280
  workers: 12
decoder:
  bn_d: 0.001
  dropout: 0.0
  extra:
    aspp_channels: 64
    last_channels: 32
    skip_os:
    - 4
    - 2
  name: aspp_residual_attention
  train: true
head:
  dropout: 0.0
  name: segmentation
train:
  avg_N: 3
  batch_size: 3
  crop_prop:
    height: 448
    width: 448
  down_epochs: 0
  final_decay: 0.99
  loss: xentropy
  max_epochs: 1000
  max_lr: 0.001
  max_momentum: 0.9
  min_lr: 0.001
  min_momentum: 0.9
  report_batch: 1
  report_epoch: 1
  save_imgs: false
  save_summary: false
  up_epochs: 0
  w_decay: 1.0e-05

FPS

Hello,

I'm using PyTorch GPU and "MobilenetsV2 ASPP Res - 512px" model with Python and C++ but I get ~94ms on my PC. Is it ok or not?

OS: Win 10
GPU: GTX 1080 Ti
Video size: 2160x3840
Video FPS: 60

Thanks

Platform DOESN'T HAVE fp16 support.

I tried the cityscapes_erfnet_1024_70 dataset and got an error on start.

I am running Cuda 10.2 with TensorRT 6 on Ubuntu 18.04.

Specs:
Intelยฎ Xeon(R) CPU E5-1650 v4 @ 3.60GHz ร— 12
64GB DDR4-RAM
3x Titan X (12GB)

Trying to open model
Trying to deserialize previously stored: /home/tobias/src/bonnetal/cityscapes_erfnet_1024_70/model.trt
Could not deserialize TensorRT engine.
Generating from sratch... This may take a while...
Trying to generate trt engine from : /home/tobias/src/bonnetal/cityscapes_erfnet_1024_70/model.onnx
Platform DOESN'T HAVE fp16 support.
No DLA selected.
Could not open file /home/tobias/src/bonnetal/cityscapes_erfnet_1024_70/model.onnx
Could not open file /home/tobias/src/bonnetal/cityscapes_erfnet_1024_70/model.onnx
Failed to parse ONNX model from file/home/tobias/src/bonnetal/cityscapes_erfnet_1024_70/model.onnx
Success picking up ONNX model
Success adding argmax to trt model
[bonnetal_segmentation_node-2] process has died [pid 9628, exit code -11, cmd /home/tobias/src/bonnetal/deploy/devel/lib/bonnetal_segmentation_ros/bonnetal_segmentation_node __name:=bonnetal_segmentation_node __log:=/home/tobias/.ros/log/2eb40dc4-f663-11ea-893f-38d547c88646/bonnetal_segmentation_node-2.log].
log file: /home/tobias/.ros/log/2eb40dc4-f663-11ea-893f-38d547c88646/bonnetal_segmentation_node-2*.log

terminate called after throwing an instance of 'c10::Error'

Hi,
I am trying to run segmentation using pretrained model.
I am using docker on Ubuntu 18.04 with GPU.
nvidia-smi works fine (but whole gpu mem is already used for some training in the background)

nvidia-docker run -ti --rm -e DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix -v $HOME/.Xauthority:/home/developer/.Xauthority -v /home/$USER:/home/$USER --net=host --pid=host -v /mnt/Data/dataset002mp4:/home/developer/dataset2 --ipc=host tano297/bonnetal:runtime /bin/bash

In docker:

cd deploy
catkin init
catkin build
cd ~/bonnetal/deploy/devel/lib/bonnetal_segmentation_standalone
./infer_img -p mapillary_darknet53_aspp_res_512_os8_40/ -i ~/dataset2/frames/00000001.jpg -v

I get:

================================================================================
image: /home/developer/dataset2/frames/00000001.jpg
path: mapillary_darknet53_aspp_res_512_os8_40//
backend: pytorch. Using default!
verbose: 1
================================================================================
Trying to open model
Could not send model to GPU, using CPU
terminate called after throwing an instance of 'c10::Error'
  what():  open file failed, file path: mapillary_darknet53_aspp_res_512_os8_40///model.pytorch (FileAdapter at ../caffe2/serialize/file_adapter.cc:11)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x6c (0x7f824a5e845c in /usr/local/lib/libc10.so)
frame #1: caffe2::serialize::FileAdapter::FileAdapter(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x208 (0x7f82c2382538 in /usr/local/lib/libcaffe2.so)
frame #2: torch::jit::load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&) + 0x40 (0x7f824b1f9250 in /usr/local/lib/libtorch.so.1)
frame #3: bonnetal::segmentation::NetPytorch::NetPytorch(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x3e1 (0x7f82c4e73171 in /home/developer/bonnetal/deploy/devel/.private/bonnetal_segmentation_lib/lib/libbonnetal_segmentation_lib.so)
frame #4: bonnetal::segmentation::make_net(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x3a6 (0x7f82c4e71926 in /home/developer/bonnetal/deploy/devel/.private/bonnetal_segmentation_lib/lib/libbonnetal_segmentation_lib.so)
frame #5: <unknown function> + 0x7dfe (0x55dc04edddfe in ./infer_img)
frame #6: __libc_start_main + 0xe7 (0x7f824bd55b97 in /lib/x86_64-linux-gnu/libc.so.6)
frame #7: <unknown function> + 0x87ea (0x55dc04ede7ea in ./infer_img)

Aborted (core dumped)

Anything obvious?
Is it related to no mem on GPU?
I tried also with CUDA_VISIBLE_DEVICES=''
I was looking for an example, how to use pretrained models, but haven't found any instructions.
I am finally going to use these models and present results on YT.
I will be very grateful for any help.

BTW, I am using docker because I have ROS1 with catkin_make and no catkin command.

Add depth channel

Hi,

I am currently thinking of using bonnetal for a project and I saw that there is a benchmark using the Zed Depth. Which encoder-decoder was it used? Do you have any tips on how to include rgb+depth to the current available backbone and decoders in bonnetal or would I need to create my own in this case? Thanks!

Can't Fix RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED ?

I am finding the same error but do not seem to be able to solve it. I have changes the labels and preprocessed the label file (changed labels.py and ran python createTrainIdLabelImgs.py ) but the code still exits before completing
File ../../tasks/segmentation/modules/trainer.py, line 488, in train_epoch loss.backward()

Do you have any idea what I could do to solve this issue?

My labels.py file in cityscapes:

labels = [
    #       name                     id    trainId   category            catId     hasInstances   ignoreInEval   color
    Label(  'unlabeled'            ,  0 ,      19 , 'void'            , 0       , False        , True         , (  0,  0,  0) ),
    Label(  'ego vehicle'          ,  1 ,      19 , 'void'            , 0       , False        , True         , (  0,  0,  0) ),
    Label(  'rectification border' ,  2 ,      19 , 'void'            , 0       , False        , True         , (  0,  0,  0) ),
    Label(  'out of roi'           ,  3 ,      19 , 'void'            , 0       , False        , True         , (  0,  0,  0) ),
    Label(  'static'               ,  4 ,      19 , 'void'            , 0       , False        , True         , (  0,  0,  0) ),
    Label(  'dynamic'              ,  5 ,      19 , 'void'            , 0       , False        , True         , (111, 74,  0) ),
    Label(  'ground'               ,  6 ,      19 , 'void'            , 0       , False        , True         , ( 81,  0, 81) ),
    Label(  'road'                 ,  7 ,        0 , 'flat'            , 1       , False        , False        , (128, 64,128) ),
    Label(  'sidewalk'             ,  8 ,        1 , 'flat'            , 1       , False        , False        , (244, 35,232) ),
    Label(  'parking'              ,  9 ,      19 , 'flat'            , 1       , False        , True         , (250,170,160) ),
    Label(  'rail track'           , 10 ,      19 , 'flat'            , 1       , False        , True         , (230,150,140) ),
    Label(  'building'             , 11 ,        2 , 'construction'    , 2       , False        , False        , ( 70, 70, 70) ),
    Label(  'wall'                 , 12 ,        3 , 'construction'    , 2       , False        , False        , (102,102,156) ),
    Label(  'fence'                , 13 ,        4 , 'construction'    , 2       , False        , False        , (190,153,153) ),
    Label(  'guard rail'           , 14 ,      19 , 'construction'    , 2       , False        , True         , (180,165,180) ),
    Label(  'bridge'               , 15 ,      19 , 'construction'    , 2       , False        , True         , (150,100,100) ),
    Label(  'tunnel'               , 16 ,      19 , 'construction'    , 2       , False        , True         , (150,120, 90) ),
    Label(  'pole'                 , 17 ,        5 , 'object'          , 3       , False        , False        , (153,153,153) ),
    Label(  'polegroup'            , 18 ,      19 , 'object'          , 3       , False        , True         , (153,153,153) ),
    Label(  'traffic light'        , 19 ,        6 , 'object'          , 3       , False        , False        , (250,170, 30) ),
    Label(  'traffic sign'         , 20 ,        7 , 'object'          , 3       , False        , False        , (220,220,  0) ),
    Label(  'vegetation'           , 21 ,        8 , 'nature'          , 4       , False        , False        , (107,142, 35) ),
    Label(  'terrain'              , 22 ,        9 , 'nature'          , 4       , False        , False        , (152,251,152) ),
    Label(  'sky'                  , 23 ,       10 , 'sky'             , 5       , False        , False        , ( 70,130,180) ),
    Label(  'person'               , 24 ,       11 , 'human'           , 6       , True         , False        , (220, 20, 60) ),
    Label(  'rider'                , 25 ,       12 , 'human'           , 6       , True         , False        , (255,  0,  0) ),
    Label(  'car'                  , 26 ,       13 , 'vehicle'         , 7       , True         , False        , (  0,  0,142) ),
    Label(  'truck'                , 27 ,       14 , 'vehicle'         , 7       , True         , False        , (  0,  0, 70) ),
    Label(  'bus'                  , 28 ,       15 , 'vehicle'         , 7       , True         , False        , (  0, 60,100) ),
    Label(  'caravan'              , 29 ,      19 , 'vehicle'         , 7       , True         , True         , (  0,  0, 90) ),
    Label(  'trailer'              , 30 ,      19 , 'vehicle'         , 7       , True         , True         , (  0,  0,110) ),
    Label(  'train'                , 31 ,       16 , 'vehicle'         , 7       , True         , False        , (  0, 80,100) ),
    Label(  'motorcycle'           , 32 ,       17 , 'vehicle'         , 7       , True         , False        , (  0,  0,230) ),
    Label(  'bicycle'              , 33 ,       18 , 'vehicle'         , 7       , True         , False        , (119, 11, 32) ),
    Label(  'license plate'        , -1 ,       19 , 'vehicle'         , 7       , False        , True         , (  0,  0,142) ),
] 

Traceback:

./train.py -c ~/bonnetal/train/tasks/segmentation/config/cityscapes/ERFNet.yaml -l ~/bonnetal/train/tasks/segmentation/log1
/home/cris/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/cris/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/cris/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/cris/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/cris/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/cris/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
/home/cris/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/cris/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/cris/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/cris/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/cris/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/cris/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
----------
INTERFACE:
config yaml:  /home/cris/bonnetal/train/tasks/segmentation/config/cityscapes/ERFNet.yaml
log dir /home/cris/bonnetal/train/tasks/segmentation/log1
model path None
eval only False
No batchnorm False
----------

Commit hash (training version):  b'5368eed'
----------

Opening config file /home/cris/bonnetal/train/tasks/segmentation/config/cityscapes/ERFNet.yaml
No pretrained directory found.
Copying files to /home/cris/bonnetal/train/tasks/segmentation/log1 for further reference.
WARNING:tensorflow:From ../../common/logger.py:16: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

Images from:  ~/bonnetal/cityscapes/leftImg8bit/train
Labels from:  ~/bonnetal/cityscapes/gtFine/train
LENGTH 2975 2975
Inference batch size:  4
Images from:  ~/bonnetal/cityscapes/leftImg8bit/val
Labels from:  ~/bonnetal/cityscapes/gtFine/val
LENGTH 500 500
Original OS:  8
New OS:  8
Trying to get backbone weights online from Bonnetal server.
Using pretrained weights from bonnetal server for backbone
OS:  1 , channels:  16
OS:  2 , channels:  16
OS:  4 , channels:  64
[Decoder] os:  4 in:  128 skip: 64 out:  64
[Decoder] os:  2 in:  64 skip: 16 out:  16
[Decoder] os:  1 in:  16 skip: 3 out:  16
Using normalized weights as bias for head.
No path to pretrained, using bonnetal Imagenet backbone weights and random decoder.
Total number of parameters:  2252148
Total number of parameters requires_grad:  2252148
Param encoder  1913168
Param decoder  338640
Param head  340
Training in device:  cuda
/home/cris/.local/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:100: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule.See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
Ignoring class  19  in IoU evaluation
[IOU EVAL] IGNORE:  tensor([19])
[IOU EVAL] INCLUDE:  tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
        18])
Let's see if it finishes this
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:104: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [1,0,0], thread: [576,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:104: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [1,0,0], thread: [577,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:104: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [1,0,0], thread: [578,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:104: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [1,0,0], thread: [579,0,0] Assertion `t >= 0 && t < n_classes` failed.
Traceback (most recent call last):
  File "./train.py", line 117, in <module>
    trainer.train()
  File "../../tasks/segmentation/modules/trainer.py", line 302, in train
    scheduler=self.scheduler)
  File "../../tasks/segmentation/modules/trainer.py", line 488, in train_epoch
    loss.backward()
  File "/home/cris/.local/lib/python3.6/site-packages/torch/tensor.py", line 166, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/cris/.local/lib/python3.6/site-packages/torch/autograd/__init__.py", line 99, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

Error using standalone image inference

hey there,

training worked fine, also making the inference models was no problem. I then build for standalone use, which gives me no errors.

But when I try using standalone inference on images as described in deploy/segmentation, I get the following C++ error:

Predicting image: ../train/tasks/segmentation/dataset/504-896/rgb/rgb-cropped/
terminate called after throwing an instance of 'std::out_of_range'
  what():  basic_string::substr: __pos (which is 140) > this->size() (which is 0)
Aborted (core dumped)

Since I dont yet have any idea of C++ whatsoever, I thought maybe someone has a chance to see the problem directly. In the following I'll add the whole console output for my infer_img execution, aswell as the config of the model. I would appreciate any clue. Thx for looking!

My example is called with tensor rt, but I get the same error with pytorch aswell. I first thought it was about image dimensions, remade the inference models with new dimensions etc., but did not get there yet.

The full console output:

developer@olli:/home/olli/Code/XxxXxxx/bonnetal/deploy$ ./devel/lib/bonnetal_segmentation_standalone/infer_img -p ../train/tasks/segmentation/mystuff/deployed/best_darknet_leaf -i ../train/tasks/segmentation/dataset/504-896/rgb/rgb-cropped/ -b tensorrt
================================================================================
image: ../train/tasks/segmentation/dataset/504-896/rgb/rgb-cropped/
path: ../train/tasks/segmentation/mystuff/deployed/best_darknet_leaf/
backend: tensorrt
verbose: 0
================================================================================
Setting verbosity to: true
Trying to open model
Trying to deserialize previously stored: ../train/tasks/segmentation/mystuff/deployed/best_darknet_leaf//model.trt
Successfully found TensorRT engine file ../train/tasks/segmentation/mystuff/deployed/best_darknet_leaf//model.trt
Successfully created inference runtime
No DLA selected.
Successfully allocated 122542320 for model.
Successfully read 122542320 to modelmem.
INFO: Glob Size is 122257600 bytes.
INFO: Added linear block of size 173408256
INFO: Added linear block of size 173408256
INFO: Added linear block of size 28901376
INFO: Added linear block of size 28901376
INFO: Added linear block of size 14450688
INFO: Added linear block of size 7225344
INFO: Deserialize required 1790698 microseconds.
Created engine!
Successfully deserialized Engine from trt file
Binding: 0, type: 0
[Dim 3][Dim 504][Dim 896]
Binding: 1, type: 3
[Dim 1][Dim 504][Dim 896]
Successfully create binding buffer
================================================================================
Predicting image: ../train/tasks/segmentation/dataset/504-896/rgb/rgb-cropped/
terminate called after throwing an instance of 'std::out_of_range'
  what():  basic_string::substr: __pos (which is 140) > this->size() (which is 0)
Aborted (core dumped)

And the config

backbone:
  OS: 8
  bn_d: 0.01
  dropout: 0.01
  extra:
    darknet: darknet53
  name: darknet
  train: true
dataset:
  color_map:
    0:
    - 0
    - 0
    - 0
    1:
    - 255
    - 0
    - 0
    2:
    - 0
    - 0
    - 255
  img_means:
  - 0.4148446751432425
  - 0.5053385609354691
  - 0.45907718096515426
  img_prop:
    depth: 3
    height: 504
    width: 896
  img_stds:
  - 0.14722864867881855
  - 0.16334236156069035
  - 0.17758600209156641
  labels:
    0: ground
    1: carrot
    2: weed
  labels_w:
    0: 1.0
    1: 1.0
    2: 1.0
  location: dataset/leaf_moreweed/
  name: leaf_moreweed
  workers: 12
decoder:
  bn_d: 0.01
  dropout: 0.01
  extra:
    aspp_channels: 256
    last_channels: 32
    skip_os:
    - 4
    - 2
  name: aspp_residual
  train: true
head:
  dropout: 0.01
  name: segmentation
train:
  avg_N: 3
  batch_size: 2
  crop_prop:
    height: 480
    width: 480
  down_epochs: 100
  final_decay: 0.995
  loss: xentropy
  max_epochs: 100
  max_lr: 0.0001
  max_momentum: 0.95
  min_lr: 1.0e-05
  min_momentum: 0.9
  report_batch: 1
  report_epoch: 1
  save_imgs: false
  save_summary: true
  up_epochs: 0.5
  w_decay: 0.0001

best regards, olli

Error building pytorch in the base docker

When executing the pytorch build in the base Dockerfile, it fails and outputs the following:
FAILED: caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/VariableType_1.cpp.o /usr/bin/c++ -DAT_PARALLEL_OPENMP=1 -DCPUINFO_SUPPORTED_PLATFORM=1 -DFMT_HEADER_ONLY=1 -DFXDIV_USE_INLINE_ASSEMBLY=0 -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNNP_CONVOLUTION_ONLY=0 -DNNP_INFERENCE_ONLY=0 -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -D_FILE_OFFSET_BITS=64 -Dtorch_cpu_EXPORTS -Iaten/src -I../aten/src -I. -I../ -isystem third_party/gloo -isystem ../cmake/../third_party/gloo -isystem ../cmake/../third_party/googletest/googlemock/include -isystem ../cmake/../third_party/googletest/googletest/include -isystem ../third_party/protobuf/src -isystem ../third_party/gemmlowp -isystem ../third_party/neon2sse -isystem ../third_party/XNNPACK/include -I../cmake/../third_party/benchmark/include -isystem ../third_party -isystem ../cmake/../third_party/eigen -isystem /usr/include/python3.6m -isystem /usr/local/lib/python3.6/dist-packages/numpy/core/include -isystem ../cmake/../third_party/pybind11/include -isystem /usr/lib/x86_64-linux-gnu/openmpi/include/openmpi -isystem /usr/lib/x86_64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent -isystem /usr/lib/x86_64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent/include -isystem /usr/lib/x86_64-linux-gnu/openmpi/include -isystem ../cmake/../third_party/cub -Icaffe2/contrib/aten -I../third_party/onnx -Ithird_party/onnx -I../third_party/foxi -Ithird_party/foxi -isystem ../third_party/ideep/mkl-dnn/include -isystem ../third_party/ideep/include -I/usr/local/cuda/include -I../caffe2/../torch/csrc/api -I../caffe2/../torch/csrc/api/include -I../caffe2/aten/src/TH -Icaffe2/aten/src/TH -I../caffe2/../torch/../aten/src -Icaffe2/aten/src -Icaffe2/../aten/src -Icaffe2/../aten/src/ATen -I../caffe2/../torch/csrc -I../caffe2/../torch/../third_party/miniz-2.0.8 -I../aten/src/TH -I../aten/../third_party/catch/single_include -I../aten/src/ATen/.. -Icaffe2/aten/src/ATen -I../third_party/miniz-2.0.8 -I../caffe2/core/nomnigraph/include -isystem include -I../third_party/FXdiv/include -I../c10/.. -Ithird_party/ideep/mkl-dnn/include -I../third_party/ideep/mkl-dnn/src/../include -I../third_party/cpuinfo/include -I../third_party/QNNPACK/include -I../third_party/pthreadpool/include -I../aten/src/ATen/native/quantized/cpu/qnnpack/include -I../aten/src/ATen/native/quantized/cpu/qnnpack/src -I../third_party/cpuinfo/deps/clog/include -I../third_party/NNPACK/include -I../third_party/fbgemm/include -I../third_party/fbgemm -I../third_party/fbgemm/third_party/asmjit/src -I../third_party/FP16/include -I../third_party/tensorpipe -I../third_party/fmt/include -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_INTERNAL_THREADPOOL_IMPL -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -DHAVE_AVX_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -fPIC -DCAFFE2_USE_GLOO -DCUDA_HAS_FP16=1 -DHAVE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD -Wall -Wextra -Wno-unused-parameter -Wno-missing-field-initializers -Wno-write-strings -Wno-unknown-pragmas -Wno-missing-braces -Wno-maybe-uninitialized -fvisibility=hidden -O2 -DCAFFE2_BUILD_MAIN_LIB -pthread -DASMJIT_STATIC -std=gnu++14 -MD -MT caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/VariableType_1.cpp.o -MF caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/VariableType_1.cpp.o.d -o caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/VariableType_1.cpp.o -c ../torch/csrc/autograd/generated/VariableType_1.cpp c++: internal compiler error: Killed (program cc1plus) Please submit a full bug report, with preprocessed source if appropriate. See <file:///usr/share/doc/gcc-7/README.Bugs> for instructions. [3565/4925] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/VariableType_4.cpp.o FAILED: caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/VariableType_4.cpp.o /usr/bin/c++ -DAT_PARALLEL_OPENMP=1 -DCPUINFO_SUPPORTED_PLATFORM=1 -DFMT_HEADER_ONLY=1 -DFXDIV_USE_INLINE_ASSEMBLY=0 -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNNP_CONVOLUTION_ONLY=0 -DNNP_INFERENCE_ONLY=0 -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -D_FILE_OFFSET_BITS=64 -Dtorch_cpu_EXPORTS -Iaten/src -I../aten/src -I. -I../ -isystem third_party/gloo -isystem ../cmake/../third_party/gloo -isystem ../cmake/../third_party/googletest/googlemock/include -isystem ../cmake/../third_party/googletest/googletest/include -isystem ../third_party/protobuf/src -isystem ../third_party/gemmlowp -isystem ../third_party/neon2sse -isystem ../third_party/XNNPACK/include -I../cmake/../third_party/benchmark/include -isystem ../third_party -isystem ../cmake/../third_party/eigen -isystem /usr/include/python3.6m -isystem /usr/local/lib/python3.6/dist-packages/numpy/core/include -isystem ../cmake/../third_party/pybind11/include -isystem /usr/lib/x86_64-linux-gnu/openmpi/include/openmpi -isystem /usr/lib/x86_64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent -isystem /usr/lib/x86_64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent/include -isystem /usr/lib/x86_64-linux-gnu/openmpi/include -isystem ../cmake/../third_party/cub -Icaffe2/contrib/aten -I../third_party/onnx -Ithird_party/onnx -I../third_party/foxi -Ithird_party/foxi -isystem ../third_party/ideep/mkl-dnn/include -isystem ../third_party/ideep/include -I/usr/local/cuda/include -I../caffe2/../torch/csrc/api -I../caffe2/../torch/csrc/api/include -I../caffe2/aten/src/TH -Icaffe2/aten/src/TH -I../caffe2/../torch/../aten/src -Icaffe2/aten/src -Icaffe2/../aten/src -Icaffe2/../aten/src/ATen -I../caffe2/../torch/csrc -I../caffe2/../torch/../third_party/miniz-2.0.8 -I../aten/src/TH -I../aten/../third_party/catch/single_include -I../aten/src/ATen/.. -Icaffe2/aten/src/ATen -I../third_party/miniz-2.0.8 -I../caffe2/core/nomnigraph/include -isystem include -I../third_party/FXdiv/include -I../c10/.. -Ithird_party/ideep/mkl-dnn/include -I../third_party/ideep/mkl-dnn/src/../include -I../third_party/cpuinfo/include -I../third_party/QNNPACK/include -I../third_party/pthreadpool/include -I../aten/src/ATen/native/quantized/cpu/qnnpack/include -I../aten/src/ATen/native/quantized/cpu/qnnpack/src -I../third_party/cpuinfo/deps/clog/include -I../third_party/NNPACK/include -I../third_party/fbgemm/include -I../third_party/fbgemm -I../third_party/fbgemm/third_party/asmjit/src -I../third_party/FP16/include -I../third_party/tensorpipe -I../third_party/fmt/include -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_INTERNAL_THREADPOOL_IMPL -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -DHAVE_AVX_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -fPIC -DCAFFE2_USE_GLOO -DCUDA_HAS_FP16=1 -DHAVE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD -Wall -Wextra -Wno-unused-parameter -Wno-missing-field-initializers -Wno-write-strings -Wno-unknown-pragmas -Wno-missing-braces -Wno-maybe-uninitialized -fvisibility=hidden -O2 -DCAFFE2_BUILD_MAIN_LIB -pthread -DASMJIT_STATIC -std=gnu++14 -MD -MT caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/VariableType_4.cpp.o -MF caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/VariableType_4.cpp.o.d -o caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/VariableType_4.cpp.o -c ../torch/csrc/autograd/generated/VariableType_4.cpp c++: internal compiler error: Killed (program cc1plus) Please submit a full bug report, with preprocessed source if appropriate. See <file:///usr/share/doc/gcc-7/README.Bugs> for instructions. [3566/4925] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/VariableType_2.cpp.o FAILED: caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/VariableType_2.cpp.o /usr/bin/c++ -DAT_PARALLEL_OPENMP=1 -DCPUINFO_SUPPORTED_PLATFORM=1 -DFMT_HEADER_ONLY=1 -DFXDIV_USE_INLINE_ASSEMBLY=0 -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNNP_CONVOLUTION_ONLY=0 -DNNP_INFERENCE_ONLY=0 -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -D_FILE_OFFSET_BITS=64 -Dtorch_cpu_EXPORTS -Iaten/src -I../aten/src -I. -I../ -isystem third_party/gloo -isystem ../cmake/../third_party/gloo -isystem ../cmake/../third_party/googletest/googlemock/include -isystem ../cmake/../third_party/googletest/googletest/include -isystem ../third_party/protobuf/src -isystem ../third_party/gemmlowp -isystem ../third_party/neon2sse -isystem ../third_party/XNNPACK/include -I../cmake/../third_party/benchmark/include -isystem ../third_party -isystem ../cmake/../third_party/eigen -isystem /usr/include/python3.6m -isystem /usr/local/lib/python3.6/dist-packages/numpy/core/include -isystem ../cmake/../third_party/pybind11/include -isystem /usr/lib/x86_64-linux-gnu/openmpi/include/openmpi -isystem /usr/lib/x86_64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent -isystem /usr/lib/x86_64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent/include -isystem /usr/lib/x86_64-linux-gnu/openmpi/include -isystem ../cmake/../third_party/cub -Icaffe2/contrib/aten -I../third_party/onnx -Ithird_party/onnx -I../third_party/foxi -Ithird_party/foxi -isystem ../third_party/ideep/mkl-dnn/include -isystem ../third_party/ideep/include -I/usr/local/cuda/include -I../caffe2/../torch/csrc/api -I../caffe2/../torch/csrc/api/include -I../caffe2/aten/src/TH -Icaffe2/aten/src/TH -I../caffe2/../torch/../aten/src -Icaffe2/aten/src -Icaffe2/../aten/src -Icaffe2/../aten/src/ATen -I../caffe2/../torch/csrc -I../caffe2/../torch/../third_party/miniz-2.0.8 -I../aten/src/TH -I../aten/../third_party/catch/single_include -I../aten/src/ATen/.. -Icaffe2/aten/src/ATen -I../third_party/miniz-2.0.8 -I../caffe2/core/nomnigraph/include -isystem include -I../third_party/FXdiv/include -I../c10/.. -Ithird_party/ideep/mkl-dnn/include -I../third_party/ideep/mkl-dnn/src/../include -I../third_party/cpuinfo/include -I../third_party/QNNPACK/include -I../third_party/pthreadpool/include -I../aten/src/ATen/native/quantized/cpu/qnnpack/include -I../aten/src/ATen/native/quantized/cpu/qnnpack/src -I../third_party/cpuinfo/deps/clog/include -I../third_party/NNPACK/include -I../third_party/fbgemm/include -I../third_party/fbgemm -I../third_party/fbgemm/third_party/asmjit/src -I../third_party/FP16/include -I../third_party/tensorpipe -I../third_party/fmt/include -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_INTERNAL_THREADPOOL_IMPL -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -DHAVE_AVX_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -fPIC -DCAFFE2_USE_GLOO -DCUDA_HAS_FP16=1 -DHAVE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD -Wall -Wextra -Wno-unused-parameter -Wno-missing-field-initializers -Wno-write-strings -Wno-unknown-pragmas -Wno-missing-braces -Wno-maybe-uninitialized -fvisibility=hidden -O2 -DCAFFE2_BUILD_MAIN_LIB -pthread -DASMJIT_STATIC -std=gnu++14 -MD -MT caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/VariableType_2.cpp.o -MF caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/VariableType_2.cpp.o.d -o caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/VariableType_2.cpp.o -c ../torch/csrc/autograd/generated/VariableType_2.cpp c++: internal compiler error: Killed (program cc1plus) Please submit a full bug report, with preprocessed source if appropriate.

The actual summary is this The command '/bin/sh -c cd pytorch && python3 setup.py install && cd ..' returned a non-zero code: 1

If someone ran into a similar issue, please let me know.

Thanks!

Worse accuracy with ONNX version (inferenced in C++ by OpenCV)

Hello,

I used your script "make_deploy_model.py" to create ONNX version of person segmentation models (for all three versions).
After I inferenced it using OpenCV:

cv::dnn::Net net = cv::dnn::readNetFromONNX("my_path/erf/model.onnx"); cv::Mat inpBlob = cv::dnn::blobFromImage(image, ...) //necessary blob image net.setInput(inpBlob); cv::Mat output = net.forward(); cv::Mat mask(H, W, CV_32F, output.ptr(0, 1));
and after some processes on obtained mask from inference otput based on your source code (necessary information like image_means for extract proper pixels I taked from cfg.yaml).

I would like to ask you if ONNX version can have worse accuracy or if I'm doing something wrong in postprocess step on obtained output from network? I was also thinking if maybe blob image step can cause some information loss.

Please let me know your opinion

sample results comparison:
resssss

int64 support for some operations not supported

I have installed all the pip packages in a venv, and when I pip list, everything matches up. I also installed pytorch from source. When I attempt to run

python3 train.py -c /tank/home/xury1/segmentation/bonnetal/train/tasks/segmentation/config/persons/ mobilenetv2_test.yaml --log /tank/home/xury1/segmentation/bonnetal/train/tasks/segmentation/log -p /dev/null

INTERFA

CE:
config yaml: /tank/home/xury1/segmentation/bonnetal/train/tasks/segmentation/config/persons/mobilenetv2_test.yaml
log dir /tank/home/xury1/segmentation/bonnetal/train/tasks/segmentation/log
model path /dev/null
eval only False
No batchnorm False

Commit hash (training version): b'5368eed'

Opening config file /tank/home/xury1/segmentation/bonnetal/train/tasks/segmentation/config/persons/mobilenetv2_test.yaml
model folder doesnt exist! Start with random weights...
Copying files to /tank/home/xury1/segmentation/bonnetal/train/tasks/segmentation/log for further reference.
Images from: /tank/home/xury1/segmentation_data/persons/roads_annotated/ds1/train/img
Labels from: /tank/home/xury1/segmentation_data/persons/roads_annotated/ds1/train/lbl
Inference batch size: 3
Images from: /tank/home/xury1/segmentation_data/persons/roads_annotated/ds1/valid/img
Labels from: /tank/home/xury1/segmentation_data/persons/roads_annotated/ds1/valid/lbl
Original OS: 32
New OS: 16.0
[Decoder] os: 8 in: 32 skip: 32 out: 32
[Decoder] os: 4 in: 32 skip: 24 out: 24
[Decoder] os: 2 in: 24 skip: 16 out: 16
[Decoder] os: 1 in: 16 skip: 3 out: 16
Using normalized weights as bias for head.

Couldn't load backbone, using random weights. Error: [Errno 20] Not a directory: '/dev/null/backbone'
Couldn't load decoder, using random weights. Error: [Errno 20] Not a directory: '/dev/null/segmentation_decoder'
Couldn't load head, using random weights. Error: [Errno 20] Not a directory: '/dev/null/segmentation_head'
Total number of parameters: 2154794
Total number of parameters requires_grad: 2154794
Param encoder 1812800
Param decoder 341960
Param head 34
Training in device: cuda
/tank/home/xury1/segmentation/bonnetal/train/tasks/segmentation/bonnetal/lib/python3.5/site-packages/torch/optim/lr_scheduler.py:100: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule.See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
[IOU EVAL] IGNORE: tensor([], dtype=torch.int64)
[IOU EVAL] INCLUDE: tensor([0, 1])
Traceback (most recent call last):
File "train.py", line 118, in
trainer.train()
File "../../tasks/segmentation/modules/trainer.py", line 302, in train
scheduler=self.scheduler)
File "../../tasks/segmentation/modules/trainer.py", line 494, in train_epoch
evaluator.addBatch(output.argmax(dim=1), target)
File "../../tasks/segmentation/modules/ioueval.py", line 42, in addBatch
tuple(idxs), self.ones, accumulate=True)
RuntimeError: "embedding_backward" not implemented for 'Long'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.