qualcomm-ai-research / inverseform Goto Github PK

License: BSD 3-Clause Clear License

Dockerfile 0.36% Python 99.64%

inverseform's Introduction

InverseForm

This repository contains a version of the InverseForm module.

Shubhankar Borse, Ying Wang, Yizhe Zhang, Fatih Porikli, "InverseForm: A Loss Function for Structured Boundary-Aware Segmentation ", CVPR 2021.[arxiv]

Qualcomm AI Research (Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc)

Reference

If you find our work useful for your research, please cite:

@inproceedings{borse2021inverseform,
  title={InverseForm: A Loss Function for Structured Boundary-Aware Segmentation},
  author={Borse, Shubhankar and Wang, Ying and Zhang, Yizhe and Porikli, Fatih
},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2021}
}

Method

InverseForm is a novel boundary-aware loss term for semantic segmentation, which efficiently learns the degree of parametric transformations between estimated and target boundaries.

This plug-in loss term complements the cross-entropy loss in capturing boundary transformations and allows consistent and significant performance improvement on segmentation backbone models without increasing their size and computational complexity.

Here is an example demo from our state-of-the-art model trained on the Cityscapes benchmark.

This repository contains the implementation of InverseForm module presented in the paper. It can also run inference on Cityscapes validation set with models trained using the InverseForm framework. The same models can be validated by removing the InverseForm framework such that no additional compute is added during inference. Here are some of the models over which you can run inference with and without the InverseForm block (right-most column of the table below):

Model	mIoU (trained w/o InverseForm)	mIoU (trained w/ InverseForm)	Checkpoint
HRNet-18	77.0%	77.6%	hrnet18_IF_checkpoint.pth
HRNet-16-Slim	76.1%	77.8%	hr16s_4k_slim.pth
OCRNet-48	86.0%	86.3%	hrnet48_OCR_IF_checkpoint.pth
OCRNet-48-HMS	86.7%	87.0%	hrnet48_OCR_HMS_IF_checkpoint.pth

Setup environment

Code has been tested with pytorch 1.3 and NVIDIA Apex. The Dockerfile is available under docker/ folder.

Cityscapes path

utils/config.py has the dataset/directory information. Please update CITYSCAPES_DIR as the preferred Cityscapes directory. You can download this dataset from https://www.cityscapes-dataset.com/.

Inference on cityscapes

To run inference, this directory path needs to be added to your pythonpath. Here is the command for this:

export PYTHONPATH="${PYTHONPATH}:/path/to/this/dir"

Here are code snippets to run inference on the models shown above. These examples show usage with 8 GPUs. You could run the inference command with 1/2/4 GPUs by updating the nproc_per_node argument.

Our pretrained InverseForm module can be downloaded from here and should be placed inside the directory checkpoints/. See usage below.

distance_measures_regressor.pth

HRNet-18-IF

python -m torch.distributed.launch --nproc_per_node=8 experiment/validation.py --output_dir "/path/to/output/dir" --model_path "checkpoints/hrnet18_IF_checkpoint.pth" --has_edge True

HRNet-16-Slim-IF

python -m torch.distributed.launch --nproc_per_node=8 experiment/validation.py --output_dir "/path/to/output/dir" --model_path "checkpoints/hr16s_4k_slim.pth" --hrnet_base "16" --arch "lighthrnet.HRNet16" --has_edge True

OCRNet-48-IF

python -m torch.distributed.launch --nproc_per_node=8 experiment/validation.py --output_dir "/path/to/output/dir" --model_path checkpoints/hrnet48_OCR_IF_checkpoint.pth --arch "ocrnet.HRNet" --hrnet_base "48" --has_edge True

HMS-OCRNet-48-IF

python -m torch.distributed.launch --nproc_per_node=8 experiment/validation.py --output_dir "/path/to/output/dir" --model_path checkpoints/hrnet48_OCR_HMS_IF_checkpoint.pth --arch "ocrnet.HRNet_Mscale" --hrnet_base "48" --has_edge True

To remove the InverseForm operation during inference, simply run without the has_edge flag. You will notice no drop in performance as compared to running with the operation.

Acknowledgements:

This repository shares code with the following repositories:

Hierarchical Multi-Scale Attention(HMS): https://github.com/NVIDIA/semantic-segmentation
HRNet-OCR: https://github.com/HRNet/HRNet-Semantic-Segmentation

We would like to acknowledge the researchers who made these repositories open-source.

inverseform's People

Contributors

Stargazers

Watchers

inverseform's Issues

I have trained stn with resnet using the ImageNet , then how to train inverseForm net

Hi! I have read your awesome paper and the appendices, and I have a few questions:

Which dataset is used to train the stn ? ImageNet or mnist. if I use ImageNet to train stn, how many times should I down sample to get the feature before flatten the feature, and how about the learning rate should I choose in stn, is it the same with classification net?

2.if I have trained stn with resnet using the ImageNet , the how to train inverseForm net, In my opinion it looks like this:
a) first get the rgb input image from ImageNet, then get the gray edges using sobel filter.
b) Freeze the stn, Using rgb input image to get the theta and the affine matrix, and use gray edges to get the affine edges by the affine matrix.
c) Get the theta hat from the inverseForm Net by using the gray edges and the affine gray edges. Then get the loss from theta and theta hat
Is that correct?
3. Does it matter if I use different net(resnet or mobilenet) to train the stn net?
4. Whats's the loss function between the theta from stn and the theta hat from inverseForm, L2? or L1
Looking forward to your reply. Thank you!

Design question

In your paper you describe InverseForm is used to compare b_pred with b_gt. The latter is obtained from running a Sobel filter on the ground truth segmentation . Is there any particular reason we don't do the same to obtainb_pred? Meaning, we could have just taken the predicted segmentation map, ran a sobel operation on that, and fed that as b_pred.

Put formally in your notation:

b_gt = sobel(y_gt)
So why not also
b_pred = sobel(y_pred)

why do I achived 84.8 miou on test data with the pretrained model

why I got 84.8 miou on test data with the pretrained model？
but the paper got 85.6

Why are two models saved at the end of the inference?

Why are two models saved at the end of the inference? Are these two models (last and best ) the same as the "hrnet48_OCR_IF_checkpoint.pth " ?

how to visualize the result?

thank you for your great work
but could you please tell me how to visualize the result and save them?
thank you.

Question about the code

Hi, I have a question about the loss code. In the function of ImageBasedCrossEntropyLoss2d, the result of nll_loss.weight is no difference no matter the batch_weights is setted for True or False. Is this the original purpose that the Ith target(target[i]) should be as the input to the weights calculate when the batch_weights is setted for False？

Questions about the test dataset

Hello! I am a novice and I have 2 questions. I would very appreciate it if you could answer.

1.Tere are many datas in https://www.cityscapes-dataset.com/downloads/, witch shoud I download.

When trainning: gts = inputs['gts'] edge_gts = inputs['edge']
gts --> label data ? edge --> what data is it?

how can I use IF in my network?

I feel confused to integrate IF module in my network

Two confusions about the code

Hi，I have two confusions about the code:

The InverseNet should predict 8 or 6 values according to Section 3.3 in your paper, but the InverseNet in your code only ouputs 4 values. Why? What do these 4 elements represent?
Is the calculation of InverseForm loss in your code based on Euclidean distance or Geodesic distance?

A question about InverseTransform2D

Hi,
In your code, the class InverseTransform2D calculates the loss of boundary distance. You use "inputs = torch.ge(inputs, 0.5).float()" to get boundary map. But it seems that the backward is interrupted. Please make sure that this item of loss can be backward. Cause I think the required_grad attribute of mean_square_inverse_loss is False.

Usage in TensorFlow

How can I use this loss function in TensorFlow? Which portions need to be recoded?

No such file or directory: 'checkpoints/distance_measures_regressor.pth'

When I inference on cityscape

Confuse about predict scales and shifts and the loss

Hi：
In another issue, you said the inverseForm outputs 4 values which stand for scales and shifts, so if I want to minimize the inverseForm loss, the scales should be close to 1 and shifts should be chose to 0.But in your code, the loss is (((distance_coeffs*distance_coeffs).sum(dim=1))**0.5).mean(), which will push all of 4 values to 0, why ?
Looking forward to your reply. Thank you!

Support of the full training code

Hello
How are you?
Thanks for contributing to this project.
Could u share the full training code or code snip?
Thanks

Training the Inverseform Net on custom dataset

Hi! Based on your paper, I understand that you have retrained the IFLoss separately for each dataset (as specified in your supplementary materials). I am presently using a custom aerial imagery dataset for my experiments and would like to follow the same protocol of retraining IFLoss on this dataset. I hope you can offer some insights on how I could go about doing this. Even better, I would appreciate it if you could share the training script, which would save a great deal of time for me.

Also, I'm also curious as to why you have retrained the IFLoss for each dataset separately. Since the Inverseform network only used the GT seg masks to train, are the structure and shape of the GT seg masks so drastically different between the various datasets that IFLoss requires retraining on each dataset separately?

Looking forward to your reply. Thank you!

How can I use this work for cityscapes segmantation on Kitti dataset?

Thanks for your great work! And since you provide excellent performance on cityscapes benchmark, I just wonder whether I can use your pretrained checkpoints for semantic segmantation on Kitti dataset. If it supports, I'd greatly appreciate it for your generous advice for how to adjust your dataset config and so on.

where is the function named get_aspp in models/basic.py?

Hello！
I want to know where is the function named get_aspp in models/basic.py?
I can not find it in models/utils.py.

Not able to Run given validation/test code

While using the given command in README
"python -m torch.distributed.launch --nproc_per_node=8 experiment/validation.py --output_dir "/path/to/output/dir" --model_path checkpoints/hrnet48_OCR_IF_checkpoint.pth --arch "ocrnet.HRNet" --hrnet_base "48" --has_edge True"

getting below error
usage: validation.py [-h] [--tag TAG] [--no_run] [--interactive] [--no_cooldir] [--farm FARM] exp_yml

** All cmd line arguments are given as "unrecognized arguments"

Edge Head Weights

Hello,

I wanted to ask if you would be willing to release the weights of the edge-head that is added to the default semantic segmentation architecture.

Where can I find the appendix materials?

Dear authors,

Thanks for your great work! I am interested in how you conducted your tiling operations, and select your hyperparameters for training the net. You mentioned discussions about those are in the appendix. Could you kindly remind me where can I find the appendix materials for this paper? Thanks!

question about visualization

Hi, from your code I guess the visualization code is :

        prediction = assets['predictions']
        prediction_cpu = np.array(prediction)
        label_out = np.zeros_like(prediction)
        submit_fn = '{}.png'.format(img_name)
        for label_id, train_id in   cfg.DATASET.id_to_trainid.items():
            label_out[np.where(prediction_cpu == train_id)] = label_id
        cv2.imwrite(os.path.join(self.save_dir, submit_fn), label_out)

but when I execute it, there is no picture saved, and the saving path is right, could you help me? Thanks

How to implement the Homemorphic transformation？

we can use F.affine_grid and F.grid_sample complete grid generator， How to implement the Homemorphic transformation？thanks