vinairesearch / magnet Goto Github PK

View Code? Open in Web Editor NEW

113.0 7.0 9.0 12.24 MB

Progressive Semantic Segmentation (CVPR-2021)

License: GNU Affero General Public License v3.0

Shell 2.86% Python 97.14%

segmentation memory-efficiency python3 high-resolution-image

magnet's People

Stargazers

Watchers

Forkers

cv-ip kazusaw1999 zhangsongdmk leo-hao chujie-zhang phuclb1 tengfei-ma peterzs markoharalovic

magnet's Issues

Some problems about Gleason dataset

Hi, thanks a lot for sharing your interesting and great work. I downloaded the Gleason dataset, but I have some problems.
1.Could you tell me how to vote for the final labels for your work? Majority vote?
2.The labels seem to contain more than fours classes(013456). Could you tell me how to map it to four classes(benign, Grade 3, Grade 4, and Grade 5)? Only calculate metrics of class 1,3,4 and 5?
3.Could you provide its training and testing filenames?

Can this code run on windows?

RuntimeError: shape '[1, 1, -1, 508, 508]' is invalid for input of size 16451136

Questions about Binary Semantic Segmentation

Hi, thank you for sharing your code, it works well on the public datasets.
Now I want to train and test this network with my own dataset, which is a binary semantic segmentation task.
So I changed the Class_Num as 2, but there are some problems I cannot solve:
While I'm training backbone, there comes a RuntimeError as following:

File "train.py", line 331, in
main()
File "train.py", line 297, in main
writer_dict,
File "/home/yuming/Documents/MagNet-main/backbone/lib/core/function.py", line 49, in train
losses, _ = model(images, labels)
File "/opt/conda_envs/yuming/envs/hu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda_envs/yuming/envs/hu/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
return self.module(*inputs[0], **kwargs[0])
File "/opt/conda_envs/yuming/envs/hu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/yuming/Documents/MagNet-main/backbone/lib/utils/utils.py", line 34, in forward
loss = self.loss(outputs, labels)
File "/opt/conda_envs/yuming/envs/hu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/yuming/Documents/MagNet-main/backbone/lib/core/criterion.py", line 37, in forward
return sum([w * self._forward(x, target) for (w, x) in zip(weights, score)])
File "/home/yuming/Documents/MagNet-main/backbone/lib/core/criterion.py", line 37, in
return sum([w * self._forward(x, target) for (w, x) in zip(weights, score)])
File "/home/yuming/Documents/MagNet-main/backbone/lib/core/criterion.py", line 25, in _forward
loss = self.criterion(score.contiguous(), target)
File "/opt/conda_envs/yuming/envs/hu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda_envs/yuming/envs/hu/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 1121, in forward
ignore_index=self.ignore_index, reduction=self.reduction)
File "/opt/conda_envs/yuming/envs/hu/lib/python3.7/site-packages/torch/nn/functional.py", line 2824, in cross_entropy
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: weight tensor should be defined either for all or no classes at /pytorch/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu:27

While I'm training MagNet, there comes a IndexError as following:
File "train.py", line 193, in
main()
File "train.py", line 38, in main
dataset = get_dataset_with_name(opt.dataset)(opt)
File "/home/yuming/Documents/MagNet-main/magnet/dataset/cityscapes.py", line 11, in init
super().init(opt)
File "/home/yuming/Documents/MagNet-main/magnet/dataset/base.py", line 44, in init
self.data += [self.parse_info(line)]
File "/home/yuming/Documents/MagNet-main/magnet/dataset/base.py", line 84, in parse_info
info["label"] = os.path.join(self.root, tokens[1])
IndexError: list index out of range

I've tried a lot, but the situation has not improved.
I would appreciate it if you could help me!

Some questions about the training process

How to train segmentation model and refinement model ?
I try to retrain MagNet with Deepglobe dataset. But I noticed that no example is provided in the readme.md to train MagNet without pretrained parameters of backbone. In train.py, the segmentation model is set to eval mode, and the parameters of segmentation model are not updated during training.
For this reason, I changed model.eval() to model.train() on the line 46 of train.py. But the IOU fluctuates up and down during training, with only tiny increase after 100 epochs of training.
Therefore, I would like to know how to train segmentation model and refinement model. Are the two models trained respectively?

the epoch_IoU of retrained refinement network can only up to 0.35 on deepglobe dataset

I tried to retrain the segmentation backbone and refinement network following the guideline in readme https://github.com/VinAIResearch/MagNet#training-backbone-networks.
The best_mIoU of retrained backbone fpn is 0.6363 , this result is close to the baseline IoU 0.6722 shown in readme.

In this sense, the performance of retrained refinement network with retrained backbone should be close to the performance with pretrained backbone.
In the retraining of refinement network, the change of epoch_IoU with pretrained backbone was like following image,

the change of epoch_IoU with retrained backbone was like following image.

With the retrained backbone, the epoch_IoU can only up to 0.35.
I tried to find the difference between pretrained backbone and retrained backbone.
I separated the validate part from backbone/train.py to evaluate the performance of pretrained backbone. https://github.com/DwRolin/temp_code/blob/main/eval_pretrain.py
What's strange is that the MeanIU of pretrained backbone is only 0.07.
I would like to know what causes this contradiction and how to make the retrained refinement network work well.

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

i run the scripts offered in the part "To test with a Deepglobe image", using the python demo.py ......, then i got the followling error:

"RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)"
could you please help me with that please?
Here's my running scripts, i add the --sub_batch_size part:
python demo.py
--dataset deepglobe
--image data/639004_sat.jpg
--scales 612-612,1224-1224,2448-2448
--crop_size 612 612
--input_size 508 508
--model fpn
--pretrained checkpoints/deepglobe_fpn.pth
--pretrained_refinement checkpoints/deepglobe_refinement.pth
--num_classes 7
--n_points 0.75
--n_patches -1
--smooth_kernel 11
--save_pred
--save_dir test_results/demo
--sub_batch_size 1

Why the input_size of backbone is set to the number of 508×508 on the DeepGlobe dataset experiment?

On the Cityscapes dataset experiment, the input size of the backbone network is set to 256×128, and the original image size is 2048×1024, and the input samples are obtained by integer downsampling. However, on the deepglobe dataset experiment, the input size was set to 508×508 and the original image size was 2448×2448, so the input samples were not obtained by integer downsampling.

In the refinement stage, the coarse segmentation map will be interpolated to the original size. Will the non-integer interpolation result in loss of object edge details and is the 508×508 input size necessary?

For Medical Image segmentation

Hi, thanks for your wonderful work! I want to ask whether MagNet is suitable for medical image segmentation? thanks!

a small question about Deepglobe dataset

Hi,
First of all, thank you for your excellent work. However, I have a small question about Deepglobe dataset.

As far as I know, the original validation and test sets do not provide labels. Are all the val and test sets in the current data split from the original training set?
According to your and GLNet's split file (.txt), the total number of train samples is 454 rather than 455. Why do you and your previous papers on related work explain that 455？

Thank you for your attention!

the inputs of the "refinement model" are different between train.py and test.py

Hello @hmchuong,
Thank you for your source code. I read your code and recognize that the inputs of the "refinement model" between train.py and test.py are different. To more detail, the 2 inputs of the "refinement model" in train.py (crop_preds, fine_pred) are derived from the backbone model while the 2 inputs of the "refinement model" in test.py (scale_early_preds, coarse_preds) are derived from the backbone model and the refinement model of the previous stage, respectively.

Can you explain why the inputs of the "refinement model" are different between train.py and test.py? Thank you very much!

In train.py

coarse_pred = model(coarse_image).softmax(1) # "model" is backbone model
fine_pred = model(fine_image).softmax(1)
crop_preds = roi_align(coarse_pred, coords, output_size=(opt.input_size[1], opt.input_size[0]))
logits = refinement_model(crop_preds, fine_pred) # --> "crop_preds" and "fine_pred" derived from the backbone model

#--------------------

In test.py

scale_early_preds = get_batch_predictions(model, sub_batch_size, scale_image_patches.to(device))
coarse_preds = roi_align(final_output, [coords[selected_patch_ids]], output_size=(opt.input_size[1], opt.input_size[0])) # "final_output" derived from the refinement model --> "coarse_preds" derived from the refinement model
.
.
fine_pred = get_batch_predictions(
refinement_models[min(len(refinement_models), idx) - 1],
sub_batch_size,
scale_early_preds, # --> "scale_early_preds" from backbone model
coarse_preds, #---> "coarse_preds" from the refinement model of the previous stage
)
.
.
final_output = (
final_output.reshape(1, opt.num_classes, scale[0] * scale[1])
.scatter_(2, error_point_indices, fine_pred) # "fine_pred" derived from the refinement model
.view(1, opt.num_classes, scale[1], scale[0])
)

error about get_gaussian_kernel2d

When I run the demo bash script, I run the script by you, but have an error as follows:

aceback (most recent call last):
  File "demo.py", line 12, in <module>
    from magnet.utils.blur import MedianBlur
  File "/MagNet/magnet/utils/blur.py", line 125
    kernel: torch.Tensor = get_gaussian_kernel2d(kernel_size, sigma).repeat(chan
nel, 1, 1, 1)
          ^
SyntaxError: invalid syntax```

some details about the results of experiment

Thank you for sharing your work.
I'm confused that the result of FPN reported in your experiment part(table8, the results on the DeepGlobe dataset). I used your test collection and model parameters, but only got 62.86, less than 67.86.
Did you use any data augmentation for testing other networks ?
Could you pls give some more details about that? thanks.

prepare_cityscapes.sh

no prepare_cityscapes.sh & prepare_deepglobe.sh

About the Gleason dataset

My problem is same as the link: ##11

Could you provide the filenames of train and test, and share me the label you used?

Thank you so much!

How to apply train.py trained parameters to test.py？

Hello! How to apply the parameters of the refinement module obtained after running the train.py function to test.py, I see that you have given three scales of refinement module parameters in the test of the citydataset.

Patches and refined locations

Hi!
If we are using 256x128 patches and we refine 32768 locations in them. Doesn't this mean that we are using only the output from the refinement network by overwriting all the pixels predicted by the backbone. Am I missing something? Doesn't locations mean pixels? Thank you in advance.

How to set parameter 'sub_batch_size'?

Hi, thanks for your contribution with code. I found that the parameter 'sub_batch_size' is added in 'magnet/options/test.py', but in README.md, the instruction of demo do not given that. How could I set it correctly in this model?

# of required GPUs to reproduce Best outputs

Hello !
Thanks for your great contribution in this field.

I'm setting up to follow your work (MagNet) and wonder how many GPUs are required to implement your codes?
In details, I want to work on DeepGlobe Dataset first with the following running codes.
Please tell me the number and the memory size of GPUs you used in this experiments!

Best regards,
Yooseung

========================================================================
python train.py --dataset deepglobe
--root data/deepglobe
--datalist data/list/deepglobe/train.txt
--scales 612-612,1224-1224,2448-2448
--crop_size 612 612
--input_size 508 508
--num_workers 8
--model fpn
--pretrained checkpoints/deepglobe_fpn.pth
--num_classes 7
--batch_size 8
--task_name deepglobe_refinement
--lr 0.001

or in short, run the script below
sh scripts/deepglobe/train_magnet.sh

input to the model

Does your model have to require a picture with a size of 1024*2048?

About the result of deepglobe dataset

hi！
For the deepglobe dataset, I have some questions about the results of the replication.
This is the result of my own training backbone network and refined modules：

I have done many experiments and still can not reproduce the effect of the original paper

I want to know what's wrong and there are two groups of Coarse iou and Refinement iou. What do they represent respectively？
I hope to get your answer.Thank you very much!!!

missing file "hrnet_ocr_w18_train_256x128_sgd_lr1e-2_wd5e-4_bs_12_epoch484.yaml"

Hi @hmchuong ,
I try to train backbone networks with mentioned instruction:

In ./backbone

python train.py --cfg experiments/cityscapes/hrnet_ocr_w18_train_256x128_sgd_lr1e-2_wd5e-4_bs_12_epoch484.yaml

Can you upload file "hrnet_ocr_w18_train_256x128_sgd_lr1e-2_wd5e-4_bs_12_epoch484.yaml" ? or any way as long as I can run above command line (I don't see arg: --cfg in code)
Thank you.

demo.py: error: --sub_batch_size

usage: demo.py [-h] --dataset DATASET [--root ROOT] [--datalist DATALIST] --scales SCALES --crop_size N
[N ...] --input_size N [N ...] [--num_workers NUM_WORKERS] --model MODEL --num_classes
NUM_CLASSES --pretrained PRETRAINED
[--pretrained_refinement PRETRAINED_REFINEMENT [PRETRAINED_REFINEMENT ...]] [--image IMAGE]
--sub_batch_size SUB_BATCH_SIZE [--n_patches N_PATCHES] --n_points N_POINTS
[--smooth_kernel SMOOTH_KERNEL] [--save_pred] [--save_dir SAVE_DIR]
demo.py: error: --sub_batch_size

RuntimeError: shape '[1, 1, -1, 508, 508]' is invalid for input of size 16451136

AttributeError in test_magnet.sh Script

I encountered an AttributeError while running the test_magnet.sh script in the MagNet project. The issues seem to come from deprecated use of PyTorch and NumPy functions.

Steps to Reproduce:

Run the test_magnet.sh script.
Observe output:
/MagNet/magnet/utils/metrics.py:22: FutureWarning: In the future np.bool will be defined as the corresponding NumPy scalar.
AttributeError: module 'numpy' has no attribute 'bool'.

Proposed Solution
Replace np.bool with bool in metrics.py. Suggested change in line 22:
k = (x >= 0) & (y < n) & (x != ignore_label) & (mask.astype(bool))

NumPy version: 1.26.4, PyTorch version: 1.12.1

How to train without using pretrained weight weights?

Hi, thanks for your contribution with code. How to train without using pretrained weight?
Thank you!!
I love you!

RuntimeError: CUDA error: out of memory

Traceback (most recent call last):
File "train.py", line 331, in
main()
File "train.py", line 120, in main
scale_factor=config.TRAIN.SCALE_FACTOR,
File "/home/cv428/Students/LH/MagNet-main/backbone/lib/datasets/cityscapes.py", line 118, in init
1.0507,
RuntimeError: CUDA error: out of memory

i used 3080 12GB.
i set : BASE_SIZE: 8
BATCH_SIZE_PER_GPU: 1
SCALE_FACTOR: 1
but still out of memory
Help me！！！！！！！

Training details on methods in Table 4

Hi,

Thanks for your interesting work.

I'm curious about some details regarding comparison methods in table 4 from your paper. Are all methods compared in the table trained and tested on 256x128 images as you mentioned in sec. 4.2? Could you provide more details on how you trained your model compared to the baseline "downsample" and "patching" methods?

Minor side note, from your backbone training config it seems that you are using HRNetv2-W18s rather than HRNetv2-W18. Am I missing something?

Thanks again.