kidswithtokens / medsegdiff Goto Github PK
View Code? Open in Web Editor NEWMedical Image Segmentation with Diffusion Model
License: MIT License
Medical Image Segmentation with Diffusion Model
License: MIT License
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument find_unused_parameters=True
to torch.nn.parallel.DistributedDataParallel
, and by
making sure all forward
function outputs participate in calculating loss.
I would like to ask which part of this part is to be intercepted from the path as the ID. Maybe my data is different from yours, and the error is reported after the code runs here.
elif args.data_name == 'BRATS': # slice_ID=path[0].split("_")[2] + "_" + path[0].split("_")[4] slice_ID=path[0].split("_")[-3] + "_" + path[0].split("slice")[-1].split('.nii')[0]
(1) mse_diff here I understand is to predict the noise, target (noisy added) shape=[b,1,h,w], but model_output shape=[b,2,h,w], last issue you answer here two channels represent the mean and variance, can you explain the significance of them doing mse?
(2) loss_cal where target is the segmentation GT, does that model cal output represent the predicted segmentation result? Can the cal output be used directly to represent the segmentation accuracy of the model in the inference stage?
(3)Can you explain the meaning of sample, x_noisy, org, cal, cal_out respectively?
this error will occur
Original Traceback (most recent call last):
File "/data1/ppw/anaconda3/envs/PyTorch/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/data1/ppw/anaconda3/envs/PyTorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/data1/ppw/works/MedSegDiff/guided_diffusion/unet.py", line 775, in forward
uemb, cal = self.highway_forward(c, [hs[3],hs[6],hs[9],hs[12]])
File "/data1/ppw/works/MedSegDiff/guided_diffusion/unet.py", line 744, in highway_forward
return self.hwm(x,hs)
File "/data1/ppw/anaconda3/envs/PyTorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/data1/ppw/works/MedSegDiff/guided_diffusion/unet.py", line 2152, in forward
h = self.ffparserd
File "/data1/ppw/anaconda3/envs/PyTorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/data1/ppw/works/MedSegDiff/guided_diffusion/unet.py", line 479, in forward
x = x * weight
RuntimeError: The size of tensor a (129) must match the size of tensor b (65) at non-singleton dimension 3
Thanks for the great work, I have a problem. there are four parts of my dataset ------train_images, train_mask, test_images, test_mask. There are no jason documents or csv document. should I create a jason document in coco form for my dataset or just use the mask images. thank you!!
Hi Wu, I am a beginner to the medical imaging processing. Could you share the DDTI dataset and example cases? Thanks a lot.
I encountered the following problems when training with BRATS dataset!Can you help me?Thanks!
File "D:\jace\pythonProject\MedSegDiff-master\MedSegDiff-master\guided_diffusion\train_util.py", line 83, in init
self._load_and_sync_parameters()
File "D:\jace\pythonProject\MedSegDiff-master\MedSegDiff-master\guided_diffusion\train_util.py", line 139, in _load_and_sync_parameters
dist_util.sync_params(self.model.parameters())
File "D:\jace\pythonProject\MedSegDiff-master\MedSegDiff-master\guided_diffusion\dist_util.py", line 76, in sync_params
dist.broadcast(p, 0)
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
I run the segmentation_sample.py, and meet the problem:
Logging to /root/autodl-tmp/MedSegDif/med_results/img_out/
creating model and diffusion...
sampling...
no dpm-solver
/root/miniconda3/lib/python3.8/site-packages/torch/nn/functional.py:1709: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
Traceback (most recent call last):
File "MedSegDif/med_scripts/segmentation_sample.py", line 163, in
main()
File "MedSegDif/med_scripts/segmentation_sample.py", line 109, in main
sample, x_noisy, org, cal, cal_out = sample_fn(
File "/root/autodl-tmp/./MedSegDif/med_guided_diffusion/gaussian_diffusion.py", line 553, in p_sample_loop_known
for sample in self.p_sample_loop_progressive(
File "/root/autodl-tmp/./MedSegDif/med_guided_diffusion/gaussian_diffusion.py", line 624, in p_sample_loop_progressive
out = self.p_sample(
File "/root/autodl-tmp/./MedSegDif/med_guided_diffusion/gaussian_diffusion.py", line 435, in p_sample
out = self.p_mean_variance(
File "/root/autodl-tmp/./MedSegDif/med_guided_diffusion/respace.py", line 90, in p_mean_variance
return super().p_mean_variance(self._wrap_model(model), *args, **kwargs)
File "/root/autodl-tmp/./MedSegDif/med_guided_diffusion/gaussian_diffusion.py", line 319, in p_mean_variance
model_mean, _, _ = self.q_posterior_mean_variance(
File "/root/autodl-tmp/./MedSegDif/med_guided_diffusion/gaussian_diffusion.py", line 219, in q_posterior_mean_variance
assert x_start.shape == x_t.shape
AssertionError
I know that it is because x_start.shape is not equal to x_t.shape. However, My dataset is similar to ISICDataset, so I feel very strange.
Thanks a lot if you can reply.
Hello,
I'm currently training this model on my own dataset, I have created a separate dataloader python file, the file and folder structure of the dataset is the same as ISIC. No other code other than segmentation_train.py and segmentation_sample.py was changed just to load the data. The model is trained for 30000 steps so far. But when I tried to use the segmentation_sample.py for the test images, I am getting these masks.
Are these mask outputs normal for this model?
python 3.8.16
torch 1.13.1
torchvision 0.14.1
torchsummary 1.5.1
opencv 4.7.0.68
scikit-image 0.19.3
Thank you for you great job!
Can I leverage these parameters "--diffusion_steps 50 --dpm_solver True " in training process?
请问**--diffusion_steps 50 --dpm_solver True**的参数设置可以被用于training过程中吗?
还是他们只能被用于sampling过程?
Hi! Thanks for your excellent work. I successfully trained a MedSegDiff-B model on my dataset but have trouble sampling.
Specifically, while using DPM-Solver to sample, the memory usage of GPU improves with the 'num_ensemble' parameter. In every ensemble model(1/5), the GPU memory improves around 2GB and finally collapses with the "CUDA out of memory" error.
This problem allows me to sample only one image before the inference process collapses. Is this a normal phenomenon? If not, how can I deal with it?
PS: using the original inference process can sample images without increasing GPU memory.
Thanks for your great work and your effort on sharing this code. Here I am wondering that, is it stable to use mse loss for training segmentation tasks? Usually we use cross-entropy loss to train this task and this is what i am curious about.
Thanks for reading this issue and I am looking forward to your reply!
i looked other's issue,someone said need to change the pytorch version to 1.8.1,but when i try it it won't work,i also tried other version of pytorch still won't work
Hi, could you please provide your pre-trained models? I train a model, but the sampling result is not right. The max value of the pixel is about 10, so the pictures are all black.
Hi,
It is an excellent project to share with. I have a question when running the program. Is the number of steps set to 1000 during training and use only 100 steps during inference?
Thanks if the question can be answered~
Best,
CaviarLover
May I ask how many epochs do you train to obtain the result in this paper?
Hi Junde Wu,
I have some questions for you.
The hyper-parameter in_ch=2 is fixed no matter of binary or multi-class task, where the two dimension includes the image and the mask.
For multi-calss task, what we are supposed to change is only the calibration output, i.e. sigmoid to softmax, then we can get a [1 3 H W] calibration and a [1 2 H W] model_output. Is that correct?
If we change the in_ch = 3 + 1(one-hot with the image condition), we can have the [1 3 H W] calibration, however, i do not know what is the model_output? is it something like [1 3 2 H W]? or it is also the [1 2 H W], if so, using mask rather then one-hot as the input of diffusion model seems to be meaningful?
I grouped a 5-class task into binary case to check the results. Here are one visualization, is it correct? From top to bottom, img, recovery from diffusion model, calibration, linear combination of the recovery and calibration.
Thanks!
Ping
hello, may I know how did you slice the 3D brats data into 2D data in order to put it in the directory?
defaults.update({k: v for k, v in model_and_diffusion_defaults().items() if k not in defaults})
Hi, i believe this is what you want to have, otherwise the value will be overwriten by those in the predefined values
Hello, I run scripts/ segmentation_train.py on my own datasets , and I meet the problem:
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument find_unused_parameters=True
to torch.nn.parallel.DistributedDataParallel
; (2) making sure all forward
function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's forward
function. Please include the loss function and the structure of the return value of forward
of your module when reporting this issue (e.g. list, dict, iterable).
Thank you !
Are the scores in tableⅠ of MedSedDiff the official test scores or 5-fold cross validation scores? nii.gz files are needed to be uploaded to BraTS2020 website to get the official test score, but I cannot find nii.gz generating section in related source code so I don't know how to get the test score of trained model.
HIi, thanks for your great work! 😊 Could you tell me how to use this part of code? Can I just use “return v[(...,) + (None,)*(dims - 1)]” to replace line1257-1261?
My programming ability is not very good so I cannot understand this part. Why did you write this part like this? Looking forward to your reply!!
When I running the scripts/segmentation_train.py have a problem.
Traceback (most recent call last):
File "D:\jace\pythonProject\MedSegDiffv2-master\scripts\segmentation_train.py", line 110, in
main()
File "D:\jace\pythonProject\MedSegDiffv2-master\scripts\segmentation_train.py", line 62, in main
TrainLoop(
File "D:\jace\pythonProject\MedSegDiffv2-master\guided_diffusion\train_util.py", line 83, in init
self._load_and_sync_parameters()
File "D:\jace\pythonProject\MedSegDiffv2-master\guided_diffusion\train_util.py", line 139, in _load_and_sync_parameters
dist_util.sync_params(self.model.parameters())
File "D:\jace\pythonProject\MedSegDiffv2-master\guided_diffusion\dist_util.py", line 78, in sync_params
dist.broadcast(p, 0)
File "C:\SoftWare\python 3.10\lib\site-packages\torch\distributed\distributed_c10d.py", line 1408, in broadcast
work.wait()
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
Hello, I found that the DPM solver can only be found in the function p_sample_loop_known. But I can not find them in training codes. Can you tell me?
Can you provide the environment.yaml or requirements.txt file?
what parameters or arguments should i revise or what can i do?
In the line, MedSegDiff + TransUNet, the avg should be 0.849, not 0.762
Hi!
Thanks for this repo, really exciting stuff!
I have a sagittal MRI dataset that has the following dimensions: (512,512,7)
(H, W, Slice)
in NIFTI format. How should the input be for the network to train? In my understanding, since the autoencoder is 2D U-Net, the networks will be trained on each slice of each patient individually, however, I'm a bit confused about the input to network should be.
Thank you for your excellent job. I wonder how many iterations will be used for training since I do not find the condition to stop training. Thank you.
Hi there, nice work.
Can you provide me your training and testing split for the BRATS21 dataset? I am trying to reproduce your work so I would like to know how to create the actual samples I need to train and infer upon. In the paper you wrote Train/validation/test sets are split following the default settings of the dataset
, but their validation and test split sets don't have labels. Can you tell me how to find them?
Also did you do any preprocessing except slicing the images from 3D to 2D?
i download the brats dataset and rename the folder as readme told,
I have a question regarding loss calculation:
for training loss = (losses["loss"] * weights + losses['loss_cal'] * 10).mean()
is used.
Why do you weigh the direct prediction of the ground truth higher compared to the comparison with a less noisy version?
Is there a reason that for inference depending on the Dice-score, different composition of cal and sample is used.
Thanks in advance!
Excellent work!
I'm a beginner in the field of deep learning.
I have a question that when I run segmentation_sample.py, what's the difference between savedmodel_XXXX.pt, optsavedmodel_XXXX.pt, emasavedmodel_XXXX.pt.
Thanks a lot.
for one image, model need to infer 1000(timestep), one step is about 2min, how to speed up the infer process?
Dear authors:
I have some questions about the function of the highway_forward (Generic_UNet). Detailed as follows:
I hope for your response sincerely. Thanks a lot!
对于扩散做语义分割给予了厚望,但用在自己的数据集V1的代码结果不太理想,希望能够尽早发布V2版本
Original Traceback (most recent call last):
File "/data1/ppw/anaconda3/envs/PyTorch/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/data1/ppw/anaconda3/envs/PyTorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/data1/ppw/works/MedSegDiff/guided_diffusion/unet.py", line 773, in forward
h = module(h, emb)
File "/data1/ppw/anaconda3/envs/PyTorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/data1/ppw/works/MedSegDiff/guided_diffusion/unet.py", line 86, in forward
x = layer(x)
File "/data1/ppw/anaconda3/envs/PyTorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/data1/ppw/anaconda3/envs/PyTorch/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 446, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/data1/ppw/anaconda3/envs/PyTorch/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 442, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [128, 5, 3, 3], expected input[8, 4, 256, 256] to have 5 channels, but got 4 channels instead
Error using official run mode.
mentation_sample.py --data_dir /home/yp/diskdata/workspace/medsegdiff/dataset/ISIC --model_path /home/yp/diskdata/workspace/medsegdiff/results/savedmodel020000.pt --image_size 256 --num_channels 128 --class_cond False --num_res_blocks 2 --num_heads 1 --learn_sigma True --use_scale_shift_norm False --attention_resolutions 16 --diffusion_steps 1000 --noise_schedule linear --rescale_learned_sigmas False --rescale_timesteps False --num_ensemble 5
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.