Hi! The code in this repository has helped me a lot! <p dir="aut

A question related to batch size and training speed about denoising-diffusion-pytorch HOT 2 OPEN

paidaxinbao commented on June 26, 2024

A question related to batch size and training speed

from denoising-diffusion-pytorch.

Comments (2)

gggah commented on June 26, 2024

Hi!

The code in this repository has helped me a lot!

I found that as the batch size increases, the training time increases dramatically. When I set the batch size to 4 (the dataset has 25k images) the training time is about 2 days, but when the batch size is set to 128, the training time increases to 800 hours!

I don't know much about this.

My training configuration is as follows: model = Unet( dim=64, out_dim=1, dim_mults=(1, 2, 4, 8), channels=2 )

diffusion = GaussianDiffusion( model, image_size=128, timesteps=1000, # number of steps sampling_timesteps=250, # number of sampling timesteps (using ddim for faster inference [see citation for ddim paper]) )

trainer = Trainer( diffusion, '/home/pxy/ML_work/train_picset/', train_batch_size=4, train_lr=8e-5, train_num_steps=700000, # total training steps gradient_accumulate_every=4, # gradient accumulation steps ema_decay=0.995, # exponential moving average decay amp=True, # turn on mixed precision calculate_fid = False )

trainer.train()

May I ask what is your data format, why can't I recognize it, and the error should be greater than 100, while mine is 1200 pictures

from denoising-diffusion-pytorch.

paidaxinbao commented on June 26, 2024

Hi!
The code in this repository has helped me a lot!
I found that as the batch size increases, the training time increases dramatically. When I set the batch size to 4 (the dataset has 25k images) the training time is about 2 days, but when the batch size is set to 128, the training time increases to 800 hours!
I don't know much about this.
My training configuration is as follows: model = Unet( dim=64, out_dim=1, dim_mults=(1, 2, 4, 8), channels=2 )
diffusion = GaussianDiffusion( model, image_size=128, timesteps=1000, # number of steps sampling_timesteps=250, # number of sampling timesteps (using ddim for faster inference [see citation for ddim paper]) )
trainer = Trainer( diffusion, '/home/pxy/ML_work/train_picset/', train_batch_size=4, train_lr=8e-5, train_num_steps=700000, # total training steps gradient_accumulate_every=4, # gradient accumulation steps ema_decay=0.995, # exponential moving average decay amp=True, # turn on mixed precision calculate_fid = False )
trainer.train()

May I ask what is your data format, why can't I recognize it, and the error should be greater than 100, while mine is 1200 pictures

Hi, my data is a grayscale map and then I used the strategy in SR3 to use the condition and concatenate the original grayscale image as an input to Unet. I didn't understand what you mean by error, do you mean loss?

from denoising-diffusion-pytorch.

A question related to batch size and training speed about denoising-diffusion-pytorch HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent