Giter VIP home page Giter VIP logo

Comments (12)

xichenpan avatar xichenpan commented on August 26, 2024 2

Yes, we try to freezing clip, blip and resent in our very early experiments and the performance is still acceptable, but we do not run whole experiment use this setting and test FID scores.
btw, for acceptable performance, the model do not need to be trained for 50 epochs, 3-5 epochs are enough, this can also reduce your computational power cost.

from arldm.

xichenpan avatar xichenpan commented on August 26, 2024 2

Yes, we try to freezing clip, blip and resent in our very early experiments and the performance is still acceptable, but we do not run whole experiment use this setting and test FID scores.

Hi, I want to know how long it will take approximately if I set all three to freeze and train on 4 A100? Really appreciate for this great open source project:)

@bibisbar Hi, for unfreeze setting, the forward time for a batch=1 in a single A100 GPU is 0.5s, and freeze will not change this time much. As for the backward, the original time is 0.5s too, and freeze the gradient will accelerate this process, I guess it may reduce 50% time cost at most. So freeze operation will slightly shorten the training time, but it can reduce the memory usage (which I think is more important, like param efficient tuning)

from arldm.

skywalker00001 avatar skywalker00001 commented on August 26, 2024 2

@skywalker00001 Hi, I am sorry it seems pytorch lighting do not support this setting. Lightning-AI/lightning#49

Thanks. And the other several approaches (freezing resnet, clip embedding, blip embedding), amp and 8-bit optimizer together helped reduce the vRAM to about 40GB on my A6000 for batch_size = 1 successfully.

from arldm.

xichenpan avatar xichenpan commented on August 26, 2024

It seems to be a pytorch lightning problem. We trained the model using batch_size=1 on A100 GPUs with 80G vram (usuing around 70+GB). Gradient checkpointing, amp, and 8-bit optimizer can greatly reduce the varm requirement, You can also set freeze_clip=True, freeze_blip=True, reeze_resnet=True to reduce vram usage,
see this https://github.com/huggingface/diffusers/tree/main/examples/dreambooth and https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html. But it is still not possible to run the model on 3080ti, I guess it would be able to be trained on V100 (32GB) or even 3090 (24GB) after above modification.

from arldm.

kriskrisliu avatar kriskrisliu commented on August 26, 2024

Nice!
Well, gradient checkpointing, amp, and 8-bit optimizer seem to be an optional choice, I'll take a try or maybe use more powerfull GPUs instead...
Actually, I'm curious about freezing clip, blip and resent, which means to freeze most parameters. Is it able to produce comparable results?

from arldm.

bibisbar avatar bibisbar commented on August 26, 2024

Yes, we try to freezing clip, blip and resent in our very early experiments and the performance is still acceptable, but we do not run whole experiment use this setting and test FID scores.

Hi, I want to know how long it will take approximately if I set all three to freeze and train on 4 A100? Really appreciate for this great open source project:)

from arldm.

skywalker00001 avatar skywalker00001 commented on August 26, 2024

It seems to be a pytorch lightning problem. We trained the model using batch_size=1 on A100 GPUs with 80G vram (usuing around 70+GB). Gradient checkpointing, amp, and 8-bit optimizer can greatly reduce the varm requirement, You can also set freeze_clip=True, freeze_blip=True, reeze_resnet=True to reduce vram usage, see this https://github.com/huggingface/diffusers/tree/main/examples/dreambooth and https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html. But it is still not possible to run the model on 3080ti, I guess it would be able to be trained on V100 (32GB) or even 3090 (24GB) after above modification.

Hi, but How to enable gradient checkpointing in pytorch lightning model? I think in the huggingface model, it's easy to model.enable_gradient_checkpointing(), but it seems no to work for ARLDM model....

from arldm.

xichenpan avatar xichenpan commented on August 26, 2024

@skywalker00001 Hi, I am sorry it seems pytorch lighting do not support this setting. Lightning-AI/pytorch-lightning#49

from arldm.

xichenpan avatar xichenpan commented on August 26, 2024

@skywalker00001 Great!

from arldm.

FlamingJay avatar FlamingJay commented on August 26, 2024

It seems to be a pytorch lightning problem. We trained the model using batch_size=1 on A100 GPUs with 80G vram (usuing around 70+GB). Gradient checkpointing, amp, and 8-bit optimizer can greatly reduce the varm requirement, You can also set freeze_clip=True, freeze_blip=True, reeze_resnet=True to reduce vram usage, see this https://github.com/huggingface/diffusers/tree/main/examples/dreambooth and https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html. But it is still not possible to run the model on 3080ti, I guess it would be able to be trained on V100 (32GB) or even 3090 (24GB) after above modification.

freeze_clip=True, freeze_blip=True, reeze_resnet=True and V100 doesn't work, still cuda out of momory

from arldm.

skywalker00001 avatar skywalker00001 commented on August 26, 2024

from arldm.

Echo411 avatar Echo411 commented on August 26, 2024

@skywalker00001 Hi, I am sorry it seems pytorch lighting do not support this setting. Lightning-AI/lightning#49

Thanks. And the other several approaches (freezing resnet, clip embedding, blip embedding), amp and 8-bit optimizer together helped reduce the vRAM to about 40GB on my A6000 for batch_size = 1 successfully.

Hi, I'm a beginner of deep learning and I would appreciate it if you could tell me how you can use amp and 8-bit optimizer to reduce VRAM usage. And I wonder, can I run this project on two 24G 3090s after optimization? Looking forward to hearing from you.

from arldm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.