Giter VIP home page Giter VIP logo

Comments (8)

xichenpan avatar xichenpan commented on August 26, 2024 1

What I mean is that I have now saved a ckpt and the training process is still executing, I have reopened another job to sample the saved ckpt, but I am worried that the ckpt will be overwritten by the new training process saved ckpt @Flash-321

@pokameng It doesn't matter, once your ckpt is loaded, the sample job do not rely on it anymore. You can also copy it to another folder to avoid this situation.

from arldm.

pokameng avatar pokameng commented on August 26, 2024 1

ok thanks!!!

from arldm.

xichenpan avatar xichenpan commented on August 26, 2024

@pokameng Hi, it is too slow. In our original implementation, sample each story takes around 30-40 secounds. I am wondering if the time cost for the first batch contains the prefetch time and hdf5 loading time, what about the time cost for the other batch?
Also, in the sample mode, you could using ddp and also increase the batch size. For acceptable sample quality, you could try to set the guidance scale to 7.5, and the steps to 50, using a pndm scheduler, it can greatly reduce the sample time.

from arldm.

pokameng avatar pokameng commented on August 26, 2024

@Flash-321
Can I train and sample at the same time?

from arldm.

xichenpan avatar xichenpan commented on August 26, 2024

@pokameng Sure, you can simply modify the code in
https://github.com/Flash-321/ARLDM/blob/eb907e3717ac20f82dfba8e67fd55d95127de098/main.py#L309-L311

def validation_step(self, batch, batch_idx):
     original_images, images = self.sample(batch)
     grid = torchvision.utils.make_grid(sample_imgs) 
     self.logger.experiment.add_image('generated_images', grid, 0)

also make sure to enable this module during training
https://github.com/Flash-321/ARLDM/blob/eb907e3717ac20f82dfba8e67fd55d95127de098/main.py#L82-L97
but it will slow the training process, we recommend manually run a job to sample images.

from arldm.

pokameng avatar pokameng commented on August 26, 2024

What I mean is that I have now saved a ckpt and the training process is still executing, I have reopened another job to sample the saved ckpt, but I am worried that the ckpt will be overwritten by the new training process saved ckpt
@Flash-321

from arldm.

rehammsalah avatar rehammsalah commented on August 26, 2024

what is the test_model_file ?

from arldm.

xichenpan avatar xichenpan commented on August 26, 2024

@rehammsalah here, https://github.com/xichenpan/ARLDM/blob/main/config.yaml#L25

from arldm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.