Giter VIP home page Giter VIP logo

Comments (5)

mreso avatar mreso commented on May 29, 2024

Hi @emilwallner,
thanks for the extensive issue report.

My thought on this are:

  1. You're looking at the server after the crash, right? Meaning that the worker process has died, gets restarted and and thus memory is back to normal.
  2. I can't find the line from your stack trace in your code but I assume that its basically the next line from your code. Detach does not create a copy of the data so you should still be having a single batch on device.
  3. You're resizing the images with a resolution coming from the requests and then re-resizing the tensor in preprocess_and_stack_images to (3,768,768). Then you're stacking them along the channel dimension creating e.g. (6,768,768) before you add a batch dimension with unsqueeze. Not sure about your model by maybe it does something funky when it gets (1,6,768,768) instead of(2,3,768,768).
  4. What is your batch size? Did you try using batch_size=1 for some time?
  5. In the video there are multiple processes on the GPU, do you use multiple worker for the same model?

Thats all I have for now but happy to continue spitballing and iterating over this until you find s solution!

Best
Matthias

from serve.

emilwallner avatar emilwallner commented on May 29, 2024

Really, really appreciate your input, @mreso!

  1. The worker crashes and returns 507 and doesn't recover.
  2. Yeah, I added detach to make sure requires_grad is set to False
  3. Yeah, that could be it
  4. I switched the batch size to 1 following your suggestion. Also, I check that it has the correct type, and final batch size.
  5. Yes, multiple workers per model.

I also realized CUDA_LAUNCH_BLOCKING 1 reduces performance by about 70%, so I'll turn it off for now.

Here's my updated check:

  def preprocess_and_stack_images(self, images):
        preprocessed_images = []

        for i, img in enumerate(images):
            try:
                preprocessed_img = self.resize_tensor(img)
              
                if preprocessed_img.shape != (3, 768, 768) or preprocessed_img.min() < 0 or preprocessed_img.max() > 1 or preprocessed_img.dtype != torch.float32:
                    # Log information about the image that doesn't meet the requirements
                    logger.info(f"Image {i} does not meet the requirements. Replacing with a blank image.")
                    preprocessed_img = torch.zeros((3, 768, 768))
            except Exception as e:
                # Log the error message and load a blank image
                logger.error(f"Error occurred while processing Image {i}: {str(e)}. Loading a blank image.")
                preprocessed_img = torch.zeros((3, 768, 768))

            preprocessed_images.append(preprocessed_img)

        images_batch = torch.stack(preprocessed_images, dim=0)

        if len(images_batch.shape) == 3:
            images_batch = images_batch.unsqueeze(0)

        # Second test: Check if the size is (1, 3, 768, 768)
        if images_batch.shape != (1, 3, 768, 768):
            # Log information about the batch that doesn't meet the requirements
            logger.info(f"Batch size {images_batch.shape} does not match the required shape (1, 3, 768, 768). Replacing with a blank batch.")
            images_batch = torch.zeros((1, 3, 768, 768))
        

        return images_batch

Again, really appreciate the brainstorming — let’s keep at it until we crack this!

from serve.

mreso avatar mreso commented on May 29, 2024

Yeah, performance will suffer significant from CUDA_LAUNCH_BLOCKING as kernels will not run asynchronously. So only activate if really necessary for debugging.

You could try to run the model in a notebook with a (1,6,768,768) input and observe the memory usage compared to (2,3,768,768). Wondering why this actually seem to to work in the first place.

from serve.

emilwallner avatar emilwallner commented on May 29, 2024

I haven’t tried the (1,6,768,768) input yet, but since our model is based on three channels, it should throw an error during execution.

Now, I double-check the size (1,3,768,768), dtype, and ensured the values are in the correct range. Despite that, I’m still hitting a CUDA error: device-side assert triggered when moving the batch with images_batch = images_batch.to(self.device).detach()

Got any more suggestions on what might be causing this?

from serve.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.