Giter VIP home page Giter VIP logo

Comments (4)

mreso avatar mreso commented on May 31, 2024 1

Hi @Twinparadox
it depends on your use case if multiple workers per GPU make sense.

In general, its not a problem to have multiple workers on the same GPU but your milage may vary. Keep in mind that each worker will hold its own copy of the model weights so you might fill up your memory pretty quickly.

A reason for low GPU usage can have multiple factors. Here are the two major ones I can think of right now:

  1. Do you use extensive pre- or post-processing in your handler? - As the GPU is mostly idle during these times the usage might be lower than expected.
  2. Is your batch size configured correctly to maximize usage?

Another issue might be GPU context switching but the NVIDIA driver is usually pretty good at scheduling the work from multiple GPU contexts (each process has its own NVIDIA context. You can have a look into this doc to see if you can squeeze out more performance in your environment by utilizing NVIDIA MPS which shares a single context between worker processes.

If you're in an offline batch processing scenario, you can have a look into micro batching which usually boost performance for big batch sizes.

If you can share some details on you use case and model we might be able to help you better.

from serve.

mreso avatar mreso commented on May 31, 2024 1

from serve.

Twinparadox avatar Twinparadox commented on May 31, 2024

@mreso
Thank you for all of your kindness. :)

I'm going to serve an model that receive image data.
The input consists of high-resolution images, so the pre-processing operation of image inquiry and resizing was included in the handler.

Do you use extensive pre- or post-processing in your handler? - As the GPU is mostly idle during these times the usage might be lower than expected.

Well, There are pre-processing tasks in handler takes longer than the model inference time. I think it seems to correspond to this issue.

Is your batch size configured correctly to maximize usage?

I didn't consider this part. It would be a good idea to check it out.

Based on your advice, I will check if NVIDIA MPS or micro-batching is appropriate for our use case as well.

from serve.

Twinparadox avatar Twinparadox commented on May 31, 2024

@mreso
Thank you so much.
I will also consider using NVIDIA Dali for image process.

from serve.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.