Thanks for your great project. I'm newbie and this is my first exper

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Can i use multiple workers in single GPU? about serve HOT 4 CLOSED

Twinparadox commented on May 31, 2024

Can i use multiple workers in single GPU?

from serve.

Comments (4)

mreso commented on May 31, 2024 1

Hi @Twinparadox
it depends on your use case if multiple workers per GPU make sense.

In general, its not a problem to have multiple workers on the same GPU but your milage may vary. Keep in mind that each worker will hold its own copy of the model weights so you might fill up your memory pretty quickly.

A reason for low GPU usage can have multiple factors. Here are the two major ones I can think of right now:

Do you use extensive pre- or post-processing in your handler? - As the GPU is mostly idle during these times the usage might be lower than expected.
Is your batch size configured correctly to maximize usage?

Another issue might be GPU context switching but the NVIDIA driver is usually pretty good at scheduling the work from multiple GPU contexts (each process has its own NVIDIA context. You can have a look into this doc to see if you can squeeze out more performance in your environment by utilizing NVIDIA MPS which shares a single context between worker processes.

If you're in an offline batch processing scenario, you can have a look into micro batching which usually boost performance for big batch sizes.

If you can share some details on you use case and model we might be able to help you better.

from serve.

mreso commented on May 31, 2024 1

@Twinparadox For image preprocessing you can also have a look at our NVIDIA Dali example https://github.com/pytorch/serve/tree/master/examples/nvidia_dali

from serve.

Twinparadox commented on May 31, 2024

@mreso
Thank you for all of your kindness. :)

I'm going to serve an model that receive image data.
The input consists of high-resolution images, so the pre-processing operation of image inquiry and resizing was included in the handler.

Do you use extensive pre- or post-processing in your handler? - As the GPU is mostly idle during these times the usage might be lower than expected.

Well, There are pre-processing tasks in handler takes longer than the model inference time. I think it seems to correspond to this issue.

Is your batch size configured correctly to maximize usage?

I didn't consider this part. It would be a good idea to check it out.

Based on your advice, I will check if NVIDIA MPS or micro-batching is appropriate for our use case as well.

from serve.

Twinparadox commented on May 31, 2024

@mreso
Thank you so much.
I will also consider using NVIDIA Dali for image process.

from serve.

Recommend Projects

Can i use multiple workers in single GPU? about serve HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent