📚 The doc issue This is not an issue. How the workers works in pa

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

The workers works in parallelism? about serve HOT 7 CLOSED

IonBoleac commented on May 31, 2024

The workers works in parallelism?

from serve.

Comments (7)

agunapal commented on May 31, 2024

@IonBoleac Yes, these are python processes. If you have a multi-GPU instance, you can choose to assign each GPU to a worker. In an intel CPU based instance, you can use core pinning to distribute the physical cores evenly between all the workers

from serve.

IonBoleac commented on May 31, 2024

@agunapal thanks for your answer... In this case the workers of a model works in parallel? If the serving receive a request for a inference., then this request can be divided in all workers assigned to this model? Thus this request is elaborated in parallel form all workers of a model or one worker of the model works for only one request? So one worker for only one request even if the request is very large. Is this right?
I have this dubious because the workers are process then for theory the process don't shares their resource with other and every worker process works only on one core... but they are managed by frontend that are java threads, and these can works in parallel on more cores. So in theory the workers can works in parallel... But anyway the workers don't works with other workers to resolve one request or they can? Can you help me to resolve this things?

from serve.

IonBoleac commented on May 31, 2024

@agunapal thangs very much for your eventually help

from serve.

agunapal commented on May 31, 2024

@IonBoleac No, each worker will handle a single request sent from the frontend at a time. This request can be either dynamically batched or not depending on the config.
If multiple clients are sending requests at the same time, then having multiple workers can process these in parallel instead of being sequential

from serve.

IonBoleac commented on May 31, 2024

@agunapal thnaks very much... I've an another question. What is the sense to use minWorker and maxWorker if there isn't autoscaling of the workers? The single way to increase or decrease the workers is form management API

from serve.

lxning commented on May 31, 2024

@IonBoleac It is expensive for worker process to auto scale. Currently TorchServe only supports increase or decrease the worker via management API

from serve.

IonBoleac commented on May 31, 2024

@lxning thanks for your help...

from serve.

Recommend Projects

The workers works in parallelism? about serve HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent