Comments (12)
@harshbafna If batchSize and max_batch_delay can only be configured only through management API what is the recommendation from Torchserve team to configure this when using multiple replicas in Kubernetes to load these values on container start/restart ?
from serve.
@fbbradheintz : batch_size and max_batch_delay configuration is not supported with config.properties
.
For details on supported config.properties
parameters please refer configuration documentation
These parameters can be configured through management APIs while registering the model as documented in here.
The default values for these params is :
- batchSize - 1
- maxBatchDelay - 100
These parameters are only used in case of batch inferencing.
For usage example refer : Batch inferencing example with Resnet 152 model
from serve.
If we can't put the batch config info per model into a configuration file, that's going to be a problem for the server saving its state. I've flagged this as v1.0, but there should be a discussion about this soon.
from serve.
@fbbradheintz : While saving the snapshot for registered models, we do save the batchSize
and maxBatchDelay
params for every model-version and the models are restored with their corresponding batchSize
and maxBatchDelay
when we restart TorchServe
with a snapshot config.
However, these parameters are not available as generic config parameters and the default vaules are used for
- models passed in
models
param (list of models to be loaded while startingTorchServe
) in config file - models registered through management api without these params in argument.
Note that if the config file supplied with --ts-config-file
param while starting Torchserve
is a snapshot config file, then the models
param is ignored and the models in model_snapshot parameter are registered.
from serve.
I understand that the option is available in the command line and Management API; the concern is that it is not available in config files, and generally we don't have parity among the multiple configuration methods.
This doesn't block launch, but I'm leaving open for discussion.
from serve.
Batch Inferencing is not supported by TorchServe default handlers and is only supported through custom handlers. In case we make this configurable through config.property at global level, this will get applicable for models using default handlers as well and the will break. Thus, it makes sense to keep this configurable at model level only at the time of model registration.
However, TorchServe should provide an api to update these parameters through API post registration as well.
@mycpuorg , @fbbradheintz , @dhaniram-kshirsagar thoughts?
from serve.
Agreed with @harshbafna that batch params should be set at the model level. The point here is consistency and parity among the multiple configuration channels.
Are we limited here by the .properties
format?
Also, can we agreed on a definition for this feature before writing more code for it?
from serve.
It may be a good idea to discuss the approach taken and its pros/cons. Some of my thoughts are as follows:
- Ideally we shouldn't add model-level configuration to
config.properties
. We could look into adding these options toMANIFEST.json
in MAR files (or) look into having a separate configuration file (analogous tolog4j.properties
) which contains all the model-specific configuration. The concerns with both the options are:- Adding all the model level configuration to a single file could potentially make that configuration file hard to manually modify. We may have to look into tooling to generate this configuration.
- If we go the MANIFEST route, we will be mandating that, customers who want to use this feature MUST provide a mar file. And every time customers want to change this configuration, they would have to regenerate their MAR files. This might not be feasible as model creation and deployment might not happen in the same pipeline.
- I don't think we need a new API for this. We currently do support
PUT /models
, which already modifies number of workers. Why can't we add options to this API? - We could take the environment variables route. But defining a good namespace and not polluting the environment variables might be a discussion of its own.
from serve.
It may be a good idea to discuss the approach taken and its pros/cons. Some of my thoughts are as follows:
Ideally we shouldn't add model-level configuration to
config.properties
. We could look into adding these options toMANIFEST.json
in MAR files (or) look into having a separate configuration file (analogous tolog4j.properties
) which contains all the model-specific configuration. The concerns with both the options are:
- Adding all the model level configuration to a single file could potentially make that configuration file hard to manually modify. We may have to look into tooling to generate this configuration.
- If we go the MANIFEST route, we will be mandating that, customers who want to use this feature MUST provide a mar file. And every time customers want to change this configuration, they would have to regenerate their MAR files. This might not be feasible as model creation and deployment might not happen in the same pipeline.
I don't think we need a new API for this. We currently do support
PUT /models
, which already modifies number of workers. Why can't we add options to this API?We could take the environment variables route. But defining a good namespace and not polluting the environment variables might be a discussion of its own.
I like #2 above i.e. modify PUT API to support batch params.
from serve.
what about to check if there is MODELNAME.properties
in model store during start? If so, load batch_size and delay from it per model?
from serve.
model level config is added in v0.4.1.
from serve.
@harshbafna If batchSize and max_batch_delay can only be configured only through management API what is the recommendation from Torchserve team to configure this when using multiple replicas in Kubernetes to load these values on container start/restart ?
I found an example for that in torchserve github. https://github.com/pytorch/serve/blob/master/kubernetes/EKS/config.properties.
I hope the above link will be helpful.
from serve.
Related Issues (20)
- TorchServe linux aarch64 plan
- Serve multiple models with both CPU and GPU HOT 3
- How to modify torchserve’s Python runtime from 3.8.0 to 3.10 HOT 1
- TorchServe crashes in production with `WorkerThread - IllegalStateException error' HOT 4
- Unable to use build_image.sh to build the cu102 version of the image HOT 2
- Metrics collector crashes when NVIDIA MIGs are present HOT 1
- Server crashes in production with `WorkerThread - IllegalStateException error' HOT 1
- Whether the pre- and post-processing operations of batch processing are parallel HOT 1
- Update cpp/llamacpp to Llama 3 HOT 1
- Update LLM/llama2 to Llama3
- Update large_models/inferentia2/llama2 to Llama3
- Update large_models/tp_llama to llama3
- Update large_models/gpt_fast to llama3
- How to pass parameters from preprocessing to postprocessing when using micro-batch operations HOT 4
- Load model failed - error: Worker died HOT 5
- Docker regression failure: test_handler_traceback_logging.py
- Exchange Llama2 against Llama3 in HF_accelerate example
- CUDA out of Memory with low Memory Utilization (CUDA error: device-side assert triggered) HOT 4
- If micro_batch_size of micro-batch is set to 1, then model inference is still batch processing? HOT 1
- question to model inference optimization HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from serve.