Comments (5)
Hi @emilwallner,
thanks for the extensive issue report.
My thought on this are:
- You're looking at the server after the crash, right? Meaning that the worker process has died, gets restarted and and thus memory is back to normal.
- I can't find the line from your stack trace in your code but I assume that its basically the next line from your code. Detach does not create a copy of the data so you should still be having a single batch on device.
- You're resizing the images with a resolution coming from the requests and then re-resizing the tensor in preprocess_and_stack_images to (3,768,768). Then you're stacking them along the channel dimension creating e.g. (6,768,768) before you add a batch dimension with unsqueeze. Not sure about your model by maybe it does something funky when it gets (1,6,768,768) instead of(2,3,768,768).
- What is your batch size? Did you try using batch_size=1 for some time?
- In the video there are multiple processes on the GPU, do you use multiple worker for the same model?
Thats all I have for now but happy to continue spitballing and iterating over this until you find s solution!
Best
Matthias
from serve.
Really, really appreciate your input, @mreso!
- The worker crashes and returns 507 and doesn't recover.
- Yeah, I added detach to make sure requires_grad is set to False
- Yeah, that could be it
- I switched the batch size to 1 following your suggestion. Also, I check that it has the correct type, and final batch size.
- Yes, multiple workers per model.
I also realized CUDA_LAUNCH_BLOCKING 1 reduces performance by about 70%, so I'll turn it off for now.
Here's my updated check:
def preprocess_and_stack_images(self, images):
preprocessed_images = []
for i, img in enumerate(images):
try:
preprocessed_img = self.resize_tensor(img)
if preprocessed_img.shape != (3, 768, 768) or preprocessed_img.min() < 0 or preprocessed_img.max() > 1 or preprocessed_img.dtype != torch.float32:
# Log information about the image that doesn't meet the requirements
logger.info(f"Image {i} does not meet the requirements. Replacing with a blank image.")
preprocessed_img = torch.zeros((3, 768, 768))
except Exception as e:
# Log the error message and load a blank image
logger.error(f"Error occurred while processing Image {i}: {str(e)}. Loading a blank image.")
preprocessed_img = torch.zeros((3, 768, 768))
preprocessed_images.append(preprocessed_img)
images_batch = torch.stack(preprocessed_images, dim=0)
if len(images_batch.shape) == 3:
images_batch = images_batch.unsqueeze(0)
# Second test: Check if the size is (1, 3, 768, 768)
if images_batch.shape != (1, 3, 768, 768):
# Log information about the batch that doesn't meet the requirements
logger.info(f"Batch size {images_batch.shape} does not match the required shape (1, 3, 768, 768). Replacing with a blank batch.")
images_batch = torch.zeros((1, 3, 768, 768))
return images_batch
Again, really appreciate the brainstorming — let’s keep at it until we crack this!
from serve.
Yeah, performance will suffer significant from CUDA_LAUNCH_BLOCKING as kernels will not run asynchronously. So only activate if really necessary for debugging.
You could try to run the model in a notebook with a (1,6,768,768) input and observe the memory usage compared to (2,3,768,768). Wondering why this actually seem to to work in the first place.
from serve.
I haven’t tried the (1,6,768,768) input yet, but since our model is based on three channels, it should throw an error during execution.
Now, I double-check the size (1,3,768,768), dtype, and ensured the values are in the correct range. Despite that, I’m still hitting a CUDA error: device-side assert triggered when moving the batch with images_batch = images_batch.to(self.device).detach()
Got any more suggestions on what might be causing this?
from serve.
Related Issues (20)
- Serve multiple models with both CPU and GPU HOT 3
- How to modify torchserve’s Python runtime from 3.8.0 to 3.10 HOT 1
- TorchServe crashes in production with `WorkerThread - IllegalStateException error' HOT 4
- Unable to use build_image.sh to build the cu102 version of the image HOT 2
- Metrics collector crashes when NVIDIA MIGs are present HOT 1
- Server crashes in production with `WorkerThread - IllegalStateException error' HOT 1
- Whether the pre- and post-processing operations of batch processing are parallel HOT 1
- Update cpp/llamacpp to Llama 3 HOT 1
- Update LLM/llama2 to Llama3
- Update large_models/inferentia2/llama2 to Llama3
- Update large_models/tp_llama to llama3
- Update large_models/gpt_fast to llama3
- How to pass parameters from preprocessing to postprocessing when using micro-batch operations HOT 4
- Load model failed - error: Worker died HOT 6
- Docker regression failure: test_handler_traceback_logging.py
- Exchange Llama2 against Llama3 in HF_accelerate example
- If micro_batch_size of micro-batch is set to 1, then model inference is still batch processing? HOT 1
- question to model inference optimization HOT 1
- Duplicate base_neuronx_continuous_batching_handler.py HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from serve.