Giter VIP home page Giter VIP logo

Comments (11)

nairbv avatar nairbv commented on May 15, 2024 4

filed JIT ticket for potential improvements: pytorch/pytorch#33354

from serve.

fbbradheintz avatar fbbradheintz commented on May 15, 2024 1

Thanks to @nairbv for the tandem diagnosis on this.

The issue only shows on the model's first forward pass. There's a bunch of precompilation that needs to happen for TorchScript to execute an inference. After that happens, things get much faster. I'll verify that once a worker is hit once, perf improves.

At this time, the only way to kick off this precompilation is to perform a forward pass. We discussed different ways to accommodate this:

  • One way is for TorchServe to have a way to generate a valid or valid-looking input to a model. This would require a bunch of new code and config around what constitutes a valid input to a model. Seems like a heavy lift.
  • Another way would be for the user to (optionally) include a serialized PyTorch tensor that contains valid input. If such a tensor exists, TorchServe could load it and pass it to the model as part if initialization.

The latter could be exposed as a single, optional flag on torchserve, something like:

torchserve --start --blahblah --sample_input=valid_input.pt

For the time being, this doesn't need to block launch, but we should make a plan to improve this in future revs.

from serve.

harshbafna avatar harshbafna commented on May 15, 2024 1

Validated this on the latest master with PT 1.7 on a p3.8xlarge instance with 4 model workers each loaded on a different GPU device and response time is 1.4 seconds

ubuntu@ip-172-31-73-130:~$ time curl -X POST http://localhost:8080/predictions/densenet161_scripted -T serve/examples/image_classifier/kitten.jpg 
{
  "tiger_cat": 0.46933576464653015,
  "tabby": 0.463387668132782,
  "Egyptian_cat": 0.06456146389245987,
  "lynx": 0.0012828221078962088,
  "plastic_bag": 0.00023323048662859946
}
real	0m1.344s
user	0m0.000s
sys	0m0.006s
ubuntu@ip-172-31-73-130:~$ 
ubuntu@ip-172-31-73-130:~$ time curl -X POST http://localhost:8080/predictions/densenet161_scripted -T serve/examples/image_classifier/kitten.jpg 
{
  "tiger_cat": 0.46933576464653015,
  "tabby": 0.463387668132782,
  "Egyptian_cat": 0.06456146389245987,
  "lynx": 0.0012828221078962088,
  "plastic_bag": 0.00023323048662859946
}
real	0m1.347s
user	0m0.000s
sys	0m0.006s
ubuntu@ip-172-31-73-130:~$ 
ubuntu@ip-172-31-73-130:~$ time curl -X POST http://localhost:8080/predictions/densenet161_scripted -T serve/examples/image_classifier/kitten.jpg 
{
  "tiger_cat": 0.46933576464653015,
  "tabby": 0.463387668132782,
  "Egyptian_cat": 0.06456146389245987,
  "lynx": 0.0012828221078962088,
  "plastic_bag": 0.00023323048662859946
}
real	0m1.394s
user	0m0.000s
sys	0m0.006s
ubuntu@ip-172-31-73-130:~$ 
ubuntu@ip-172-31-73-130:~$ time curl -X POST http://localhost:8080/predictions/densenet161_scripted -T serve/examples/image_classifier/kitten.jpg 
{
  "tiger_cat": 0.46933576464653015,
  "tabby": 0.463387668132782,
  "Egyptian_cat": 0.06456146389245987,
  "lynx": 0.0012828221078962088,
  "plastic_bag": 0.00023323048662859946
}
real	0m1.374s
user	0m0.000s
sys	0m0.006s
ubuntu@ip-172-31-73-130:~$ 

Closing the ticket.

from serve.

ozancaglayan avatar ozancaglayan commented on May 15, 2024 1

sorry, ignore this, i didnt notice that the model was getting deployed onto gpu without any further setup, so that overhead is probably due to the model being on gpu, some CUDA cache coldness. now there seems to be still a slight lag in first calls on CPU, though probably negligible.

Thanks!

from serve.

fbbradheintz avatar fbbradheintz commented on May 15, 2024

Note that this isn't a "lag time loading the model" issue - repeated attempts give similar results.

I'll try it with some other models as well, to see how consistent the issue is.

from serve.

harshbafna avatar harshbafna commented on May 15, 2024

@fbbradheintz This seems like PyTorch specific issue.

Please find attached sample prediction code for densenet161 model in eager and torchscript mode.

test_torchscript.txt
test_eager.txt

  • TorchScript mode execution time
(base) USL07109 harsh_bafna$ time python test_torchscript.py 
['n02123045', 'tabby']
--- 114.89286518096924 seconds ---

real	1m56.721s
user	1m56.270s
sys	0m0.950s
  • Eager mode execution time
(base) USL07109 harsh_bafna$ time python test_eager.py 
['n02123045', 'tabby']
--- 1.1276381015777588 seconds ---

real	0m1.974s
user	0m1.975s
sys	0m0.409s

We also found following open issue in PyTorch related to performance issue in TorchScript mode :
pytorch/pytorch#30365

from serve.

fbbradheintz avatar fbbradheintz commented on May 15, 2024

Ah - I hadn't seen that issue. Will investigate from my side. I'll keep this issue open in the meantime.

from serve.

fbbradheintz avatar fbbradheintz commented on May 15, 2024

@harshbafna Can you share the scripts you used for that test?

from serve.

jeremiahschung avatar jeremiahschung commented on May 15, 2024

Confirmed fix in the latest 1.7 RC thanks to the fix in pytorch/pytorch#33354.

Followed steps in the original issue description to create a torchscripted densenet model and served it with TS.

time curl -X POST http://127.0.0.1:8080/predictions/tsd161 -T kitten.jpg
{
  "282": 0.4693361222743988,
  "281": 0.4633875787258148,
  "285": 0.06456127017736435,
  "287": 0.0012828144244849682,
  "728": 0.00023322943889070302
}
real	0m0.496s
user	0m0.004s
sys	0m0.004s
time curl -X POST http://127.0.0.1:8080/predictions/tsd161 -T kitten.jpg
{
  "282": 0.4693361222743988,
  "281": 0.4633875787258148,
  "285": 0.06456127017736435,
  "287": 0.0012828144244849682,
  "728": 0.00023322943889070302
}
real	0m0.049s
user	0m0.008s
sys	0m0.000s

@chauhang , can we close this issue now or wait until 1.7 is out?

from serve.

ozancaglayan avatar ozancaglayan commented on May 15, 2024

Hi,

sorry to bring this up but I thought that this may be the right place.

I'm also having a similar issue with torchserve-nightly, but interestingly with an eager model. After launching the server, the forward-pass for the very first HTTP request takes around 320ms whereas the subsequent ones take around 9ms. I've measured times of different snippets and indeed, it's the forward call that takes 99% of this time.

Do you have any ideas?

from serve.

nairbv avatar nairbv commented on May 15, 2024

@ozancaglayan That sounds like a distinct issue, so might want to file a separate one. Is this difference torchserve specific, or is it something you can reproduce without torchserve?

from serve.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.