Giter VIP home page Giter VIP logo

Comments (4)

zhangxy1234 avatar zhangxy1234 commented on September 25, 2024

@cadedaniel

from vllm.

cadedaniel avatar cadedaniel commented on September 25, 2024

Hi @zhangxy1234 . can you confirm something for me -- what is the max context len supported by your draft model ?

from vllm.

zhangxy1234 avatar zhangxy1234 commented on September 25, 2024

Hi @zhangxy1234 . can you confirm something for me -- what is the max context len supported by your draft model ?

draft model is 2048 and base model is 4096

when tp = 1 ,it will stop in 2048
but when tp >1 , it can not stop in 2048 but raise this error

2024-06-15 10:58:41.179 | CRITICAL | vllm.worker.worker:_execute_model_non_driver:303 - data {'num_seq_groups': 1, 'blocks_to_swap_in': tensor([], size=(0, 2), dtype=torch.int64), 'blocks_to_swap_out': tensor([], size=(0, 2), dtype=torch.int64), 'blocks_to_copy': tensor([], device='cuda:1', size=(0, 2), dtype=torch.int64)}
(RayWorkerWrapper pid=7582) 2024-06-15 10:58:41.180 | INFO | vllm.worker.worker:cache_swap:220 - cache_swap blocks_to_swap_in tensor([], size=(0, 2), dtype=torch.int64)
(RayWorkerWrapper pid=7582) 2024-06-15 10:58:41.193 | CRITICAL | vllm.worker.worker:_execute_model_non_driver:303 - data {'num_lookahead_slots': 5, 'disable_all_speculation': False}
(RayWorkerWrapper pid=7582) 2024-06-15 10:58:41.193 | INFO | vllm.worker.worker:cache_swap:220 - cache_swap blocks_to_swap_in None
(RayWorkerWrapper pid=7582) 2024-06-15 10:58:41.193 | ERROR | vllm.worker.worker_base:execute_method:148 - Error executing method start_worker_execution_loop. This might cause deadlock in distributed execution.
(RayWorkerWrapper pid=7582) Traceback (most recent call last):
(RayWorkerWrapper pid=7582)
(RayWorkerWrapper pid=7582) File "/home/ma-user/anaconda3/envs/PyTorch-2.0.0/lib/python3.9/site-packages/ray/_private/workers/default_worker.py", line 289, in
(RayWorkerWrapper pid=7582) worker.main_loop()
(RayWorkerWrapper pid=7582) β”‚ β”” <function Worker.main_loop at 0x7fd2a5357040>
(RayWorkerWrapper pid=7582) β”” <ray._private.worker.Worker object at 0x7fd2a5350670>
(RayWorkerWrapper pid=7582) File "/home/ma-user/anaconda3/envs/PyTorch-2.0.0/lib/python3.9/site-packages/ray/_private/worker.py", line 876, in main_loop
(RayWorkerWrapper pid=7582) self.core_worker.run_task_loop()
(RayWorkerWrapper pid=7582) β”‚ β”‚ β”” <method 'run_task_loop' of 'ray._raylet.CoreWorker' objects>
(RayWorkerWrapper pid=7582) β”‚ β”” <ray._raylet.CoreWorker object at 0x7fd2a42f5220>
(RayWorkerWrapper pid=7582) β”” <ray._private.worker.Worker object at 0x7fd2a5350670>
(RayWorkerWrapper pid=7582) File "/home/ma-user/anaconda3/envs/PyTorch-2.0.0/lib/python3.9/site-packages/ray/_private/function_manager.py", line 691, in actor_method_executor
(RayWorkerWrapper pid=7582) return method(__ray_actor, *args, **kwargs)
(RayWorkerWrapper pid=7582) β”‚ β”‚ β”” {}
(RayWorkerWrapper pid=7582) β”‚ β”” ('start_worker_execution_loop',)
(RayWorkerWrapper pid=7582) β”” <function WorkerWrapperBase.execute_method at 0x7fd2045aaa60>
(RayWorkerWrapper pid=7582) File "/home/ma-user/anaconda3/envs/PyTorch-2.0.0/lib/python3.9/site-packages/ray/util/tracing/tracing_helper.py", line 467, in _resume_span
(RayWorkerWrapper pid=7582) return method(self, *_args, **_kwargs)
(RayWorkerWrapper pid=7582) β”‚ β”‚ β”‚ β”” {}
(RayWorkerWrapper pid=7582) β”‚ β”‚ β”” ('start_worker_execution_loop',)
(RayWorkerWrapper pid=7582) β”‚ β”” <vllm.executor.ray_utils.RayWorkerWrapper object at 0x7fd2045ab760>
(RayWorkerWrapper pid=7582) β”” <function WorkerWrapperBase.execute_method at 0x7fd204720820>
(RayWorkerWrapper pid=7582)
(RayWorkerWrapper pid=7582) > File "vllm-main/vllm/worker/worker_base.py", line 140, in execute_method
(RayWorkerWrapper pid=7582) return executor(*args, **kwargs)
(RayWorkerWrapper pid=7582) β”‚ β”‚ β”” {}
(RayWorkerWrapper pid=7582) β”‚ β”” ()
(RayWorkerWrapper pid=7582) β”” <bound method SpecDecodeWorker.start_worker_execution_loop of <vllm.spec_decode.spec_decode_worker.SpecDecodeWorker object at...
(RayWorkerWrapper pid=7582)
(RayWorkerWrapper pid=7582) File "/home/ma-user/anaconda3/envs/PyTorch-2.0.0/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(RayWorkerWrapper pid=7582) return func(*args, **kwargs)
(RayWorkerWrapper pid=7582) β”‚ β”‚ β”” {}
(RayWorkerWrapper pid=7582) β”‚ β”” (<vllm.spec_decode.spec_decode_worker.SpecDecodeWorker object at 0x7fa40838c6a0>,)
(RayWorkerWrapper pid=7582) β”” <function SpecDecodeWorker.start_worker_execution_loop at 0x7fa4083899d0>
(RayWorkerWrapper pid=7582)
(RayWorkerWrapper pid=7582) File "vllm-main/vllm/spec_decode/spec_decode_worker.py", line 300, in start_worker_execution_loop
(RayWorkerWrapper pid=7582) while self._run_non_driver_rank():
(RayWorkerWrapper pid=7582) β”‚ β”” <function SpecDecodeWorker._run_non_driver_rank at 0x7fa408389d30>
(RayWorkerWrapper pid=7582) β”” <vllm.spec_decode.spec_decode_worker.SpecDecodeWorker object at 0x7fa40838c6a0>
(RayWorkerWrapper pid=7582)
(RayWorkerWrapper pid=7582) File "vllm-main/vllm/spec_decode/spec_decode_worker.py", line 369, in _run_non_driver_rank
(RayWorkerWrapper pid=7582) self.proposer_worker.execute_model()
(RayWorkerWrapper pid=7582) β”‚ β”‚ β”” <function Worker.execute_model at 0x7fa4083863a0>
(RayWorkerWrapper pid=7582) β”‚ β”” <vllm.spec_decode.multi_step_worker.MultiStepWorker object at 0x7fa4096df610>
(RayWorkerWrapper pid=7582) β”” <vllm.spec_decode.spec_decode_worker.SpecDecodeWorker object at 0x7fa40838c6a0>
(RayWorkerWrapper pid=7582)
(RayWorkerWrapper pid=7582) File "/home/ma-user/anaconda3/envs/PyTorch-2.0.0/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(RayWorkerWrapper pid=7582) return func(*args, **kwargs)
(RayWorkerWrapper pid=7582) β”‚ β”‚ β”” {}
(RayWorkerWrapper pid=7582) β”‚ β”” (<vllm.spec_decode.multi_step_worker.MultiStepWorker object at 0x7fa4096df610>,)
(RayWorkerWrapper pid=7582) β”” <function Worker.execute_model at 0x7fa408386310>
(RayWorkerWrapper pid=7582)
(RayWorkerWrapper pid=7582) File " vllm-main/vllm/worker/worker.py", line 236, in execute_model
(RayWorkerWrapper pid=7582) self._execute_model_non_driver()
(RayWorkerWrapper pid=7582) β”‚ β”” <function Worker._execute_model_non_driver at 0x7fa408386550>
(RayWorkerWrapper pid=7582) β”” <vllm.spec_decode.multi_step_worker.MultiStepWorker object at 0x7fa4096df610>
(RayWorkerWrapper pid=7582)
(RayWorkerWrapper pid=7582) File "vllm-main/vllm/worker/worker.py", line 311, in _execute_model_non_driver
(RayWorkerWrapper pid=7582) self.cache_swap(blocks_to_swap_in, blocks_to_swap_out, blocks_to_copy)
(RayWorkerWrapper pid=7582) β”‚ β”‚ β”‚ β”‚ β”” None
(RayWorkerWrapper pid=7582) β”‚ β”‚ β”‚ β”” None
(RayWorkerWrapper pid=7582) β”‚ β”‚ β”” None
(RayWorkerWrapper pid=7582) β”‚ β”” <function Worker.cache_swap at 0x7fa408386280>
(RayWorkerWrapper pid=7582) β”” <vllm.spec_decode.multi_step_worker.MultiStepWorker object at 0x7fa4096df610>
(RayWorkerWrapper pid=7582)
(RayWorkerWrapper pid=7582) File "vllm-main/vllm/worker/worker.py", line 223, in cache_swap
(RayWorkerWrapper pid=7582) if blocks_to_swap_in.numel() > 0:
(RayWorkerWrapper pid=7582) β”” None
(RayWorkerWrapper pid=7582)
(RayWorkerWrapper pid=7582) AttributeError: 'NoneType' object has no attribute 'numel'

@cadedaniel

from vllm.

njhill avatar njhill commented on September 25, 2024

@zhangxy1234 could you confirm whether you still encounter this error with the latest version of vLLM? (0.5.3.post1)

from vllm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.