Your current environment vllm: v0.4.1 GPU : V100

[Bug]: v0.4.1 The output results of the MoE kinds models are incorrect on the V100 about vllm HOT 5 OPEN

keyword1983 commented on June 8, 2024 1

[Bug]: v0.4.1 The output results of the MoE kinds models are incorrect on the V100

from vllm.

Comments (5)

pcmoritz commented on June 8, 2024

Could you do a bisection of the commits between 0.4.0 and 0.4.1 to pinpoint which is the commit that caused the issue? There is a possibility that this is fixed by #4463, but if it isn't, knowing the exact commit will help us to figure out what is going on :)

from vllm.

keyword1983 commented on June 8, 2024

Could you do a bisection of the commits between 0.4.0 and 0.4.1 to pinpoint which is the commit that caused the issue? There is a possibility that this is fixed by #4463, but if it isn't, knowing the exact commit will help us to figure out what is going on :)

hi i will do pinpoint later. there is another infomation.
my launch cmd is

python3 -m vllm.entrypoints.openai.api_server --port 8000 --model /models/mixtral-8x7b/ --tensor-parallel-size 8 --max-num-batched-toke
ns 32768 --max-model-len 8192 --gpu-memory-utilization 0.9 --dtype float16

but if --dtype float16 change to --dtype float32 , inference result is fine.

from vllm.

keyword1983 commented on June 8, 2024

Could you do a bisection of the commits between 0.4.0 and 0.4.1 to pinpoint which is the commit that caused the issue? There is a possibility that this is fixed by #4463, but if it isn't, knowing the exact commit will help us to figure out what is going on :)

and i tested #4463 #4517 , didnt work.

from vllm.

keyword1983 commented on June 8, 2024

Could you do a bisection of the commits between 0.4.0 and 0.4.1 to pinpoint which is the commit that caused the issue? There is a possibility that this is fixed by #4463, but if it isn't, knowing the exact commit will help us to figure out what is going on :)

#3805 is the commit causes the issue.

from vllm.

keyword1983 commented on June 8, 2024

further experiment shows it happens on pytorch version > 2.1.2

from vllm.

Recommend Projects