Error trying to raise Mixtral private swarm server about petals HOT 13 CLOSED

Qessia commented on June 15, 2024

Error trying to raise Mixtral private swarm server

from petals.

Comments (13)

artek0chumak commented on June 15, 2024 1

Hello! Thank you for reporting!
We will quickly resolve this issue.

from petals.

artek0chumak commented on June 15, 2024 1

We resolved this issue in recent master update. Just pull new updates.
Thank tou for noticing the issue and waiting fixes.

from petals.

artek0chumak commented on June 15, 2024 1

Thank you for the information. It seems the only change required is this: #574.
We will soon merge it with the main.

from petals.

mprishchepo commented on June 15, 2024 1

Hi!
How is the work on the fixes going, is everything good?
We are really looking for the merge

from petals.

artek0chumak commented on June 15, 2024 1

Sorry for taking so long; the fix is merged into the master.

from petals.

mprishchepo commented on June 15, 2024

Hello!

I observe the same problem. I have tried to diagnose the issue a bit by myselve.

As I understood (if you haven't found it already) the problem is in calculating block size (its parameters). The layer_idx mentioned above is used in load_pretrained_block, but it is not used when calculating block_size and when calculating rps in throughput.

Very much waiting for a solution.

from petals.

Qessia commented on June 15, 2024

Thank you for your quick response!

from petals.

mprishchepo commented on June 15, 2024

Hi!
Original error of this issue doesn't appear anymore, but I've got another error when I try launching private swarm with Mixtral (with GPU, CPU is ok). Also it doesn't appear when I do the same with StableBeluga2

System:

Python3.10
Torch2.2.2
Cuda 12.3
Ubuntu 22.04

Reproduce

python3 -m petals.cli.run_server SanjiWatsuki/TinyMixtral-32x248M --new_swarm

Error

File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/qessia/.local/lib/python3.10/site-packages/petals/cli/run_server.py", line 235, in <module>
    main()
  File "/home/qessia/.local/lib/python3.10/site-packages/petals/cli/run_server.py", line 219, in main
    server = Server(
  File "/home/qessia/.local/lib/python3.10/site-packages/petals/server/server.py", line 237, in __init__
    throughput_info = get_server_throughput(
  File "/home/qessia/.local/lib/python3.10/site-packages/petals/server/throughput.py", line 83, in get_server_throughput
    cache[cache_key] = measure_throughput_info(
  File "/home/qessia/.local/lib/python3.10/site-packages/petals/server/throughput.py", line 123, in measure_throughput_info
    "inference_rps": measure_compute_rps(
  File "/home/qessia/.local/lib/python3.10/site-packages/petals/server/throughput.py", line 218, in measure_compute_rps
    cache = step(cache)
  File "/home/qessia/.local/lib/python3.10/site-packages/petals/server/throughput.py", line 215, in step
    outputs = block.forward(dummy_input, use_cache=inference, layer_past=cache_ if inference else None)
  File "/home/qessia/.local/lib/python3.10/site-packages/tensor_parallel/tensor_parallel.py", line 99, in forward
    return [self.module_shards[0](*args, **kwargs)][self.output_device_index]
  File "/home/qessia/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/qessia/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/qessia/.local/lib/python3.10/site-packages/petals/models/mixtral/block.py", line 74, in forward
    outputs = super().forward(
  File "/home/qessia/.local/lib/python3.10/site-packages/transformers/models/mixtral/modeling_mixtral.py", line 934, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/home/qessia/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/qessia/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/qessia/.local/lib/python3.10/site-packages/transformers/models/mixtral/modeling_mixtral.py", line 356, in forward
    key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
  File "/home/qessia/.local/lib/python3.10/site-packages/transformers/cache_utils.py", line 131, in update
    self.key_cache[layer_idx] = torch.cat([self.key_cache[layer_idx], key_states], dim=-2)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument tensors in method wrapper_CUDA_cat)

from petals.

artek0chumak commented on June 15, 2024

Hello!
This is a strange error. Can you also provide a transformers' version?

from petals.

mprishchepo commented on June 15, 2024

Can you also provide a transformers' version?

4.38.2

from petals.

jmikedupont2 commented on June 15, 2024

I had that same error on master as well and had a ticket open for it, #575

from petals.

jmikedupont2 commented on June 15, 2024

I was able to get the branch mentioned running and my docker work rebased.

Have now tinymixtral running locally in gpu.
https://github.com/meta-introspector/petals

from petals.

Qessia commented on June 15, 2024

Thank you for fixes!! It works

from petals.

Error trying to raise Mixtral private swarm server about petals HOT 13 CLOSED

Comments (13)

System:

Reproduce

Error

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent