Giter VIP home page Giter VIP logo

Comments (13)

artek0chumak avatar artek0chumak commented on June 15, 2024 1

Hello! Thank you for reporting!
We will quickly resolve this issue.

from petals.

artek0chumak avatar artek0chumak commented on June 15, 2024 1

We resolved this issue in recent master update. Just pull new updates.
Thank tou for noticing the issue and waiting fixes.

from petals.

artek0chumak avatar artek0chumak commented on June 15, 2024 1

Thank you for the information. It seems the only change required is this: #574.
We will soon merge it with the main.

from petals.

mprishchepo avatar mprishchepo commented on June 15, 2024 1

Hi!
How is the work on the fixes going, is everything good?
We are really looking for the merge

from petals.

artek0chumak avatar artek0chumak commented on June 15, 2024 1

Sorry for taking so long; the fix is merged into the master.

from petals.

mprishchepo avatar mprishchepo commented on June 15, 2024

Hello!

I observe the same problem. I have tried to diagnose the issue a bit by myselve.

As I understood (if you haven't found it already) the problem is in calculating block size (its parameters). The layer_idx mentioned above is used in load_pretrained_block, but it is not used when calculating block_size and when calculating rps in throughput.

Very much waiting for a solution.

from petals.

Qessia avatar Qessia commented on June 15, 2024

Thank you for your quick response!

from petals.

mprishchepo avatar mprishchepo commented on June 15, 2024

Hi!
Original error of this issue doesn't appear anymore, but I've got another error when I try launching private swarm with Mixtral (with GPU, CPU is ok). Also it doesn't appear when I do the same with StableBeluga2

System:

  • Python3.10
  • Torch2.2.2
  • Cuda 12.3
  • Ubuntu 22.04

Reproduce

python3 -m petals.cli.run_server SanjiWatsuki/TinyMixtral-32x248M --new_swarm

Error

File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/qessia/.local/lib/python3.10/site-packages/petals/cli/run_server.py", line 235, in <module>
    main()
  File "/home/qessia/.local/lib/python3.10/site-packages/petals/cli/run_server.py", line 219, in main
    server = Server(
  File "/home/qessia/.local/lib/python3.10/site-packages/petals/server/server.py", line 237, in __init__
    throughput_info = get_server_throughput(
  File "/home/qessia/.local/lib/python3.10/site-packages/petals/server/throughput.py", line 83, in get_server_throughput
    cache[cache_key] = measure_throughput_info(
  File "/home/qessia/.local/lib/python3.10/site-packages/petals/server/throughput.py", line 123, in measure_throughput_info
    "inference_rps": measure_compute_rps(
  File "/home/qessia/.local/lib/python3.10/site-packages/petals/server/throughput.py", line 218, in measure_compute_rps
    cache = step(cache)
  File "/home/qessia/.local/lib/python3.10/site-packages/petals/server/throughput.py", line 215, in step
    outputs = block.forward(dummy_input, use_cache=inference, layer_past=cache_ if inference else None)
  File "/home/qessia/.local/lib/python3.10/site-packages/tensor_parallel/tensor_parallel.py", line 99, in forward
    return [self.module_shards[0](*args, **kwargs)][self.output_device_index]
  File "/home/qessia/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/qessia/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/qessia/.local/lib/python3.10/site-packages/petals/models/mixtral/block.py", line 74, in forward
    outputs = super().forward(
  File "/home/qessia/.local/lib/python3.10/site-packages/transformers/models/mixtral/modeling_mixtral.py", line 934, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/home/qessia/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/qessia/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/qessia/.local/lib/python3.10/site-packages/transformers/models/mixtral/modeling_mixtral.py", line 356, in forward
    key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
  File "/home/qessia/.local/lib/python3.10/site-packages/transformers/cache_utils.py", line 131, in update
    self.key_cache[layer_idx] = torch.cat([self.key_cache[layer_idx], key_states], dim=-2)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument tensors in method wrapper_CUDA_cat)

from petals.

artek0chumak avatar artek0chumak commented on June 15, 2024

Hello!
This is a strange error. Can you also provide a transformers' version?

from petals.

mprishchepo avatar mprishchepo commented on June 15, 2024

Can you also provide a transformers' version?

4.38.2

from petals.

jmikedupont2 avatar jmikedupont2 commented on June 15, 2024

I had that same error on master as well and had a ticket open for it, #575

from petals.

jmikedupont2 avatar jmikedupont2 commented on June 15, 2024

I was able to get the branch mentioned running and my docker work rebased. Screenshot_20240416_140359_Termux.jpg

Have now tinymixtral running locally in gpu.
https://github.com/meta-introspector/petals

from petals.

Qessia avatar Qessia commented on June 15, 2024

Thank you for fixes!! It works

from petals.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.