Comments (13)
Hello! Thank you for reporting!
We will quickly resolve this issue.
from petals.
We resolved this issue in recent master update. Just pull new updates.
Thank tou for noticing the issue and waiting fixes.
from petals.
Thank you for the information. It seems the only change required is this: #574.
We will soon merge it with the main.
from petals.
Hi!
How is the work on the fixes going, is everything good?
We are really looking for the merge
from petals.
Sorry for taking so long; the fix is merged into the master.
from petals.
Hello!
I observe the same problem. I have tried to diagnose the issue a bit by myselve.
As I understood (if you haven't found it already) the problem is in calculating block size (its parameters). The layer_idx mentioned above is used in load_pretrained_block, but it is not used when calculating block_size and when calculating rps in throughput.
Very much waiting for a solution.
from petals.
Thank you for your quick response!
from petals.
Hi!
Original error of this issue doesn't appear anymore, but I've got another error when I try launching private swarm with Mixtral (with GPU, CPU is ok). Also it doesn't appear when I do the same with StableBeluga2
System:
- Python3.10
- Torch2.2.2
- Cuda 12.3
- Ubuntu 22.04
Reproduce
python3 -m petals.cli.run_server SanjiWatsuki/TinyMixtral-32x248M --new_swarm
Error
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/qessia/.local/lib/python3.10/site-packages/petals/cli/run_server.py", line 235, in <module>
main()
File "/home/qessia/.local/lib/python3.10/site-packages/petals/cli/run_server.py", line 219, in main
server = Server(
File "/home/qessia/.local/lib/python3.10/site-packages/petals/server/server.py", line 237, in __init__
throughput_info = get_server_throughput(
File "/home/qessia/.local/lib/python3.10/site-packages/petals/server/throughput.py", line 83, in get_server_throughput
cache[cache_key] = measure_throughput_info(
File "/home/qessia/.local/lib/python3.10/site-packages/petals/server/throughput.py", line 123, in measure_throughput_info
"inference_rps": measure_compute_rps(
File "/home/qessia/.local/lib/python3.10/site-packages/petals/server/throughput.py", line 218, in measure_compute_rps
cache = step(cache)
File "/home/qessia/.local/lib/python3.10/site-packages/petals/server/throughput.py", line 215, in step
outputs = block.forward(dummy_input, use_cache=inference, layer_past=cache_ if inference else None)
File "/home/qessia/.local/lib/python3.10/site-packages/tensor_parallel/tensor_parallel.py", line 99, in forward
return [self.module_shards[0](*args, **kwargs)][self.output_device_index]
File "/home/qessia/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/qessia/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/qessia/.local/lib/python3.10/site-packages/petals/models/mixtral/block.py", line 74, in forward
outputs = super().forward(
File "/home/qessia/.local/lib/python3.10/site-packages/transformers/models/mixtral/modeling_mixtral.py", line 934, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/home/qessia/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/qessia/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/qessia/.local/lib/python3.10/site-packages/transformers/models/mixtral/modeling_mixtral.py", line 356, in forward
key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
File "/home/qessia/.local/lib/python3.10/site-packages/transformers/cache_utils.py", line 131, in update
self.key_cache[layer_idx] = torch.cat([self.key_cache[layer_idx], key_states], dim=-2)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument tensors in method wrapper_CUDA_cat)
from petals.
Hello!
This is a strange error. Can you also provide a transformers' version?
from petals.
Can you also provide a transformers' version?
4.38.2
from petals.
I had that same error on master as well and had a ticket open for it, #575
from petals.
I was able to get the branch mentioned running and my docker work rebased.
Have now tinymixtral running locally in gpu.
https://github.com/meta-introspector/petals
from petals.
Thank you for fixes!! It works
from petals.
Related Issues (20)
- ImportError: cannot import name 'AutoDistributedModelForCausalLM' from partially initialized module 'petals' (most likely due to a circular import) HOT 3
- Add "Podman" usage to the documentation
- Is there any plan to support MoE models like Mixtral8×7B? HOT 5
- Error when trying to launch private swarm using locally stored model
- Can not use direct server-to-server communication HOT 6
- Reachability Issue for private swarm HOT 1
- Feature Request: Distributed inference API where the "miners" are paid. HOT 2
- content of 'labels' when doing prompt tuning of llama-2 on QA
- Reachability issue while connecting to private swarm HOT 1
- Grok | Mixture-of-Experts | Model Support HOT 3
- Latest bump "Bump transformers and accelerate versions (#554)" looks to destroy Falcon support. HOT 5
- compile to webassembly HOT 3
- Is there a way to shard a model without downloading it first? HOT 2
- DynamicCache and Beam Search
- Manual management of shards
- RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument tensors in method wrapper_CUDA_cat) HOT 2
- Error with PyTorch 2.3.0: Missing '_refresh_per_optimizer_state' in 'torch.cuda.amp.grad_scaler'
- LLama-3-70B support HOT 3
- System_prompt HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from petals.