Comments (5)
Hi @apoorvkh while in the past we have not required torch to be installed before installing DeepSpeed, it has always been required to run DeepSpeed. This "bug" is from a recent PR: #4547
It is problematic adding torch as a requirement to the project because pip will often install the CPU version rather than CUDA/ROCm versions.
We are discussing internally what the proper action is, but I suspect we will restore the previous behavior of not requiring torch at install time and do a 0.13.1 patch release with this update.
from deepspeed.
from deepspeed.
Understood, that would be great -- thank you!
from deepspeed.
This issue is caused by these two piece of reflection codes. The problem is the reflection code should not intend to iterate into directories belongs to other accelerator.
https://github.com/microsoft/DeepSpeed/blob/master/op_builder/__init__.py#L46
https://github.com/microsoft/DeepSpeed/blob/master/op_builder/all_ops.py#L22
There are two possible fixes, one is to skip directories belongs to accelerators with code like:
if module_name not in ['cpu', 'hpu', 'mps', 'npu', 'xpu']:
Another possible fix which is more graceful should be move CUDA OpBuilders into cuda/
directory in op_builder but his might need more global code change.
from deepspeed.
Everything is working well on my end -- thanks again @mrwyattii!
from deepspeed.
Related Issues (20)
- [REQUEST] Deepspeed support finetune extra large model with lora + pipeline ?
- [BUG] Fail to Resume From Checkpoint with Different GPU Number(Huggingface Trainer + Deepspeed) HOT 16
- [BUG] Mis-typed free_blocks
- [BUG] Gradient Accumulation Steps Initialization Bug in Pipeline Parallel Mode
- nv-ds-chat CI test failure HOT 1
- [BUG]Zero inference return bad result and low speed inference HOT 1
- When using pure DeepSpeed ulysses and zero stage 3 to continue pre-training, the loss gap between each GPU is too large.[BUG] HOT 2
- [BUG] AttributeError deepspeed.comm has no attribute Processgroup HOT 3
- [BUG] Tensors are on different devices when model.step() HOT 13
- Is there any solusion to overcome underflow issues? HOT 2
- [BUG] Trying to finetune mistral using deepspeed but running into an error: Error building extension 'cpu_adam' HOT 1
- [BUG] No `universal_checkpoint_info` in the Accelerate+Deepspeed Checkpoint HOT 6
- nv-nightly CI test failure HOT 1
- [BUG] (flops_profiler) Duplicate registration check for start_time_hook is not working
- [BUG: Whisper model pipeline parallel training] logits and ground truth size mismatch during loss calculation
- [Q&A] Why Deepspeed Ulysses could support long sequence length?
- Why not save frozen params unless: `self.zero_optimization_stage() >= ZeroStageEnum.gradients`? HOT 2
- [REQUEST]I do not understand the meaning of ' reduction ' in the ZERO++ paper.
- Deepspeed module not being able to install in the WSL environment HOT 2
- Cannot create wheel for version 0.14.2 on Windows HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deepspeed.