Comments (6)
The triton version was installed incorrectly. you need to remove deepspeed and newt and try this:
!rm -rf /tmp/DeepSpeed
!pip install triton==0.2.3
cd /tmp && git clone https://github.com/microsoft/DeepSpeed.git && cd DeepSpeed/ && git checkout ff58fa7e5a4f637a21d11daad0192683fe50ed15 && pip uninstall -y typing && pip install cpufeature && DS_BUILD_CPU_ADAM=1 DS_BUILD_SPARSE_ATTN=1 /tmp/DeepSpeed/install.sh -n && pip install typing
pip install transformers==3.5.1
from ru-gpts.
Hi.
In colab we have default versions:
- torch 1.8.1+cu101
- CUDA Version: 11.2
But if we want to install extensions DS_BUILD_CPU_ADAM=1 DS_BUILD_SPARSE_ATTN=1 it is necessary that the version of the сuda, under which the python is compiled, and the version of the сuda in the system are the same.
See this code:
!git clone https://github.com/microsoft/DeepSpeed.git
%cd DeepSpeed/
!DS_BUILD_CPU_ADAM=1 DS_BUILD_SPARSE_ATTN=1 pip install -v --disable-pip-version-check --no-cache-dir ./
f"Installed CUDA version {sys_cuda_version} does not match the "
Exception: Installed CUDA version 11.0 does not match the version torch was compiled with 10.1, unable to compile cuda/cpp extensions without a matching cuda version.
DS_BUILD_OPS=0
We can install required package versions:
pip install torch==1.7.0+cu110 -f https://download.pytorch.org/whl/torch_stable.html
- and change string
export CUDA_HOME=/usr/local/cuda-11.0
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [YES] ...... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
sparse_attn ............ [YES] ...... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/usr/local/lib/python3.7/dist-packages/torch']
torch version .................... 1.7.0+cu110
torch cuda version ............... 11.0
nvcc version ..................... 11.0
deepspeed install path ........... ['/usr/local/lib/python3.7/dist-packages/deepspeed']
deepspeed info ................... 0.3.7, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.7, cuda 11.0
from ru-gpts.
@Ulitochka
Но потом вот эта строка !DS_BUILD_CPU_ADAM=1 DS_BUILD_SPARSE_ATTN=1 pip install deepspeed==0.3.7
переустанавливает torch:
Found existing installation: torch 1.7.0+cu110
Uninstalling torch-1.7.0+cu110:
Successfully uninstalled torch-1.7.0+cu110
Successfully installed deepspeed-0.3.7 ninja-1.10.2 tensorboardX-1.8 torch-1.9.0
В итоге:
DeepSpeed general environment info:
torch install path ............... ['/usr/local/lib/python3.7/dist-packages/torch']
torch version .................... 1.9.0+cu102
torch cuda version ............... 10.2
nvcc version ..................... 11.0
deepspeed install path ........... ['/usr/local/lib/python3.7/dist-packages/deepspeed']
deepspeed info ................... 0.3.7, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.7, cuda 11.0
И строка import deepspeed.ops.sparse_attention.sparse_attn_op
не выполняется:
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-21-2d9098395ca5> in <module>()
1 # And this cell should be run without errors
----> 2 import deepspeed.ops.sparse_attention.sparse_attn_op
ModuleNotFoundError: No module named 'deepspeed.ops.sparse_attention.sparse_attn_op'
from ru-gpts.
In fact, no big deal with torch uninstallation. Can use --no-dependencies key for deepspeed install. And maybe some deps to install separately.
from ru-gpts.
Regarding the "ImportError: cannot import name 'SAVE_STATE_WARNING'" :
Maybe it will be useful for someone, as I understood, this error is related with a newer video-cards and can be fixed by editing the trainer_pt_utils.py file with:
sudo nano /usr/local/lib/python3.7/dist-packages/transformers/trainer_pt_utils.py
and changing the:
SAVE_STATE_WARNING = ""
else:
from torch.optim.lr_scheduler import SAVE_STATE_WARNING
logger = logging.get_logger(__name__)
to:
SAVE_STATE_WARNING = ""
try:
from torch.optim.lr_scheduler import SAVE_STATE_WARNING
else:
SAVE_STATE_WARNING = ""
logger = logging.get_logger(__name__)
from ru-gpts.
Add fixes for last updates on colab for rugpt3xl
from ru-gpts.
Related Issues (20)
- describe carbon emission
- ruGPT3XL_generation.ipynb not working HOT 3
- Новость курс
- AssertionError: model parallel group is not initialized HOT 1
- The model requires `num_beams`, although it is not needed in the example HOT 3
- Ru-gpts for chit-chat bot HOT 2
- Прямая трансляция по apex legends HOT 1
- Games
- Correct data format for fine-tuning RUGPT3 models
- A
- The XL Model and the latest DeepSpeed
- Как настроить на вопрос\ответ? HOT 2
- Apackage missing HOT 2
- Hello
- Are there hardware requirements to execute the script? HOT 17
- Ускорение инференса rugpt3-large HOT 1
- Как embedding'и получить и какой они длины? HOT 1
- Unable to use RuGPT3FinetuneHF.ipynb Colab notebook HOT 1
- Link to code implementation is not available
- No "nvcc" utilite founded during environment installation HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ru-gpts.