Giter VIP home page Giter VIP logo

Comments (6)

king-menin avatar king-menin commented on July 24, 2024 1

The triton version was installed incorrectly. you need to remove deepspeed and newt and try this:

!rm -rf /tmp/DeepSpeed

!pip install triton==0.2.3

cd /tmp && git clone https://github.com/microsoft/DeepSpeed.git && cd DeepSpeed/ && git checkout ff58fa7e5a4f637a21d11daad0192683fe50ed15 && pip uninstall -y typing && pip install cpufeature && DS_BUILD_CPU_ADAM=1 DS_BUILD_SPARSE_ATTN=1 /tmp/DeepSpeed/install.sh -n && pip install typing

pip install transformers==3.5.1

from ru-gpts.

Ulitochka avatar Ulitochka commented on July 24, 2024

Hi.

In colab we have default versions:

  • torch 1.8.1+cu101
  • CUDA Version: 11.2
    But if we want to install extensions DS_BUILD_CPU_ADAM=1 DS_BUILD_SPARSE_ATTN=1 it is necessary that the version of the сuda, under which the python is compiled, and the version of the сuda in the system are the same.

See this code:

!git clone https://github.com/microsoft/DeepSpeed.git
%cd DeepSpeed/
!DS_BUILD_CPU_ADAM=1 DS_BUILD_SPARSE_ATTN=1 pip install -v --disable-pip-version-check --no-cache-dir ./

        f"Installed CUDA version {sys_cuda_version} does not match the "
    Exception: Installed CUDA version 11.0 does not match the version torch was compiled with 10.1, unable to compile cuda/cpp extensions without a matching cuda version.
    DS_BUILD_OPS=0

We can install required package versions:

  • pip install torch==1.7.0+cu110 -f https://download.pytorch.org/whl/torch_stable.html
  • and change string export CUDA_HOME=/usr/local/cuda-11.0
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [YES] ...... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
sparse_attn ............ [YES] ...... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/usr/local/lib/python3.7/dist-packages/torch']
torch version .................... 1.7.0+cu110
torch cuda version ............... 11.0
nvcc version ..................... 11.0
deepspeed install path ........... ['/usr/local/lib/python3.7/dist-packages/deepspeed']
deepspeed info ................... 0.3.7, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.7, cuda 11.0

from ru-gpts.

Artyrm avatar Artyrm commented on July 24, 2024

@Ulitochka
Но потом вот эта строка !DS_BUILD_CPU_ADAM=1 DS_BUILD_SPARSE_ATTN=1 pip install deepspeed==0.3.7 переустанавливает torch:

Found existing installation: torch 1.7.0+cu110
    Uninstalling torch-1.7.0+cu110:
      Successfully uninstalled torch-1.7.0+cu110
Successfully installed deepspeed-0.3.7 ninja-1.10.2 tensorboardX-1.8 torch-1.9.0

В итоге:

DeepSpeed general environment info:
torch install path ............... ['/usr/local/lib/python3.7/dist-packages/torch']
torch version .................... 1.9.0+cu102
torch cuda version ............... 10.2
nvcc version ..................... 11.0
deepspeed install path ........... ['/usr/local/lib/python3.7/dist-packages/deepspeed']
deepspeed info ................... 0.3.7, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.7, cuda 11.0

И строка import deepspeed.ops.sparse_attention.sparse_attn_op не выполняется:

ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-21-2d9098395ca5> in <module>()
      1 # And this cell should be run without errors
----> 2 import deepspeed.ops.sparse_attention.sparse_attn_op

ModuleNotFoundError: No module named 'deepspeed.ops.sparse_attention.sparse_attn_op'

from ru-gpts.

Artyrm avatar Artyrm commented on July 24, 2024

In fact, no big deal with torch uninstallation. Can use --no-dependencies key for deepspeed install. And maybe some deps to install separately.

from ru-gpts.

ITV1 avatar ITV1 commented on July 24, 2024

Regarding the "ImportError: cannot import name 'SAVE_STATE_WARNING'" :
Maybe it will be useful for someone, as I understood, this error is related with a newer video-cards and can be fixed by editing the trainer_pt_utils.py file with:

sudo nano /usr/local/lib/python3.7/dist-packages/transformers/trainer_pt_utils.py

and changing the:

       SAVE_STATE_WARNING = ""
else:
       from torch.optim.lr_scheduler import SAVE_STATE_WARNING

logger = logging.get_logger(__name__)

to:

      SAVE_STATE_WARNING = ""
try:
      from torch.optim.lr_scheduler import SAVE_STATE_WARNING
else:
      SAVE_STATE_WARNING = ""       
      
logger = logging.get_logger(__name__)

from ru-gpts.

king-menin avatar king-menin commented on July 24, 2024

Add fixes for last updates on colab for rugpt3xl

from ru-gpts.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.