Comments (6)
@will-cromar FYI, @wonjoolee95 too since you are fixing the similar issue for our gpu whls
from xla.
This is helpful, thanks for the info! I'm able to reproduce:
# Fails
wonjoo@t1v-n-b72eb559-w-0:~$ pip install https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-nightly-cp311-cp311-linux_x86_64.whl
Defaulting to user installation because normal site-packages is not writeable
ERROR: Invalid requirement: 'torch-xla==nightly': Expected end or semicolon (after name and no valid version specifier)
torch-xla==nightly
^
# Works
wonjoo@t1v-n-b72eb559-w-0:~$ pip install "pip<24"
Defaulting to user installation because normal site-packages is not writeable
Collecting pip<24
Downloading pip-23.3.2-py3-none-any.whl.metadata (3.5 kB)
Downloading pip-23.3.2-py3-none-any.whl (2.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 34.0 MB/s eta 0:00:00
WARNING: Error parsing dependencies of distro-info: Invalid version: '1.1build1'
WARNING: Error parsing dependencies of python-debian: Invalid version: '0.1.43ubuntu1'
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 24.1.2
Uninstalling pip-24.1.2:
Successfully uninstalled pip-24.1.2
WARNING: The scripts pip, pip3 and pip3.10 are installed in '/home/wonjoo/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed pip-23.3.2
I think it's better if we do pip install "pip<24"
to fix our GPU wheels asap, and then come up with a more long term solution. @will-cromar, do you know where would be the correct place to have this pip install "pip<24"
command in our /infra
files?
from xla.
Is this issue actually what's causing our build breakage? Why are the TPU builds passing but not the GPU builds? The most recent failures I see there are this:
Step #2 - "build_xla_docker_image": ERROR: An error occurred during the fetch of repository 'go_sdk':
Step #2 - "build_xla_docker_image": Traceback (most recent call last):
Step #2 - "build_xla_docker_image": File "/root/.cache/bazel/_bazel_root/2ba57cc32d8c1f12152416615363d16d/external/io_bazel_rules_go/go/private/sdk.bzl", line 101, column 16, in _go_download_sdk_impl
Step #2 - "build_xla_docker_image": _remote_sdk(ctx, [url.format(filename) for url in ctx.attr.urls], ctx.attr.strip_prefix, sha256)
Step #2 - "build_xla_docker_image": File "/root/.cache/bazel/_bazel_root/2ba57cc32d8c1f12152416615363d16d/external/io_bazel_rules_go/go/private/sdk.bzl", line 209, column 21, in _remote_sdk
Step #2 - "build_xla_docker_image": ctx.download(
Step #2 - "build_xla_docker_image": Error in download: java.io.IOException: Error downloading [https://dl.google.com/go/go1.18.4.linux-amd64.tar.gz] to /root/.cache/bazel/_bazel_root/2ba57cc32d8c1f12152416615363d16d/external/go_sdk/go_sdk.tar.gz: Bytes read 127925296 but wanted 141812725
Step #2 - "build_xla_docker_image": ERROR: /src/pytorch/xla/WORKSPACE:136:15: fetching _go_download_sdk rule //external:go_sdk: Traceback (most recent call last):
Step #2 - "build_xla_docker_image": File "/root/.cache/bazel/_bazel_root/2ba57cc32d8c1f12152416615363d16d/external/io_bazel_rules_go/go/private/sdk.bzl", line 101, column 16, in _go_download_sdk_impl
Step #2 - "build_xla_docker_image": _remote_sdk(ctx, [url.format(filename) for url in ctx.attr.urls], ctx.attr.strip_prefix, sha256)
Step #2 - "build_xla_docker_image": File "/root/.cache/bazel/_bazel_root/2ba57cc32d8c1f12152416615363d16d/external/io_bazel_rules_go/go/private/sdk.bzl", line 209, column 21, in _remote_sdk
Step #2 - "build_xla_docker_image": ctx.download(
Step #2 - "build_xla_docker_image": Error in download: java.io.IOException: Error downloading [https://dl.google.com/go/go1.18.4.linux-amd64.tar.gz] to /root/.cache/bazel/_bazel_root/2ba57cc32d8c1f12152416615363d16d/external/go_sdk/go_sdk.tar.gz: Bytes read 127925296 but wanted 141812725
Step #2 - "build_xla_docker_image": ERROR: Analysis of target '//:_XLAC.so' failed; build aborted: java.io.IOException: Error downloading [https://dl.google.com/go/go1.18.4.linux-amd64.tar.gz] to /root/.cache/bazel/_bazel_root/2ba57cc32d8c1f12152416615363d16d/external/go_sdk/go_sdk.tar.gz: Bytes read 127925296 but wanted 141812725
Even if we can hack our build, this is a client issue. Nobody who updated their pip
recently would be able to install our wheels, because the rename we're doing is no longer actually valid.
The build version we set is defined by some combination of these environment variables: https://github.com/pytorch/xla/blob/master/infra/ansible/config/env.yaml
I think TORCH_XLA_VERSION
and GIT_VERSIONED_XLA_BUILD
are the important ones, but you'll have to review setup.py
to see how we set version
exactly. That version name is probably still valid like torch_xla-2.5.0+git41d998d
. The problem is, we rename the wheels with the nightly date here:
xla/infra/ansible/roles/build_srcs/tasks/main.yaml
Lines 74 to 89 in 44f88a9
We need to at least change that rename to one of the valid patterns like @fellhorn suggested or copy the pattern used by torch (e.g. torch-X.Y.Z.devYYYYMMDD
)
You can dry run the ansible workflow with a command like this one:
Anything that gets written to /dist
is what we will upload to GCS.
from xla.
@zpcore can you made the rename logic that @will-cromar mentioned above since you are offcall this week? It should just be a one line change but then we need to update README to reflect the new format.
from xla.
Hello all
As a general comment:
When users find errors in pytorch-xla developers fix it in nightly releases and ask the users to test them.
But generating an environment with compatible torch-xla, torch, and torch vision is not straightforward as told here.
This issue is one example of it.
I hope you provide a better way for users to test the nightly updates easily.
from xla.
Hello all
As a general comment:
When users find errors in pytorch-xla developers fix it in nightly releases and ask the users to test them. But generating an environment with compatible torch-xla, torch, and torch vision is not straightforward as told here.
This issue is one example of it. I hope you provide a better way for users to test the nightly updates easily.
Thanks for the feedback, I think we are missing to provide example commands to install compatible torch, torch[vision,audio], torch_xla for the cuda. We will make the document update. For now, you can use e.g.,:
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121
pip3 install https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-nightly-cp310-cp310-linux_x86_64.whl
In general, this should be compatible.
from xla.
Related Issues (20)
- Equivalent of get_worker_info to split an IterableDataset HOT 18
- Is there any way to directly execute the cached computational graph HOT 5
- Op info test for `T .. arange` HOT 1
- CUDA and GPU-Flavoured Docker/Container Image Missing CUDA Support HOT 1
- Graph dump to optimize HOT 9
- How to test on a subset of TPUs in a TPU Pod HOT 7
- Failed to import torch_xla by following the GPU instructions on an H100 node (A3-High) HOT 1
- Iteration of MpDeviceLoader doesn't work HOT 1
- Improve device auto-detection HOT 2
- libtpu not installed with nightly build HOT 4
- PyTorch/XLA usability progress tracking
- inconsistency in calling `get_ordinal` and `world_size` calls HOT 2
- Effectively manage API usability changes
- Make `torch_xla.launch` work transparently in notebooks
- Support portable executables in `torch_xla.launch`
- `xmp.spawn(_mp_fn, nprocs=1)` failure HOT 4
- Device init before `xmp.spawn()` HOT 3
- Does PyTorch/XLA nightly provide GPU support? HOT 3
- introduce torch.tpu.is_available() HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from xla.