aclex / pytorch-ebuild Goto Github PK
View Code? Open in Web Editor NEWEbuild infrastructure files for PyTorch and some related projects
License: GNU General Public License v2.0
Ebuild infrastructure files for PyTorch and some related projects
License: GNU General Public License v2.0
Hi, thanks a lot for the ebuild of torchvision!
When trying to emerge, I am getting this error during the compile phase:
/var/tmp/portage/sci-libs/torchvision-0.8.1/temp/environment: line 2310: 332 Segmentation fault "${@}"
* ERROR: sci-libs/torchvision-0.8.1::aclex-pytorch failed (compile phase):
* (no error message)
*
* Call stack:
* ebuild.sh, line 125: Called src_compile
* environment, line 3936: Called distutils-r1_src_compile
* environment, line 1839: Called _distutils-r1_run_foreach_impl 'distutils-r1_python_compile'
* environment, line 652: Called python_foreach_impl 'distutils-r1_run_phase' 'distutils-r1_python_compile'
* environment, line 3481: Called multibuild_foreach_variant '_python_multibuild_wrapper' 'distutils-r1_run_phase' 'distutils-r1_python_compile'
* environment, line 2947: Called _multibuild_run '_python_multibuild_wrapper' 'distutils-r1_run_phase' 'distutils-r1_python_compile'
* environment, line 2945: Called _python_multibuild_wrapper 'distutils-r1_run_phase' 'distutils-r1_python_compile'
* environment, line 1069: Called distutils-r1_run_phase 'distutils-r1_python_compile'
* environment, line 1830: Called distutils-r1_python_compile
* environment, line 1699: Called esetup.py 'build' '-j' '1'
* environment, line 2317: Called die
* The specific snippet of code:
* "${@}" || die "${die_args[@]}";
The full log can be found here: build.log
When emerging torchvision-0.12.0, I get the following error:
RuntimeError: Unable to find torch_shm_manager at /usr/lib/python3.9/site-packages/torch/bin/torch_shm_manager
* ERROR: sci-libs/torchvision-0.12.0::aclex-pytorch failed (compile phase)
Indeed, torch_shm_manager
is not at that location:
# updatedb
# locate torch_shm_manager
/usr/bin/torch_shm_manager
# equery b torch_shm_manager
* Searching for torch_shm_manager ...
sci-libs/pytorch-1.11.0 (/usr/bin/torch_shm_manager)
I do not know whether this is an issue on my side or with one of the ebuilds from this repository.
This error I got when I try to build any version of torchvision (tried with 0.6.1 and 0.7.0):
`hackenherr:torchvision jorgicio% sudo ebuild torchvision-0.6.1.ebuild compile [11:27:33]
Contraseña:
Appending /home/jorgicio/EbuildsGentoo/pytorch to PORTDIR_OVERLAY...
Unpacking source...
Unpacking torchvision-0.6.1.tar.gz to /var/calculate/tmp/portage/sci-libs/torchvision-0.6.1/work
Source unpacked in /var/calculate/tmp/portage/sci-libs/torchvision-0.6.1/work
Preparing source in /var/calculate/tmp/portage/sci-libs/torchvision-0.6.1/work/vision-0.6.1 ...
Source prepared.
Configuring source in /var/calculate/tmp/portage/sci-libs/torchvision-0.6.1/work/vision-0.6.1 ...
-- Configuring done
CMake Error in CMakeLists.txt:
Imported target "torch" includes non-existent path
"/var/calculate/tmp/portage/sci-libs/pytorch-1.6.0/work/pytorch-1.6.0/cmake/../third_party/pybind11/include"
in its INTERFACE_INCLUDE_DIRECTORIES. Possible reasons include:
The path was deleted, renamed, or moved to another location.
An install or uninstall procedure did not complete successfully.
The installation package was faulty and references files it does not
provide.
-- Generating done
CMake Generate step failed. Build files cannot be regenerated correctly.
ebuild.sh, line 125: Called src_configure
"${CMAKE_BINARY}" "${cmakeargs[@]}" "${CMAKE_USE_DIR}" || die "cmake failed";
emerge --info '=sci-libs/torchvision-0.6.1::pytorch'
,emerge -pqv '=sci-libs/torchvision-0.6.1::pytorch'
.This happens with pytorch 1.6.0 using the same patches as you used.
Thanks in advance.
When compiling pytorch with the useflag 'rocm', there is an invalid reference to 'dev-util/amd-rocm-meta'. perhaps that should be rocm-opencl-runtime?
https://packages.gentoo.org/packages/dev-libs/rocm-opencl-runtime
Hi @aclex, have you tried compiling pytorch with a newer gcc version?
For me it fails with x86_64-pc-linux-gnu-11.2.0:
Hi!
I tried your ebuild and it builds fine and pytorch module imports without any issue.
However, when I want to try any of the convert tools (onnx-to-caffe2 and caffe2-to-onnx), this happens:
``hackenherr ~ # convert-onnx-to-caffe2 [12:06:03]
Traceback (most recent call last):
File "/usr/bin/convert-onnx-to-caffe2", line 11, in
load_entry_point('torch==1.5.0a0+3c31d73', 'console_scripts', 'convert-onnx-to-caffe2')()
File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 489, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 2852, in load_entry_point
return ep.load()
File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 2443, in load
return self.resolve()
File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 2449, in resolve
module = import(self.module_name, fromlist=['name'], level=0)
File "/usr/lib/python3.7/site-packages/caffe2/python/init.py", line 2, in
from caffe2.proto import caffe2_pb2
File "/usr/lib/python3.7/site-packages/caffe2/proto/init.py", line 11, in
from caffe2.proto import caffe2_pb2, metanet_pb2, torch_pb2
ImportError: cannot import name 'caffe2_pb2' from 'caffe2.proto' (/usr/lib/python3.7/site-packages/caffe2/proto/init.py)
hackenherr ~ # convert-caffe2-to-onnx [12:06:06]
Traceback (most recent call last):
File "/usr/bin/convert-caffe2-to-onnx", line 11, in
load_entry_point('torch==1.5.0a0+3c31d73', 'console_scripts', 'convert-caffe2-to-onnx')()
File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 489, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 2852, in load_entry_point
return ep.load()
File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 2443, in load
return self.resolve()
File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 2449, in resolve
module = import(self.module_name, fromlist=['name'], level=0)
File "/usr/lib/python3.7/site-packages/caffe2/python/init.py", line 2, in
from caffe2.proto import caffe2_pb2
File "/usr/lib/python3.7/site-packages/caffe2/proto/init.py", line 11, in
from caffe2.proto import caffe2_pb2, metanet_pb2, torch_pb2
ImportError: cannot import name 'caffe2_pb2' from 'caffe2.proto' (/usr/lib/python3.7/site-packages/caffe2/proto/init.py)
``
Tried both your ebuild as a based on the science overlay and the issue is the same.
May I miss something?
Thanks!
PYTHON_COMPAT=( python3_{7,8,9} ) in Some of the ebuilds need updating to include the latest Python versions used in Gentoo
I get an error building sci-libs/onnx-1.7.0::aclex-pytorch. It is not exactly clear to me what the underlying issue is, but it seems related to a protobuf upgrade in ::gentoo. In attachment, I put the typical log files. I hope you are able to find out what is wrong.
Hi Alexey,
I have read your ebuild
and it looks great. Are you interested in maintaining it in the Gentoo Science Overlay to let more users benefit from your work? If you are interested, please start from sending a pull request to us. Your ebuild quality is high. I believe it won't take too long to grant you the commit access to the science overlay.
I regret that I have made a pytorch ebuild by myself just before discovering your ebuild. Luckily mine is focused on CUDA, so our works are complementary.
Yours,
Benda
/usr/lib/gcc/x86_64-pc-linux-gnu/9.2.0/include/g++-v9/x86_64-pc-linux-gnu/bits/c++config.h:273:27: error: #if with no expression
273 | #if _GLIBCXX_USE_CXX11_ABI
| ^
feels bad, are there patches for it in upstream repo?
Hi @aclex, would you like to add your ebuilds to the gentoo science overlay? They could simply be proposed via a pull request. This way your ebuilds would gain visibility and support.
Also, the pytorch ebuild on the gentoo science seems to have some issues.
Please add the ebuild for pytorch 2.0
what's that:
emerge: there are no ebuilds built with USE flags to satisfy "dev-python/typing[python_targets_python3_7(-)?,-python_single_target_python3_7(-)]".
!!! One of the following packages is required to complete your request:
- sci-libs/pytorch-1.4.0::aclex-pytorch (Change USE: -python_targets_python3_7, this change violates use flag constraints defined by sci-libs/pytorch-1.4.0: 'python? ( any-of ( python_targets_python3_7 ) ) numpy? ( python ) atlas? ( !eigen !mkl !openblas ) eigen? ( !atlas !mkl !openblas ) mkl? ( !atlas !eigen !openblas ) openblas? ( !atlas !eigen !mkl ) rocm? ( !mkldnn !cuda )')
(dependency required by "sci-libs/pytorch-1.4.0::aclex-pytorch" [ebuild])
(dependency required by "=pytorch-1.4.0" [argument])
hi
When installing the latest version, I eventually get an error that the doc flag is expected in IUSE. After adding it back in (I saw you actually removed it a week ago), I could emerge pytorch as expected. If need be, I can provide error logs.
Also, I saw that in the official repository the pytorch package has been added, and has been worked on recently (https://github.com/gentoo/gentoo/search?q=pytorch&type=commits). Perhaps a good time to reach out to the maintainer to see if you could contribute to the main repo as well? Looks like your ebuild has more features so they may be interested.
Trying to compile pytorch-1.11.0
I get the errors below.
My USE
flags are: fbgemm ffmpeg gloo mpi nnpack numpy observers opencv openmp python qnnpack rocm tools -asan -atlas -caffe2 -cuda -doc -eigen -gflags -glog -leveldb -lmdb -mkl -mkldnn -namedtensor -numa -openblas -opencl -redis -static -tbb -test -zeromq
Trying to compile with rocm
for an AMD card, with gcc-11.3.0p4 on an amd64 system.
Do you see anything obvious that'd explain the errors?
Thanks,
/mnt/t/tmp-portage/portage/sci-libs/pytorch-1.11.0/work/pytorch-1.11.0/torch/csrc/jit/ir/ir.cpp: In member function ‘bool torch::jit::Node::hasSideEffects() const’:
/mnt/t/tmp-portage/portage/sci-libs/pytorch-1.11.0/work/pytorch-1.11.0/torch/csrc/jit/ir/ir.cpp:1194:16: error: ‘set_stream’ is not a member of ‘torch::jit::cuda’; did you mean ‘c10::cuda::set_stream’?
1194 | case cuda::set_stream:
| ^~~~~~~~~~
In file included from /mnt/t/tmp-portage/portage/sci-libs/pytorch-1.11.0/work/pytorch-1.11.0/torch/csrc/jit/ir/ir.h:18,
from /mnt/t/tmp-portage/portage/sci-libs/pytorch-1.11.0/work/pytorch-1.11.0/torch/csrc/jit/ir/ir.cpp:1:
/mnt/t/tmp-portage/portage/sci-libs/pytorch-1.11.0/work/pytorch-1.11.0/aten/src/ATen/core/interned_strings.h:208:11: note: ‘c10::cuda::set_stream’ declared here
208 | _(cuda, set_stream) \
| ^~~~~~~~~~
/mnt/t/tmp-portage/portage/sci-libs/pytorch-1.11.0/work/pytorch-1.11.0/aten/src/ATen/core/interned_strings.h:321:35: note: in definition of macro ‘DEFINE_SYMBOL’
321 | namespace ns { constexpr Symbol s(static_cast<unique_t>(_keys::ns##_##s)); }
| ^
/mnt/t/tmp-portage/portage/sci-libs/pytorch-1.11.0/work/pytorch-1.11.0/aten/src/ATen/core/interned_strings.h:322:1: note: in expansion of macro ‘FORALL_NS_SYMBOLS’
322 | FORALL_NS_SYMBOLS(DEFINE_SYMBOL)
| ^~~~~~~~~~~~~~~~~
/mnt/t/tmp-portage/portage/sci-libs/pytorch-1.11.0/work/pytorch-1.11.0/torch/csrc/jit/ir/ir.cpp:1195:16: error: ‘_set_device’ is not a member of ‘torch::jit::cuda’; did you mean ‘c10::cuda::_set_device’?
1195 | case cuda::_set_device:
| ^~~~~~~~~~~
In file included from /mnt/t/tmp-portage/portage/sci-libs/pytorch-1.11.0/work/pytorch-1.11.0/torch/csrc/jit/ir/ir.h:18,
from /mnt/t/tmp-portage/portage/sci-libs/pytorch-1.11.0/work/pytorch-1.11.0/torch/csrc/jit/ir/ir.cpp:1:
/mnt/t/tmp-portage/portage/sci-libs/pytorch-1.11.0/work/pytorch-1.11.0/aten/src/ATen/core/interned_strings.h:207:11: note: ‘c10::cuda::_set_device’ declared here
207 | _(cuda, _set_device) \
| ^~~~~~~~~~~
/mnt/t/tmp-portage/portage/sci-libs/pytorch-1.11.0/work/pytorch-1.11.0/aten/src/ATen/core/interned_strings.h:321:35: note: in definition of macro ‘DEFINE_SYMBOL’
321 | namespace ns { constexpr Symbol s(static_cast<unique_t>(_keys::ns##_##s)); }
| ^
/mnt/t/tmp-portage/portage/sci-libs/pytorch-1.11.0/work/pytorch-1.11.0/aten/src/ATen/core/interned_strings.h:322:1: note: in expansion of macro ‘FORALL_NS_SYMBOLS’
322 | FORALL_NS_SYMBOLS(DEFINE_SYMBOL)
| ^~~~~~~~~~~~~~~~~
…
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.