Giter VIP home page Giter VIP logo

Comments (5)

ertza avatar ertza commented on August 22, 2024

I can go forward by adding a SKIP_DPDK, but then it fails at building PyTorch and says it cannot find Gloo.

AT_INSTALL_INCLUDE_DIR include/ATen/core
core header install: /root/tmp/omnireduce/omnireduce-DPDK/pytorch/build/aten/src/ATen/core/TensorBody.h
-- /usr/bin/c++ /root/tmp/omnireduce/omnireduce-DPDK/pytorch/torch/abi-check.cpp -o /root/tmp/omnireduce/omnireduce-DPDK/pytorch/build/abi-check
-- Determined _GLIBCXX_USE_CXX11_ABI=1
CMake Error at /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
Could NOT find Gloo (missing: Gloo_INCLUDE_DIR Gloo_LIBRARY)
Call Stack (most recent call first):
/usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
cmake/Modules/FindGloo.cmake:44 (find_package_handle_standard_args)
torch/lib/c10d/CMakeLists.txt:76 (find_package)

Although Gloo builds and installs correctly before this

-- Configuring done
-- Generating done
-- Build files have been written to: /root/tmp/omnireduce/omnireduce-DPDK/gloo/build

  • make install
    Consolidate compiler generated dependencies of target gloo
    [100%] Built target gloo
    Install the project...

from omnireduce.

ChenYuHo avatar ChenYuHo commented on August 22, 2024

Are you building DPDK inside docker container?
We install DPDK in the host machine and didn't see this problem before.
Your NIC supports RoCE so you may try the RDMA version instead.

from omnireduce.

ertza avatar ertza commented on August 22, 2024

Yes I am building inside the docker. That should not be the issue here as the newer DPDK version does compile fine, might be some incompatibility with the newer OS or arch. But anyways even skipping the DPDK, how to solve the Gloo issue?

For my particular use-case, I need to run PyTorch/Gloo/DAIET/DPDK so have to make this one work. Right now the only blocking issue is that pytorch make cannot fine Gloo Could NOT find Gloo (missing: Gloo_INCLUDE_DIR Gloo_LIBRARY)

from omnireduce.

ChenYuHo avatar ChenYuHo commented on August 22, 2024

If you were following the doc, you should have a conda environment. Check if you do have gloo installed in your ${CONDA_PREFIX}
If yes, try passing Gloo_INCLUDE_DIR=${CONDA_PREFIX}/include and Gloo_LIBRARY=${CONDA_PREFIX}/lib when building PyTorch as the error message says those are missing.

from omnireduce.

ertza avatar ertza commented on August 22, 2024

Fixed! I wasn't sure what to pass as the paths for include and lib directories, this worked.

from omnireduce.

Related Issues (4)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.