Giter VIP home page Giter VIP logo

Comments (8)

nareshganesan avatar nareshganesan commented on August 19, 2024 1

I think we should use ALIYUN_COM_GPU_MEM_IDX env value inside the container to get the index of the assigned GPU (for the current container)

Reference from the here and here

from gpushare-scheduler-extender.

cheyang avatar cheyang commented on August 19, 2024

Given a server with 8 GPUs, if we start a pod with "aliyun.com/gpu-count:2", and the scheduler assign GPU3 and GPU7 to this pod, what is the GPU number for these 2 GPU cards in the pod? 0 and 1?

aliyun.com/gpu-count is not used for scheduling, it's only for calculate the number of GPUs. And if you are concerned about GPU ID, as @nareshganesan said, it's environment variable ALIYUN_COM_GPU_MEM_IDX.

from gpushare-scheduler-extender.

tjliupeng avatar tjliupeng commented on August 19, 2024

So, the value of ALIYUN_COM_GPU_MEM_IDX should be the actual ID of the host GPU server, such as 3 or 7 in my example, right? And in the container, the application will access the GPU according to the real GPU ID of the server.

from gpushare-scheduler-extender.

nareshganesan avatar nareshganesan commented on August 19, 2024

@tjliupeng , yes exactly as per the code, will verify it once in container (inside a multi gpu host machine ) and confirm here 👍

from gpushare-scheduler-extender.

kmac8361 avatar kmac8361 commented on August 19, 2024

Great work... We have servers with 8 GPUs which can be mixed variety (eg. P100, V100, RTX 6000). It would be a nice enhancement to be able to specify GPU model preference. Since P100 is not MPS architecture, by default we want pod to choose GPU model V100 or RTX 6000.

from gpushare-scheduler-extender.

cheyang avatar cheyang commented on August 19, 2024

I think you can add node label to specify the GPU model, and use node selector to choose the node.

from gpushare-scheduler-extender.

alasdairtran avatar alasdairtran commented on August 19, 2024

I think you can add node label to specify the GPU model, and use node selector to choose the node.

This would work if we have different GPUs residing in different nodes. If we have two types of GPUs residing in the same node/machine, is there currently any way to tell your pod which GPU to pick? Can we request GPU by their ID (ALIYUN_COM_GPU_MEM_IDX)?

from gpushare-scheduler-extender.

cheyang avatar cheyang commented on August 19, 2024

I think the current implementation doesn't work for your scenario. The assumption of the design is the GPUs in the same node are the same types.

from gpushare-scheduler-extender.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.