Giter VIP home page Giter VIP logo

Comments (6)

dailong avatar dailong commented on August 29, 2024 2

@jiarunying Thank you!问题找到了。看了下NM日志内存配置小了,这个NM内存怎么从2G突然到了6G,然后到降下来,没想到这么吃内存
image
image

from hbox.

dailong avatar dailong commented on August 29, 2024

image
同一台机器的两个Container不能通信么。worker一直卡着没有日志

from hbox.

jiarunying avatar jiarunying commented on August 29, 2024

正常情况下,worker运行结束后,am会向ps container发出作业完成信号,ps接收信号自行退出。此处没退出的原因需要排查日志分析。
同台机器可以启两个container,worker卡住需查看逻辑具体分析。是跑的demo出现的问题?

from hbox.

dailong avatar dailong commented on August 29, 2024

是跑Tensorflow demo出现的问题。。现在用了两台机器的NodeManager就是跑不通,其中一个是这个状态一直卡在这,个人感觉是因为另一个worker提前结束了,导致这个worker一直等着通信。
image
为什么Container会很快Complete呢?
image

from hbox.

dailong avatar dailong commented on August 29, 2024

而且demo是没问题的,我把demo.py 在同台机器,和不同机器跑都是没问题的

from hbox.

jiarunying avatar jiarunying commented on August 29, 2024

麻烦先排查下最早出现COMPLETE状态的container的退出原因(截图中是containerxxx_000003优先退出):是否存在RUNNING;如果存在RUNNING状态,则查看container日志的最后退出信息;若无,则查看NM端日志判断退出原因。因后边提示有task time out 日志,可能由于container中途被kill等原因导致的异常退出状态,与AM心跳通信超时。

from hbox.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.