Comments (16)
hihi 在吗?
from euler.
pip安装的euler无法支持分布式训练?
from euler.
- 命令行参数中有
--model_dir
这个选项,默认是(worker 0)本地当前目录下的ckpt
文件夹,模型会被保存在这里,实际上就是TensorFlow中Checkpoint的路径; - 目前是支持的。
from euler.
hihi, 请问pip安装的euler无法支持HDFS的话,如何支持分布式训练呢?
from euler.
@lixusign PyPI上0.1.0这个版本的包是支持HDFS的。更新的版本的话需要编译安装并打开HDFS的选项。
from euler.
from euler.
非常感谢各位大大,我先用下0.1.0 版本的Pypi安装试试,编译安装很麻烦而且系统上很多依赖都要各种版本哎。
from euler.
你好,下面是进行分布式训练ppi-graphSage的日志,请问卡到这块意味着什么?我用的pip安装的0.1.0版本euler + hdfs2.9.2 + zk + tensorflow1.12 + 当前2worker + 1ps 。
I0218 11:22:33.431810 19041 graph_builder.cc:84] Thread 98, job size: 0
I0218 11:22:33.432725 19041 graph_builder.cc:84] Thread 99, job size: 0
19/02/18 11:22:33 WARN hdfs.DFSClient: zero
19/02/18 11:22:33 WARN hdfs.DFSClient: zero
19/02/18 11:22:33 WARN hdfs.DFSClient: zero
19/02/18 11:22:33 WARN hdfs.DFSClient: zero
19/02/18 11:22:33 WARN hdfs.DFSClient: zero
19/02/18 11:22:33 WARN hdfs.DFSClient: zero
19/02/18 11:22:33 WARN hdfs.DFSClient: zero
19/02/18 11:22:33 WARN hdfs.DFSClient: zero
19/02/18 11:22:33 WARN hdfs.DFSClient: zero
19/02/18 11:22:33 WARN hdfs.DFSClient: zero
19/02/18 11:22:33 WARN hdfs.DFSClient: zero
I0218 11:22:33.970232 19075 graph_builder.cc:59] Load Done: hdfs://xxx:9000/user/euler/ppi/ppi_train.id
I0218 11:22:34.135093 19070 graph_builder.cc:59] Load Done: hdfs://xxx :9000/user/euler/ppi/ppi-walks.txt
I0218 11:22:34.300680 19069 graph_builder.cc:59] Load Done: hdfs://xxx:9000/user/euler/ppi/ppi-id_map.json
I0218 11:22:34.424460 19067 graph_builder.cc:59] Load Done: hdfs://xxx:9000/user/euler/ppi/ppi-class_map.json
I0218 11:22:34.482018 19074 graph_builder.cc:59] Load Done: hdfs://xxx:9000/user/euler/ppi/ppi_test.id
I0218 11:22:34.498394 19076 graph_builder.cc:59] Load Done: hdfs://xxx:9000/user/euler/ppi/ppi_val.id
I0218 11:22:36.113641 19073 graph_builder.cc:59] Load Done: hdfs://xxx:9000/user/euler/ppi/ppi_meta.json
I0218 11:22:37.140648 19068 graph_builder.cc:59] Load Done: hdfs://xxx:9000/user/euler/ppi/ppi-feats.npy
I0218 11:22:37.287214 19066 graph_builder.cc:59] Load Done: hdfs://xxx:9000/user/euler/ppi/ppi-G.json
I0218 11:22:37.952857 19072 graph_builder.cc:59] Load Done: hdfs://xxx:9000/user/euler/ppi/ppi_data.json
I0218 11:22:38.380856 19071 graph_builder.cc:59] Load Done: hdfs://xxx:9000/user/euler/ppi/ppi_data.dat
I0218 11:22:38.395088 19041 graph_builder.cc:102] Done: build node sampler
I0218 11:22:38.395136 19041 graph_builder.cc:112] Graph build finish
I0218 11:22:38.395155 19041 graph_service.cc:179] service init finish
I0218 11:22:38.396541 19041 graph_service.cc:131] bound port: xxx:32804
W0218 11:22:38.448446 19041 graph.h:148] global sampler is not ok
I0218 11:22:38.451355 19041 graph_service.cc:146] service start
I0218 11:22:38.463107 19045 zk_server_monitor.cc:238] Online node: 0#ip:32804.
I0218 11:22:38.463485 18999 remote_graph.cc:106] Retrieve meta info success, shard number: 2
I0218 11:22:38.463508 18999 remote_graph.cc:119] Retrieve meta info success, partition number: 1
I0218 11:22:38.463533 18999 remote_graph.cc:190] Retrieve Shard Meta Info successfully, shard: 0, Key: node_sum_weight, Meta Info: 44906.000000,6514.000000,5524.000000
I0218 11:22:38.463547 18999 remote_graph.cc:190] Retrieve Shard Meta Info successfully, shard: 0, Key: edge_sum_weight, Meta Info:
from euler.
from euler.
你好 ,是这样的,我只有一个shard,有2个worker 那么只有一个worker会加载一个shard,这样不行吗?
另外上面提到的训练模型保存 配置是:--model_dir:"/model" ,然后就会save到worker0的这个目录下 ?
from euler.
from euler.
非常感谢,还有一个小问题,即训练完成后ps服务无法退出。
from euler.
from euler.
好的 非常感谢 我先看下 这个issue可以关闭了
from euler.
from euler.
ok
from euler.
Related Issues (20)
- ThreadLocalRandom() 随机数分布问题 - sample_neighbor_layerwise OP
- 找不到ppi_data.py
- samle_node样遇到 euler service cpu 100% 一直没有结果卡死问题
- pip install euler_gl python版本冲突导致失败 HOT 1
- pip install成功,site-packages下没有源码
- 关于节点权重
- install Grpc error
- 记 euler2 分布式例子run_ppi.sh 失败的原因
- 请问多ps 多euler的情况下有infer的例子可以借鉴吗
- euler目前支持导入预训练的embedding向量吗? HOT 1
- from euler.tools import json2dat相关
- euler2的graphsage怎么设置use_id=true
- 如何使用Euler做分布式的Embedding训练
- euler1 使用ppi数据集 无监督graphsage训练 loss突然特别特别大
- 子图采样的时候为什么将每一层的源节点拼接? HOT 1
- Euler1 邻居采样去重
- ValueError: inputs must be a list of at least oneTensor/IndexedSlices with the same dtype and shape
- sample_edge的输出shape是[count,3]吗?
- relation_cov code not understand
- 采样策略是入度还是出度
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from euler.