Comments (24)
Thanks you, I will try it later
from graphscope.
error log
/opt/graphscope/include/graphscope/apps/pregel/louvain/louvain.h:215:35: error: invalid 'static_cast' from type 'grape::EmptyType' to type 'gs::PregelLouvain<gs::ArrowProjectedFragment<std::__cxx11::basic_string<char>, long unsigned int, grape::EmptyType, grape::EmptyType, vineyard::ArrowVertexMap<std::basic_string_view<char>, long unsigned int>, false> >::edata_t' {aka 'float'}
from graphscope.
Does the louvain algorithm not support oid custom String ID?
graph = session.g(oid_type="string",generate_eid=True,directed=False)
from graphscope.
It means the graph has no edge data, but it needs it.
from graphscope.
the graph has edge data:
## GIE 图基本操作
# Get the entrypoint for submitting Gremlin queries on graph g.
interactive = session.gremlin(graph)
edgenum = interactive.execute(
"g.E().count()").one()
print("edgenum", edgenum)
vertexnum = interactive.execute(
"g.V().count()").one()
print("vertexnum", vertexnum)
edgenum [30]
vertexnum [18]
from graphscope.
Sorry, I mean property on that edges. For example, the louvain takes a graph which has a schema of (person, knows), where the knows has property as (id: string, weight: double)
from graphscope.
After add property ('weight','double'), the same error is still reported
from graphscope.framework.loader import Loader
graph = (
graph.add_vertices(Loader("file:///user/jacky.yang/graphcomputer/louvain/vertex_address.csv",filetype="csv")
,label="user_id"
,vid_field="id"
)
.add_edges(Loader("file:///user/jacky.yang/graphcomputer/louvain/edge_transaction.csv",filetype="csv")
,label="login"
,src_label="user_id"
,dst_label="user_id"
,src_field='from'
,dst_field='to'
,properties=[('weight', 'double')]
)
)
error log
开始运行 Louvain 算法
I1227 03:00:59.000000 506 /home/graphscope/GraphScope/analytical_engine/core/grape_instance.cc:268] Projecting graph graph_ipnqLnvB to simple graph: graph_projected_DYlg2S3a, type sig: 76170b1daab8ede76f000284df3134d76e2f0be377b9f3824b6501d4e33267c2
2023-12-27 03:01:00,315 [INFO][utils:243]: app type: gs::LouvainAppBase<_GRAPH_TYPE> (apps/pregel/louvain/louvain_app_base.h), graph type: gs::ArrowProjectedFragment<std::string,uint64_t,grape::EmptyType,grape::EmptyType,vineyard::ArrowVertexMap<vineyard::arrow_string_view,uint64_t>,false> (core/fragment/arrow_projected_fragment.h)
2023-12-27 03:01:00,318 [INFO][utils:452]: Building app library...
2023-12-27 03:01:00,421 [INFO][utils:469]: Codegened application type: cpp_pie, app header: apps/pregel/louvain/louvain_app_base.h, app_class: gs::LouvainAppBase<_GRAPH_TYPE>, vd_type: None, md_type: None, pregel_combine: None, java_jar_path: None, java_app_class: None
2023-12-27 03:01:00,422 [INFO][utils:377]: compile on kubernetes, ["cmake . -DNETWORKX=ON -DCMAKE_PREFIX_PATH='/opt/graphscope;'", 'make -j2'], /tmp/gs/builtin/5bd82f1167356c336b6d76a64a6391048f8a717871f46092e024df869e6f2c3a, 5bd82f1167356c336b6d76a64a6391048f8a717871f46092e024df869e6f2c3a, gs-engine-leahic-0, engine
2023-12-27 03:01:00,422 [INFO][utils:321]: Running command: kubectl exec -c engine gs-engine-leahic-0 -- bash -c "test -f /tmp/gs/builtin/5bd82f1167356c336b6d76a64a6391048f8a717871f46092e024df869e6f2c3a/lib5bd82f1167356c336b6d76a64a6391048f8a717871f46092e024df869e6f2c3a.so", cwd: None
2023-12-27 03:01:00,794 [ERROR][utils:325]: Failed to run command: kubectl exec -c engine gs-engine-leahic-0 -- bash -c "test -f /tmp/gs/builtin/5bd82f1167356c336b6d76a64a6391048f8a717871f46092e024df869e6f2c3a/lib5bd82f1167356c336b6d76a64a6391048f8a717871f46092e024df869e6f2c3a.so", error message is: command terminated with exit code 1
2023-12-27 03:01:00,794 [INFO][utils:321]: Running command: kubectl exec -c engine gs-engine-leahic-0 -- bash -c "mkdir -p /tmp/gs/builtin", cwd: None
2023-12-27 03:01:01,089 [DEBUG][utils:398]:
2023-12-27 03:01:01,089 [INFO][utils:321]: Running command: kubectl cp /tmp/gs/builtin/5bd82f1167356c336b6d76a64a6391048f8a717871f46092e024df869e6f2c3a gs-engine-leahic-0:/tmp/gs/builtin/5bd82f1167356c336b6d76a64a6391048f8a717871f46092e024df869e6f2c3a -c engine --retries=5, cwd: None
2023-12-27 03:01:01,359 [DEBUG][utils:399]:
2023-12-27 03:01:01,360 [INFO][utils:321]: Running command: kubectl exec -c engine gs-engine-leahic-0 -- bash -c "cd /tmp/gs/builtin/5bd82f1167356c336b6d76a64a6391048f8a717871f46092e024df869e6f2c3a && cmake . -DNETWORKX=ON -DCMAKE_PREFIX_PATH='/opt/graphscope;'", cwd: None
2023-12-27 03:01:05,369 [DEBUG][utils:402]: -- The C compiler identification is GNU 11.4.0
2023-12-27 03:01:14,865 [ERROR][utils:325]: Failed to run command: kubectl exec -c engine gs-engine-leahic-0 -- bash -c "cd /tmp/gs/builtin/5bd82f1167356c336b6d76a64a6391048f8a717871f46092e024df869e6f2c3a && make -j2", error message is: In file included from /opt/graphscope/include/graphscope/apps/pregel/louvain/louvain_app_base.h:31,
from /opt/graphscope/include/graphscope/frame/app_frame.cc:43:
/opt/graphscope/include/graphscope/apps/pregel/louvain/louvain.h: In instantiation of 'void gs::PregelLouvain<FRAG_T>::Init(gs::PregelLouvain<FRAG_T>::pregel_vertex_t&, gs::PregelLouvain<FRAG_T>::compute_context_t&) [with FRAG_T = gs::ArrowProjectedFragment<std::__cxx11::basic_string<char>, long unsigned int, grape::EmptyType, grape::EmptyType, vineyard::ArrowVertexMap<std::basic_string_view<char>, long unsigned int>, false>; gs::PregelLouvain<FRAG_T>::pregel_vertex_t = gs::LouvainVertex<gs::ArrowProjectedFragment<std::__cxx11::basic_string<char>, long unsigned int, grape::EmptyType, grape::EmptyType, vineyard::ArrowVertexMap<std::basic_string_view<char>, long unsigned int>, false>, std::__cxx11::basic_string<char>, gs::LouvainMessage<long unsigned int> >; gs::PregelLouvain<FRAG_T>::compute_context_t = gs::PregelComputeContext<gs::ArrowProjectedFragment<std::__cxx11::basic_string<char>, long unsigned int, grape::EmptyType, grape::EmptyType, vineyard::ArrowVertexMap<std::basic_string_view<char>, long unsigned int>, false>, std::__cxx11::basic_string<char>, gs::LouvainMessage<long unsigned int> >]':
from graphscope.
ArrowProjectedFragment<std::string,uint64_t,grape::EmptyType,grape::EmptyType
but there's still no edge data there. Could you paste the code from load graph to run app, and several lines of your data so I can have a try of it?
from graphscope.
load graph and run louvain
# enabled_engines="analytical,interactive",
import graphscope
graphscope.set_option(log_level='DEBUG')
graphscope.set_option(show_log=True)
from graphscope.config import Config
config = Config()
config.coordinator.monitor=True
config.log_level="debug"
config.show_log=True
# Create GraphScope client session, the 'cluster_type' is k8s by default.
session = graphscope.session(
k8s_coordinator_cpu=1,
k8s_coordinator_mem="1Gi",
k8s_vineyard_cpu=4,
k8s_vineyard_mem="5Gi",
vineyard_shared_mem="5Gi",
k8s_engine_cpu=2,
k8s_namespace='gs-new-orc-jacky',
k8s_engine_mem="5Gi",
num_workers=1,
enabled_engines="analytical,interactive",
k8s_client_config='~/.kube/config')
print('========= Session created. ==========')
graph = session.g(oid_type="string",generate_eid=True,directed=False)
from graphscope.framework.loader import Loader
graph = (
graph.add_vertices(Loader("file:///user/jacky.yang/graphcomputer/louvain/vertex_address.csv",filetype="csv")
,label="user_id"
,vid_field="id"
)
.add_edges(Loader("file:///user/jacky.yang/graphcomputer/louvain/edge_transaction.csv",filetype="csv")
,label="login"
,src_label="user_id"
,dst_label="user_id"
,src_field='from'
,dst_field='to'
,properties=[('weight', 'double')]
)
)
# 目标:demo 所有的图分析算法
import numpy as np
import pandas as pd
####################################################################
#Louvain算法
print("开始运行 Louvain 算法")
result_louvain = graphscope.louvain(graph, min_progress=1000, progress_tries=1)
#格式整理
dataframe_louvain = result_louvain.to_dataframe({"node": "v.id", "result": "r"})
print("算法运行结束!")
vertex_address.csv
id
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
edge_transaction.csv
from,to,weight
2,6,1
0,16,1
0,11,1
0,4,1
3,13,1
0,7,1
0,14,1
0,10,1
1,17,1
2,5,1
3,9,1
2,8,1
9,15,1
2,12,1
16,5,1
from graphscope.
Does Graphscope louvain algorithm not support execution on mac?
2023-12-27 11:55:30,468 [ERROR][rpc:188]: Runstep failed with code: ANALYTICAL_ENGINE_INTERNAL_ERROR, message: Error occurred during RunStep, The traceback is: Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/gscoordinator/utils.py", line 417, in _compile_on_local
logger.debug(run_command(command, cwd=workdir))
File "/usr/local/lib/python3.10/site-packages/gscoordinator/utils.py", line 322, in run_command
cp = subprocess.run(shlex.split(args), capture_output=True, cwd=cwd, **kwargs)
File "/usr/local/Cellar/[email protected]/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/subprocess.py", line 503, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/local/Cellar/[email protected]/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/subprocess.py", line 971, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/local/Cellar/[email protected]/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/subprocess.py", line 1847, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'cmake'
from graphscope.
FileNotFoundError: [Errno 2] No such file or directory: 'cmake'
from graphscope.
Project the graph to correct schema first, and I passed the compilation with it.
pg = graph.project(vertices={'user_id': ['id']}, edges={'login': ['weight']})
graphscope.louvain(pg, ...)
from graphscope.
pg = graph.project(vertices={'user_id': ['id']}, edges={'login': ['weight']})
Running successfully, thank you very much for your help!
from graphscope.
You are welcome
from graphscope.
by the way
i want to adate algo nx
from community import community_louvain
com = community_louvain.best_partition(G)
how can we get the best_partition with graphscope louvain
print("开始运行 Louvain 算法")
result_louvain = graphscope.louvain(graph, min_progress=1000, progress_tries=1)
#格式整理
dataframe_louvain = result_louvain.to_dataframe({"node": "v.id", "result": "r"})
if use default args min_progress=1000, progress_tries=1 the result not the best community
result
node result
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
expect 18 nodes in 4 communities!
from graphscope.
Try tune the parameters and see if it helps. For example min_progress=1000
this means < 1000 node changes is not seen as a progress, which is too large for your 18 node graph.
from graphscope.
print("开始运行 Louvain 算法")
result_louvain = graphscope.louvain(graph, min_progress=2, progress_tries=2)
#格式整理
dataframe_louvain = result_louvain.to_dataframe({"node": "v.id", "result": "r"})
get the same result
node result
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
when i see #183 the louvain test result is
dataset: p2p (62586 nodes, 147892 edges)
graphscope louvain: 0.571986 find community num: 103
python-louvain: 0.578780 find community num: 110
i want to use dataset: p2p (62586 nodes, 147892 edges) where the dataset ?
from graphscope.
print("开始运行 Louvain 算法") result_louvain = graphscope.louvain(graph, min_progress=2, progress_tries=2) #格式整理 dataframe_louvain = result_louvain.to_dataframe({"node": "v.id", "result": "r"})
get the same result
node result 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17 17
when i see #183 the louvain test result is
dataset: p2p (62586 nodes, 147892 edges) graphscope louvain: 0.571986 find community num: 103 python-louvain: 0.578780 find community num: 110
i want to use dataset: p2p (62586 nodes, 147892 edges) where the dataset ?
hi, @JackyYangPassion, the p2p dataset you can use the p2p-31.v and p2p-31.e in gstest
from graphscope.
print("开始运行 Louvain 算法") result_louvain = graphscope.louvain(graph, min_progress=2, progress_tries=2) #格式整理 dataframe_louvain = result_louvain.to_dataframe({"node": "v.id", "result": "r"})
get the same result
node result 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17 17
when i see #183 the louvain test result is
dataset: p2p (62586 nodes, 147892 edges) graphscope louvain: 0.571986 find community num: 103 python-louvain: 0.578780 find community num: 110
i want to use dataset: p2p (62586 nodes, 147892 edges) where the dataset ?
hi, @JackyYangPassion, the p2p dataset you can use the p2p-31.v and p2p-31.e in gstest
Thanks for the reply. If the results are reproduced, what are the two configuration parameters of louvain? Are they the default ones?
min_progress=2, progress_tries=2
from graphscope.
print("开始运行 Louvain 算法") result_louvain = graphscope.louvain(graph, min_progress=2, progress_tries=2) #格式整理 dataframe_louvain = result_louvain.to_dataframe({"node": "v.id", "result": "r"})
get the same result
node result 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17 17
when i see #183 the louvain test result is
dataset: p2p (62586 nodes, 147892 edges) graphscope louvain: 0.571986 find community num: 103 python-louvain: 0.578780 find community num: 110
i want to use dataset: p2p (62586 nodes, 147892 edges) where the dataset ?
hi, @JackyYangPassion, the p2p dataset you can use the p2p-31.v and p2p-31.e in gstest
Thanks for the reply. If the results are reproduced, what are the two configuration parameters of louvain? Are they the default ones?
min_progress=2, progress_tries=2
@acezen
like p2p the small dataset,you can try
min_progress=0
progress_tries=1
refer to: https://sotera.github.io/distributed-graph-analytics/louvain/
from graphscope.
I followed the mentioned dataset and modified the test_app.py:test_louvain_on_projected_graph test case locally
but I couldn't reproduce the results of the previous PR #183 . The group_id column is empty. if there is any logical error?.
test_app.py:test_louvain_on_projected_graph The code logic has been modified as follows:
# TODO: 验证p2p 数据集合:性能,结果准确性
def test_louvain_on_p2p_projected_graph(p2p_project_undirected_graph_string,graphscope_session):
g = p2p_project_undirected_graph_string
interactive = graphscope_session.interactive(g)
edgeNum = interactive.execute("g.E().count()").one()
vertexNum = interactive.execute("g.V().count()").one()
ctx = louvain(g, min_progress=0, progress_tries=1)
df_com_louvain = ctx.to_dataframe({"node": "v.id", "r": "r"})
logger.info(df_com_louvain.head(100))
ctx.output_to_client(f'/home/graphscope/result/p2p_louvain_result.csv', selector={'id': 'v.id', 'dist': 'r'})
the result in p2p_louvain_result.csv, the dist column is empty
id,dist
31993,
32003,
32004,
32005,
The number of groups obtained through community_louvain.best_partition(G) in nx is 62.
@acezen @siyuan0322 Can you help me take a look at it?
from graphscope.
I0103 09:19:59.000000 418 /opt/graphscope/include/graphscope/apps/pregel/louvain/louvain_app_base.h:237] super step: 56 decided to halt, ACTUAL QUALITY: 0.875365 previous QUALITY: 0
I0103 09:19:59.000000 418 /opt/graphscope/include/graphscope/apps/pregel/louvain/louvain_app_base.h:155] current super step: -1 current minor step: -1 current iteration: 0
I0103 09:19:59.000000 418 /opt/graphscope/include/graphscope/apps/pregel/louvain/louvain_app_base.h:155] current super step: 0 current minor step: 0 current iteration: 0
I0103 09:19:59.000000 418 /opt/graphscope/include/graphscope/apps/pregel/louvain/louvain_app_base.h:155] current super step: 1 current minor step: 1 current iteration: 0
I0103 09:19:59.000000 418 /opt/graphscope/include/graphscope/core/app/app_invoker.h:196] Query time: 1.05882 seconds
from the log ΔQ = ACTUAL QUALITY: 0.875365
from graphscope.
I followed the mentioned dataset and modified the test_app.py:test_louvain_on_projected_graph test case locally
but I couldn't reproduce the results of the previous PR #183 . The group_id column is empty. if there is any logical error?.
test_app.py:test_louvain_on_projected_graph The code logic has been modified as follows:
# TODO: 验证p2p 数据集合:性能,结果准确性 def test_louvain_on_p2p_projected_graph(p2p_project_undirected_graph_string,graphscope_session): g = p2p_project_undirected_graph_string interactive = graphscope_session.interactive(g) edgeNum = interactive.execute("g.E().count()").one() vertexNum = interactive.execute("g.V().count()").one() ctx = louvain(g, min_progress=0, progress_tries=1) df_com_louvain = ctx.to_dataframe({"node": "v.id", "r": "r"}) logger.info(df_com_louvain.head(100)) ctx.output_to_client(f'/home/graphscope/result/p2p_louvain_result.csv', selector={'id': 'v.id', 'dist': 'r'})
the result in p2p_louvain_result.csv, the dist column is empty
id,dist 31993, 32003, 32004, 32005,
The number of groups obtained through community_louvain.best_partition(G) in nx is 62.
@acezen @siyuan0322 Can you help me take a look at it?
It's weird. The louvain logic may change since the PR merged, I would try to reproduce the result and report to you ASAP.
from graphscope.
Thank you for your reply. @acezen
Currently, with the growth of graph data scale, we are adapting from nx to gs louvain to speed up the running efficiency. During the result verification stage, we found that the results after clustering do not meet expectations.
from graphscope.
Related Issues (20)
- [BUG] 利用Cypher查询节点,不返回节点属性,节点属性必须指定,并且返回的结果不稳定? HOT 2
- Integrate Interactive with `gsctl` HOT 1
- [BUG] [Graphscope on yarn] When generating Fragments, there is a partition issue.
- feature(interactive): Support `properties()` to Fetch All Properties of A Vertex in Cypher
- [BUG] Missing pattern sentences in Pattern Matching in GOpt-based compilation
- Why does GraphScope Vineyard require so much memory when running on Yarn
- [BUG] [GAE] The triangles algorithm results cannot be consistent for the same data set
- [BUG] Interactive unexpected behavior: delete serving graph successfully by openAPI generated code
- In pregel programming model, how does the static method Combine work HOT 1
- feat(interactive): Support retrieving current running schema of the query service
- feat(interactive): Add a API which will return the statistic of a graph.
- feat(interactive): Adapt to the GOpt optimizer
- [BUG] The returned `path` can not be displayed by neo4j driver
- [BUG] Unexpected when dealing with filters in `match` in the GOpt-based compilation
- [BUG] Unexpected alias maintenance in the new GOpt-based compilation stack
- [BUG] The absence of `bulk_loader` in system `PATH` cause engine down.
- refactor(interactive): Refine Stored procedure stack
- Miss pk identify in some cases in GOpt
- Deal with the cases when type infer is not precise in Gremlin queries in GOpt HOT 1
- Reorganize the Coordinator and gsctl to adapt to the unified schema
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from graphscope.