Giter VIP home page Giter VIP logo

Comments (24)

siyuan0322 avatar siyuan0322 commented on May 24, 2024 1

Thanks you, I will try it later

from graphscope.

JackyYangPassion avatar JackyYangPassion commented on May 24, 2024

error log

/opt/graphscope/include/graphscope/apps/pregel/louvain/louvain.h:215:35: error: invalid 'static_cast' from type 'grape::EmptyType' to type 'gs::PregelLouvain<gs::ArrowProjectedFragment<std::__cxx11::basic_string<char>, long unsigned int, grape::EmptyType, grape::EmptyType, vineyard::ArrowVertexMap<std::basic_string_view<char>, long unsigned int>, false> >::edata_t' {aka 'float'}

from graphscope.

JackyYangPassion avatar JackyYangPassion commented on May 24, 2024

Does the louvain algorithm not support oid custom String ID?

graph = session.g(oid_type="string",generate_eid=True,directed=False)

from graphscope.

siyuan0322 avatar siyuan0322 commented on May 24, 2024

It means the graph has no edge data, but it needs it.

from graphscope.

JackyYangPassion avatar JackyYangPassion commented on May 24, 2024

the graph has edge data:

## GIE 图基本操作
# Get the entrypoint for submitting Gremlin queries on graph g.
interactive = session.gremlin(graph)

edgenum = interactive.execute(
    "g.E().count()").one()
print("edgenum", edgenum)


vertexnum = interactive.execute(
    "g.V().count()").one()
print("vertexnum", vertexnum)

edgenum [30]
vertexnum [18]

from graphscope.

siyuan0322 avatar siyuan0322 commented on May 24, 2024

Sorry, I mean property on that edges. For example, the louvain takes a graph which has a schema of (person, knows), where the knows has property as (id: string, weight: double)

from graphscope.

JackyYangPassion avatar JackyYangPassion commented on May 24, 2024

After add property ('weight','double'), the same error is still reported

from graphscope.framework.loader import Loader
graph = (
        graph.add_vertices(Loader("file:///user/jacky.yang/graphcomputer/louvain/vertex_address.csv",filetype="csv")
                           ,label="user_id"
                           ,vid_field="id"
              )
             .add_edges(Loader("file:///user/jacky.yang/graphcomputer/louvain/edge_transaction.csv",filetype="csv")
                    ,label="login"
                    ,src_label="user_id"
                    ,dst_label="user_id"
                    ,src_field='from'
                    ,dst_field='to'
                    ,properties=[('weight', 'double')]
              )
             
            
    )

error log

开始运行 Louvain 算法
I1227 03:00:59.000000   506 /home/graphscope/GraphScope/analytical_engine/core/grape_instance.cc:268] Projecting graph graph_ipnqLnvB to simple graph: graph_projected_DYlg2S3a, type sig: 76170b1daab8ede76f000284df3134d76e2f0be377b9f3824b6501d4e33267c2
2023-12-27 03:01:00,315 [INFO][utils:243]: app type: gs::LouvainAppBase<_GRAPH_TYPE> (apps/pregel/louvain/louvain_app_base.h), graph type: gs::ArrowProjectedFragment<std::string,uint64_t,grape::EmptyType,grape::EmptyType,vineyard::ArrowVertexMap<vineyard::arrow_string_view,uint64_t>,false> (core/fragment/arrow_projected_fragment.h)
2023-12-27 03:01:00,318 [INFO][utils:452]: Building app library...
2023-12-27 03:01:00,421 [INFO][utils:469]: Codegened application type: cpp_pie, app header: apps/pregel/louvain/louvain_app_base.h, app_class: gs::LouvainAppBase<_GRAPH_TYPE>, vd_type: None, md_type: None, pregel_combine: None,             java_jar_path: None, java_app_class: None
2023-12-27 03:01:00,422 [INFO][utils:377]: compile on kubernetes, ["cmake . -DNETWORKX=ON -DCMAKE_PREFIX_PATH='/opt/graphscope;'", 'make -j2'], /tmp/gs/builtin/5bd82f1167356c336b6d76a64a6391048f8a717871f46092e024df869e6f2c3a, 5bd82f1167356c336b6d76a64a6391048f8a717871f46092e024df869e6f2c3a, gs-engine-leahic-0, engine
2023-12-27 03:01:00,422 [INFO][utils:321]: Running command: kubectl exec -c engine gs-engine-leahic-0 -- bash -c "test -f /tmp/gs/builtin/5bd82f1167356c336b6d76a64a6391048f8a717871f46092e024df869e6f2c3a/lib5bd82f1167356c336b6d76a64a6391048f8a717871f46092e024df869e6f2c3a.so", cwd: None
2023-12-27 03:01:00,794 [ERROR][utils:325]: Failed to run command: kubectl exec -c engine gs-engine-leahic-0 -- bash -c "test -f /tmp/gs/builtin/5bd82f1167356c336b6d76a64a6391048f8a717871f46092e024df869e6f2c3a/lib5bd82f1167356c336b6d76a64a6391048f8a717871f46092e024df869e6f2c3a.so", error message is: command terminated with exit code 1
2023-12-27 03:01:00,794 [INFO][utils:321]: Running command: kubectl exec -c engine gs-engine-leahic-0 -- bash -c "mkdir -p /tmp/gs/builtin", cwd: None
2023-12-27 03:01:01,089 [DEBUG][utils:398]:
2023-12-27 03:01:01,089 [INFO][utils:321]: Running command: kubectl cp /tmp/gs/builtin/5bd82f1167356c336b6d76a64a6391048f8a717871f46092e024df869e6f2c3a gs-engine-leahic-0:/tmp/gs/builtin/5bd82f1167356c336b6d76a64a6391048f8a717871f46092e024df869e6f2c3a -c engine --retries=5, cwd: None
2023-12-27 03:01:01,359 [DEBUG][utils:399]:
2023-12-27 03:01:01,360 [INFO][utils:321]: Running command: kubectl exec -c engine gs-engine-leahic-0 -- bash -c "cd /tmp/gs/builtin/5bd82f1167356c336b6d76a64a6391048f8a717871f46092e024df869e6f2c3a && cmake . -DNETWORKX=ON -DCMAKE_PREFIX_PATH='/opt/graphscope;'", cwd: None
2023-12-27 03:01:05,369 [DEBUG][utils:402]: -- The C compiler identification is GNU 11.4.0
2023-12-27 03:01:14,865 [ERROR][utils:325]: Failed to run command: kubectl exec -c engine gs-engine-leahic-0 -- bash -c "cd /tmp/gs/builtin/5bd82f1167356c336b6d76a64a6391048f8a717871f46092e024df869e6f2c3a && make -j2", error message is: In file included from /opt/graphscope/include/graphscope/apps/pregel/louvain/louvain_app_base.h:31,
                 from /opt/graphscope/include/graphscope/frame/app_frame.cc:43:
/opt/graphscope/include/graphscope/apps/pregel/louvain/louvain.h: In instantiation of 'void gs::PregelLouvain<FRAG_T>::Init(gs::PregelLouvain<FRAG_T>::pregel_vertex_t&, gs::PregelLouvain<FRAG_T>::compute_context_t&) [with FRAG_T = gs::ArrowProjectedFragment<std::__cxx11::basic_string<char>, long unsigned int, grape::EmptyType, grape::EmptyType, vineyard::ArrowVertexMap<std::basic_string_view<char>, long unsigned int>, false>; gs::PregelLouvain<FRAG_T>::pregel_vertex_t = gs::LouvainVertex<gs::ArrowProjectedFragment<std::__cxx11::basic_string<char>, long unsigned int, grape::EmptyType, grape::EmptyType, vineyard::ArrowVertexMap<std::basic_string_view<char>, long unsigned int>, false>, std::__cxx11::basic_string<char>, gs::LouvainMessage<long unsigned int> >; gs::PregelLouvain<FRAG_T>::compute_context_t = gs::PregelComputeContext<gs::ArrowProjectedFragment<std::__cxx11::basic_string<char>, long unsigned int, grape::EmptyType, grape::EmptyType, vineyard::ArrowVertexMap<std::basic_string_view<char>, long unsigned int>, false>, std::__cxx11::basic_string<char>, gs::LouvainMessage<long unsigned int> >]':

from graphscope.

siyuan0322 avatar siyuan0322 commented on May 24, 2024

ArrowProjectedFragment<std::string,uint64_t,grape::EmptyType,grape::EmptyType
but there's still no edge data there. Could you paste the code from load graph to run app, and several lines of your data so I can have a try of it?

from graphscope.

JackyYangPassion avatar JackyYangPassion commented on May 24, 2024

load graph and run louvain

# enabled_engines="analytical,interactive",
import graphscope
graphscope.set_option(log_level='DEBUG')
graphscope.set_option(show_log=True)
from graphscope.config import Config

config = Config()
config.coordinator.monitor=True
config.log_level="debug"
config.show_log=True

# Create GraphScope client session, the 'cluster_type' is k8s by default.
session = graphscope.session(
                             k8s_coordinator_cpu=1,
                             k8s_coordinator_mem="1Gi",
                             k8s_vineyard_cpu=4,
                             k8s_vineyard_mem="5Gi",
                             vineyard_shared_mem="5Gi",
                             k8s_engine_cpu=2,
                             k8s_namespace='gs-new-orc-jacky',
                             k8s_engine_mem="5Gi",
                             num_workers=1,
                             enabled_engines="analytical,interactive",
                             k8s_client_config='~/.kube/config')
print('========= Session created. ==========')


graph = session.g(oid_type="string",generate_eid=True,directed=False)
from graphscope.framework.loader import Loader
graph = (
        graph.add_vertices(Loader("file:///user/jacky.yang/graphcomputer/louvain/vertex_address.csv",filetype="csv")
                           ,label="user_id"
                           ,vid_field="id"
              )
             .add_edges(Loader("file:///user/jacky.yang/graphcomputer/louvain/edge_transaction.csv",filetype="csv")
                    ,label="login"
                    ,src_label="user_id"
                    ,dst_label="user_id"
                    ,src_field='from'
                    ,dst_field='to'
                    ,properties=[('weight', 'double')]
              )
    )

# 目标:demo 所有的图分析算法
import numpy  as np
import pandas as pd
####################################################################
#Louvain算法
print("开始运行 Louvain 算法")
result_louvain = graphscope.louvain(graph, min_progress=1000, progress_tries=1)
#格式整理
dataframe_louvain = result_louvain.to_dataframe({"node": "v.id", "result": "r"})
                              
                              
                              
print("算法运行结束!")

vertex_address.csv

id
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

edge_transaction.csv

from,to,weight
2,6,1
0,16,1
0,11,1
0,4,1
3,13,1
0,7,1
0,14,1
0,10,1
1,17,1
2,5,1
3,9,1
2,8,1
9,15,1
2,12,1
16,5,1

from graphscope.

JackyYangPassion avatar JackyYangPassion commented on May 24, 2024

Does Graphscope louvain algorithm not support execution on mac?

2023-12-27 11:55:30,468 [ERROR][rpc:188]: Runstep failed with code: ANALYTICAL_ENGINE_INTERNAL_ERROR, message: Error occurred during RunStep, The traceback is: Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/gscoordinator/utils.py", line 417, in _compile_on_local
    logger.debug(run_command(command, cwd=workdir))
  File "/usr/local/lib/python3.10/site-packages/gscoordinator/utils.py", line 322, in run_command
    cp = subprocess.run(shlex.split(args), capture_output=True, cwd=cwd, **kwargs)
  File "/usr/local/Cellar/[email protected]/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/subprocess.py", line 503, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/local/Cellar/[email protected]/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/subprocess.py", line 971, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/local/Cellar/[email protected]/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/subprocess.py", line 1847, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'cmake'

from graphscope.

siyuan0322 avatar siyuan0322 commented on May 24, 2024

FileNotFoundError: [Errno 2] No such file or directory: 'cmake'

from graphscope.

siyuan0322 avatar siyuan0322 commented on May 24, 2024

Project the graph to correct schema first, and I passed the compilation with it.

pg = graph.project(vertices={'user_id': ['id']}, edges={'login': ['weight']})
graphscope.louvain(pg, ...)

from graphscope.

JackyYangPassion avatar JackyYangPassion commented on May 24, 2024

pg = graph.project(vertices={'user_id': ['id']}, edges={'login': ['weight']})

Running successfully, thank you very much for your help!

from graphscope.

siyuan0322 avatar siyuan0322 commented on May 24, 2024

You are welcome

from graphscope.

JackyYangPassion avatar JackyYangPassion commented on May 24, 2024

by the way
i want to adate algo nx

from community import community_louvain
com = community_louvain.best_partition(G) 

how can we get the best_partition with graphscope louvain

print("开始运行 Louvain 算法")
result_louvain = graphscope.louvain(graph, min_progress=1000, progress_tries=1)
#格式整理
dataframe_louvain = result_louvain.to_dataframe({"node": "v.id", "result": "r"})

if use default args min_progress=1000, progress_tries=1 the result not the best community

result

node result
0     0       
1     1       
2     2       
3     3       
4     4       
5     5       
6     6       
7     7       
8     8       
9     9       
10   10       
11   11       
12   12       
13   13       
14   14       
15   15       
16   16       
17   17   

expect 18 nodes in 4 communities!

from graphscope.

siyuan0322 avatar siyuan0322 commented on May 24, 2024

Try tune the parameters and see if it helps. For example min_progress=1000 this means < 1000 node changes is not seen as a progress, which is too large for your 18 node graph.

from graphscope.

JackyYangPassion avatar JackyYangPassion commented on May 24, 2024
print("开始运行 Louvain 算法")
result_louvain = graphscope.louvain(graph, min_progress=2, progress_tries=2)
#格式整理
dataframe_louvain = result_louvain.to_dataframe({"node": "v.id", "result": "r"})

get the same result

node result
0     0       
1     1       
2     2       
3     3       
4     4       
5     5       
6     6       
7     7       
8     8       
9     9       
10   10       
11   11       
12   12       
13   13       
14   14       
15   15       
16   16       
17   17   

when i see #183 the louvain test result is

dataset: p2p (62586 nodes, 147892 edges)
graphscope louvain: 0.571986 find community num: 103
python-louvain: 0.578780 find community num: 110

i want to use dataset: p2p (62586 nodes, 147892 edges) where the dataset ?

from graphscope.

acezen avatar acezen commented on May 24, 2024
print("开始运行 Louvain 算法")
result_louvain = graphscope.louvain(graph, min_progress=2, progress_tries=2)
#格式整理
dataframe_louvain = result_louvain.to_dataframe({"node": "v.id", "result": "r"})

get the same result

node result
0     0       
1     1       
2     2       
3     3       
4     4       
5     5       
6     6       
7     7       
8     8       
9     9       
10   10       
11   11       
12   12       
13   13       
14   14       
15   15       
16   16       
17   17   

when i see #183 the louvain test result is

dataset: p2p (62586 nodes, 147892 edges)
graphscope louvain: 0.571986 find community num: 103
python-louvain: 0.578780 find community num: 110

i want to use dataset: p2p (62586 nodes, 147892 edges) where the dataset ?

hi, @JackyYangPassion, the p2p dataset you can use the p2p-31.v and p2p-31.e in gstest

from graphscope.

JackyYangPassion avatar JackyYangPassion commented on May 24, 2024
print("开始运行 Louvain 算法")
result_louvain = graphscope.louvain(graph, min_progress=2, progress_tries=2)
#格式整理
dataframe_louvain = result_louvain.to_dataframe({"node": "v.id", "result": "r"})

get the same result

node result
0     0       
1     1       
2     2       
3     3       
4     4       
5     5       
6     6       
7     7       
8     8       
9     9       
10   10       
11   11       
12   12       
13   13       
14   14       
15   15       
16   16       
17   17   

when i see #183 the louvain test result is

dataset: p2p (62586 nodes, 147892 edges)
graphscope louvain: 0.571986 find community num: 103
python-louvain: 0.578780 find community num: 110

i want to use dataset: p2p (62586 nodes, 147892 edges) where the dataset ?

hi, @JackyYangPassion, the p2p dataset you can use the p2p-31.v and p2p-31.e in gstest

Thanks for the reply. If the results are reproduced, what are the two configuration parameters of louvain? Are they the default ones?

min_progress=2, progress_tries=2

@acezen

from graphscope.

acezen avatar acezen commented on May 24, 2024
print("开始运行 Louvain 算法")
result_louvain = graphscope.louvain(graph, min_progress=2, progress_tries=2)
#格式整理
dataframe_louvain = result_louvain.to_dataframe({"node": "v.id", "result": "r"})

get the same result

node result
0     0       
1     1       
2     2       
3     3       
4     4       
5     5       
6     6       
7     7       
8     8       
9     9       
10   10       
11   11       
12   12       
13   13       
14   14       
15   15       
16   16       
17   17   

when i see #183 the louvain test result is

dataset: p2p (62586 nodes, 147892 edges)
graphscope louvain: 0.571986 find community num: 103
python-louvain: 0.578780 find community num: 110

i want to use dataset: p2p (62586 nodes, 147892 edges) where the dataset ?

hi, @JackyYangPassion, the p2p dataset you can use the p2p-31.v and p2p-31.e in gstest

Thanks for the reply. If the results are reproduced, what are the two configuration parameters of louvain? Are they the default ones?

min_progress=2, progress_tries=2

@acezen
like p2p the small dataset,you can try

min_progress=0
progress_tries=1

refer to: https://sotera.github.io/distributed-graph-analytics/louvain/

from graphscope.

JackyYangPassion avatar JackyYangPassion commented on May 24, 2024

I followed the mentioned dataset and modified the test_app.py:test_louvain_on_projected_graph test case locally

but I couldn't reproduce the results of the previous PR #183 . The group_id column is empty. if there is any logical error?.

test_app.py:test_louvain_on_projected_graph The code logic has been modified as follows:

# TODO: 验证p2p 数据集合:性能,结果准确性
def test_louvain_on_p2p_projected_graph(p2p_project_undirected_graph_string,graphscope_session):
    g = p2p_project_undirected_graph_string
    interactive = graphscope_session.interactive(g)
    edgeNum = interactive.execute("g.E().count()").one()
    vertexNum = interactive.execute("g.V().count()").one()
        
    ctx = louvain(g, min_progress=0, progress_tries=1)
    df_com_louvain = ctx.to_dataframe({"node": "v.id", "r": "r"})
    logger.info(df_com_louvain.head(100))
    ctx.output_to_client(f'/home/graphscope/result/p2p_louvain_result.csv', selector={'id': 'v.id', 'dist': 'r'})

the result in p2p_louvain_result.csv, the dist column is empty

id,dist
31993,
32003,
32004,
32005,

The number of groups obtained through community_louvain.best_partition(G) in nx is 62.

image

@acezen @siyuan0322 Can you help me take a look at it?

from graphscope.

JackyYangPassion avatar JackyYangPassion commented on May 24, 2024
I0103 09:19:59.000000   418 /opt/graphscope/include/graphscope/apps/pregel/louvain/louvain_app_base.h:237] super step: 56 decided to halt, ACTUAL QUALITY: 0.875365 previous QUALITY: 0
I0103 09:19:59.000000   418 /opt/graphscope/include/graphscope/apps/pregel/louvain/louvain_app_base.h:155] current super step: -1 current minor step: -1 current iteration: 0
I0103 09:19:59.000000   418 /opt/graphscope/include/graphscope/apps/pregel/louvain/louvain_app_base.h:155] current super step: 0 current minor step: 0 current iteration: 0
I0103 09:19:59.000000   418 /opt/graphscope/include/graphscope/apps/pregel/louvain/louvain_app_base.h:155] current super step: 1 current minor step: 1 current iteration: 0
I0103 09:19:59.000000   418 /opt/graphscope/include/graphscope/core/app/app_invoker.h:196] Query time: 1.05882 seconds

from the log ΔQ = ACTUAL QUALITY: 0.875365

from graphscope.

acezen avatar acezen commented on May 24, 2024

I followed the mentioned dataset and modified the test_app.py:test_louvain_on_projected_graph test case locally

but I couldn't reproduce the results of the previous PR #183 . The group_id column is empty. if there is any logical error?.

test_app.py:test_louvain_on_projected_graph The code logic has been modified as follows:

# TODO: 验证p2p 数据集合:性能,结果准确性
def test_louvain_on_p2p_projected_graph(p2p_project_undirected_graph_string,graphscope_session):
    g = p2p_project_undirected_graph_string
    interactive = graphscope_session.interactive(g)
    edgeNum = interactive.execute("g.E().count()").one()
    vertexNum = interactive.execute("g.V().count()").one()
        
    ctx = louvain(g, min_progress=0, progress_tries=1)
    df_com_louvain = ctx.to_dataframe({"node": "v.id", "r": "r"})
    logger.info(df_com_louvain.head(100))
    ctx.output_to_client(f'/home/graphscope/result/p2p_louvain_result.csv', selector={'id': 'v.id', 'dist': 'r'})

the result in p2p_louvain_result.csv, the dist column is empty

id,dist
31993,
32003,
32004,
32005,

The number of groups obtained through community_louvain.best_partition(G) in nx is 62.

image

@acezen @siyuan0322 Can you help me take a look at it?

It's weird. The louvain logic may change since the PR merged, I would try to reproduce the result and report to you ASAP.

from graphscope.

JackyYangPassion avatar JackyYangPassion commented on May 24, 2024

Thank you for your reply. @acezen
Currently, with the growth of graph data scale, we are adapting from nx to gs louvain to speed up the running efficiency. During the result verification stage, we found that the results after clustering do not meet expectations.

from graphscope.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.