CompNode

CompNode, an effective and efficient node ranking framework for billion-scale heterogeneous graphs. It can accurately rank over 500 million nodes within a day by leveraging rich path-based information relevant to target conditions. The more relevant paths are used, the higher the ranking accuracy, and the addition of more path types does not significantly impact computation time. For detailed design and analysis, please refer to our paper "Complex-Path: Effective and Efficient Node Ranking with Paths in Billion-Scale Heterogeneous Graphs" (currently under review).

Three stages in the workflow of our CompNode framework.

Complex-Path

Our framework is mainly based on a new generalized path schema called 'Complex-path' for defining non-linear conditional path types in a standard form. In real-world scenarios, we find that many non-linear conditional path types represent highly effective information. For example, when searching for potential high-value customers, if a customer can be associated with many existing high-value customers through the paths illustrated in the figure below, they are often likely to be a potential high-value customer themselves.

An example of nonlinear conditional path type. (Link each customer to existing high-value customers via co-industry and co-region relations.)

However, the commonly used path schema ’Meta-path‘ in heterogeneous graph neural networks can only represent linear path types. Therefore, we refine the Meta-path definition to convey nonlinear conditional paths while maintaining a generalized path schema suitable for heterogeneous graph neural networks. We propose four improvements based on the four situations shown below.

The final definition of Complex-path is:

A complex-path is defined on a heterogeneous graph HG = (V, E) and is represented as: p = { [ (o1, L1, o2), ..., (oi, L(j-1), oj), ..., (on, L(m-1), om) ] | C }. Each tuple (oi, L(j-1), oj) connects the j-th node oj to the prior i-th node oi through any edge or complex-path in L(j-1) = [..., ek, ...], where ek = lk or pk, i < j. C is the conditional constraints on the attributes of all nodes and edges.

Real-world Deployment

This framework has been effectively deployed in JD Logistics for finding top potential high-value customers. The following figure illustrates that, based on our prediction results, we made over 262,720 phone calls in China before October 2023. This strategy outperforms the previously deployed method by 252% in success rate. We are still using this framework for recommending customers with more potential value. As of December 1, 2023, more than 50,000 phone calls have been made in the last week alone. For more comprehensive details about our framework, please refer to our paper.

Brief Description for Our Code

The 'Utils' folder includes our methods for extracting information from heterogeneous graphs, developed using Spark and complex-path. This folder also includes baseline methods like meta-path and k-hop sampling.

The 'Model' folder contains our model implemented with PyTorch, along with the baseline models referenced in our paper.

The 'Main' folder includes code that utilizes the functions from the aforementioned folders to implement node ranking.

We are continuously refining and optimizing our code, and will keep updating it.

Requirements

pyspark==3.0.0
pytorch==1.8.0

jqhang / complex-path Goto Github PK

complex-path's Introduction

CompNode

Complex-Path

Real-world Deployment

Brief Description for Our Code

Requirements

complex-path's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent