Giter VIP home page Giter VIP logo

indexed_ppr's Introduction

##Indexed PPR

This repository contains the code for investigating the effect of using indexed proximity vectors for speeding the computation of Personalized PageRank (PPR) queries as well as top-K query results.

PPR is an algorithm for computing the network proximity of all nodes in a network to a set of query nodes provided by the user.

Although there are several ways of computing the results of PPR queries, the most commonly used is the Power Iteration method. In the Power Iteration method, the network in question is represented as a column normalized matrix W of dimension (|V|, |V|) (where nodes are represented by the indices of the rows and columns of the matrix, and edges weights are represented by entries in the matrix). A "restart vector" of dimensions (|V|,1) is denoted as rq, where each query node q in the set of query nodes Q provided by the user gets an entry 1 / |Q| at location q in the vector. In the standard implementation of PPR, a "start vector" xq0 that is identical to the restart vector is used as a starting point for the computation. In addition, the user provides a value for alpha (unfortunately github markdown does not support greek letters) that is the "restart probability" of the calculation. It affects how "global" our computation is, or in other words how detailed we want our computation to be. Lower values of alpha correspond to more global computations.

Now that we have the initizations of all the variables, we can use the following iterative formula to compute the final node scores.

We continue to compute xqt+1 until the L1-distance between xqt+1 and xqt is less than a threshold epsilon (usually set at 1E-10). At this point we say that xqt+1 has converged to xq*, and we return this vector as the final scores for every node in the network.

Our contribution is speeding the PPR query computation by using indexed proximity vectors. Essentially we move some of the computation of the PPR calculation to a pre-computation step, so when a user submits a query it is proceesed more quickly.

In the pre-computation (or indexing step), we calculate partial PPR query results for every node in the network. When a user submits a query set Q, we build a start vector from the partial proximity vectors we stored. The goal is to build a start vector that is closer to the final converged vector than the standard start vector would be.

The results of this investigation have been quite positive. We have applied our method to the standard PPR implementation as well as a novel method for computing PPR queries, CHOPPER. A selection of results are shown below.

We are currently working to apply our indexing method to several PPR top-k query algorithms. These include SQUEEZE and the top-k version of CHOPPER.

indexed_ppr's People

Contributors

joewledger avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.