Giter VIP home page Giter VIP logo

pagerank's Introduction

Page Rank

This Project developed for the university course of "Cloud Computing" reproduce the PageRank algorithm, that was the first algorithm used in Google to rank web pages in their search engine results. The project reproduced PageRank using both Spark and Hadoop, two of the most used analytics engine for large-scale data processing

How to execute

Firstly be sure that 'wiki-micro.txt' is present in HDFS with:

hadoop fs -ls

How to execute the Spark Page Rank application

Then, in the folder where the 'PageRank_Spark.py' is present execute:

$SPARK_HOME/bin/spark-submit PageRank_Spark.py <input> <output> <iteration> <alpha>

With input 'wiki-micro.txt', output will be the folder in which the program will write the results divided into different file 'part-0000x' inside the output folder. To check result:

hadoop fs -cat output/part-0000x 

How to execute the Hadoop Page Rank application

In the PageRank folder execute

mvn clean package

Then:

 hadoop jar target/PageRank-1.0-SNAPSHOT.jar it.unipi.hadoop.PageRank <input> <output directory> <# of iterations> <# of reducers> <random jump probability>

You will retrieve the output in the ouput directory.

pagerank's People

Contributors

bicchie avatar lorebianchi98 avatar

Watchers

 avatar

Forkers

bicchie

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.