Giter VIP home page Giter VIP logo

mapreduce-pr's Introduction

使用MapReduce实现的PageRank算法

1.执行环境要求

软件依赖:Hadoop 2.7.3,JDK 1.8

执行环境配置:建议使用docker搭建执行环境。具体参见https://github.com/ruoyu-chen/hadoop-docker

2.测试数据格式

测试数据下载自http://www.limfinity.com/ir/data/hollins.dat.gz。

经过处理后,原始数据格式被转化为如下的邻接链表形式:

src, pr, dest1, dest2, ..., destn

其中src为源页面id,pr为当前轮src页面的PageRank值,destx为源页面所指向的目标页面id。

经过处理后的测试数据(pr.dat文件)可以从百度云下载:

链接: https://pan.baidu.com/s/1jI82WU6 密码: vaef

3.测试数据准备

安装配置好Hadoop运行环境后,需要将pr.dat文件上传到HDFS的/pr目录下,可以使用以下命令:


# [创建/pr目录]
hadoop fs -mkdir /pr
# [上传测试文件]
hadoop fs -put /code/pr.dat /pr

4.提交任务

运行下列代码,向集群提交任务:

hadoop jar MapReducePR.jar cn.edu.bistu.mrpr.PageRankJob /pr/ 

程序执行完毕后,执行结果存放在HDFS文件系统中的/output目录下最新的一个目录中

5.参考资料

Data-Intensive Text Processing with MapReduce https://lintool.github.io/MapReduceAlgorithms/index.html

mapreduce-pr's People

Contributors

ruoyu-chen avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.