Giter VIP home page Giter VIP logo

csci3390-project3's Introduction

Large Scale Data Processing: Project 3

Authors:

  • Ilan Valencius (valencig)
  • Steven Roche (sroche14)
  • Jason Adhinarta (jasonkena)

Getting started

Head to Project 1 if you're looking for information on Git, template repositories, or setting up your local/remote environments.

  • Steven Roche (sroche14)
  • Ilan Valencius (valencig)
  • Jason Adhinarta (jasonkena)

1 - verifyMIS

// Linux
spark-submit --class project_3.main --master local[*] target/scala-2.12/project_3_2.12-1.0.jar verify [path_to_graph] [path_to_MIS]

// Unix
spark-submit --class "project_3.main" --master "local[*]" target/scala-2.12/project_3_2.12-1.0.jar verify [path_to_graph] [path_to_MIS]

Apply verifyMIS locally with the parameter combinations listed in the table below and fill in all blanks.

Graph file MIS file Is an MIS?
small_edges.csv small_edges_MIS.csv
small_edges.csv small_edges_non_MIS.csv
line_100_edges.csv line_100_MIS_test_1.csv
line_100_edges.csv line_100_MIS_test_2.csv
twitter_10000_edges.csv twitter_10000_MIS_test_1.csv
twitter_10000_edges.csv twitter_10000_MIS_test_2.csv

2 - LubyMIS

Implement the LubyMIS function. The function accepts a Graph[Int, Int] object as its input. You can ignore the two integers associated with the vertex RDD and the edge RDD as they are dummy fields. LubyMIS should return a Graph[Int, Int] object such that the integer in a vertex's data field denotes whether or not the vertex is in the MIS, with 1 signifying membership and -1 signifying non-membership. The output will be written as a CSV file to the output path you provide. To execute the function, run the following:

// Linux
spark-submit --class project_3.main --master local[*] target/scala-2.12/project_3_2.12-1.0.jar compute [path_to_input_graph] [path_for_output_graph]

// Unix
spark-submit --class "project_3.main" --master "local[*]" target/scala-2.12/project_3_2.12-1.0.jar compute [path_to_input_graph] [path_for_output_graph]

Apply LubyMIS locally on the graph files listed below and report the number of iterations and running time that the MIS algorithm consumes for each file. You may need to include additional print statements in LubyMIS in order to acquire this information. Finally, verify your outputs with verifyMIS.

Graph file Running time [s] Iterations verifyMIS
small_edges.csv 0 1
line_100_edges.csv 1 2
twitter_100_edges.csv 1 2
twitter_1000_edges.csv 1 3
twitter_10000_edges.csv 1 3
twitter_original_edges.csv 3694 4

3 - LubyMIS (GCP)

a. Run LubyMIS on twitter_original_edges.csv in GCP with 3x4 cores. Report the number of iterations, running time, and remaining active vertices (i.e. vertices whose status has yet to be determined) at the end of each iteration. You may need to include additional print statements in LubyMIS in order to acquire this information. Finally, verify your outputs with verifyMIS.
b. Run LubyMIS on twitter_original_edges.csv with 4x2 cores and then 2x2 cores. Compare the running times between the 3 jobs with varying core specifications that you submitted in 3a and 3b.

Configuration Running time [s] Iterations Remaining active vertices verifyMIS
3x4 cores 953 5 6584804, 32309, 427, 5, 0
4x2 cores 1466 5 6513956, 34359, 434, 1, 0
2x2 cores 2257 5 7082835, 33850, 451, 2, 0

Here we see that having more cores means that the wall time is reduced. This is because the algorithm is parallelizable and can be split into multiple tasks that can be run concurrently. This means that the more cores we have, the more tasks can be run concurrently, and the faster the algorithm will run.

csci3390-project3's People

Contributors

ivalencius avatar sroche14 avatar jasonkena avatar hsinhaosu avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.