Giter VIP home page Giter VIP logo

dominancequeries's Introduction

Scalable Processing of Dominance-Based Queries

What is this project?

This is an exercise in the context of the subject “Decentralized Technologies”, that take place in the master’s degree program "Data and Web Science" of Aristotle University of Thessaloniki.

Exercise description

Given a potentially large set of d-dimensional points, where each point is represented as a d-dimensional vector, we need to detect interesting points. The project is based on the concept of dominance. We say that a point p dominates another point q, when p is as good as q in all dimensions and it is strictly better in at least one dimension. We will assume that small values are preferable. For example, the point p(1, 2) dominates q(3, 4) since 1 < 3 and 2 < 4. Also, p(1, 2) dominates q(1, 3) since although they have the same x coordinate, the y coordinate of p is smaller than that of q. There are three different tasks you need to complete:

  • Task1. Given a set of d-dimensional points, return the set of points that are not dominated. This is also known as the skyline set.
  • Task2. Given a set of d-dimensional points, return the k points with the highest dominance score. The dominance score of a point p is defined as the total number of points dominated by p.
  • Task3. Given a set of d-dimensional points, return the k points from the skyline with the highest dominance score.

Generate the datasets

They are generated by the pythons scripts that can be found here

How to pass parameters to the DominanceQueries script

The program is reading the file settings.json that must be at the root of the project. This JSON file contains all the parameters needed. The settings.json file has the format below:

{
    "description":"Distribution: Corelated, Points: 1000, Dimensions: 2, Generated with entropy: 0.5",
    "cores":4,
    "testName":"anti-corelated.s1000.e0.5.d2",
    "dataFile":"datasets/&NAME&.csv",
    "task1ResultsOutput":"results/&NAME&/task1.csv",
    "task2ResultsOutput":"results/&NAME&/task2.csv",
    "task3ResultsOutput":"results/&NAME&/task3.csv",
    "topKpoints":10,
    "executeTask2":true,
    "executeTask3":true
}

The place holder &NAME& is used in the paths for the value of the testName property. The properties "cores", "testName" and "topKpoints" can be provided as parameters to the DominanceQueries script. To execute the DominanceQueries script : java -jar DominanceQueries.jar settings_json_path test_case_index_of_json_file test_name top_k_points cpu_cores

All the arguments are optional and the default behavior is to load the setting.json file from the execution path, get the first test case with index 0 and load the rest of the arguments from the test case properties.

Visualize script

A simple python script, visualize.py, was created in order to plot together two datasets in the same scatter plot with different colors. We do that in order to visualize the results of the different tasks in a plot.

Usage:

Arguments:
    -h, --help 
	    show this help message and exit
    -d DATA, --data DATA 
        Data to plot
    -l HIGHLIGHT, --highlight HIGHLIGHT
	    Data to highlight
    -s SAMPLES, --samples SAMPLES
	    Samples to visualise. Set 0 to use all of them.
    -o OUTPUT, --output OUTPUT
        Define where to save the plot, if not provided it is not saved

Bruteforce script

The python script, bruteforce.py, was created in order to get the results of the required tasks using brute-force method. We do that in order to validate the results of the spark implementation. Please don't run this script with many data because your PC will explode.

Usage:

Arguments:
-h, --help 
    show this help message and exit
-d DATA, --data DATA 
    Input data to process
-t TOP, --top TOP 
    Number of top points in terms of dominations

dominancequeries's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.