The dominancequeries from mnappsnet

Scalable Processing of Dominance-Based Queries

What is this project?

This is an exercise in the context of the subject “Decentralized Technologies”, that take place in the master’s degree program "Data and Web Science" of Aristotle University of Thessaloniki.

Exercise description

Given a potentially large set of d-dimensional points, where each point is represented as a d-dimensional vector, we need to detect interesting points. The project is based on the concept of dominance. We say that a point p dominates another point q, when p is as good as q in all dimensions and it is strictly better in at least one dimension. We will assume that small values are preferable. For example, the point p(1, 2) dominates q(3, 4) since 1 < 3 and 2 < 4. Also, p(1, 2) dominates q(1, 3) since although they have the same x coordinate, the y coordinate of p is smaller than that of q. There are three different tasks you need to complete:

Task1. Given a set of d-dimensional points, return the set of points that are not dominated. This is also known as the skyline set.
Task2. Given a set of d-dimensional points, return the k points with the highest dominance score. The dominance score of a point p is defined as the total number of points dominated by p.
Task3. Given a set of d-dimensional points, return the k points from the skyline with the highest dominance score.

Generate the datasets

They are generated by the pythons scripts that can be found here

How to pass parameters to the DominanceQueries script

The program is reading the file settings.json that must be at the root of the project. This JSON file contains all the parameters needed. The settings.json file has the format below:

{
    "description":"Distribution: Corelated, Points: 1000, Dimensions: 2, Generated with entropy: 0.5",
    "cores":4,
    "testName":"anti-corelated.s1000.e0.5.d2",
    "dataFile":"datasets/&NAME&.csv",
    "task1ResultsOutput":"results/&NAME&/task1.csv",
    "task2ResultsOutput":"results/&NAME&/task2.csv",
    "task3ResultsOutput":"results/&NAME&/task3.csv",
    "topKpoints":10,
    "executeTask2":true,
    "executeTask3":true
}

The place holder &NAME& is used in the paths for the value of the testName property. The properties "cores", "testName" and "topKpoints" can be provided as parameters to the DominanceQueries script. To execute the DominanceQueries script : java -jar DominanceQueries.jar settings_json_path test_case_index_of_json_file test_name top_k_points cpu_cores

All the arguments are optional and the default behavior is to load the setting.json file from the execution path, get the first test case with index 0 and load the rest of the arguments from the test case properties.

Visualize script

A simple python script, visualize.py, was created in order to plot together two datasets in the same scatter plot with different colors. We do that in order to visualize the results of the different tasks in a plot.

Usage:

Arguments:
    -h, --help 
	    show this help message and exit
    -d DATA, --data DATA 
        Data to plot
    -l HIGHLIGHT, --highlight HIGHLIGHT
	    Data to highlight
    -s SAMPLES, --samples SAMPLES
	    Samples to visualise. Set 0 to use all of them.
    -o OUTPUT, --output OUTPUT
        Define where to save the plot, if not provided it is not saved

Bruteforce script

The python script, bruteforce.py, was created in order to get the results of the required tasks using brute-force method. We do that in order to validate the results of the spark implementation. Please don't run this script with many data because your PC will explode.

Usage:

Arguments:
-h, --help 
    show this help message and exit
-d DATA, --data DATA 
    Input data to process
-t TOP, --top TOP 
    Number of top points in terms of dominations

mnappsnet / dominancequeries Goto Github PK

dominancequeries's Introduction

Scalable Processing of Dominance-Based Queries

What is this project?

Exercise description

Generate the datasets

How to pass parameters to the DominanceQueries script

Visualize script

Bruteforce script

dominancequeries's People

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent