crowd-source-microwork-platform-oversight-algorthim's Introduction

dataminingProject: Sama Source Gold Gating Algorthim

Our gating algorthim requires a Samasource generated json file.

To run our file, please just enter

python runner.py json_data_file.json

##Data Our data files are

data/Getty_Training1.json
data/Getty_Training2.json
data/Getty_Validation.json

These files were generated by running the Training_Validation_Generator.py file on a master download from Samasource's data warehouse. The Training Dataset can be accessed in this repository, while the Validation dataset can be downloaded here.

Parameters

The optional parameter of "batch" may be passed in the command line with the parameter "batch":

python runner.py data/Getty_Training1.json batch 892

If there is no batch with that number, or no batch specified, the script will default to the batch with the most tasks in the branch.

You can also specify if you would like to run the centroid over a loop of 30 times to estimate the profit gains over the course of a full project.

python runner.py data/Getty_Training1.json run-loop

If this parameter is on, the centroid selection iterates through a loop 30 times. This loop represents the number of gold batches all users must complete in order to finish an entire project of 300,000 tasks. These functions must also be run 30 times to decrease the variability of the k-means function that is generates random centroids. Running the loop estimates the amount of gold saved, the cost savings, and the increased profit Samasource receives if it implements the smart gating Algorithm.

The complete loop file runs in under 2 minutes for the training sets.

Plottings contains a number of functions that can be called to run graphs of the existing data.

##Dependencies

Our runner script has the following dependencies:

Numpy
Scipy
Scikit learn
Matplot Lib

##Old Code: the project's graveyard

A monty carlo simulation was developed for the threshold parameters and though not currently used, can be investigated to see what it outputs. Other unused bits of code can be found in the ```old`` folder.

Recommend Projects

lauraggit / crowd-source-microwork-platform-oversight-algorthim Goto Github PK

crowd-source-microwork-platform-oversight-algorthim's Introduction

dataminingProject: Sama Source Gold Gating Algorthim

Parameters

crowd-source-microwork-platform-oversight-algorthim's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent