Giter VIP home page Giter VIP logo

crowd-source-microwork-platform-oversight-algorthim's Introduction

dataminingProject: Sama Source Gold Gating Algorthim

Our gating algorthim requires a Samasource generated json file.

To run our file, please just enter

python runner.py json_data_file.json

##Data Our data files are

data/Getty_Training1.json
data/Getty_Training2.json
data/Getty_Validation.json

These files were generated by running the Training_Validation_Generator.py file on a master download from Samasource's data warehouse. The Training Dataset can be accessed in this repository, while the Validation dataset can be downloaded here.

Parameters

The optional parameter of "batch" may be passed in the command line with the parameter "batch":

python runner.py data/Getty_Training1.json batch 892

If there is no batch with that number, or no batch specified, the script will default to the batch with the most tasks in the branch.

You can also specify if you would like to run the centroid over a loop of 30 times to estimate the profit gains over the course of a full project.

python runner.py data/Getty_Training1.json run-loop

If this parameter is on, the centroid selection iterates through a loop 30 times. This loop represents the number of gold batches all users must complete in order to finish an entire project of 300,000 tasks. These functions must also be run 30 times to decrease the variability of the k-means function that is generates random centroids. Running the loop estimates the amount of gold saved, the cost savings, and the increased profit Samasource receives if it implements the smart gating Algorithm.

The complete loop file runs in under 2 minutes for the training sets.

Plottings contains a number of functions that can be called to run graphs of the existing data.

##Dependencies

Our runner script has the following dependencies:

  • Numpy
  • Scipy
  • Scikit learn
  • Matplot Lib

##Old Code: the project's graveyard

A monty carlo simulation was developed for the threshold parameters and though not currently used, can be investigated to see what it outputs. Other unused bits of code can be found in the ```old`` folder.

crowd-source-microwork-platform-oversight-algorthim's People

Contributors

cjnayak avatar lauraggit avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.