Giter VIP home page Giter VIP logo

easl's Introduction

Efficient Annotation of Scalar Labels (EASL)

Last update: June 5, 2018

preprint


Pre-requisites

  • Python 3.x
  • numpy (pip install numpy)

Procedure

  1. Create your project by sh create_project.sh YOUR_PROJECT_NAME

    Let's assume our project name is political.

    (e.g., sh create_project.sh political)

  2. Prepare data (csv file) in ./experiments/political/XYZ.csv

    Your data should be formatted in a csv file, consisting of (at least) the columns of id, sent.

          e.g., 
          id,sent
          1,obama is a legend in his own mind
          2,conservatives are racists
          3,cruz is correct
          4,romney is president
          5,obama thinks there are 57 states
          ...
    

    Let's assume the file name is political.csv

    Note: You can add additional columns. For example, if you want to annotate on a pair of sentences such as premise and hypothesis, the columns will look like id, premise, hypothesis)

    Run python initialize.py ./experiments/political/political.csv to set initial parameters (alpha, beta, mu, sigma)

    The result csv file (political_0.csv) should be as follows.

      e.g., 
      id,sent,alpha,beta,mu,sigma
      1,obama is a legend in his own mind,1,1,0.5,0.0833
      2,conservatives are racists,1,1,0.5,0.0833
      3,cruz is correct,1,1,0.5,0.0833
      4,romney is president,1,1,0.5,0.0833
      5,obama thinks there are 57 states,1,1,0.5,0.0833
      ...
    

    It has additional columns: alpha, beta, mu, sigma.

    In this example, we place the file at ./experiments/political/political_0.csv.

    Prepare your HIT template in ./templates/political/

    See an example at ./templates/political/template_political.html.

    Now, we are ready to start annotation with EASL!

  3. Generate HITs

    We generate our HITs by running the following command.

     python main.py --operation generate --model ./experiments/political/political_0.csv --hits 25
    

    This generates political_hit_1.csv that has 25 HITs.

    The number of HITs (per iteration) should depend on your data size. (See python main.py --help for more details.)

  4. Publish the HITs (with the template file.)

  5. Collect the result and name it ./experiments/political/political_result_1.csv.

  6. Update the model

     python main.py --operation update --model ./experiments/political/political_0.csv
    

    This takes political_0.csv, political_result_1.csv, and then generate political_1.csv.

  7. Go back to the step 3 (Generate HITs) and iterate the procedure.

    (e.g., If this is the first iteration with political_0.csv, use political_1.csv to generate HITs in the next iteration.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.