cogcomp / open-eval Goto Github PK

An open source evaluation framework for developing NLP systems

Java 70.20% HTML 19.23% Scala 1.56% CSS 1.58% Shell 0.18% JavaScript 7.25%

open-eval's Introduction

OpenEval

Project's slack channel.

About

Data scientists working on machine learning problems have historically had several issues relating to evaluating their systems: spending time individually developing evaluation frameworks for tasks, comparing results over time, and keeping evaluations consistent among teams. OpenEval is a system designed to address these problems.

In developing this system we set out to build a centralized, easy-to-use platform for groups to evaluate their models. All the user needs to do to evaluate their solver is host it on a thin server, which we provide. Then, on the web interface, they need to select their desired task and dataset to test their solver. After their solver finishes processing the dataset, the user can view the results.

Modules

The project contains two main modules.

The OpenEval core, which contains the main functionalities and the web app.
A Learner, which acts as a toy system to be evaluated against core.

Quick Guide on Running the Apps

Your will need Java 8 in order to run App. openjdk on Ubuntu seems to have issues.

You will also need sbt

First, run sbt from the parent directory.

projects will show the names of the existing module names.
- project core will take you inside the core package.
- project learner will take you inside the examples package.
Inside each project you can compile it, or run it.

If you run the core you can browse to localhost:9000. To run on specific port simply add the port number after run. You also should not have to start it multiple times. You can just save code, and refresh the page.

Note: OpenEval server needs to store its backend information in a SQL database. If you want to run it on your machine first create a SQL DB, rename core/conf/application.conf.sample to core/conf/application.conf, and add the DB information (url, username and password).

open-eval's People

Contributors

Stargazers

Watchers

Forkers

gerko32 joshuacamp dshine2 dhruvvajpeyi ryannk

open-eval's Issues

make datapoints on graph clickable

checkboxes for f1, precision, and recall on configuration page

readme for core

I think we should have a readme for the core, where we explain "how to start using it", e.g. setting up DB, as well as its internal structure like evaluators, connections to DB, Learner etc.

Flywaydb?

Do you think Flywaydb might be of any use for us?
It also has a sbt plugin.

change team name

change "team name" to "configuration name" on add configuration page

Adding bootstrap

Proposal:
Add bootstrap (https://www.playframework.com/documentation/2.4.x/AssetsLess) to the code and design a basic interface for the application.

Fix double pull from db

Show page hierarchy

Show page hierarchy as in:
http://getbootstrap.com/components/#breadcrumbs

Fix dataset name hack in core

Fix TextAnnotation clone

   @Test
public void cloneTest() {
    String[] views = new String[] {ViewNames.POS, ViewNames.SENTENCE};
    TextAnnotation ta = DummyTextAnnotationGenerator.generateAnnotatedTextAnnotation(views, false);
    Assert.assertTrue(ta.hasView(ViewNames.SENTENCE));
    Assert.assertTrue(ta.hasView(ViewNames.POS));
    TextAnnotation copy = null;
    try{
        copy = (TextAnnotation) ta.clone();
    }
    catch(CloneNotSupportedException te){
        return;
    }

    copy.removeView(ViewNames.SENTENCE);
    Assert.assertFalse(copy.hasView(ViewNames.SENTENCE));
    Assert.assertTrue(ta.hasView(ViewNames.SENTENCE)); // FAILS ASSERTION

}

Scalable graph of scores?

For POS: (Task-Variant) -> Evaluator Mapping

Raw Text -> Span Splitting, Span Labeling
Sentence Boundaries -> Span Splitting, Span Labeling
Gold Token -> Span labeling

Delete Job Class

Using toy generator of CoreUtils.

Instead of/in addition to this add calls to a real POS tagger. Here is one.

replacing `score` with `EvaluationRecord`?

Would it make sense if we replace double score with EvaluationRecord?

ordering configurations

Is there a way to sort the configurations shown in the landing page? Say by the last time they are used/updated?

Initiate the core

proposal:
Add a class (in app/controllers with package name, say edu.illinois.cs.cogcomp.core) for the core.
The core is supposed to contain the interface of the solver and the evaluation system. The internal datastructure of the system is supposed to be a (TextAnnotation)[https://github.com/IllinoisCogComp/illinois-cogcomp-nlp/blob/master/core-utilities/src/main/java/edu/illinois/cs/cogcomp/core/datastructures/textannotation/TextAnnotation.java#L21]. So we need add core-utilities as dependency (example here)[https://github.com/IllinoisCogComp/saul/blob/master/build.sbt#L14].

The evaluation framework is supposed to send a TextAnnotation object to the solver, and will receive a modified TextAnnotation.

Define CHUNK size

Retrieving and cleansing the TextAnnotations is very computationally expensive, so I propose we define a variable CHUNK_SIZE in a config file defining the number of TA's to send to the solver at a time.

Add testConnection method to core and call before submitting run

Multi-threading in core-learner interface

We should be able to send the question in parallel to a solver.

overlapping f1, precision, and recall on the graph

using progress bar when evaluating

Use progress bar while we are evaluating:
http://getbootstrap.com/components/#progress-basic

add a “Working…” page to display while the configuration is running

“Are you sure??” question to delete configuration

“Are you sure??” question to delete configuration. Or “All data will be lost”

Pull dataset names from the database

Running the code non-locally

Add "Views to keep" to Task Variant table in DB

Get rid of evaluators here

get rid of the evaluators and use the ones in coreutils:
https://github.com/cogcomp-dev/illinois-cogcomp-nlp/tree/master/core-utilities/src/main/java/edu/illinois/cs/cogcomp/core/experiments/evaluators

Use `btn-groups-toolbar` to show extra configurations and runs

Use btn-groups-toolbar to show extra configurations and runs:
http://getbootstrap.com/components/#btn-groups-toolbar

Lemmatizer data and its evaluation

Add lemmatizer data and evaluate it

Checking if the new config name already exists in the db.

As of #54 we are using configuration_names to identify the configurations and assuming that they are unique. If so, when creating new configs, shall we check if the entered config id already exists in the db or not? (it will remove the previous one, if it already exist; right?)

fyi @dshine2 @paultgibbons

Test System on Multiple Web Browsers.

Priority:
Google Chrome
Firefox
Safari

Unfortunately:
Internet Explorer
Edge

Storing / Reading datasets

@mssammon @christos-c @cogcomp-dev and all open-eval team!

We have discussed this a couple of times (beyond the open-eval) that how to access datasets programmatically, meanwhile not adding them directly to our code repositories. This is something that can be very useful for any project, but essential for open-eval, since we want to load the data programmatically inside the evaluation system.

I want to bring AI2's datastore to your attention. This package saves all data files/folders on Amazon S3 (which is very cheap). I did a little experimentation with this package (A deploy is here).

Using it is very easy; for example here I am uploading the dataset we use for training/testing POS tagging from my computer:

    Datastore.publishDirectory(
      "/home/daniel/ideaProjects/saul/data/POS",
      "edu.illinois.cs.cogcomp",
      "POS-tagging-data",
      1,
      false)

Later for accessing the data (on any machine, without having the data locally) I can have do:

    val dataPath: java.nio.file.Path =
      Datastore.directoryPath(
        "edu.illinois.cs.cogcomp",
        "POS-tagging-data",
        1)

Running the above will download the data from S3 to a folder (in home dir) and give its path. (running it next time it just reads it from cache). Building on top of the above, I can evaluate Saul POS tagger in few simple steps:

def testPOSTagger() = {
    val dataPath: java.nio.file.Path =
      Datastore.directoryPath(
        "edu.illinois.cs.cogcomp",
        "POS-tagging-data",
        1)

    /** Read your data from datastore. */
    lazy val testData = {
      val testDataReader = new PennTreebankPOSReader("testData")
      testDataReader.readFile(dataPath.toString + "/22-24.br")

      var sentenceId = 0
      testDataReader.getTextAnnotations.flatMap(p => {
        val cons = commonSensors.textAnnotationToTokens(p)
        sentenceId += 1
        //      Adding a dummy attribute so that hashCode is different for each constituent
        cons.foreach(c => c.addAttribute("SentenceId", sentenceId.toString))
        cons
      }).toList
    }

    /** Populate your data in the model */
    POSDataModel.tokens.populate(testData, train = false)

    /** Load the models for the POS classifier */
    POSClassifiers.loadModelsFromPackage()

    testPOSTagger(testData)
  }


  def testPOSTagger(testData: List[Constituent]): Unit = {
    val tester = new TestDiscrete
    val testReader = new LBJIteratorParserScala[Constituent](testData)
    testReader.reset()

    testReader.data.foreach(cons => {
      val gold = POSDataModel.POSLabel(cons)
      val predicted = POSClassifiers.POSClassifier(cons)
      tester.reportPrediction(predicted, gold)
    })

    tester.printPerformance(System.out)
  }

which gives me the follow results:

...
[info] VBG         91.483  92.240  91.860   1933   1949
[info] VBN         86.041  90.395  88.164   2707   2844
[info] VBP         93.894  91.374  92.617   1565   1523
[info] VBZ         96.941  96.059  96.498   2639   2615
[info] WDT         97.967  90.753  94.222    584    541
[info] WP          98.596  99.293  98.944    283    285
[info] WP$        100.000 100.000 100.000     37     37
[info] WRB         99.671  99.671  99.671    304    304
[info] ``         100.000 100.000 100.000   1074   1074
[info] ------------------------------------------------
[info] Accuracy    96.439    -       -      -    129654

So what do you think about adopting this project for our work?

Side note: here is repeating the same steps in Java:

java.nio.file.Path dataPath = Datastore$.MODULE$.directoryPath( "edu.illinois.cs.cogcomp", "POS-tagging-data", 1);

        PennTreebankPOSReader testDataReader = new PennTreebankPOSReader("testData");
        testDataReader.readFile(dataPath.toString() + "/22-24.br");

        java.util.List<Constituent> testData = testDataReader.getTextAnnotations()
                .get(0).getView(ViewNames.TOKENS).getConstituents();

        scala.collection.Iterable<Constituent> testDataInScalaCollection = scala.collection.JavaConversions.asScalaBuffer(testData);

        /* Populate your data in the model */
        POSDataModel.tokens().populate(testDataInScalaCollection, false);

        /* Load the models for the POS classifier */
        POSClassifiers.loadModelsFromPackage();

        /* Make prediction on the input instances */
        for(Constituent constituent : testData ) {
            String predicted = POSClassifiers.POSClassifier(constituent);
            System.out.println(constituent + "  ->  " +  predicted);
        }

Renaming module names

When we deploy the project as jar files, the name will be learner_xx.jar which is not very intuitive.
Let's rename the module name to something else; say openeval-learner?

Following that, to keep the consistency, let's rename others as well, say to openeval-core.

Also let's make sure only the learner get's published not the other modules.

Show score type in conf page.

Show score type on configuration page, for per-record.

visualizing outputs of textAnnotations?

standardize naming (run vs. record?)

Change “Download our thin client”

Change “Download our thin client” to Instructions on how to use the system" or sth like that.

Define maximum number of TextAnnotations to store in database

Fixing Tasks and Task Variants

Task -------------------------- Task Variant

Part of Speech Tagging -> Raw Text, Gold Token, Sentence Boundaries
Named Entity Recognition -> Raw Text, Gold Token, Sentence Boundaries
Parsing -> Raw Text, Gold Token, Sentence Boundaries
Co-reference -> Raw Text, Gold Token, Sentence Boundaries

Making the learner, language-agnostic?

Allow user to view logs

Error handling

properties files containing essential parameters

We need to have an parameter file which contains important variables set from outside code; for example:

address, port, usename, password for database

Import POS tagging dataset

Since we are setting our learner to work based on POS tagging, let's import the POS tagging dataset into our DB, so that we can run our system against this data.
Since the data is propriety, I am not sharing it here. Instead I am share the data personally.

Here is how to read the data via a reader inside CoreUtils:

PennTreebankPOSReader testDataReader = new PennTreebankPOSReader("testData");
testDataReader.readFile(dataPath.toString() + "/22-24.br");