Giter VIP home page Giter VIP logo

qa_eval's Introduction

Question Answering Evaluation Tools

The project targets on question answering models for 2020 Formosa Grand Challenge. It provides tools for correctness evaluation, analytics and visualization.

Prerequisites

The project only support Python3. You need to install following packages via pip or conda:

  • json-lines
  • pandas
  • tabulate

Usages

Example Usage

Run following command and you will see the results:

python3 eval.py \
  ./examples/official_1_questions.json \
  ./examples/official_1_answers.json \
  ./examples/predictions.jsonl

Detailed Usages

Run following command to see detail usages:

python3 eval.py -h

The usages is shown as follows:

usage: eval.py [-h] [-o OUTPUT]
               [-f {plain,simple,github,grid,fancy_grid,pipe,orgtbl,jira,presto,psql,rst,mediawiki,moinmoin,youtrack,html,latex,latex_raw,latex_booktabs,textile}]
               questions_path answers_path predictions_path

Evaluation tool of question answering model for 2020 Formosa Grand Challenge

positional arguments:
  questions_path        Path to questions JSON file (.json)
  answers_path          Path to ground truth answers JSON file (.json)
  predictions_path      Path to predictions JSON Lines file (.jsonl)

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Path to directory that saves the results (default:
                        output)
  -f {plain,simple,github,grid,fancy_grid,pipe,orgtbl,jira,presto,psql,rst,mediawiki,moinmoin,youtrack,html,latex,latex_raw,latex_booktabs,textile}, --format {plain,simple,github,grid,fancy_grid,pipe,orgtbl,jira,presto,psql,rst,mediawiki,moinmoin,youtrack,html,latex,latex_raw,latex_booktabs,textile}
                        Format to print the result tables (default: github)

Outputs

After running the tools, you will get three kinds of result: details, correctness, error reasons and scores.

Details

The result shows detail for each of your model prediction. You can see a compact version report on stdout and full version report in output/details.csv.

Meaning of the fields:

  • PID: Passage ID
  • QID: Question ID
  • Passage: Passage text (only in full version)
  • Question: Question text (only in full version)
  • Question Type: Question type, it can be:
    • passage_span: The answer should be extracted from passage
    • question_span: The answer should be extracted from question
    • multiple_spans: The answer should be extracted from passage (more than one span)
    • yesno: The answer should be yes or no
    • math: The answer is a math problem (arithmetic or counting)
  • Answer: Ground truth text (only in full version)
  • Prediction: Your model prediction text (only in full version)
  • Prediction Type: Your model prediction type
  • Correct?: Your model prediction is correct or not
  • Score: The score your model got for this question
  • Non-Correct Reason: If your model prediction is not correct, what is the reason? It can be:
    • right_type_wrong_prediction: Your model predicted a correct question type, but its prediction is wrong
    • wrong_type: Your model predicted a wrong question type

An example of compact version details report is shown as follows:

Correctness

The result shows correctness count and rate of each question type. You can see the report on stdout and in output/correctness.csv.

Meaning of the fields:

  • Type: Question type, it can be:
  • Question Count: Total count of that question type
    • passage_span: The answer should be extracted from passage
    • question_span: The answer should be extracted from question
    • multiple_spans: The answer should be extracted from passage (more than one span)
    • yesno: The answer should be yes or no
    • overall: All question types
    • math: The answer is a math problem (arithmetic or counting)
  • Correct Count: Correct count of that question type from your predictions
  • Correct Rate: Correct rate of that question type from your predictions

An example is shown as follows:

Error Reasons

The result shows count and rate for each reason for non-correct predictions. You can see the report on stdout and in output/error_reasons.csv.

Meaning of the fields:

  • Reason: The reason that your model predictions are not correct. It can be:
    • right_type_wrong_prediction: Your model predicted a correct question type, but its prediction is wrong
    • wrong_type: Your model predicted a wrong question type
  • Count: Count of that reason from your non-correct predictions
  • Count: Rate of that reason from your non-correct predictions

An example is shown as follows:

Scores

You will see the total your model got.

qa_eval's People

Contributors

sujiakuan avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.