Giter VIP home page Giter VIP logo

permtest's Introduction

PermTest

General information

We provide software for statistical significance testing. This was originally designed for a standard IR evaluation, where one or more method is represented by vectors of real-value performance scores. However, it can be used to compare any equal-length series (of performance measurements).

This utility consumes matrix input. Each row represents a single evaluation event. Each row element is an event-specific value of an effectiveness or efficiency metric such as classification accuracy, retrieval time, etc. In IR, we commonly use the following metrics: ERR, NDCG, or MAP. We provide a Python3 wrapper for this utility.

Our software employs permutation algorithms for unadjusted pair-wise significance testing and testing with adjustment for multiple comparisons. The advantage of permutation algorithms is that they make relatively mild assumptions about statistical nature of data. In particular, they do not assume observations are normal i.i.d. variables.

The code is released under the Apache License Version 2.0 http://www.apache.org/licenses/.

For technical/theoretical details see:

Leonid Boytsov, Anna Belova, Peter Westfall, 2013, Deciding on an Adjustment for Multiplicity in IR Experiments. In Proceedings of SIGIR 2013. [BibTex]

If you use our software, please, consider citing this paper.

Software description

EvalUtil:

  • The test program itself: permtest. It accepts a matrix of performance scores (ERR, MAP, etc). We provide a Python3 wrapper for this test program.
  • Each row of the matrix represent one retrieval method (called run in TREC terminology).
  • Column I represents performance scores for the I-th query.
  • In the case of binary classification, all values are 0s and 1s. The first row represents ground truth labels.
  • An R-script SignTest.R which carries out a sign test for the purpose of binary classification. The input format is the same as for the utility permtest (in the case of binary classification). However, SignTest.R can compare only two outputs/systems at a time, but it can handle multiple classes. To this end, it relies on the SignTest.

ConvScripts:

  • Scripts to convert TREC output file (to the matrix format).
  • Each script accepts a registry file, which lists names of the files, which contain an output of a TREC evalution utility, e.g., trec_eval.
  • Each such file should represent a single run.

A working example:

  1. Compile the Eval util
  2. Go to the directory SampleData
  3. Run the shell script sample_run.sh
  4. Read the comments inside the script

permtest's People

Contributors

maxime2 avatar mug31416 avatar searchivarius avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

permtest's Issues

inout file example

It would better to put an example of the input file for the permtest.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.