Giter VIP home page Giter VIP logo

unmasking's Introduction

General-purpose Unmasking Framework

This is a general-purpose unmasking framework, primarily (but not only) developed for authorship verification. For the original unmasking paper, see Koppel and Schler (2004).

The framework is released on the Apache 2.0 license.

Requirements

The framework needs Python >= 3.6. Dependencies are installed via:

pip3 install -r requirements.txt

By default, plots are rendered using the Qt5Agg Matplotlib backend if run in a graphical environment or Agg otherwise. The graphical backend can be changed by editing the matplotlibrc file (e.g., if you have no Qt on your system).

Usage

Unmasking experiments are defined in YAML configuration files. The configuration defaults can be found in authorship_unmasking/etc/defaults.yml (for unmasking) and authorship_unmasking/etc/defaults_meta.yml (for the meta classifier). Specific job configurations can override all or part of these default configurations. An commented example configuration with a small test corpus can be found in examples/gutenberg_test. The default configuration files are also commented, so please have a look at their contents as well.

Unmasking

To generate unmasking curve plots, use the unmask tool with the run command. This will parse the given job configuration and generate unmasking curves on the provided input data. The job's output is saved to an out folder next to the job configuration.

Example:

./unmask run examples/gutenberg_test/job.yml

Output will be saved to examples/gutenberg_test/out.

The output consists of individual curve plots generated with various parameters, the raw numbers as JSON, as well as an average aggregation of all individual parameter configurations (see the example job configuration file for more details). If you only want to aggregate existing runs, use

./unmask aggregate JSON_FILE [JSON_FILE ...]

where JSON_FILE is the generated raw JSON file of one or more existing runs.

For a full list of all parameters, specify the -h flag:

./unmask run -h
./unmask aggregate -h

Meta Classification

After you have generated your unmasking curves, you can train and evaluate meta classification models on them by using the classify tool. It comes with the train, apply, eval, and model_select commands for training a new model, applying a previously-trained model, evaluating a model on a corpus with a ground truth, and selecting the best-performing model of a series of pre-trained models.

The classify tool also takes a YAML job configuration via the --config flag, although in most cases it shouldn't be necessary to write a custom configuration. If no configuration is given, the defaults from authorship_unmasking/etc/defaults_meta.yml will be used.

Examples:

Train a model on the JSON output of an unmasking job:

./classify train INPUT_JSON

Apply a pre-trained model on unlabeled unmasking JSON:

./classify apply MODEL INPUT_JSON

Train and evaluate a classifier an two labeled unmasking JSON dumps:

./classify eval INPUT_JSON_TRAIN INPUT_JSON_TEST

Select the best-performing model from the input directory (one sub directory per model):

./classify model_select INPUT_DIR

For a full list of all parameters, use the help parameter -h on a command.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.