Giter VIP home page Giter VIP logo

recluse's Introduction

Recluse

Author: L. Amber Wilcox-O'Hearn

Contact: [email protected]

Released under the GNU AFFERO GENERAL PUBLIC LICENSE, see COPYING file for details.

Introduction

recluse (Reproducible Experimentation for Computational Linguistics USE) is a set of tools for running computational linguistics experiments reproducibly.

This version contains

* utils, which has four functions: ** open_with_unicode for reading and writing unicode with regular or compressed text ** split_file_into_chunks for splitting a file into smaller pieces. This is needed for some tools that load everything into RAM, or train on all the data when we would be satisfied with training on partial data. ** partition_by_list works like a combination of the string methods partition and split; it keeps the separators, but partitions into a list. ** precision_recall_f_measure calculates those things.

  • article_selector (to replace article_randomiser below), reproducibly randomly selects a portion of a large corpus for the experiment, divides it into training, development, and test sets, and returns an article index to those sets.
  • article_randomiser, which reproducibly randomly divides a corpus into training, development, and test sets.
  • nltk_based_segmenter_tokeniser, which does sentence segmentation and word tokenisation. It is optimised for Wikipedia type text.
  • vocabulary_generator and the helper class vocabulary_cutter. This wraps srilm as it makes unigram counts, and then selects the most frequent.

Dependencies

recluse depends on the pypi package regex, which (unlike re) has unicode category support.

sudo pip install regex

Installing

recluse is registered with pypi, so can be installed with pip:

sudo pip install recluse

recluse's People

Stargazers

Nazeeruddin Ikram avatar  avatar Sudhanshu Mishra avatar

Watchers

James Cloos avatar L. Amber O'Hearn avatar

recluse's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.