Giter VIP home page Giter VIP logo

gspell_corrector's Introduction

GSpell Spelling Suggester 
=================================

This plugin wraps the GSpell API (http://lexsrv3.nlm.nih.gov/LexSysGroup/Projects/gSpell/current/GSpell.html) to add spelling suggestions to features in the input/output annotations defined.

The 2006Lexicon dictionary from the NLM's SPECIALIST toolset is provided by default. See the GSpell project page for information on how to create your own dictionaries, and further configuration options.


Parameters
==========

- Init-time
-----------
configURL: path to the directory containing the GSpellRegistry.cfg file

dictionaryURL: path to the directory containing compiled dictionary data


- Run-time
----------------
candidatesFeature: Create new feature with this name on inputASTypes (or outputASType in outputASName, if set) to hold all spelling candidates returned by GSpell.

dumpOutputFeature: Name of the feature to use to hold the raw GSpell output. Optional

excludeIfContains: If an entry within inputASTypes contain any of the annotations in this list, 
then do not spellcheck this entry.

excludeIfWithin: List of input annotations within which spellchecking should not occur.

filters: List of regular expressions to use to filter the input to the spell checker. Two filters are provided by default: ignore capitalised abbreviations and words in all caps, and words starting or ending with a digit. Optional

inputASName: Input AnnotationSet name. Optional, leave blank for default annotation set.

inputASTypeFeature: Name of the feature on inputASTypes from which to extract strings for input to the spell-checker. Optional, leave blank to use the string content of inputASTypes.

inputASTypes: List of input annotations from which to extract strings for input to the spell-checker. Default is Token. 
This parameter also accepts entries in the form Annotation.feature == value so that you can filter your input annotations
according to feature value (although regexes for value are not currently allowed).

maxCandidates: Maximum number of dictionary candidates to consider.

maxEditDistance: Limit candidates to those with edit distance <= N. Default is 2.

mode: WholePhrase: Spell check the whole string as a single phrase (e.g. 'blood morphagenic protein'). PhraseTokens: spell check individual tokens within the string.

outputASName: Output AnnotationSet name. Optional. Only used if outputASType is set.

outputASType: Create new annotations with this name, to hold the spell-corrected text. Optional - if not specified,
the spellSuggestFeature is added to entries in inputASTypes.

outputListFormat: Set to 'String' so that GSpell ArrayList<String> output can be matched with JAPE LHS expressions. Set to 'List' so that GSpell output can be iterated over with JAPE RHS expressions.

shortestWord: Ignore words shorter than N, and ignore spelling suggestions shorter than N. Default is 4.

spellSuggestFeature: Create new feature with this name on entries within inputASTypes (or outputASType in outputASName, if set) to hold the spell-corrected text.

tokASName: AnnotationSet containing Tokens. Used if mode is set to PhraseTokens. Leave blank for default annotation set.

tokType: Token annotation name and feature that contains the Token string. Default is Token.string

truncateSize: Only return the top N candidates. Default is 4.

useWordLengthHeuristic. Prune potential dictionary candidates that are +/- 4 characters in length larger or smaller than the query term. Default is true.

gspell_corrector's People

Contributors

philgooch avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.