Giter VIP home page Giter VIP logo

lexicalrichness's Introduction

LexicalRichness

Documentation Status

A small python module to compute textual lexical richness measures

Installation

$ pip install lexicalrichness

Quickstart

>>> from lexicalrichness import LexicalRichness

# text example
>>> text = """Measure of textual lexical diversity, computed as the mean length of sequential words in
                a text that maintains a minimum threshold TTR score.

                Iterates over words until TTR scores falls below a threshold, then increase factor
                counter by 1 and start over. McCarthy and Jarvis (2010, pg. 385) recommends a factor
                threshold in the range of [0.660, 0.750].
                (McCarthy 2005, McCarthy and Jarvis 2010)"""

# instantiate new text object (use the tokenizer=blobber argument to use the textblob tokenizer)
>>> lex = LexicalRichness(text)

# Return word count.
>>> lex.words
57

# Return (unique) term count.
>>> lex.terms
39

# Return type-token ratio (TTR) of text.
>>> lex.ttr
0.6842105263157895

# Return root type-token ratio (RTTR) of text.
>>> lex.rttr
5.165676192553671

# Return corrected type-token ratio (CTTR) of text.
>>> lex.cttr
3.6526846651686067

# Return mean segmental type-token ratio (MSTTR).
>>> lex.msttr(segment_window=25)
0.88

# Return moving average type-token ratio (MATTR).
>>> lex.mattr(window_size=25)
0.8351515151515151

# Return Measure of Textual Lexical Diversity (MTLD).
>>> lex.mtld(threshold=0.72)
46.79226361031519

# Return hypergeometric distribution diversity (HD-D) measure.
>>> lex.hdd(draws=42)
0.7468703323966486

Attributes and properties

Methods

msttr Mean segmental TTR (Johnson 1944)
mattr Moving average TTR (Covington 2007, Covington and McFall 2010)
mtld Measure of Lexical Diversity (McCarthy 2005, McCarthy and Jarvis 2010)
hdd HD-D (McCarthy and Jarvis 2007)

Assessing method docstrings

>>> import inspect

# docstring for hdd (HD-D)
>>> print(inspect.getdoc(LexicalRichness.hdd))

Hypergeometric distribution diversity (HD-D) score.

For each term (t) in the text, compute the probabiltiy (p) of getting at least one appearance
of t with a random draw of size n < N (text size). The contribution of t to the final HD-D
score is p * (1/n). The final HD-D score thus sums over p * (1/n) with p computed for
each term t. Described in McCarthy and Javis 2007, p.g. 465-466.
(McCarthy and Jarvis 2007)

Parameters
__________
draws: int
    Number of random draws in the hypergeometric distribution (default=42).

Returns
_______
float

lexicalrichness's People

Contributors

davidlesieur avatar lsys avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.