Giter VIP home page Giter VIP logo

fst-lookup's Introduction

FST Lookup

Build Status codecov PyPI version calver YYYY.MM.DD

Implements lookup for Foma finite state transducers.

Supports Python 3.5 and up.

Install

pip install fst-lookup

Usage

Import the library, and load an FST from a file:

Hint: Test this module by downloading the eat FST!

>>> from fst_lookup import FST
>>> fst = FST.from_file('eat.fomabin')

Assumed format of the FSTs

fst_lookup assumes that the lower label corresponds to the surface form, while the upper label corresponds to the lemma, and linguistic tags and features: e.g., your LEXC will look something like this—note what is on each side of the colon (:):

Multichar_Symbols +N +Sg +Pl
Lexicon Root
    cow+N+Sg:cow #;
    cow+N+Pl:cows #;
    goose+N+Sg:goose #;
    goose+N+Pl:geese #;
    sheep+N+Sg:sheep #;
    sheep+N+Pl:sheep #;

If your FST has labels on the opposite sides—e.g., the upper label corresponds to the surface form and the upper label corresponds to the lemma and linguistic tags—then instantiate the FST by providing the labels="invert" keyword argument:

fst = FST.from_file('eat-inverted.fomabin', labels="invert")

Hint: FSTs originating from the HFST suite are often inverted, so try to loading the FST inverted first if .generate() or .analyze() aren't working correctly!

Analyze a word form

To analyze a form (take a word form, and get its linguistic analyzes) call the analyze() function:

def analyze(self, surface_form: str) -> Iterator[Analysis]

This will yield all possible linguistic analyses produced by the FST.

An analysis is a tuple of strings. The strings are either linguistic tags, or the lemma (base form of the word).

FST.analyze() is a generator, so you must call list() to get a list.

>>> list(sorted(fst.analyze('eats')))
[('eat', '+N', '+Mass'),
 ('eat', '+V', '+3P', '+Sg')]

Generate a word form

To generate a form (take a linguistic analysis, and get its concrete word forms), call the generate() function:

def generate(self, analysis: str) -> Iterator[str]

FST.generate() is a Python generator, so you must call list() to get a list.

>>> list(fst.generate('eat+V+Past')))
['ate']

Contributing

If you plan to contribute code, it is recommended you use Pipenv. Fork and clone this repository, then install development dependencies by typing:

pipenv install

Then, do all your development within a virtual environment, managed by Pipenv:

pipenv shell

Type-checking

This project uses mypy to check static types. To invoke it on this package, type the following:

mypy -p fst_lookup

Running tests

To run this project's tests, we use py.test:

py.test

Fixtures

If you are creating or modifying existing test fixtures (i.e., mostly pre-built FSTs used for testing), you will need the following dependencies:

Fixtures are stored in tests/data/. Here, you will use make to compile all pre-built FSTs from source:

make

License

Copyright © 2019 Eddie Antonio Santos. Released under the terms of the Apache license. See LICENSE for more info.

fst-lookup's People

Contributors

eddieantonio avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.