Giter VIP home page Giter VIP logo

hi-engmorph-foma's Introduction

The goal of this project is to create a useable English morphological analyser for the Alice project.

The whole project is based on the following resources:

University of Pennsylvania, XTAG Project: https://www.cis.upenn.edu/~xtag/

morph-1.5: https://www.cis.upenn.edu/~xtag/swrelease.html

original text file: morph-1.5\data\morph_english.flat

I just commited all intermediate artifacts (source code, db files, fst, build scripts, etc.) emerged during the conversion of the text file morph_english.txt.

Some short description:
-createdb.sql was used the create the empty db files
-upenn_morphtxt2db.cpp was used to create the converted_engmorph.db from comment_free_morph_english.txt
-adjust_upenn_morphdb.cpp was used to adjust it and get engmorph.db which was finally used onwards
-the various build*.sh scripts were used to build the programs to convert the relevant info from the engmorph.db into the corresponding *.lexc files
-english.foma was taken from the foma site and enhanced manually -finally some manual adjustments were applied to the lexc files
-createfst.sh was used to create english.fst in the end

Note: This is not a 1:1 transformation of the original and is still in development so expect many bugs/mistakes

hi-engmorph-foma's People

Contributors

r0ller avatar

Stargazers

 avatar  avatar

Watchers

 avatar

hi-engmorph-foma's Issues

reduced not recognised as verb

Currently, the result is:
reduced +swConsonant+reduced[stem]+CON

However, reduce is correctly analysed as verb (as well):
reduce +swConsonant+reduce[stem]+CON
reduce reduce[stem]+V+PRES

So the problem probably lies in the E deletion rule in the phonological rules file (.foma).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.