Giter VIP home page Giter VIP logo

plains-cree-fsts's Introduction

Plains Cree FSTs

No longer maintained: please see https://github.com/giellalt/lang-crk

Build Status

kîkwây ôma?

This is a mirror of the Plains Cree morphological finite-state transducers (FSTs) source code. The FSTs can analyze and generate nêhiyawêwin word forms.

âh?

You can use the FSTs to explain the grammar (analysis) of a nêhiyawêwin words:

kohkom -> nôhkom+N+A+D+Px2Sg+Sg

And you can use the models to generate a word, based on a grammatical description:

nôhkom+N+A+D+Px1Pl+Sg -> nôhkominân

The canonical source code for the FSTs, with derivational FSTs, and more are available at https://gtsvn.uit.no/langtech/trunk/langs/crk/.

Download the FSTs

Download compiled FSTS on the releases page!

You can use *.hfstol files with hfst-optimized-lookup and *.fomabin with flookup. You can also use the *.fomabin and *.hfstol file in Python using fst-lookup and hfstol respectively.

Usage

Using the HFST application suite:

$ echo "ewapamat" | hfst-optimized-lookup -q crk-descriptive-analyzer.hfstol
ewapamat	PV/e+wâpamêw+V+TA+Cnj+Prs+2Sg+3SgO+Err/Orth
ewapamat	PV/e+wâpamêw+V+TA+Cnj+Prs+3Sg+4Sg/PlO+Err/Orth

$ echo "PV/e+wâpamêw+V+TA+Cnj+Prs+3Sg+4Sg/PlO" | hfst-optimized-lookup crk-normative-generator.hfstol
PV/e+wâpamêw+V+TA+Cnj+Prs+3Sg+4Sg/PlO	ê-wâpamât

Using Foma:

$ echo "ewapamat" | flookup crk-descriptive-analyzer.fomabin
ewapamat	PV/e+wâpamêw+V+TA+Cnj+Prs+2Sg+3SgO+Err/Orth
ewapamat	PV/e+wâpamêw+V+TA+Cnj+Prs+3Sg+4Sg/PlO+Err/Orth

$ echo "PV/e+wâpamêw+V+TA+Cnj+Prs+3Sg+4Sg/PlO" | flookup crk-normative-generator.fomabin
PV/e+wâpamêw+V+TA+Cnj+Prs+3Sg+4Sg/PlO	ê-wâpamât

Using fst-lookup:

from fst_lookup import FST

analyzer = FST.from_file('crk-descriptive-analyzer.fomabin')
for analysis in analyzer.analyze('ewapamat'):
  print(analysis)
# prints: ('PV/e+', 'wâpamêw', '+V', '+TA', '+Cnj', '+Prs', '+2Sg', '+3SgO', '+Err/Orth')
#         ('PV/e+', 'wâpamêw', '+V', '+TA', '+Cnj', '+Prs', '+3Sg', '+4Sg/PlO', '+Err/Orth')

# NB: You must invert the labels on the generator because this FST is "upside-down"!
generator = FST.from_file('crk-normative-generator.fomabin', labels='invert')
for wordform in generator.generate('PV/e+' 'wâpamêw' '+V' '+TA' '+Cnj' '+Prs' '+3Sg' '+4Sg/PlO'):
  print(wordform)
# prints: ê-wâpamât

Bulk lookups

If you want to generate a large amount of word forms all at once, it is recommended that you use hfst-optimized-lookup command, as this is the fastest way to generate lookups. You will provide analyses, one per line. For example, say I want to conjugate mîcisow, and I have a file of analyses called conjugations.txt:

mîcisow+V+AI+Ind+Prs+1Sg
mîcisow+V+AI+Ind+Prs+2Sg
mîcisow+V+AI+Ind+Prs+3Sg
PV/e+mîcisow+V+AI+Cnj+Prs+1Sg
PV/e+mîcisow+V+AI+Cnj+Prs+2Sg
PV/e+mîcisow+V+AI+Cnj+Prs+3Sg

You can pipe this into hfst-optimized-lookup:

$ cat conjugations.txt | hfst-optimized-lookup crk-normative-generator.hfstol
mîcisow+V+AI+Ind+Prs+1Sg	nimîcison

mîcisow+V+AI+Ind+Prs+2Sg	kimîcison

mîcisow+V+AI+Ind+Prs+3Sg	mîcisow

PV/e+mîcisow+V+AI+Cnj+Prs+1Sg	ê-mîcisoyân

PV/e+mîcisow+V+AI+Cnj+Prs+2Sg	ê-mîcisoyan

PV/e+mîcisow+V+AI+Cnj+Prs+3Sg	ê-mîcisot

You can use the two-column output to map the input to the generated word form. This is useful, since some analyses have multiple possible word forms (e.g., cactus+Pl in English can be "cactuses" or "cacti").

Working on the FSTs

The following instructions assume you're working in a Linux/macOS/Unix command line.

Dependencies

You'll need (GNU) Make, and HFST. If you're on macOS/Linux, you probably already have make installed. HFST can be installed on macOS with Homebrew by typing:

brew install ualbertaaltlab/hfst/hfst

Building

To build the FSTs from scratch, type the following in the root directory:

make -j fsts

The resultant *.hfstol and *.foma files will be placed in src/.

Explanation:

  • make: run GNU Make
  • -j: run jobs on as many CPU cores as possible
  • fsts: the thing you want to make are the *.hfstol and *.foma FSTs.

If you see the message,

make[1]: Nothing to be done for `fsts'.

This means the FSTs are up-to-date, so there's no need to remake them. If you want to remake them anyway, add the -B flag when using make:

make -j -B fsts

Modifying

Change the *.lexc, *.regexp, and *.twolc files in src/, then run make -j fsts to see the changes.

Citation

If you use this work in an academic context, use this to cite the morphological FST:

@misc{arppe2019finite,
    Author={Arppe, Antti and Harrigan, Atticus and Schmirler, Katherine and Antonsen, Lene and Trosterud, Trond and N{\o}rsteb{\o} Moshagen, Sjur and Silfverberg, Miikka and Wolvengrey, Arok and Snoek, Conor and Lachler, Jordan and Santos, Eddie Antonio and Okim{\=a}sis, Jean and Thunder, Dorothy},
    Howpublished={\url{https://gtsvn.uit.no/langtech/trunk/langs/crk/}},
    Title={Finite-state transducer-based computational model of {Plains Cree} morphology},
    Year={2014--2019}
}

You may also cite these publications:

@inproceedings{snoek2014modeling,
  title={Modeling the noun morphology of Plains Cree},
  author={Snoek, Conor and Thunder, Dorothy and Loo, Kaidi and Arppe, Antti and Lachler, Jordan and Moshagen, Sjur and Trosterud, Trond},
  booktitle={Proceedings of the 2014 Workshop on the Use of Computational Methods in the Study of Endangered Languages},
  pages={34--42},
  year={2014}
}

@article{harrigan2017learning,
  title={Learning from the computational modelling of Plains Cree verbs},
  author={Harrigan, Atticus G and Schmirler, Katherine and Arppe, Antti and Antonsen, Lene and Trosterud, Trond and Wolvengrey, Arok},
  journal={Morphology},
  volume={27},
  number={4},
  pages={565--598},
  year={2017},
  publisher={Springer}
}

Maintainer tools

To sync the FST sources with the upstream SVN repository, re-download the sources list:

make -B src/morphological-fst-sources.mk

Then download all the sources again:

make -j -B download

And make the fsts like normal!

make -j fsts

License

The FST and its sources are distributed under the terms of Affero GPL license:

Copyright (C) 2015—2019 Alberta Language Technology Lab (ALTLab) [email protected]

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see http://www.gnu.org/licenses/.

plains-cree-fsts's People

Contributors

acl-sigel avatar atticusha avatar eddieantonio avatar madoshakalaka avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

madoshakalaka

plains-cree-fsts's Issues

New Foma FST build script

We should try a new, less error-prone Foma build script. Here's @aarppe:


I left out the proper names and abbreviations from the catenation of the LEXC source, leaving otherwise the compilation script the same and adding FOMA compilation at the end. Direct conversion with hfst-fst2fst of the HFST descriptive analyzer to FOMA format results in abort, but the following scheme seems to work:

hfst-fst2fst -b -F -i crk-gen-norm-dict.hfst -o crk-gen-norm-dict.fomabin

hfst-fst2fst -b -F -i crk-orth.hfst -o crk-orth.fomabin

foma -e"load crk-gen-norm-dict.fomabin" -e"define M" -e"load crk-orth.fomabin" -e"invert net" -e"define O" -e"regex [ M .o. O ];" -e"save stack crk-anl-desc-dict.fomabin" -s

Testing with a few examples, we seem to get results we expect:

flookup -q crk-anl-desc-dict.fomabin
nepat
nipayan
meyonipat
nepat	IC+nipâw+V+AI+Cnj+Prs+3Sg

nipayan	pê-ayâw+V+AI+Ind+Prs+1Sg
nipayan	nipâw+V+AI+Cnj+Prs+2Sg
nipayan	nipâw+V+AI+Cnj+Prs+1Sg

meyonipat	IC+PV/miyo+nipâw+V+AI+Cnj+Prs+3Sg

That hfst-fst2fst produces an imperfect FST is something that we ought to bring to the attention of the Helsinki folks. Nevertheless, the HFST and FOMA FSTs seem to work, so I can't easily judge if there's something disagreeable in the source code.

Regardless, I don't know if this works with the python FST lookup code.

Getting the new FST working is important as it fixes some key errors in the affixation implemented last week (Atticus is working in the remaining issues, namely unspecified actors for VAIs and VTIs) as well as incorporates all of Arok's new dictionary entries.

Originally posted by @aarppe in UAlbertaALTLab/morphodict#261 (comment)

Question: paradigm strings for forms in FSTs, but not on itwêwina site

Sorry for the long title, the layout files do not include the paradigm strings (eg: PV/ta+*+Cnj+Prs+3Pl) for the following forms:

ka-kî- (independent) eg: nika-kî-itwân "I could say (thus)"
ta-kî- (conjunct) eg: ta-kî-itweyân "I should say (thus)"
kita- (conjunct) eg: kita-mosiwit "he became a moose". I understand this may be the same as "ta" but please correct me if I'm wrong
kâ- (conjunct) eg: kâ-nêhiyawêcik "when they speak Cree/those who speak Cree"

I assume the FSTs can handle these, but I didn't see the paradigm IDs (what are we calling them?) listed in the layout files, I was curious if they were going to be added, or if someone could help me generate the list.

Thanks!

Discrepancy using `kâ-` in layouts

I believe I've found a discrepancy in the layouts, the INFINITIVE form for kâ- is being represented as ka- in the .layout files:

|      | "FUTURE/INFINITIVE"        |                           |
|      | : "Conjunct: ka-"          | : "Conjunct: ta-"         |
| "1s" | PV/ka+*+Cnj+Prs+1Sg        | PV/ta+*+Cnj+Prs+1Sg       |
| "2s" | PV/ka+*+Cnj+Prs+2Sg        | PV/ta+*+Cnj+Prs+2Sg       |
| "3s" | PV/ka+*+Cnj+Prs+3Sg        | PV/ta+*+Cnj+Prs+3Sg       |
| "1p" | PV/ka+*+Cnj+Prs+1Pl        | PV/ta+*+Cnj+Prs+1Pl       |
| "21" | PV/ka+*+Cnj+Prs+12Pl       | PV/ta+*+Cnj+Prs+12Pl      |
| "2p" | PV/ka+*+Cnj+Prs+2Pl        | PV/ta+*+Cnj+Prs+2Pl       |
| "3p" | PV/ka+*+Cnj+Prs+3Pl        | PV/ta+*+Cnj+Prs+3Pl       |
| "4"  | PV/ka+*+Cnj+Prs+4Sg/Pl     | PV/ta+*+Cnj+Prs+4Sg/Pl    |

I believe they should be:

|      | "FUTURE/INFINITIVE"         |                           |
|      | : "Conjunct: kaa-"          | : "Conjunct: ta-"         |
| "1s" | PV/kaa+*+Cnj+Prs+1Sg        | PV/ta+*+Cnj+Prs+1Sg       |
| "2s" | PV/kaa+*+Cnj+Prs+2Sg        | PV/ta+*+Cnj+Prs+2Sg       |
| "3s" | PV/kaa+*+Cnj+Prs+3Sg        | PV/ta+*+Cnj+Prs+3Sg       |
| "1p" | PV/kaa+*+Cnj+Prs+1Pl        | PV/ta+*+Cnj+Prs+1Pl       |
| "21" | PV/kaa+*+Cnj+Prs+12Pl       | PV/ta+*+Cnj+Prs+12Pl      |
| "2p" | PV/kaa+*+Cnj+Prs+2Pl        | PV/ta+*+Cnj+Prs+2Pl       |
| "3p" | PV/kaa+*+Cnj+Prs+3Pl        | PV/ta+*+Cnj+Prs+3Pl       |
| "4"  | PV/kaa+*+Cnj+Prs+4Sg/Pl     | PV/ta+*+Cnj+Prs+4Sg/Pl    |

VTA-1 with glides

y and w in miyêw and ayâwêw should not collapse with i-initial suffixes. mowêw seems to be working, so can use as an example. May also have to do with <i> vs <i2> (or whatever characters we are using for PA *i and PA *e, I may have them wrong).

But Bloomfield kika-ayâtin ayâwêw+V+TA+Ind+Fut+Def+1Sg+2SgO <-- hmm

Imperatives for the VTAti subclass

[AEW says:] Upon checking I found that the 2s-3s command form for this verb is currently generated as "iti" which is incorrect, it should be /isi/. This goes along with the general t > s rule that takes place for all 2s > 3... command forms.

Currently, this happens to be by design, as is exemplified by the attached YAML file, also to be found via the following link:

https://victorio.uit.no/langtech/trunk/langs/crk/test/src/gt-norm-yamls/V-TA-itew_gt-norm.yaml

I've located where this change can be made, either in how the affixation is described in verb_affixes.lexc:

+Imm+2Sg+3SgO:i2 VERB_ENDLEX ;

... or in the morphophonological description on when -t- turns into -s-

"t2sVTA4Rule"
t3:s <=> _ [ Bx: [ i: | ii2: ] ] | .#. ;

We'd need to add i2 to the context when the t>s change happens (which might have ramifications elsewhere), or adjust the affixation in src/morphology/affixes/verb_affixes.lexc so that a usual <i> is affixed rather than <i2> (also with ramifications that need to be checked). In all cases, the corresponding YAML file needs to be revised.

Normative analyser misses correctly spelled particles (and one verb)

Forms from A-W MGS

Particles that look fine but aren't caught by the normative analyser (descriptive catches them):

ma ma+Ipc 0
aniyê aniyê+Ipc 0
waniyaw waniyaw+Ipc 0
nitaka nitaka+Ipc 0
ô ô+Ipc+Interj 0
yôhô yôhô+Ipc+Interj 0

And one verb:

kâ-pê-nayawacikicik PV/kaa+PV/pe+nayawacikiwak+V+AI+Cnj+3Pl 0

Inflection identifiers list

Would be great to have a JSON lookup (or similar) of identifiers for the various inflections, eg:

{
 "PV/e+{{ lemma }}+V+AI+Cnj+Prs+1Sg": {
    "mode": "Conjunct",
    "type": "VAI",
    "variation": "VAI1"
    "tempus": "Present",
    "actor":  "1Sg"
    "etc": "..."
  }
}

Some discussion may be required to make sure that all inflections are identifiable using a format such as this. The goal would be to render list of inflected forms (similar to the "linguistic" or "nêhiyawêwin" tab on itwêwina).

I've made a couple attempts at this but struggle to encompass ideas like "future intentional" or "infinitive (ta-)", or something like "should (ta-kî-)", but at the least a full list of the inflection identifiers would be great so they can be mapped to templates for inflection Note that some of these deviate from the identifiers you are using, I was taking creative liberties, however it would be great to have a 1-1 agreement on how to identify forms moving forward.

Bonus points if we can include obscure forms like "weak reduplication" such as

wayâpamêw

@eddieantonio looking for your thoughts on this one.

Missing alternative form for `V+AI+Ind+Fut+Def+3Sg`

I believe there may be an incorrect form within the FSTs, or possibly a missing "alternative" (like V+AI+Ind+Prs+12Pl), for the FUTURE DEFINITIVE TENSE. In the 3rd person these words will be prefixed with ka- by the FSTs, however we observe locally (ôta amiskiwâciy wâskahikanihk) that in the non-SAP forms the words will be prefixed with ta-, for instance:

❯ echo itwêw+V+AI+Ind+Fut+Def+3Sg | hfst-optimized-lookup --silent crk-normative-generator.hfstol
itwêw+V+AI+Ind+Fut+Def+3Sg	ka-itwêw

❯ echo kimiwan+V+II+Ind+Fut+Def+3Sg | hfst-optimized-lookup --silent crk-normative-generator.hfstol
kimiwan+V+II+Ind+Fut+Def+3Sg	ka-kimiwan

I believe these examples should be ta-itwêw and ta-kimiwan respectively. Could this just be a local variation, or is there possibly a mistake here?

Unspecified actors for AI verbs

[AEW notes] ... some options being given for the VAI unspecified actors. When I search for /nîmihitow/, the paradigms are currently giving these options:

nîmihitonâniwan
nîmihitoniwan

ê-nîmihitohk
ê-nîmihtonâniwahk
ê-nîmihitoniwahk

The ones I have marked in red are incorrect. I assume this comes from treating -(nâ)niwan / -(nâ)niwahk as if the -(nâ) part is always optional. It is not. It is morpho-phonologically/contextually conditioned; It is only absent for /â/ and /ê > â/ final stems. For all others, the full forms including the -must be used. I.e..

nipâ*niwan* and
mêtawâ*niwan*

... but ...

api*nâniwan*
tapasî*nâniwan*
nikamo*nâniwan*
pasikô*nâniwan*

This will need to be cleaned up in the paradigms if this is pervasive rather than an odd occurrence in the/nîmihito-/paradigm.

Incorrect tense marker in FST implementation

I'm going to carry over our conversation from #6 and open this up as a bug:

I believe the FST analysis is incorrect for the form kâ-ki-:

❯ echo "PV/kaa_ki+ohkomiw+V+AI+Cnj+Prs+1Sg" | hfst-optimized-lookup crk-normative-generator.hfstol
PV/kaa_ki+ohkomiw+V+AI+Cnj+Prs+1Sg	kâ-ki-ohkomiyân

The analysis marks this as Prs (past) but the implementation is -ki- when the past marker in Plains Cree is -kî-. My mistake, Prt is the 'past' analysis marker in the FSTs. With that however, I am still concerned this analysis is incorrect:

I've double-checked several references to be certain, I cannot find kâ-ki- in any of my references however there are several examples of kâ-kî- in both Freda Ahenakêw's works as well as Arok Wolvengrey's thesis, for example:

p312 ex(23)
tānisi kā-kī-isi-nikamoyan?
tānisi kā-  kī-   isi-  nikamo -yan
IPC    IPV  IPV   IPV   VAI    2s
how    CNJ  PST   thus  sing
“How did you sing?”
p312 ex(24)
tānēhki kā-kī-sipwēhtēt?
tānēhki kā- kī-  sipwēhtē -t
IPC     IPV IPV  VAI      3s
why     CNJ PST  leave
“Why did s/he leave?”
p316 ex(31)
kā- kī- wāpam -iko -t
IPV IPV VTA   INV  3s
CNJ PST see   3’-3s

There are 16 examples in total that I could find just in that paper alone.

Please let me know if you need references to further examples.

Possible typo for `langs/crk/inc/paradigms/verb-ai-full.layout`

I believe the FUTURE DEFINITE TENSE form for Ind+Fut+Def+12P should actually be Ind+Fut+Def+12Pl, attempting to inflect with the current value results in an error:

$ echo nikamow+V+AI+Ind+Fut+Def+12P | hfst-optimized-lookup --silent crk-normative-generator.hfstol
!! Warning: file contains more than one transducer          !!
!! This is currently not handled - using only the first one !!
nikamow+V+AI+Ind+Fut+Def+12P	nikamow+V+AI+Ind+Fut+Def+12P	+?

Creates superfluous + at the end of a Roman numeral analysis

From eddieantonio/fst-lookup#5:

This also appears to be an issue with the FST — at least the one built in UAlbertaALTLab/plains-cree-fst

$ hfst-lookup crk-descriptive-analyzer.hfst
hfst-lookup: warning: It is not possible to perform fast lookups with OpenFST, std arc, tropical semiring format automata.
Using HFST basic transducer format and performing slow lookups
> I
I	I+Num+Rom+	0.000000
> II
II	II+Num+Rom+	0.000000
> III
III	III+Num+Rom+	0.000000
> IV
IV	IV+Num+Rom+	0.000000
> V
V	V+Num+Rom+	0.000000
> VI
VI	VI+Num+Rom+	0.000000
> VII
VII	VII+Num+Rom+	0.000000
> VIII
VIII	VIII+Num+Rom+	0.000000
> IX
IX	IX+Num+Rom+	0.000000
> X
X	X+Num+Rom+	0.000000

Split numerals file?

The numerals.lexc file includes both Cree words for numbers, as well as legacy stuff for Arabic and Roman numerals. For a dictionary FST, I think having Arabic and Roman numerals is silly.

For a general acceptor FST, perhaps recognizing Arabic numerals makes sense. When will people legitimately use Roman numerals in Cree text? 😂

Therefore, I think this file should be split and only Cree numerals will be built into the dictionary FSTs.


Among the various LEXC files selected for the dictionary-only FSTs, I would nevertheless still include numerals.lexc, as that is where this subclass of Indeclining Particles (Ipc) are enumerated, and our dictionary sources do contain number words, e.g. pêyak.

I have... issues... with pêyak being in the same file as Arabic and Roman numerals.

Can I split the legacy lexica into its own file (something like numerals-other.lexc), include this in the normal FST, and intentionally exclude it from the dict FSTs?

Originally posted by @eddieantonio in #20 (comment)

-n/h alternation for II verbs

[AEW notes] ê-isimâkwahk was flagged and the suggested correction is ê-isimâkwak with the /h/ in the inflection. Is this an issue with the paradigm in general? I see the paradigm for a number of -mâkwan, -nâkwan, -spakwan forms are all incorrect in leaving the h out of the /...ahk / ending that should be here.

As we already reviewed this, this is a question of -n final VII verbs currently allowing the h/n alternation (specified with a stem-final n3) only for a handful of verbs.

[AEW further says:] Here is likely the problem. Verbs like mâyâtan are given correctly as ê-mâyâtahk, but some many of the other VIIs ending in /an/ are give incorrect /ak/ endings in the paradigms. They should all end in 0s /ahk/ and 0p /ahki/. VII stems that end in /an/ always change to /ahk/ in the Conjunct. It is the /in/ and /on/ endings which are more unpredictable and which need to be marked lexically.

We can implement this for all -an final II verbs, by changing their final -n into n3. For the -in and -on final verbs, this needs to be specified per lexeme, e.g. at the very least for pipon and its compounds.

(Descriptive) Fomabin no longer analyzes "nipa"

"nipa" should be analyzed as "nipâ" or "nipâw+V+AI+Imp+Imm+2Sg". This works in HFST:

$ echo "nipa" | hfst-lookup -q crk-descriptive-analyzer.hfst
nipa	nipâw+V+AI+Imp+Imm+2Sg	0.000000

And the optimized lookup:

$ echo "nipa" | hfst-optimized-lookup -q crk-descriptive-analyzer.hfstol
!! Warning: file contains more than one transducer          !!
!! This is currently not handled - using only the first one !!
nipa	nipâw+V+AI+Imp+Imm+2Sg

However, this CRASHES flookup!

$ echo "nipa" | flookup -q crk-descriptive-analyzer.fomabin
[1]    12880 done       echo "nipa" |
       12881 abort      flookup -q crk-descriptive-analyzer.fomabin

Trying to load the FST into Foma also crashes it!

$ foma
Foma, version 0.9.18alpha (svn r0)
Copyright © 2008-2015 Mans Hulden
This is free software; see the source code for copying conditions.
There is ABSOLUTELY NO WARRANTY; for details, type "help license"

Type "help" to list all commands available.
Type "help <topic>" or help "<operator>" for further help.

foma[0]: load stack crk-descriptive-analyzer.fomabin
[1]    12891 abort      foma

However, the strict analyzer (spell relax not applied) still works...?

$ echo "nipâ" | flookup -q crk-strict-analyzer.fomabin
nipâ	nipâw+V+AI+Imp+Imm+2Sg

Note that fst-lookup does not crash on this Fomabin, and is actually usable for some analyses, but returns 0 results for nipa.

Possible sources of the bug:

  • The latest spell relax rules
  • The inversion of crk-orth.hfst
  • hfst-fst2fst
  • Foma itself

Productive recognition of diminutives

If no diminutive is listed, then allows only for the productive generation of the short diminutive -is (rather than both -is and -isis). If a diminutive is listed (-is or -isis), disallow the other.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.