Giter VIP home page Giter VIP logo

Comments (6)

eddieantonio avatar eddieantonio commented on August 15, 2024

@aarppe I can't replicate this using the build system within Giella SVN r187484.

Here's me running quick.mk:

Concatenating LEXC code.
cat morphology/root.lexc morphology/affixes/noun_affixes.lexc morphology/affixes/verb_affixes.lexc morphology/stems/derivation_stems.lexc morphology/stems/noun_stems.lexc morphology/stems/particles.lexc morphology/stems/pronouns.lexc morphology/stems/verb_stems.lexc > crk.lexc
Compiling LEXC code.
hfst-lexc --format=openfst-tropical --output=crk-morph.hfst crk.lexc
Root...10 Propernouns...2 NOUN_PREFIXES...3 NOUN_INDEP_PREFIXES...4 NOUN_DEP_KINSHIP_PREFIXES...3 NOUN_DEP_NONKINSHIP_PREFIXES...4 NA...1 NAD...1 NI...1 NID...1 NA_POSS_IM_0_SUFFIX...3 NA_DIM_SUFFIXES...5 NI_POSS_IM_0_SUFFIX...3 NI_DIM_SUFFIXES...5 NA_POSS_SUFFIXES...11 NA_POSS_LOC_COMBO_SUFFIXES...10 NA_NUMBER_OBV_LOC_SUFFIXES...8 NA_NUMBER_SUFFIXES_SG...5 NI_POSS_SUFFIXES...11 NI_POSS_LOC_COMBO_SUFFIXES...9 NI_NUMBER_LOC_SUFFIXES...3 NI_NUMBER_SUFFIXES_SG...5 NOUN_IRREGULARS...16 OSI_SUFFIXES...2 NA_SG/A_POSS/IM...1 NA_SG/A_POSS/IM_DIM/IS...1 NA_POSS/IM...1 NA_DIM/IS...1 NA_DIM/ISIS...1 NA_POSS/IM_DIM/IS...1 NA_POSS/IM_DIM/ISIS...1 NI_SG/I_POSS/IM...1 NI_SG/I_POSS/IM_DIM/IS...1 NI_POSS/IM...1 NI_DIM/IS...1 NI_DIM/ISIS...1 NI_POSS/IM_DIM/IS...1 NI_POSS/IM_DIM/ISIS...1 NAD_SG/A...1 NAD_SG/A_DIM/IS...1 NAD_POSS/IM...1 NAD_DIM/IS...1 NAD_DIM/ISIS...1 NAD_POSS/IM_DIM/IS...1 NAD_POSS/IM_DIM/ISIS...1 NID_SG/I...1 NID_SG/I_DIM/IS...1 NID_POSS/IM...1 NID_DIM/IS...1 NID_DIM/ISIS...1 NID_POSS/IM_DIM/IS...1 NID_POSS/IM_DIM/ISIS...1 NOUN_ENDLEX...2 VerbPrefixes...4 INDEPENDENT...3 IND_TENSE...7 FUTURE_CONDITIONAL...1 CONJUNCT...10 CNJ_TENSE...5 IMPERATIVE...1 VERBPREFIXES...4 REDUPLICATION...4 REDUPL_BOUND...4 REDUPL_CONT...3 PREVERBS...251 PREVERBS_BOUND...4 WICI...4 VIIn...1 VIIn_SG...1 VIIw_PL...1 VIIw...1 VIIw_SG...1 VIIn_PL...1 VIIw_SGPL_WICI...1 VIIw_SG_WICI...1 VIIw_PL_WICI...1 VIIn_SGPL_WICI...1 VIIn_SG_WICI...1 VIIn_PL_WICI...1 VIIw_SGPL_ORDER...3 VIIw_SG_ORDER...3 VIIw_PL_ORDER...3 VIIn_SGPL_ORDER...3 VIIn_SG_ORDER...3 VIIn_PL_ORDER...3 VIIw_SGPL_IND_TENSE...4 VIIw_SGPL_CNJ_TENSE...3 VIIw_SG_IND_TENSE...4 VIIw_SG_CNJ_TENSE...3 VIIw_PL_IND_TENSE...4 VIIw_PL_CNJ_TENSE...3 VIIn_SGPL_IND_TENSE...4 VIIn_SGPL_CNJ_TENSE...3 VIIn_SG_IND_TENSE...4 VIIn_SG_CNJ_TENSE...3 VIIn_PL_IND_TENSE...4 VIIn_PL_CNJ_TENSE...3 VIIw_SGPL_IND_PERSON...1 VIIw_SGPL_CNJ_PERSON...1 VIIw_SGPL_FUT_CON_PERSON...1 VIIw_SG_IND_PERSON...1 VIIw_SG_CNJ_PERSON...1 VIIw_SG_FUT_CON_PERSON...1 VIIw_PL_IND_PERSON...1 VIIw_PL_CNJ_PERSON...1 VIIw_PL_FUT_CON_PERSON...1 VIIn_SGPL_IND_PERSON...1 VIIn_SGPL_CNJ_PERSON...1 VIIn_SGPL_FUT_CON_PERSON...1 VIIn_SG_IND_PERSON...1 VIIn_SG_CNJ_PERSON...1 VIIn_SG_FUT_CON_PERSON...1 VIIn_PL_IND_PERSON...1 VIIn_PL_CNJ_PERSON...1 VIIn_PL_FUT_CON_PERSON...1 VIIn_SGPL_IND_NULL...2 VIIn_SG_IND_SUFFIX...2 VIIn_PL_IND_SUFFIX...2 VIIw_SGPL_IND_NULL...2 VIIw_SG_IND_SUFFIX...2 VIIw_PL_IND_SUFFIX...2 VIIn_SGPL_CNJ_NULL...2 VIIn_SG_CNJ_SUFFIX...2 VIIn_PL_CNJ_SUFFIX...2 VIIw_SGPL_CNJ_NULL...2 VIIw_SG_CNJ_SUFFIX...2 VIIw_PL_CNJ_SUFFIX...2 VIIn_SGPL_FUT_CON_NULL...2 VIIn_SG_FUT_CON_SUFFIX...2 VIIn_PL_FUT_CON_SUFFIX...2 VIIw_SGPL_FUT_CON_NULL...2 VIIw_SG_FUT_CON_SUFFIX...2 VIIw_PL_FUT_CON_SUFFIX...2 VAIw...1 VAIw_PL...1 VAIn...1 VAIn_PL...1 VAIw_WICI...2 VAIw_PL_WICI...2 VAIn_WICI...2 VAIn_PL_WICI...2 VAIw_ORDER...4 VAIw_PL_ORDER...4 VAIn_ORDER...4 VAIn_PL_ORDER...4 VAIw_IND_TENSE...4 VAIw_CNJ_TENSE...3 VAIw_PL_IND_TENSE...4 VAIw_PL_CNJ_TENSE...3 VAIn_IND_TENSE...4 VAIn_CNJ_TENSE...3 VAIn_PL_IND_TENSE...4 VAIn_PL_CNJ_TENSE...3 VAIw_IND_PERSON...3 VAIw_CNJ_PERSON...1 VAIw_FUT_CON_PERSON...1 VAIw_IMP_PERSON...1 VAIw_PL_IND_PERSON...3 VAIw_PL_CNJ_PERSON...1 VAIw_PL_FUT_CON_PERSON...1 VAIw_PL_IMP_PERSON...1 VAIn_IND_PERSON...3 VAIn_CNJ_PERSON...1 VAIn_FUT_CON_PERSON...1 VAIn_IMP_PERSON...1 VAIn_PL_IND_PERSON...3 VAIn_PL_CNJ_PERSON...1 VAIn_PL_FUT_CON_PERSON...1 VAIn_PL_IMP_PERSON...1 VAIw_IND_NI...2 VAIw_IND_NI_SG_SUFFIX...1 VAIw_IND_NI_PL_SUFFIX...1 VAIw_IND_KI...2 VAIw_IND_KI_SG_SUFFIX...1 VAIw_IND_KI_PL_SUFFIX...3 VAIw_IND_NULL...2 VAIw_IND_NULL_SG_SUFFIX...4 VAIw_IND_NULL_PL_SUFFIX...2 VAIn_IND_NI...2 VAIn_IND_NI_SG_SUFFIX...1 VAIn_IND_NI_PL_SUFFIX...1 VAIn_IND_KI...2 VAIn_IND_KI_SG_SUFFIX...1 VAIn_IND_KI_PL_SUFFIX...3 VAIn_IND_NULL...2 VAIn_IND_NULL_SG_SUFFIX...5 VAIn_IND_NULL_PL_SUFFIX...3 VAIw_CNJ_NULL...2 VAIw_CNJ_NULL_SG_SUFFIX...6 VAIw_CNJ_NULL_PL_SUFFIX...5 VAIn_CNJ_NULL...2 VAIn_CNJ_NULL_SG_SUFFIX...6 VAIn_CNJ_NULL_PL_SUFFIX...6 VAIw_IMP_NULL...2 VAIw_IMP_SG_SUFFIX...2 VAIw_IMP_NULL_PL_SUFFIX...4 VAIn_IMP_NULL...2 VAIn_IMP_SG_SUFFIX...2 VAIn_IMP_NULL_PL_SUFFIX...4 VAIw_FUT_CON_NULL...2 VAIw_FUT_CON_NULL_SG_SUFFIX...4 VAIw_FUT_CON_NULL_PL_SUFFIX...7 VAIn_FUT_CON_NULL...2 VAIn_FUT_CON_NULL_SG_SUFFIX...5 VAIn_FUT_CON_NULL_PL_SUFFIX...8 VTIm...1 VTIm_PL...1 VTIw...1 VTIm_WICI...2 VTIm_PL_WICI...2 VTIm_ORDER...4 VTIm_PL_ORDER...4 VTIm_IND_TENSE...4 VTIm_CNJ_TENSE...3 VTIm_PL_IND_TENSE...4 VTIm_PL_CNJ_TENSE...3 VTIm_IND_PERSON...3 VTIm_CNJ_PERSON...1 VTIm_FUT_CON_PERSON...1 VTIm_IMP_PERSON...1 VTIm_PL_IND_PERSON...3 VTIm_PL_CNJ_PERSON...1 VTIm_PL_FUT_CON_PERSON...1 VTIm_PL_IMP_PERSON...1 VTIm_IND_NI...2 VTIm_IND_NI_SG_SUFFIX...1 VTIm_IND_NI_PL_SUFFIX...1 VTIm_IND_KI...2 VTIm_IND_KI_SG_SUFFIX...1 VTIm_IND_KI_PL_SUFFIX...3 VTIm_IND_NULL...2 VTIm_IND_NULL_SG_SUFFIX...2 VTIm_IND_NULL_PL_SUFFIX...3 VTIm_CNJ_NULL...2 VTIm_CNJ_NULL_SG_SUFFIX...4 VTIm_CNJ_NULL_PL_SUFFIX...6 VTIm_IMP_NULL...2 VTIm_IMP_SG_SUFFIX...2 VTIm_IMP_NULL_PL_SUFFIX...4 VTIm_FUT_CON_NULL...2 VTIm_FUT_CON_NULL_SG_SUFFIX...4 VTIm_FUT_CON_NULL_PL_SUFFIX...6 VTA...1 VTA_PL...1 VTAt...1 VTAi...1 VTA_WICI...1 VTA_PL_WICI...1 VTAt_WICI...1 VTAi_WICI...1 VTA_ORDER...4 VTA_PL_ORDER...4 VTAi_ORDER...4 VTAt_ORDER...4 VTA_IND_TENSE...4 VTA_CNJ_TENSE...3 VTA_PL_IND_TENSE...4 VTA_PL_CNJ_TENSE...3 VTA_IND_PERSON...3 VTA_CNJ_PERSON...1 VTA_FUT_CON_PERSON...1 VTA_IMP_PERSON...1 VTA_PL_IND_PERSON...3 VTA_PL_CNJ_PERSON...1 VTA_PL_FUT_CON_PERSON...1 VTA_PL_IMP_PERSON...1 VTAt_IMP_PERSON...1 VTAi_IMP_PERSON...1 VTA_IND_NI...2 VTA_IND_NI_SG_SUFFIX...5 VTA_IND_NI_PL_SUFFIX...10 VTA_IND_KI...2 VTA_IND_KI_SG_SUFFIX...7 VTA_IND_KI_PL_SUFFIX...24 VTA_IND_NULL...2 VTA_IND_NULL_SG_SUFFIX...8 VTA_IND_NULL_PL_SUFFIX...5 VTA_CNJ_NULL...2 VTA_CNJ_NULL_SG_SUFFIX...19 VTA_CNJ_NULL_PL_SUFFIX...35 VTA_IMP_NULL...2 VTA_IMP_NULL_SG_SUFFIX...4 VTA_IMP_NULL_PL_SUFFIX...16 VTAt_IMP_NULL...2 VTAt_IMP_NULL_SG_SUFFIX...4 VTAt_IMP_NULL_PL_SUFFIX...16 VTAi_IMP_NULL...2 VTAi_IMP_NULL_SG_SUFFIX...4 VTAi_IMP_NULL_PL_SUFFIX...16 VTA_FUT_CON_NULL...2 VTA_FUT_CON_NULL_SG_SUFFIX...15 VTA_FUT_CON_NULL_PL_SUFFIX...43 VERB_ENDLEX...4 DERIVATION_NOUN_STEMS...1 DERIVATION_VERB_STEMS...1 DERIVATION_NOUN_INFLECTION_SUFFIXES...19 DERIVATION_VERB_INFLECTION_SUFFIXES...12 NOUN_INDEP_STEMS...5111 NOUN_DEP_KINSHIP_STEMS...276 NOUN_DEP_NONKINSHIP_STEMS...135 pcle...1 fpcle...1 pcle/ns...1 Particles...1287 Pronoun...4 Personal...21 Interrogative...6 Indefinite...2 Demonstrative...18 VERBSTEMS...Warning: Sublexicon is mentioned but not defined. (Abbreviation)
Warning: Sublexicon is mentioned but not defined. (Numerals)
Warning: Sublexicon is mentioned but not defined. (ProperNoun-crk)
Warning: Sublexicon is mentioned but not defined. (ProperNoun-eng)
Warning: Sublexicon is mentioned but not defined. (Punctuation)
Warning: Sublexicon is mentioned but not defined. (Symbols)
Warning: Sublexicons defined but not used
DERIVATION_NOUN_STEMS NAD_DIM/ISIS NAD_POSS/IM NAD_POSS/IM_DIM/IS NAD_POSS/IM_DIM/ISIS NA_DIM/ISIS NA_POSS/IM_DIM/ISIS NID_DIM/ISIS NID_POSS/IM NID_POSS/IM_DIM/IS NID_POSS/IM_DIM/ISIS NI_DIM/ISIS NI_POSS/IM_DIM/ISIS

Compiling TWOLC code.
hfst-twolc -i phonology/crk-phon.twolc -o crk-phon.hfst
Reading input from phonology/crk-phon.twolc.
Writing output to crk-phon.hfst.
Reading alphabet.
Reading sets.
Reading rules and compiling their contexts and centers.
Compiling rules.
Storing rules.
Composing and intersecting LEXC and TWOLC transducers.
hfst-compose-intersect -1 crk-morph.hfst -2 crk-phon.hfst | hfst-minimize - -o crk-normative-generator.hfst
hfst-compose-intersect: warning:
Found output multi-char symbols ("DRV-FST") in
transducer in file crk-morph.hfst which are not found on the
input tapes of transducers in file crk-phon.hfst.
Inverting normative generator tranducer into a normative analyzer transducer.
hfst-invert crk-normative-generator.hfst -o crk-strict-analyzer.hfst
hfst-invert -i crk-strict-analyzer.hfst -o - |\
		hfst-fst2fst --foma --use-backend-format -i - -o - |\
		gzip > crk-strict-analyzer.fomabin
Compiling regular expression implementing spelling-relaxation.
hfst-regexp2fst -S -i orthography/spellrelax.regex | hfst-invert -o crk-orth.hfst
hfst-invert -i crk-orth.hfst -o - |\
		hfst-fst2fst --foma --use-backend-format -i - -o - |\
		gzip > crk-orth.fomabin
foma\
		-e "load crk-strict-analyzer.fomabin" \
		-e "define M" \
		-e "load crk-orth.fomabin" \
		-e "invert net" \
		-e "define O" \
		-e "regex [ M .o. O ];" \
		-e "save stack crk-descriptive-analyzer.fomabin" \
		-s
1.9 MB. 69465 states, 126889 arcs, Cyclic.
defined M: 1.9 MB. 69465 states, 126889 arcs, Cyclic.
10.3 kB. 18 states, 584 arcs, Cyclic.
10.3 kB. 18 states, 584 arcs, Cyclic.
defined O: 10.3 kB. 18 states, 584 arcs, Cyclic.
11.3 MB. 181027 states, 741084 arcs, Cyclic.
Writing to file crk-descriptive-analyzer.fomabin.

I get the following:

flookup -q crk-descriptive-analyzer.fomabin
nepat
nipayan
meyonipat
nepat	+?

nipayan	nipâw+V+AI+Cnj+Prs+2Sg
nipayan	nipâw+V+AI+Cnj+Prs+1Sg

meyonipat	+?

What could I be doing wrong?

from plains-cree-fsts.

eddieantonio avatar eddieantonio commented on August 15, 2024

Nevermind! As is often the case, some of the FSTs were upside down. I adjusted the build script appropriately:

Concatenating LEXC code.
cat morphology/root.lexc morphology/affixes/noun_affixes.lexc morphology/affixes/verb_affixes.lexc morphology/stems/derivation_stems.lexc morphology/stems/noun_stems.lexc morphology/stems/particles.lexc morphology/stems/pronouns.lexc morphology/stems/verb_stems.lexc > crk.lexc
Compiling TWOLC code.
hfst-twolc -i phonology/crk-phon.twolc -o crk-phon.hfst
Compiling regular expression implementing spelling-relaxation.
hfst-regexp2fst -S -i orthography/spellrelax.regex | hfst-invert -o crk-orth.hfst
Reading input from phonology/crk-phon.twolc.
Writing output to crk-phon.hfst.
Reading alphabet.
Reading sets.
Reading rules and compiling their contexts and centers.
Compiling LEXC code.
hfst-lexc --format=openfst-tropical --output=crk-morph.hfst crk.lexc
Root...10 Propernouns...2 NOUN_PREFIXES...3 NOUN_INDEP_PREFIXES...4 NOUN_DEP_KINSHIP_PREFIXES...3 NOUN_DEP_NONKINSHIP_PREFIXES...4 NA...1 NAD...1 NI...1 NID...1 NA_POSS_IM_0_SUFFIX...3 NA_DIM_SUFFIXES...5 NI_POSS_IM_0_SUFFIX...3 NI_DIM_SUFFIXES...5 NA_POSS_SUFFIXES...11 NA_POSS_LOC_COMBO_SUFFIXES...10 NA_NUMBER_OBV_LOC_SUFFIXES...8 NA_NUMBER_SUFFIXES_SG...5 NI_POSS_SUFFIXES...11 NI_POSS_LOC_COMBO_SUFFIXES...9 NI_NUMBER_LOC_SUFFIXES...3 NI_NUMBER_SUFFIXES_SG...5 NOUN_IRREGULARS...16 OSI_SUFFIXES...2 NA_SG/A_POSS/IM...1 NA_SG/A_POSS/IM_DIM/IS...1 NA_POSS/IM...1 NA_DIM/IS...1 NA_DIM/ISIS...1 NA_POSS/IM_DIM/IS...1 NA_POSS/IM_DIM/ISIS...1 NI_SG/I_POSS/IM...1 NI_SG/I_POSS/IM_DIM/IS...1 NI_POSS/IM...1 NI_DIM/IS...1 NI_DIM/ISIS...1 NI_POSS/IM_DIM/IS...1 NI_POSS/IM_DIM/ISIS...1 NAD_SG/A...1 NAD_SG/A_DIM/IS...1 NAD_POSS/IM...1 NAD_DIM/IS...1 NAD_DIM/ISIS...1 NAD_POSS/IM_DIM/IS...1 NAD_POSS/IM_DIM/ISIS...1 NID_SG/I...1 NID_SG/I_DIM/IS...1 NID_POSS/IM...1 NID_DIM/IS...1 NID_DIM/ISIS...1 NID_POSS/IM_DIM/IS...1 NID_POSS/IM_DIM/ISIS...1 NOUN_ENDLEX...2 VerbPrefixes...4 INDEPENDENT...3 IND_TENSE...7 FUTURE_CONDITIONAL...1 CONJUNCT...10 CNJ_TENSE...5 IMPERATIVE...1 VERBPREFIXES...4 REDUPLICATION...4 REDUPL_BOUND...4 REDUPL_CONT...3 PREVERBS...251 PREVERBS_BOUND...4 WICI...4 VIIn...1 VIIn_SG...1 VIIw_PL...1 VIIw...1 VIIw_SG...1 VIIn_PL...1 VIIw_SGPL_WICI...1 VIIw_SG_WICI...1 VIIw_PL_WICI...1 VIIn_SGPL_WICI...1 VIIn_SG_WICI...1 VIIn_PL_WICI...1 VIIw_SGPL_ORDER...3 VIIw_SG_ORDER...3 VIIw_PL_ORDER...3 VIIn_SGPL_ORDER...3 VIIn_SG_ORDER...3 VIIn_PL_ORDER...3 VIIw_SGPL_IND_TENSE...4 VIIw_SGPL_CNJ_TENSE...3 VIIw_SG_IND_TENSE...4 VIIw_SG_CNJ_TENSE...3 VIIw_PL_IND_TENSE...4 VIIw_PL_CNJ_TENSE...3 VIIn_SGPL_IND_TENSE...4 VIIn_SGPL_CNJ_TENSE...3 VIIn_SG_IND_TENSE...4 VIIn_SG_CNJ_TENSE...3 VIIn_PL_IND_TENSE...4 VIIn_PL_CNJ_TENSE...3 VIIw_SGPL_IND_PERSON...1 VIIw_SGPL_CNJ_PERSON...1 VIIw_SGPL_FUT_CON_PERSON...1 VIIw_SG_IND_PERSON...1 VIIw_SG_CNJ_PERSON...1 VIIw_SG_FUT_CON_PERSON...1 VIIw_PL_IND_PERSON...1 VIIw_PL_CNJ_PERSON...1 VIIw_PL_FUT_CON_PERSON...1 VIIn_SGPL_IND_PERSON...1 VIIn_SGPL_CNJ_PERSON...1 VIIn_SGPL_FUT_CON_PERSON...1 VIIn_SG_IND_PERSON...1 VIIn_SG_CNJ_PERSON...1 VIIn_SG_FUT_CON_PERSON...1 VIIn_PL_IND_PERSON...1 VIIn_PL_CNJ_PERSON...1 VIIn_PL_FUT_CON_PERSON...1 VIIn_SGPL_IND_NULL...2 VIIn_SG_IND_SUFFIX...2 VIIn_PL_IND_SUFFIX...2 VIIw_SGPL_IND_NULL...2 VIIw_SG_IND_SUFFIX...2 VIIw_PL_IND_SUFFIX...2 VIIn_SGPL_CNJ_NULL...2 VIIn_SG_CNJ_SUFFIX...2 VIIn_PL_CNJ_SUFFIX...2 VIIw_SGPL_CNJ_NULL...2 VIIw_SG_CNJ_SUFFIX...2 VIIw_PL_CNJ_SUFFIX...2 VIIn_SGPL_FUT_CON_NULL...2 VIIn_SG_FUT_CON_SUFFIX...2 VIIn_PL_FUT_CON_SUFFIX...2 VIIw_SGPL_FUT_CON_NULL...2 VIIw_SG_FUT_CON_SUFFIX...2 VIIw_PL_FUT_CON_SUFFIX...2 VAIw...1 VAIw_PL...1 VAIn...1 VAIn_PL...1 VAIw_WICI...2 VAIw_PL_WICI...2 VAIn_WICI...2 VAIn_PL_WICI...2 VAIw_ORDER...4 VAIw_PL_ORDER...4 VAIn_ORDER...4 VAIn_PL_ORDER...4 VAIw_IND_TENSE...4 VAIw_CNJ_TENSE...3 VAIw_PL_IND_TENSE...4 VAIw_PL_CNJ_TENSE...3 VAIn_IND_TENSE...4 VAIn_CNJ_TENSE...3 VAIn_PL_IND_TENSE...4 VAIn_PL_CNJ_TENSE...3 VAIw_IND_PERSON...3 VAIw_CNJ_PERSON...1 VAIw_FUT_CON_PERSON...1 VAIw_IMP_PERSON...1 VAIw_PL_IND_PERSON...3 VAIw_PL_CNJ_PERSON...1 VAIw_PL_FUT_CON_PERSON...1 VAIw_PL_IMP_PERSON...1 VAIn_IND_PERSON...3 VAIn_CNJ_PERSON...1 VAIn_FUT_CON_PERSON...1 VAIn_IMP_PERSON...1 VAIn_PL_IND_PERSON...3 VAIn_PL_CNJ_PERSON...1 VAIn_PL_FUT_CON_PERSON...1 VAIn_PL_IMP_PERSON...1 VAIw_IND_NI...2 VAIw_IND_NI_SG_SUFFIX...1 VAIw_IND_NI_PL_SUFFIX...1 VAIw_IND_KI...2 VAIw_IND_KI_SG_SUFFIX...1 VAIw_IND_KI_PL_SUFFIX...3 VAIw_IND_NULL...2 VAIw_IND_NULL_SG_SUFFIX...4 VAIw_IND_NULL_PL_SUFFIX...2 VAIn_IND_NI...2 VAIn_IND_NI_SG_SUFFIX...1 VAIn_IND_NI_PL_SUFFIX...1 VAIn_IND_KI...2 VAIn_IND_KI_SG_SUFFIX...1 VAIn_IND_KI_PL_SUFFIX...3 VAIn_IND_NULL...2 VAIn_IND_NULL_SG_SUFFIX...5 VAIn_IND_NULL_PL_SUFFIX...3 VAIw_CNJ_NULL...2 VAIw_CNJ_NULL_SG_SUFFIX...6 VAIw_CNJ_NULL_PL_SUFFIX...5 VAIn_CNJ_NULL...2 VAIn_CNJ_NULL_SG_SUFFIX...6 VAIn_CNJ_NULL_PL_SUFFIX...6 VAIw_IMP_NULL...2 VAIw_IMP_SG_SUFFIX...2 VAIw_IMP_NULL_PL_SUFFIX...4 VAIn_IMP_NULL...2 VAIn_IMP_SG_SUFFIX...2 VAIn_IMP_NULL_PL_SUFFIX...4 VAIw_FUT_CON_NULL...2 VAIw_FUT_CON_NULL_SG_SUFFIX...4 VAIw_FUT_CON_NULL_PL_SUFFIX...7 VAIn_FUT_CON_NULL...2 VAIn_FUT_CON_NULL_SG_SUFFIX...5 VAIn_FUT_CON_NULL_PL_SUFFIX...8 VTIm...1 VTIm_PL...1 VTIw...1 VTIm_WICI...2 VTIm_PL_WICI...2 VTIm_ORDER...4 VTIm_PL_ORDER...4 VTIm_IND_TENSE...4 VTIm_CNJ_TENSE...3 VTIm_PL_IND_TENSE...4 VTIm_PL_CNJ_TENSE...3 VTIm_IND_PERSON...3 VTIm_CNJ_PERSON...1 VTIm_FUT_CON_PERSON...1 VTIm_IMP_PERSON...1 VTIm_PL_IND_PERSON...3 VTIm_PL_CNJ_PERSON...1 VTIm_PL_FUT_CON_PERSON...1 VTIm_PL_IMP_PERSON...1 VTIm_IND_NI...2 VTIm_IND_NI_SG_SUFFIX...1 VTIm_IND_NI_PL_SUFFIX...1 VTIm_IND_KI...2 VTIm_IND_KI_SG_SUFFIX...1 VTIm_IND_KI_PL_SUFFIX...3 VTIm_IND_NULL...2 VTIm_IND_NULL_SG_SUFFIX...2 VTIm_IND_NULL_PL_SUFFIX...3 VTIm_CNJ_NULL...2 VTIm_CNJ_NULL_SG_SUFFIX...4 VTIm_CNJ_NULL_PL_SUFFIX...6 VTIm_IMP_NULL...2 VTIm_IMP_SG_SUFFIX...2 VTIm_IMP_NULL_PL_SUFFIX...4 VTIm_FUT_CON_NULL...2 VTIm_FUT_CON_NULL_SG_SUFFIX...4 VTIm_FUT_CON_NULL_PL_SUFFIX...6 VTA...1 VTA_PL...1 VTAt...1 VTAi...1 VTA_WICI...1 VTA_PL_WICI...1 VTAt_WICI...1 VTAi_WICI...1 VTA_ORDER...4 VTA_PL_ORDER...4 VTAi_ORDER...4 VTAt_ORDER...4 VTA_IND_TENSE...4 VTA_CNJ_TENSE...3 VTA_PL_IND_TENSE...4 VTA_PL_CNJ_TENSE...3 VTA_IND_PERSON...3 VTA_CNJ_PERSON...1 VTA_FUT_CON_PERSON...1 VTA_IMP_PERSON...1 VTA_PL_IND_PERSON...3 VTA_PL_CNJ_PERSON...1 VTA_PL_FUT_CON_PERSON...1 VTA_PL_IMP_PERSON...1 VTAt_IMP_PERSON...1 VTAi_IMP_PERSON...1 VTA_IND_NI...2 VTA_IND_NI_SG_SUFFIX...5 VTA_IND_NI_PL_SUFFIX...10 VTA_IND_KI...2 VTA_IND_KI_SG_SUFFIX...7 VTA_IND_KI_PL_SUFFIX...24 VTA_IND_NULL...2 VTA_IND_NULL_SG_SUFFIX...8 VTA_IND_NULL_PL_SUFFIX...5 VTA_CNJ_NULL...2 VTA_CNJ_NULL_SG_SUFFIX...19 VTA_CNJ_NULL_PL_SUFFIX...35 VTA_IMP_NULL...2 VTA_IMP_NULL_SG_SUFFIX...4 VTA_IMP_NULL_PL_SUFFIX...16 VTAt_IMP_NULL...2 VTAt_IMP_NULL_SG_SUFFIX...4 VTAt_IMP_NULL_PL_SUFFIX...16 VTAi_IMP_NULL...2 VTAi_IMP_NULL_SG_SUFFIX...4 VTAi_IMP_NULL_PL_SUFFIX...16 VTA_FUT_CON_NULL...2 VTA_FUT_CON_NULL_SG_SUFFIX...15 VTA_FUT_CON_NULL_PL_SUFFIX...43 VERB_ENDLEX...4 DERIVATION_NOUN_STEMS...1 DERIVATION_VERB_STEMS...1 DERIVATION_NOUN_INFLECTION_SUFFIXES...19 DERIVATION_VERB_INFLECTION_SUFFIXES...12 NOUN_INDEP_STEMS...5111 NOUN_DEP_KINSHIP_STEMS...276 NOUN_DEP_NONKINSHIP_STEMS...135 pcle...1 fpcle...1 pcle/ns...1 Particles...1287 Pronoun...4 Personal...21 Interrogative...6 Indefinite...2 Demonstrative...18 VERBSTEMS...hfst-invert -i crk-orth.hfst -o - |\
		hfst-fst2fst --foma --use-backend-format -i - -o - |\
		gzip > crk-orth.fomabin
Warning: Sublexicon is mentioned but not defined. (Abbreviation)
Warning: Sublexicon is mentioned but not defined. (Numerals)
Warning: Sublexicon is mentioned but not defined. (ProperNoun-crk)
Warning: Sublexicon is mentioned but not defined. (ProperNoun-eng)
Warning: Sublexicon is mentioned but not defined. (Punctuation)
Warning: Sublexicon is mentioned but not defined. (Symbols)
Warning: Sublexicons defined but not used
DERIVATION_NOUN_STEMS NAD_DIM/ISIS NAD_POSS/IM NAD_POSS/IM_DIM/IS NAD_POSS/IM_DIM/ISIS NA_DIM/ISIS NA_POSS/IM_DIM/ISIS NID_DIM/ISIS NID_POSS/IM NID_POSS/IM_DIM/IS NID_POSS/IM_DIM/ISIS NI_DIM/ISIS NI_POSS/IM_DIM/ISIS
Compiling rules.

Storing rules.
Composing and intersecting LEXC and TWOLC transducers.
hfst-compose-intersect -1 crk-morph.hfst -2 crk-phon.hfst | hfst-minimize - -o crk-normative-generator.hfst
hfst-compose-intersect: warning:
Found output multi-char symbols ("DRV-FST") in
transducer in file crk-morph.hfst which are not found on the
input tapes of transducers in file crk-phon.hfst.
hfst-invert -i crk-normative-generator.hfst -o - |\
		hfst-fst2fst --foma --use-backend-format -i - -o - |\
		gzip > crk-normative-generator.fomabin
foma\
		-e "load crk-normative-generator.fomabin" \
		-e "invert net" \
		-e "define M" \
		-e "load crk-orth.fomabin" \
		-e "define O" \
		-e "regex [ M .o. O ];" \
		-e "save stack crk-descriptive-analyzer.fomabin" \
		-s
1.9 MB. 69465 states, 126889 arcs, Cyclic.
1.9 MB. 69465 states, 126889 arcs, Cyclic.
defined M: 1.9 MB. 69465 states, 126889 arcs, Cyclic.
10.3 kB. 18 states, 584 arcs, Cyclic.
defined O: 10.3 kB. 18 states, 584 arcs, Cyclic.
7.1 MB. 115365 states, 466043 arcs, Cyclic.
Writing to file crk-descriptive-analyzer.fomabin.
nepat	IC+nipâw+V+AI+Cnj+Prs+3Sg

nipayan	pê-ayâw+V+AI+Ind+Prs+1Sg
nipayan	nipâw+V+AI+Cnj+Prs+2Sg
nipayan	nipâw+V+AI+Cnj+Prs+1Sg

meyonipat	IC+PV/miyo+nipâw+V+AI+Cnj+Prs+3Sg
meyonipat	IC+PV/miyo+nipâw+V+AI+Cnj+Prs+3Sg+Err/Orth

from plains-cree-fsts.

aarppe avatar aarppe commented on August 15, 2024

I notice that I forgot to invert the generative FST when transforming from HFST to FOMA.

Anyhow, the following code (at the tail end of giella/langs/crk/inc/crk-dict.sh) creates properly a descriptive FOMA analyzer and a normative FOMA generator:

hfst-invert crk-gen-norm-dict.hfst | hfst-fst2fst -b -F -i - -o crk-gen-norm-dict.fomabin

hfst-fst2fst -b -F -i crk-orth.hfst -o crk-orth.fomabin

foma -e"load crk-gen-norm-dict.fomabin" -e"invert net" -e"define Morph" -e"load crk-orth.fomabin" -e"invert net" -e"define Orth" -e"regex [ Morph .o. Orth ];" -e"save stack crk-anl-desc-dict.fomabin" -s

Test cases:

echo meyonipat | flookup -b -q inc/crk-anl-desc-dict.fomabin 
meyonipat	IC+PV/miyo+nipâw+V+AI+Cnj+Prs+3Sg

echo nipayan | flookup -b -q inc/crk-anl-desc-dict.fomabin 
nipayan	pê-ayâw+V+AI+Ind+Prs+1Sg
nipayan	nipâw+V+AI+Cnj+Prs+2Sg
nipayan	nipâw+V+AI+Cnj+Prs+1Sg

echo pê-ayâw+V+AI+Ind+Prs+1Sg | flookup -b -q inc/crk-gen-norm-dict.fomabin 
pê-ayâw+V+AI+Ind+Prs+1Sg	nipê-ayân

echo IC+PV/miyo+nipâw+V+AI+Cnj+Prs+3Sg | flookup -b -q inc/crk-gen-norm-dict.fomabin 
IC+PV/miyo+nipâw+V+AI+Cnj+Prs+3Sg	mêyo-nipât

Note that this current scheme no longer produces +Err/Orth analyses for wrongly hyphenated words, since all +Err/Orth forms are filtered away. At the same time, the latest spellrelax.regex allows for the inserting of hyphens anywhere (but doesn't output the +Err/Orth tag), delivering the expected behavior for a descriptive analyzer. However, filtering out +Err/Orth forms results in known misspellings, e.g. mân for mâna, being excluded throughout. Nevertheless, the normative generator outputs the forms with the hyphens in all the right places.

from plains-cree-fsts.

aarppe avatar aarppe commented on August 15, 2024

Among the various LEXC files selected for the dictionary-only FSTs, I would nevertheless still include numerals.lexc, as that is where this subclass of Indeclining Particles (Ipc) are enumerated, and our dictionary sources do contain number words, e.g. pêyak.

from plains-cree-fsts.

eddieantonio avatar eddieantonio commented on August 15, 2024

Among the various LEXC files selected for the dictionary-only FSTs, I would nevertheless still include numerals.lexc, as that is where this subclass of Indeclining Particles (Ipc) are enumerated, and our dictionary sources do contain number words, e.g. pêyak.

I have... issues... with pêyak being in the same file as Arabic and Roman numerals.

Can I split the legacy lexica into its own file (something like numerals-other.lexc), include this in the normal FST, and intentionally exclude it from the dict FSTs?

from plains-cree-fsts.

eddieantonio avatar eddieantonio commented on August 15, 2024

This is implemented as of 71af97c.

I would still like to split the numerals into Cree numerals and "other" numerals!

from plains-cree-fsts.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.