<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

lower-words and upper-words show only first 100 elements about foma HOT 8 OPEN

GoogleCodeExporter commented on July 21, 2024

lower-words and upper-words show only first 100 elements

from foma.

Comments (8)

GoogleCodeExporter commented on July 21, 2024

You need to use flookup for more fine-grained printing. Also, flookup in 
conjunction with a diff tool can be used for debugging: you can store some 
known pairs in a file manually using flookup's output format, e.g.

rege+Posss2+Plur+Gen+Abl  regédekétől
...

and then use this file as a reference file to diff against.  For example the 
UNIX command:

{{{
cat reference.txt | cut -f1 | flookup -i mygrammar.foma | diff -y reference.txt 
-
}}}

would pass all the words in the left column from reference.txt through flookup, 
and compare the output to those given in the right column of reference.txt, in 
effect giving a listing of all words not generated correctly by the grammar.

Original comment by [email protected] on 2 Jan 2012 at 10:15

from foma.

GoogleCodeExporter commented on July 21, 2024

Thanks for the idea, I can start testing with this. 

Not very nice is, that I have to set up programmatically lists like 
wd+Gram1+Gram2+Gram3...+Gramn, since no human being can set up manually the 
other side (rege-regék-regéim... etc...) for a minumum of 769 cases in a 
realistic time, it is also not a job for a human being to do that. Just try 
once, and you will feel giddy after the first 10 words... no matter, weather 
you do that on your mother-tongue, or not, I can assure you. And if we refer to 
an other similar tool, like sfst, as a generator, it is also not that clever. 
The more tools, the more possibility for errors.  Foma knows everything, why 
does not it say us, what it knows?

I can not see any good reason to limit word list output, for example sfst lists 
nicely, if the list is endless long, endlessly, and that helps in diagnose 
quite a bit.

The present arbitrary limit is not very nice; I had to search around for a long 
time to understand, what happens here.

At least a counter argument could be added as limit, for example:
lower-words 100.000, that should cause to list 100.000 words or the maximum 
available words, if less than 100.000 are available.

If you make a new version, you could consider this. 

Also, the limit and the command behavior should be documented.

Anyway, thanks for your help so far.

Original comment by [email protected] on 2 Jan 2012 at 7:10

from foma.

GoogleCodeExporter commented on July 21, 2024

[deleted comment]

from foma.

GoogleCodeExporter commented on July 21, 2024

I'd like to add one more wish to my wish list: Since it is not easy to match 
word form and grammatic form, I always use lists like:
...
rege+Possp3+Genpl+Sup   regéjükéin
rege+Possp3+Genpl+Ter   regéjükéiig
rege+Possp3+Genpl+Nom   regéjükéi
rege+Posss1p+Gen+Abl    regéimétől
rege+Posss1p+Gen+Acc    regéimét
rege+Posss1p+Gen+Ade    regéiménél
rege+Posss1p+Gen+All    regéiméhez
...
For diagnostics and corrections.

Therefore it would be very good, if foma had a third command besides 
lower-words and upper-words: both-words. Both-words would list both words 
(upper and lower) in one list. That would eliminate the need to use any 
external tool when setting up lexc/foma tools for new languages or new word 
classes in an existing language.

Thank you in advance for considering this in a new version.

Original comment by [email protected] on 3 Jan 2012 at 9:06

from foma.

GoogleCodeExporter commented on July 21, 2024

This deficit is especially therefore annoying, because if I use flookup for 
checking, I can not see, if undesirable word forms are still there.
Hungarian nouns have as a minimum 769 word forms, verbs 450, adjectives over 
1200.

Original comment by [email protected] on 27 Mar 2012 at 8:54

from foma.

GoogleCodeExporter commented on July 21, 2024

We are working on a project to create spell checkers for Quechua, Aymara and 
Guaraní, which are indigenous languages in Bolivia. We would greatly 
appreciate it if an option were added to view all possible combinations with 
the "print upper" and "print lower" commands. In Quechua and Aymara, root words 
can have up to 14 suffixes and the number of possible combinations of suffixes 
is probably more than a thousand. We need to see all the combinations to 
eliminate any errors.

Best regards and thanks for all the work on Foma,
Amos Batto

Original comment by [email protected] on 19 Sep 2012 at 11:25

from foma.

GoogleCodeExporter commented on July 21, 2024

I decided to change the source code to print an unlimited number with "print 
upper-words" and "print lower-words". 

I changed lines 663 and 979 of iface.c from:
  for (i = limit; i > 0; i--) { 

To:
  while (1) {

After a recompile, Foma printed an unlimited number of the upper and lower 
words. 

However, I discovered by reading the source code in the file interface.l that 
it isn't necessary to change the source code because Foma already has an 
undocumented option to specify a different limit for the "print upper-words" 
and "print lower-words" commands. 

For example, to print up to a thousand upper words, use the command:
foma[1]: print upper-words 1000

The documentation for Foma needs to be changed to inform the user about this 
option. To do this, change line 138 in iface.c from:
    {"print lower-words","prints words on the lower-side of top FSM",""},
to:
    {"print lower-words <limit>","prints words on the lower-side of top FSM","By default the limit is 100"},

There is currently no documentation for the "print upper-words" command, so 
also add this line to iface.c in the same array:
    {"print upper-words <limit>","prints words on the upper-side of top FSM","By default the limit is 100"},


By the way, the Foma also needs documentation about its comments, so also add a 
line like this:
    {"#...","comment","All text following # will be ignored"},

Original comment by [email protected] on 19 Sep 2012 at 4:35

from foma.

GoogleCodeExporter commented on July 21, 2024

Thanks a lot for your valuable input. print upper-words 10000 works fine for 
me, and solved the problem of too-few output lines.

Original comment by [email protected] on 28 Sep 2012 at 2:04

from foma.

lower-words and upper-words show only first 100 elements about foma HOT 8 OPEN

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent