mk270 / whitakers-words Goto Github PK

View Code? Open in Web Editor NEW

273.0 25.0 55.0 7.06 MB

William Whitaker's WORDS, a Latin dictionary

Home Page: http://mk270.github.io/whitakers-words/

License: Other

Makefile 0.32% Ada 99.46% Shell 0.19% sed 0.02%

whitakers-words's Introduction

Project Website

WORDS

This is a cleaned-up version of the port of William Whitaker's WORDS programme, a Latin-English dictionary with inflectional morphology support; the original author passed away in 2010, so any and all help maintaining the software as development and execution environments evolve would be greatly appreciated.

Effectively, this is an exercise in digital preservation.

Contributing

Help is needed maintaining the code for future users; in particular, it does not currently support vowel length, so it may be necessary to gather a group of Latin experts to adjust its lexicon of several thousand words.

If you contribute, please be sure to indicate your assent to redistributing your contributions under the same terms as the existing software; this will minimise copyright hassles in the future.

Usage

$ make
$ bin/words

Documentation

See the HOWTO.txt file included, and documentation on the Project Website

Build-time Dependencies

GPRBuild
gnat

On a Debian-like system, you can install these roughly as follows:

$ apt-get install gprbuild gnat

GNAT versions before 4.9 are believed to link against a buggy runtime on 64-bit platforms, so should be avoided.

Licensing

WORDS, a Latin dictionary, by Colonel William Whitaker (USAF, Retired)

This is a free program, which means it is proper to copy it and pass it on to your friends. Consider it a developmental item for which there is no charge. However, just for form, it is Copyrighted (c). Permission is hereby freely given for any and all use of program and data. You can sell it as your own, but at least tell me.

This version is distributed without obligation, but the developer would appreciate comments and suggestions.

All parts of the WORDS system, source code and data files, are made freely available to anyone who wishes to use them, for whatever purpose.

whitakers-words's People

Contributors

Stargazers

Watchers

whitakers-words's Issues

libraryised WORDS failing on Travis

I am getting a bug from the Ada or C runtime when running on Travis: https://travis-ci.org/mk270/whitakers-words/builds/115703196

*** glibc detected *** bin/words: free(): invalid pointer: 0x00002b81d113dfc0 ***

Ok, I have a better repro now, at https://travis-ci.org/mk270/whitakers-words/builds/117128661 :

+diff -q -- - test/expected.txt
Appending Word: [rem]
Appended Word: [rem]
Appending Word: [acu]
*** glibc detected *** bin/words: free(): invalid pointer: 0x00002b641e9f9fc0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x7db26)[0x2b641ec9db26]
/usr/lib/x86_64-linux-gnu/libgnat-4.6.so.1(__gnat_free+0x15)[0x2b641e72d5b5]
/home/travis/build/mk270/whitakers-words/bin/../lib/libwords_engine.so.0(words_engine__parse__word_container__elements_arraySAXn+0x1c3)[0x2b641dd80405]
/home/travis/build/mk270/whitakers-words/bin/../lib/libwords_engine.so.0(words_engine__parse__word_container__insert__4Xn+0xefd)[0x2b641dd7a186]
/home/travis/build/mk270/whitakers-words/bin/../lib/libwords_engine.so.0(words_engine__parse__word_container__append__2Xn+0x88)[0x2b641dd785ed]
/home/travis/build/mk270/whitakers-words/bin/../lib/libwords_engine.so.0(+0x4471f)[0x2b641dd8d71f]
/home/travis/build/mk270/whitakers-words/bin/../lib/libwords_engine.so.0(words_engine__parse__analyse_line+0x523)[0x2b641dd8e0c2]
/home/travis/build/mk270/whitakers-words/bin/../lib/libwords_engine.so.0(words_engine__parse__parse_line+0x324)[0x2b641dd8f9c1]
bin/words[0x404bd3]
bin/words[0x40825b]
bin/words[0x404087]
bin/words[0x404646]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x2b641ec4176d]
bin/words[0x403fa9]

The Make_Words() function is being called on the string "rem acu tetigisti", suggesting it is failing at

whitakers-words/src/words_engine/words_engine-parse.adb

Line 1052 in 2d3d683

Word_Container.Append (Container => Words, New_Item => US);

           Put_Line (Standard_Error, "Appending Word: [" & S & "]");
           Word_Container.Append (Container => Words, New_Item => US);
           Put_Line (Standard_Error, "Appended Word: [" & S & "]");

This stuff started to break after some code was put into a library, and then its initialisation was also moved to that library; it works fine everywhere it has been tested other than the Travis automated testing system.

infinite loop translating bizarre compound

The word "bestiasviginti" gets WORDS into an infinite loop, apparently.

Document what each of the source files does, and how they fit together

Get (in String; out Item; out Last) in many IO packages for null records is incorrect.

Last is always set to negative value, which under certain circumstances may lead to incorrect results.

-- could quickly write up another, less obscure, example resulting in Constraint_Error
declare     
   High : Integer := 0; -- should be called Low, but whatever
   Last : Integer := 0;     
   Var  : String (1 .. 100);        
   Prep : Preposition_Record;       
   Conj : Interjection_Record;      
   Noun : Noun_Record;      
begin       
   Preposition_Record_IO.Get (Var, Prep, High);     
   Interjection_Record_IO.Get (Var (High .. Var'Last), Conj, Last);     
   High := High + Last; -- High now is equal to High - 1;, thus     
   -- Noun_Record is 'Get' from Var last char of Preposition_Record!        
   Noun_Record_IO.Get (Var (High .. Var'Last), Noun, Last);     
end;

Last should be set into position of last character in String that was processed inside Get.
In case of Inflections_Package.Interjection_Record_IO (among others) it is set to -1 in all cases (should be 0).
Unfortunately Interjection_Record_IO.Get can't be fixed at the moment due to different bug somewhere between makeinfl.adb and Inflections_Package.Ending_Record_IO.

get makefile to respect dependencies

remove global state from parse/print routines

placeholder - this will be filled in with proper details

Basically, I think this means making Xxx_Meaning, Yyy_Meaning, Rrr_Meaning and friends all be part of a record that is passed down the call stack, and not much more.

Feature: add some sequence to print entire conjugation/declension table

It is a shame this feature wasn't built in from the start, since it seems to be rather simple to do. I will work on it (you can assign the task to me), the only issue I have now is how to initialize this mode.

One possibility which I quite like is to prefix the word with a dot (.) to parse it and print the entire table of all forms possible for that particular word. So .militis would print the usual analysis + the entire table for the word miles (singular/plural, all cases).

Since we have all the information in the inflection files, we can just cut&paste various things together and print. Basically, for each POS there will be a form printer which returns a string with the table, or prints it directly (working with strings is rather weird in Ada).

Make fails on Cygwin

...
gprbuild -j4 -Pwords meanings
Compile
[Ada] meanings.adb
Bind
[gprbind] meanings.bexch
[Ada] meanings.ali
Link
[link] meanings.adb
echo g | bin/makedict DICTLINE.GEN > /dev/null
C:/cygwin64/home/byron/dev/whitakers-words/bin/makedict.exe: error while loading shared libraries: ?: cannot open shared object file: No such file or directory
make: *** [Makefile:9: DICTFILE.GEN] Error 127

Unexpected exception in CYCLE_OVER_PA processing words with the iv SUFFIX

I ran across over 100 words (with a total of over 1000 occurrences) words that give me the Unexpected exception in CYCLE_OVER_PA processing ....
A quick look on the list of words that emit this warning suggests that this is a group of similarly derived adjectives, and that the problem only occurs with feminine/neuter forms ending in -a:

miraculosa, opprobriosa, poenosa, saporosa, taediosa, venenosa
prophetica

Most of them are formed by the -iv SUFFIX with 85 of them ending in -tiva
ablutiva, acquisitiva, adinventiva, aedificativa, aestimativa, afflictiva, appetitiva, assimilativa, augmentativa, benefactiva, calefactiva, cogitativa, cognitiva, cohibitiva, communicativa, commutativa, concretiva, confortativa, consecrativa, consiliativa, consummativa, contumeliativa, deiectiva, designativa, determinativa, distinctiva, divinativa, excitativa, executiva, factiva, figurativa, formativa, germinativa, gubernativa, impeditiva, imperativa, impetrativa, impletiva, inflativa, informativa, inquisitiva, intellectiva, interpretativa, intimativa, iudicativa, iustificativa, liquefactiva, medicativa, memorativa, mitigativa, moderativa, modificativa, negotiativa, nutritiva, operativa, opinativa, ordinativa, participativa, perceptiva, perfectiva, praeceptiva, praefigurativa, praeparativa, privativa, productiva, prohibitiva, provocativa, rarefactiva, receptiva, recordativa, reformativa, regnativa, resolutiva, respectiva, sanativa, sanctificativa, significativa, speculativa, spirativa, subiectiva, susceptiva, transitiva, unitiva, vindicativa, vivificativa
apprehensiva, conversiva, dimensiva, discursiva, discussiva, divisiva, laesiva, ostensiva, remissiva, successiva, visiva

(If this helps: I am running words on a wordlist extracted from Summa Theologiae by Thomas Aquinas.)

create test suite

initially just a bunch of latin phrases that exercise the various code paths (e.g., that future passive supine craziness)

"meaning" comparison always fails

In word_package.adb:1422, there's a test:

-- there is no way this condition can be true;
-- packon_length - 1 /= packon_length
if (trim(mean)(1..4) = "(w/-" and then  --  Does attached PACKON agree
    trim(mean)(5..4+packon_length) = trim(packons(k).tack))   then

The comparands in the second arm of the test are guaranteed to be of different lengths, so the condition always fails, presumably

Build failing with recent gcc due to -gnatwe

The build fails with various use clause for package "<some package>" has no effect warnings, which are turned into errors.

Removing -gnatwe from the .gpr files allows it to compile, though obviously it still prints those warnings.

Wrong Pearse code with suffix -e

bin/words sancte perfide improbe

The Pearse code on the "e SUFFIX" line is for some reason 01, not 05 as expected.

I assume one has to look around src/words_engine/words_engine-list_package.adb lines 479--485 and 725--750, but I was not able to solve the issue.
(It's also not clear to me why this is not handled through ADDONS.LAT.)

Make fails

I downloaded the files and followed make instructions and get the following error:

raised SYSTEM.ASSERTIONS.ASSERT_FAILURE : binde.adb:1005
gprlib: invocation of /usr/bin/gnatbind failed
gprbuild: could not build library for project words_engine
Makefile:6: recipe for target 'bin/words' failed
make: *** [bin/words] Error 4

This is on Ubuntu 17.10 (beta), on the previous version of Ubuntu the process was successful.

unknowns handling disabled

... had to be disabled to get recent commit to typecheck

Factor out the repetitive elements of the code, resulting from cut-n-paste

e.g., in parse.adb

Discussion: let's use milestones to group features for release

Since we are going beyond the "base" WW, we will need to do releases every once in a while.

I suggest we put feature and bug tasks under a milestone so we can easily review what is done and what isn't to produce reasonable changelogs.

Alternatively, tags & search works too, but I find milestones to be a better interface.

API - Call from bash script

Is there a simple way to call whitakers-words from a bash script, with parameters passed, and have only the output returned, instead of the welcome menu returned etc.

stem results duplicated

In the words that trigger #76 the output is going to be longish anyway, because miraculosa could be feminine {nom,voc,abl} singular, or neuter {nom,voc,acc} plural. However, each possibility appears twice in the output

tackons may not be working

"videsne" is decomposed correctly by the copy of WORDS at Notre Dame, but not by our version

The token "TACKON" should appear in the gloss, but it doesn't seem to.

Expected: output analogous to http://www.archives.nd.edu/cgi-bin/wordz.pl?keyword=videsne

ne                   TACKON                             
-ne = is it not that (enclitic); or ...(introduces a question or alternative);
vid.es               V      2 1 PRES ACTIVE  IND 2 S    
video, videre, vidi, visus  V   [XXXAX]  
see, look at; consider; (PASS) seem, seem good, appear, be seen;

Observed, e.g. via http://latin.ucant.org/cgi-bin/translate.cgi?query=videsne :

videsne                          ========   UNKNOWN

This may be a config error, an error we've introduced, or a bug introduced by Whitaker after Notre Dame took their copy (or even worse, a bug silently fixed by ND)

remove state from List_Package.List_Stems()

this function should not have "in out" parameters; fixing this is at the core of making WORDS more flexible

provide binaries for Windows, MacOS, Android and Linux

Any reason for using Ada 2005 rather than 2012?

The .gpr files specify Ada 2005 rather than 2012, which is the latest version. Is there any reason for this? I am new to Ada so there may be (compiler support, etc.).

re-combine source files that are cut-and-paste duplicates differing by a few lines

The substantive difference between meanings.adb and words.adb is:

-   configuration := only_meanings;
+   configuration := developer_version;

similarly, makedict.adb and wakedict.adb are near-identical.

deprecate build-time dependency on gprbuild

see https://gcc.gnu.org/onlinedocs/gnat_ugn/Automatically-Creating-a-List-of-Directories.html#141
https://gcc.gnu.org/onlinedocs/gnat_ugn/Using-gnatmake-in-a-Makefile.html

alternatively, establish that tdm-gcc can provide gprbuild appopriately

Option to hotlink to Perseus or other defn. source

For graphical builds of Whitaker's words, the root definition would be improved if it could hotlink to Perseus or another more extensive Latin dictionary so that usages in context could be seen. Thanks!

Priority: low

Add support for macrons (vowel length-marks)

This would probably need a lot of work over the dictionary, but if we make macrons/lengths supported in the code, the database/dictionary can simply be slowly updated "on the fly".

sraa buffer can be overrun

Triggered by the words in #76

Establish src/tools as an integrated part of the build process; adjust composition of Latin_Util library

I have created branch ( https://github.com/darkestkhan/whitakers-words/tree/tools ) containing gpr file for building thing tools in src/tools. Thing is: there were many warnings triggered when I tried to compile them all, and chances are high that some additional packages from src/commands need to be moved under Latin_Util library. (this is relatively simple and good)

"<=" for parse_records

While fixing indentation I found in at least 2 files declaration of "<=" function for parse_record type.

hisco word entry

The infinitive should be hiscere, not hiscare (c.f. Juvenal 5.127 and Lewis and Short entry
http://www.perseus.tufts.edu/hopper/morph?l=hisco&la=la#lexicon).

data files are CRLF sensitive

[database] Combine multiple lines for one word into a single entry

Right now, the format is rather redundant, with data listed multiple times only for presentation purposes (the | at the beginning of translation tells it to append that to the output). I fail to see why it couldn't be one line split at | character.

Example (notice how all the information sans translation is duplicated):

  20601 fi                 f                  zzz                fact               V      3 3 SEMIDEP      X X X A O happen, come about; result (from) ; take place, be held, occur, arise (event);
  20602 fi                 f                  zzz                fact               V      3 3 SEMIDEP      X X X A O |be made/created/instituted/elected/appointed/given; be prepared/done; develop;
  20603 fi                 f                  zzz                fact               V      3 3 SEMIDEP      X X X A O ||be made/become; (facio PASS); [  20603 fiat => so be it, very well; it is being done];

audit all the elaboration code and pragmas

add makefile target for redistributable binaries, with trap for GNAT GPL

Rationale: AdaCore's GNAT GPL is not a permissible tool for making redistributable binaries of Whitakers' Words, due to copyright licensing conflicts between Words and the runtime libraries linked by that compiler; FSF GNAT, and AdaCore's GNAT Pro are Ok.

Note that FSF GNAT does not currently have a good gprbuild, and that GNAT Pro is Ok for building

The thing needs to check that the version string from gnatmake --version looks like:

GNATMAKE 4.6

and not at all like

GNATMAKE GPL 2015 (20150428-49)

English to Latin not working

As far as I can tell, English to Latin mode is not currently working.

Trailing space results in unhandled error.

when a string with a trailing space, i.e. "viginti " or "viginti provincias " the webpage returns an alert with Error: An unhandled error occurred

Happens in Chrome 62.0.3202.94, Firefox 39.0, and Safari 10.1.2

coalesce gpr files / Makefile into single build system

In particular, modifying the *.ad[sb] files is not triggering rebuild from make

makeinfl is raising Ada.IO_Exceptions.Data_Error a LOT.

This is visible when running makeinfl - all these **** lines are Put by exception handler.
(if you replace it with:

            exception
               when E : Constraint_Error | IO_Exceptions.Data_Error  =>
                  Put_Line (Ada.Exceptions.Exception_Name (E) & " " &
                            line (1 .. last));

then you will have exception name printed instead of useless **** (you need to with Ada.Exceptions for that))

Quite unfortunately subprogram responsible for that - file_inflection_sections - is heavy user of state, which makes understanding flow of data quite hard (not to mention not so surprising issue with cryptic identifiers)

Drawing out record composition

While working on dictionary_package I noticed that records are highly composed - meaning that you get records.with_record_fields.with_even_more_record_fields.[...]

What would be preferred format/way of drawing/writing this composition diagram?

I'm raising this as issue because I noticed that it is highly probable that we could use inheritance polymorphism to greatly cut down on complexity and size of code (but this would demand total refactor of records)

API: disentangle parse results from output generation

The context here is src/words_engine/words_engine-list_package.adb

Style and quality changes that need to be done

I will list all style changes that need to be done for style changes to be completed:

~~all names (especially public ones) shall¹ use Ada naming convention [easy but tedious]~~ ²
~~there shall spaces between binary operators and on left side of left parenthesis (unless it is preceded by apostrophe) [easy but tedious] ³~~
~~there shall be no lines longer than 80 columns~~
all identifiers shall be as self-documenting as reasonably possible (s, ss, ssa? wtf does this even mean?) [one caveat - beside capitalization - don't change values in enumerations]

Also try to use FIXME, TODO and NOTE for (respectively) comments of bugs/problems, things that have to be done, description of non-obvious behavior, as they are easy to search for (with grep for example)

¹ used shall as 'should' is too permissive
² for interested: https://en.wikibooks.org/wiki/Ada_Style_Guide is commonly used quality and style guide for Ada, together with some examples and rationales behind style rules

Determine if binary data files are still necessary

Words uses binary data files generated from plain text input for performance. This adds complexity that may be unnecessary on modern hardware.

Attempting translation returns invalid pointer error

Running: bin/words returns:

INFLECTION_ARRAY being loaded   --    1785 entries    --  Loaded correctly
GENERAL Dictionary loading      --   62085 stems      --  Loaded correctly
UNIQUES file loading            --      74 entries    --  Loaded correctly
ADDONS loading 18+11 TACKONS 6+129 PREFIXES 179 SUFFIXES   --  Loaded correctly
Copyright (c) 1993-2006 - Free for any use - Version 1.97FC
For updates and latest version check http://www.erols.com/whitaker/words.htm
Comments? William Whitaker, Box 51225  Midland  TX  79710  USA - [email protected]

Input a word or line of Latin and ENTER to get the forms and meanings
    Or Input @ and the name of a file containing words or lines
    Or Input # to change parameters and mode of the program
    Or Input ? to get help wherever available on individual parameters
Two empty lines (just a RETURN/ENTER) from the keyboard exits the program
English-to-Latin available
~E changes to English-to-Latin, ~L changes back     [tilde E]

=>Test
*** Error in `bin/words': free(): invalid pointer: 0x00007ff538ed76a0 ***
Exception in PARSE_LINE processing Test

=>

Break up the very deep nesting of some of the code blocks

e.g., in parse.adb

comma in the input in Words Online causes an error

go to http://latin.ucant.org/
enter "Ego sum pauper, nihil habeo." or any other string containing a comma
there is an error pop-up saying "Error: Non-Latin character in word"

The same input is handled correctly in the offline version (as compiled from current master).
The full stop itself does not cause any problems - it's in the comma.

web version hangs on lengthy entries

This is a bug with the linked project website; I'm not sure if there's a better place for this.

I've noticed that the online version seems to hang when I try to translate canis. After building Words from the repository and running bin/words, I found that canis, by virtue of having lots of lengthy entries, causes the interactive environment to prompt the user to press ENTER for more. It seems like the online version is doing something similar, causing it to hang. I haven't been able to identify any other Latin words that cause this problem.

For comparison, the CGI version hosted by Notre Dame seems to translate canis just fine.

Add License...

Whitaker's original license said "All parts of the WORDS system, source code and data files, are made freely available to anyone who wishes to use them, for whatever purpose." -- should add this to the README.md and perhaps choose a compatible modern license to go with this (CC0?)

Mostly this issue is just to say how excited that you're making this archival version and perhaps development can continue! thanks! Feel free to close immediately.

Tooltips for definitions of terms

A line such as

 i.am                 V      6 1 FUT  ACTIVE  IND 1 S      Late      sometime

should, in graphical environments, have the option of mousing over "V" and seeing "Verb", S and seeing "singular", "1" for "1st person" etc. To make the usage easier for new users. Thanks!