Giter VIP home page Giter VIP logo

whitakers-words's Introduction

Project Website

WORDS

This is a cleaned-up version of the port of William Whitaker's WORDS programme, a Latin-English dictionary with inflectional morphology support; the original author passed away in 2010, so any and all help maintaining the software as development and execution environments evolve would be greatly appreciated.

Effectively, this is an exercise in digital preservation.

Contributing

Help is needed maintaining the code for future users; in particular, it does not currently support vowel length, so it may be necessary to gather a group of Latin experts to adjust its lexicon of several thousand words.

If you contribute, please be sure to indicate your assent to redistributing your contributions under the same terms as the existing software; this will minimise copyright hassles in the future.

Usage

$ make
$ bin/words

Documentation

See the HOWTO.txt file included, and documentation on the Project Website

Build-time Dependencies

  • GPRBuild
  • gnat

On a Debian-like system, you can install these roughly as follows:

$ apt-get install gprbuild gnat

GNAT versions before 4.9 are believed to link against a buggy runtime on 64-bit platforms, so should be avoided.

Licensing

WORDS, a Latin dictionary, by Colonel William Whitaker (USAF, Retired)

Copyright William A. Whitaker (1936-2010)

This is a free program, which means it is proper to copy it and pass it on to your friends. Consider it a developmental item for which there is no charge. However, just for form, it is Copyrighted (c). Permission is hereby freely given for any and all use of program and data. You can sell it as your own, but at least tell me.

This version is distributed without obligation, but the developer would appreciate comments and suggestions.

All parts of the WORDS system, source code and data files, are made freely available to anyone who wishes to use them, for whatever purpose.

whitakers-words's People

Contributors

ansa211 avatar apt1002 avatar asarhaddon avatar calumapplepie avatar darkestkhan avatar elhaem avatar fuco1 avatar hugovk avatar ids1024 avatar jap2-0 avatar mbottini avatar mk270 avatar spr93 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

whitakers-words's Issues

libraryised WORDS failing on Travis

I am getting a bug from the Ada or C runtime when running on Travis: https://travis-ci.org/mk270/whitakers-words/builds/115703196

*** glibc detected *** bin/words: free(): invalid pointer: 0x00002b81d113dfc0 ***

Ok, I have a better repro now, at https://travis-ci.org/mk270/whitakers-words/builds/117128661 :

+diff -q -- - test/expected.txt
Appending Word: [rem]
Appended Word: [rem]
Appending Word: [acu]
*** glibc detected *** bin/words: free(): invalid pointer: 0x00002b641e9f9fc0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x7db26)[0x2b641ec9db26]
/usr/lib/x86_64-linux-gnu/libgnat-4.6.so.1(__gnat_free+0x15)[0x2b641e72d5b5]
/home/travis/build/mk270/whitakers-words/bin/../lib/libwords_engine.so.0(words_engine__parse__word_container__elements_arraySAXn+0x1c3)[0x2b641dd80405]
/home/travis/build/mk270/whitakers-words/bin/../lib/libwords_engine.so.0(words_engine__parse__word_container__insert__4Xn+0xefd)[0x2b641dd7a186]
/home/travis/build/mk270/whitakers-words/bin/../lib/libwords_engine.so.0(words_engine__parse__word_container__append__2Xn+0x88)[0x2b641dd785ed]
/home/travis/build/mk270/whitakers-words/bin/../lib/libwords_engine.so.0(+0x4471f)[0x2b641dd8d71f]
/home/travis/build/mk270/whitakers-words/bin/../lib/libwords_engine.so.0(words_engine__parse__analyse_line+0x523)[0x2b641dd8e0c2]
/home/travis/build/mk270/whitakers-words/bin/../lib/libwords_engine.so.0(words_engine__parse__parse_line+0x324)[0x2b641dd8f9c1]
bin/words[0x404bd3]
bin/words[0x40825b]
bin/words[0x404087]
bin/words[0x404646]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x2b641ec4176d]
bin/words[0x403fa9]

The Make_Words() function is being called on the string "rem acu tetigisti", suggesting it is failing at

Word_Container.Append (Container => Words, New_Item => US);

           Put_Line (Standard_Error, "Appending Word: [" & S & "]");
           Word_Container.Append (Container => Words, New_Item => US);
           Put_Line (Standard_Error, "Appended Word: [" & S & "]");

This stuff started to break after some code was put into a library, and then its initialisation was also moved to that library; it works fine everywhere it has been tested other than the Travis automated testing system.

Get (in String; out Item; out Last) in many IO packages for null records is incorrect.

Last is always set to negative value, which under certain circumstances may lead to incorrect results.

-- could quickly write up another, less obscure, example resulting in Constraint_Error
declare     
   High : Integer := 0; -- should be called Low, but whatever
   Last : Integer := 0;     
   Var  : String (1 .. 100);        
   Prep : Preposition_Record;       
   Conj : Interjection_Record;      
   Noun : Noun_Record;      
begin       
   Preposition_Record_IO.Get (Var, Prep, High);     
   Interjection_Record_IO.Get (Var (High .. Var'Last), Conj, Last);     
   High := High + Last; -- High now is equal to High - 1;, thus     
   -- Noun_Record is 'Get' from Var last char of Preposition_Record!        
   Noun_Record_IO.Get (Var (High .. Var'Last), Noun, Last);     
end;        

Last should be set into position of last character in String that was processed inside Get.
In case of Inflections_Package.Interjection_Record_IO (among others) it is set to -1 in all cases (should be 0).
Unfortunately Interjection_Record_IO.Get can't be fixed at the moment due to different bug somewhere between makeinfl.adb and Inflections_Package.Ending_Record_IO.

remove global state from parse/print routines

placeholder - this will be filled in with proper details

Basically, I think this means making Xxx_Meaning, Yyy_Meaning, Rrr_Meaning and friends all be part of a record that is passed down the call stack, and not much more.

Feature: add some sequence to print entire conjugation/declension table

It is a shame this feature wasn't built in from the start, since it seems to be rather simple to do. I will work on it (you can assign the task to me), the only issue I have now is how to initialize this mode.

One possibility which I quite like is to prefix the word with a dot (.) to parse it and print the entire table of all forms possible for that particular word. So .militis would print the usual analysis + the entire table for the word miles (singular/plural, all cases).

Since we have all the information in the inflection files, we can just cut&paste various things together and print. Basically, for each POS there will be a form printer which returns a string with the table, or prints it directly (working with strings is rather weird in Ada).

Make fails on Cygwin

...
gprbuild -j4 -Pwords meanings
Compile
[Ada] meanings.adb
Bind
[gprbind] meanings.bexch
[Ada] meanings.ali
Link
[link] meanings.adb
echo g | bin/makedict DICTLINE.GEN > /dev/null
C:/cygwin64/home/byron/dev/whitakers-words/bin/makedict.exe: error while loading shared libraries: ?: cannot open shared object file: No such file or directory
make: *** [Makefile:9: DICTFILE.GEN] Error 127

Unexpected exception in CYCLE_OVER_PA processing words with the iv SUFFIX

I ran across over 100 words (with a total of over 1000 occurrences) words that give me the Unexpected exception in CYCLE_OVER_PA processing ....
A quick look on the list of words that emit this warning suggests that this is a group of similarly derived adjectives, and that the problem only occurs with feminine/neuter forms ending in -a:

miraculosa, opprobriosa, poenosa, saporosa, taediosa, venenosa
prophetica

Most of them are formed by the -iv SUFFIX with 85 of them ending in -tiva
ablutiva, acquisitiva, adinventiva, aedificativa, aestimativa, afflictiva, appetitiva, assimilativa, augmentativa, benefactiva, calefactiva, cogitativa, cognitiva, cohibitiva, communicativa, commutativa, concretiva, confortativa, consecrativa, consiliativa, consummativa, contumeliativa, deiectiva, designativa, determinativa, distinctiva, divinativa, excitativa, executiva, factiva, figurativa, formativa, germinativa, gubernativa, impeditiva, imperativa, impetrativa, impletiva, inflativa, informativa, inquisitiva, intellectiva, interpretativa, intimativa, iudicativa, iustificativa, liquefactiva, medicativa, memorativa, mitigativa, moderativa, modificativa, negotiativa, nutritiva, operativa, opinativa, ordinativa, participativa, perceptiva, perfectiva, praeceptiva, praefigurativa, praeparativa, privativa, productiva, prohibitiva, provocativa, rarefactiva, receptiva, recordativa, reformativa, regnativa, resolutiva, respectiva, sanativa, sanctificativa, significativa, speculativa, spirativa, subiectiva, susceptiva, transitiva, unitiva, vindicativa, vivificativa
apprehensiva, conversiva, dimensiva, discursiva, discussiva, divisiva, laesiva, ostensiva, remissiva, successiva, visiva

(If this helps: I am running words on a wordlist extracted from Summa Theologiae by Thomas Aquinas.)

create test suite

initially just a bunch of latin phrases that exercise the various code paths (e.g., that future passive supine craziness)

"meaning" comparison always fails

In word_package.adb:1422, there's a test:

-- there is no way this condition can be true;
-- packon_length - 1 /= packon_length
if (trim(mean)(1..4) = "(w/-" and then  --  Does attached PACKON agree
    trim(mean)(5..4+packon_length) = trim(packons(k).tack))   then

The comparands in the second arm of the test are guaranteed to be of different lengths, so the condition always fails, presumably

Build failing with recent gcc due to -gnatwe

The build fails with various use clause for package "<some package>" has no effect warnings, which are turned into errors.

Removing -gnatwe from the .gpr files allows it to compile, though obviously it still prints those warnings.

Wrong Pearse code with suffix -e

bin/words sancte perfide improbe

The Pearse code on the "e SUFFIX" line is for some reason 01, not 05 as expected.

I assume one has to look around src/words_engine/words_engine-list_package.adb lines 479--485 and 725--750, but I was not able to solve the issue.
(It's also not clear to me why this is not handled through ADDONS.LAT.)

Make fails

I downloaded the files and followed make instructions and get the following error:

raised SYSTEM.ASSERTIONS.ASSERT_FAILURE : binde.adb:1005
gprlib: invocation of /usr/bin/gnatbind failed
gprbuild: could not build library for project words_engine
Makefile:6: recipe for target 'bin/words' failed
make: *** [bin/words] Error 4

This is on Ubuntu 17.10 (beta), on the previous version of Ubuntu the process was successful.

Discussion: let's use milestones to group features for release

Since we are going beyond the "base" WW, we will need to do releases every once in a while.

I suggest we put feature and bug tasks under a milestone so we can easily review what is done and what isn't to produce reasonable changelogs.

Alternatively, tags & search works too, but I find milestones to be a better interface.

API - Call from bash script

Is there a simple way to call whitakers-words from a bash script, with parameters passed, and have only the output returned, instead of the welcome menu returned etc.

stem results duplicated

In the words that trigger #76 the output is going to be longish anyway, because miraculosa could be feminine {nom,voc,abl} singular, or neuter {nom,voc,acc} plural. However, each possibility appears twice in the output

tackons may not be working

"videsne" is decomposed correctly by the copy of WORDS at Notre Dame, but not by our version

The token "TACKON" should appear in the gloss, but it doesn't seem to.

Expected: output analogous to http://www.archives.nd.edu/cgi-bin/wordz.pl?keyword=videsne

ne                   TACKON                             
-ne = is it not that (enclitic); or ...(introduces a question or alternative);
vid.es               V      2 1 PRES ACTIVE  IND 2 S    
video, videre, vidi, visus  V   [XXXAX]  
see, look at; consider; (PASS) seem, seem good, appear, be seen;

Observed, e.g. via http://latin.ucant.org/cgi-bin/translate.cgi?query=videsne :

videsne                          ========   UNKNOWN    

This may be a config error, an error we've introduced, or a bug introduced by Whitaker after Notre Dame took their copy (or even worse, a bug silently fixed by ND)

Option to hotlink to Perseus or other defn. source

For graphical builds of Whitaker's words, the root definition would be improved if it could hotlink to Perseus or another more extensive Latin dictionary so that usages in context could be seen. Thanks!

Priority: low

Add support for macrons (vowel length-marks)

This would probably need a lot of work over the dictionary, but if we make macrons/lengths supported in the code, the database/dictionary can simply be slowly updated "on the fly".

"<=" for parse_records

While fixing indentation I found in at least 2 files declaration of "<=" function for parse_record type.

[database] Combine multiple lines for one word into a single entry

Right now, the format is rather redundant, with data listed multiple times only for presentation purposes (the | at the beginning of translation tells it to append that to the output). I fail to see why it couldn't be one line split at | character.

Example (notice how all the information sans translation is duplicated):

  20601 fi                 f                  zzz                fact               V      3 3 SEMIDEP      X X X A O happen, come about; result (from) ; take place, be held, occur, arise (event);
  20602 fi                 f                  zzz                fact               V      3 3 SEMIDEP      X X X A O |be made/created/instituted/elected/appointed/given; be prepared/done; develop;
  20603 fi                 f                  zzz                fact               V      3 3 SEMIDEP      X X X A O ||be made/become; (facio PASS); [  20603 fiat => so be it, very well; it is being done];

add makefile target for redistributable binaries, with trap for GNAT GPL

Rationale: AdaCore's GNAT GPL is not a permissible tool for making redistributable binaries of Whitakers' Words, due to copyright licensing conflicts between Words and the runtime libraries linked by that compiler; FSF GNAT, and AdaCore's GNAT Pro are Ok.

Note that FSF GNAT does not currently have a good gprbuild, and that GNAT Pro is Ok for building

The thing needs to check that the version string from gnatmake --version looks like:

GNATMAKE 4.6

and not at all like

GNATMAKE GPL 2015 (20150428-49)

Trailing space results in unhandled error.

when a string with a trailing space, i.e. "viginti " or "viginti provincias " the webpage returns an alert with Error: An unhandled error occurred

Happens in Chrome 62.0.3202.94, Firefox 39.0, and Safari 10.1.2

makeinfl is raising Ada.IO_Exceptions.Data_Error a LOT.

This is visible when running makeinfl - all these **** lines are Put by exception handler.
(if you replace it with:

            exception
               when E : Constraint_Error | IO_Exceptions.Data_Error  =>
                  Put_Line (Ada.Exceptions.Exception_Name (E) & " " &
                            line (1 .. last));

then you will have exception name printed instead of useless **** (you need to with Ada.Exceptions for that))

Quite unfortunately subprogram responsible for that - file_inflection_sections - is heavy user of state, which makes understanding flow of data quite hard (not to mention not so surprising issue with cryptic identifiers)

Drawing out record composition

While working on dictionary_package I noticed that records are highly composed - meaning that you get records.with_record_fields.with_even_more_record_fields.[...]

What would be preferred format/way of drawing/writing this composition diagram?

I'm raising this as issue because I noticed that it is highly probable that we could use inheritance polymorphism to greatly cut down on complexity and size of code (but this would demand total refactor of records)

Style and quality changes that need to be done

I will list all style changes that need to be done for style changes to be completed:

  • all names (especially public ones) shall¹ use Ada naming convention [easy but tedious] ²
  • there shall spaces between binary operators and on left side of left parenthesis (unless it is preceded by apostrophe) [easy but tedious] ³
  • there shall be no lines longer than 80 columns
  • all identifiers shall be as self-documenting as reasonably possible (s, ss, ssa? wtf does this even mean?) [one caveat - beside capitalization - don't change values in enumerations]

Also try to use FIXME, TODO and NOTE for (respectively) comments of bugs/problems, things that have to be done, description of non-obvious behavior, as they are easy to search for (with grep for example)

¹ used shall as 'should' is too permissive
² for interested: https://en.wikibooks.org/wiki/Ada_Style_Guide is commonly used quality and style guide for Ada, together with some examples and rationales behind style rules

Attempting translation returns invalid pointer error

Running: bin/words returns:

INFLECTION_ARRAY being loaded   --    1785 entries    --  Loaded correctly
GENERAL Dictionary loading      --   62085 stems      --  Loaded correctly
UNIQUES file loading            --      74 entries    --  Loaded correctly
ADDONS loading 18+11 TACKONS 6+129 PREFIXES 179 SUFFIXES   --  Loaded correctly
Copyright (c) 1993-2006 - Free for any use - Version 1.97FC
For updates and latest version check http://www.erols.com/whitaker/words.htm
Comments? William Whitaker, Box 51225  Midland  TX  79710  USA - [email protected]

Input a word or line of Latin and ENTER to get the forms and meanings
    Or Input @ and the name of a file containing words or lines
    Or Input # to change parameters and mode of the program
    Or Input ? to get help wherever available on individual parameters
Two empty lines (just a RETURN/ENTER) from the keyboard exits the program
English-to-Latin available
~E changes to English-to-Latin, ~L changes back     [tilde E]

=>Test
*** Error in `bin/words': free(): invalid pointer: 0x00007ff538ed76a0 ***
Exception in PARSE_LINE processing Test

=>

comma in the input in Words Online causes an error

go to http://latin.ucant.org/
enter "Ego sum pauper, nihil habeo." or any other string containing a comma
there is an error pop-up saying "Error: Non-Latin character in word"

The same input is handled correctly in the offline version (as compiled from current master).
The full stop itself does not cause any problems - it's in the comma.

web version hangs on lengthy entries

This is a bug with the linked project website; I'm not sure if there's a better place for this.

I've noticed that the online version seems to hang when I try to translate canis. After building Words from the repository and running bin/words, I found that canis, by virtue of having lots of lengthy entries, causes the interactive environment to prompt the user to press ENTER for more. It seems like the online version is doing something similar, causing it to hang. I haven't been able to identify any other Latin words that cause this problem.

For comparison, the CGI version hosted by Notre Dame seems to translate canis just fine.

Add License...

Whitaker's original license said "All parts of the WORDS system, source code and data files, are made freely available to anyone who wishes to use them, for whatever purpose." -- should add this to the README.md and perhaps choose a compatible modern license to go with this (CC0?)

Mostly this issue is just to say how excited that you're making this archival version and perhaps development can continue! thanks! Feel free to close immediately.

Tooltips for definitions of terms

A line such as

 i.am                 V      6 1 FUT  ACTIVE  IND 1 S      Late      sometime

should, in graphical environments, have the option of mousing over "V" and seeing "Verb", S and seeing "singular", "1" for "1st person" etc. To make the usage easier for new users. Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.