Giter VIP home page Giter VIP logo

eldamotoanki's Introduction

Eldamo To Anki

Takes the marvellous wordlist from eldamo.org and converts it into input digestable by Anki.

Usage

The decks for Neo-Quenya and Neo-Sindarin based on the lists in this repository can be found on Anki (unless they get deleted because they do not receive enough downloads).

Some lists can be found in the output folder of this repository. They are ready to be imported. They do not include any names or phrases. The Neo-Quenya and Neo-Sindarin lists do not include deprecated words.

The lists are:

Thanks to the very structured input data curated by Paul Strack, it is extremely easy to add more languages to that list. Just drop me an issue and I'll do that for you.

If you want to curate your own version of a list you can use the generate.py script to do that. It is called from the command line via:

python3 generate.py <language>

Depending on your Python install, the first command may be py or python instead.

For the <language> argument, type the name of the language, or its id (usually its first letter).

You can add optional arguments:

  • --neo: Assemble Neo-Eldarin lists, drawing from words invented by Tolkien from the 1930s onwards, as well as fan-invented words.
  • --individual-names: Include names of individuals and places.
  • --collective-names: Include names for collective people.
  • --proper-names: Include proper names.
  • --phrases: Include phrases.
  • --include-origin: Include the linguistic origin of the word in the card.
  • --include-deprecated: Include words that Paul Strack has marked as deprecated in neo lists.
  • --check-for-updates: Forces a re-download of the Eldamo database.
  • --verbose: Print more output.

You can check out the generate_all.sh script for example usages.

Neo-Quenya draws from words from (Late) Quenya, Middle Quenya, and fan inventions.

Neo-Sindarin draws from words from Sindarin, Noldorin, and fan inventions.

Design Decisions

In the simplest case, the generated words are given without any further adjustments for the Tolkienian language. The English translation lists the part of speech:

corto|circle (n)

costaima|debatable (adj)

If a word is listed with a dedicated word stem, that stem is appended in parentheses:

oron (oront-)|mountain (n)

Some words have several translation. The script tries to make the Tolkienian side unique, first by checking if the part of speech will do that:

cuiva (adj)|awake (adj)

cuiva (n)|animal (n)

If this does not suffice to make the words unique, and if a category is provided for that word, then the latter is used instead:

au (Mind and Thought)|if only (adv)

au (Spatial Relations)|away, off, not here (of position) (adv)

Finally, if this doesn't help as well, the translations are merged into one:

hyarna|compact, compressed; southern (adj)

This last step is also true for English words with several Tolkienian translations:

artatúrë; ohérë|government (n)

Some Tolkienian words are listed with variant versions. The script recognises this and treats them as a single word, so the inputs and (a)lá are listed as one:

(a)lá|yes (interj)

Some English translations are prepended with the marker * or ?, denoting some uncertainty. These markers are retained, unless the word is listed more than once, and at least one translation does not have this marker:

canya-|?to command (vb)

Several words are provided with additional information on the spelling in Tengwar, if it deviates from the default. This information is appended in brackets:

isilmë [þ]|moonlight (n)

nairë [ñ-]|space (as a physical dimension) (n)

Special Treatment for Quenya

The list also contains some archaïc words which still incorporate the old spelling. To reduce duplicated information, the script recognises these and derives the Tengwar annotations. Since this treatment needs to happen on a per language and per sound basis, it is currently implemented only for my personal use-case (Neo-)Quenya. The relevant linguistic information is taken from the Eldamo Quenya course.

Any þ is replaced with s, so minaþurië becomes:

minasurië [þ]|enquiry (n)

Initial ñ- is replaced with n-, turning ñwalmë into:

nwalmë [ñ-]|torment (n)

The rules for w are more complicated. Any w following a consonant or the diphthongs ai or oi is retained, any other w is replaced with v.

Because the archaïc w-origin of v is not represented in Tengwar, it is also not included in the output:

artanwa|award (n)

maiwë|gull (n)

oiwa|glossy (adj)

lassevinta|leaf fall, autumn, *(lit.) leaf blowing (n)

vilya|air, sky (n)

The latin transcription of Quenya words changed throughout Tolkiens life. The script makes several replacements to normalise the spelling:

  • kw and standalone q become qu.
  • ks becomes x.
  • k in other positions becomes c.
  • the non-diphthong vowel combinations are spelled ëa, ëo, and öa.
  • a trailing e becomes ë.

Acknowledgments

Almost all the credit here goes to Paul Strack, maintainer of the Eldamo website and database. They gathered all canonical Tolkienian words in one place, collected thousands of fan-made extensions, and organise it all in the structured xml format. Finding this database made writing this script pure bliss.

eldamotoanki's People

Contributors

thecomamba avatar

Watchers

 avatar

eldamotoanki's Issues

Can we include the "th" sound in the stem?

Thw word "míse" as far as I know stems from "míthe", and the "th" is still retained in its Tengwar spelling. Is this information included in the XML, and if so, can I include it in the cards?

Remove deprecated words

As explained here, words marked with a ⚠️ are deprecated.
Acceptance criteria:

  • ahtar should not translate to avenge
  • accar should not appear in list

Detect variant spellings

Compare
(1) taile; (2) tailë
and possibly
áéíóúý
vs
âêîôûŷ
as well as
aeiouy
vs the trema variants

Once again: Reduce duplications

Searching the Quenya wordlist for "(1)" reveals several English translations that are factually identical.
For example:
anta
astalda
caita-
...

Update script

Anki is a bit clumsy in updating the cards.

  1. Notes -> Export Notes -> Notes in Plain Text -> Include unique identifier
  2. Go through output and split each line at | to find front and back.
  3. Go through export and for each line split at tab to find GUID, front and back.
  4. If front and back are the same as before, do nothing.
  5. If front is same but back is different, replace back.
  6. If back is same but front is different, replace front.
  7. If front and back cannot be found, the card has probably been deleted. Print that info.
  8. Print all output cards that cannot be found, because they need to be created.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.