Giter VIP home page Giter VIP logo

pluvier's Introduction

The Pluvier Manifesto

🇫🇷 Pourquoi cette page est en anglais ?

Même si le projet est porté sur la langue française, le développement de Pluvier se fait en anglais, puisque la plupart des concepts et principes sur lesquels il est basé sont tirés du développement de théories anglophones.

La communauté OpenSteno étant jusqu'ici quasi-exclusivement anglophone, les discussions liées au développement de la théorie, à la génération du dictionnaire et l'aide apportée par les personnes expérimentées dans le domaine ne peut se faire qu'en anglais.

Bien évidemment, les ressources d'apprentissage de la théorie, elles, seront écrites en français.

🇬🇧 Why is this page in English?

Although this project is all about the French language, Pluvier’s development is done in English, since the majority of concepts and principles on which it’s based come from the development of anglophone theories.

Since the OpenSteno community is (for now) almost exclusively anglophone, discussion about the theory’s development, dictionary generation, and the help experienced people might bring can only happen in English.

Of course, the theory’s learning resources will be written in French.

What is Pluvier?

Pluvier aims to be the first real-time friendly, conflict-free steno theory for French, using the standard Ireland layout and a programmatically generated dictionary.

If you don't understand everything in this sentence, you can learn all about steno, theories and the OpenSteno community right here. Here's a very quick TL;DR: Steno is the fastest way to write on a computer. A theory is the set of rules which allow you to write in steno.

Why not Grandjean?

Indeed, steno has existed in French for a long time, and is still used today with the Grandjean system. However, the Grandjean theory was not designed with modern real-time applications in mind, and that means it cannot be used in real time without some extra software magic to disambiguate homonyms. The theory isn't conflict-free.

Additionally, the Grandjean system uses a different, specific layout that isn't compatible with hobbyist steno boards like the Uni, EcoSteno, etc.

Pluvier, much like Plover does for English, will allow anyone with a hobbyist machine, or even just a compatible keyboard, to steno in French thanks to the Plover software.

How does Pluvier work?

Pluvier's dictionary is programmatically generated from a set of rules, applied to a huge database of French words containing phonetic transcription, frequency data, grammatical information, and much more. On top of the generated dictionary, a set of briefs and manually-defined outlines will be added.

A set of rules: The LaSalle theory and a ton of tweaks

Pluvier is mostly based on an existing theory called La méthode LaSalle, developped in the late 80s and still used today in Québec.

LaSalle isn't conflict-free, but it's a really solid basis to start from. It seems to be based on StenEd, like Plover, which means many concepts from the English Plover theory are present and many others can be adapted.

There isn't much about LaSalle to be found, but we do have a 2003 book detailing its rules pretty extensively. The book being copyrighted, it won't be redistributed here but the whole translated set of rules is available. Most of these rules will be reused in Pluvier, some will be modified to better fit with Plover, and some will be completely different.

A huge database: Lexique

Lexique is a collaborative database containing a huge amount of useful data for more than 140.000 French words. Among other things, it details for (almost) every entry the following infomation:

  • Phonetic transcription. This is huge for us, because it allows us to generate an outline based on phonetics without having to infer anything from how the word is written. If you know anything about French, you'll get why this is a huge relief.
  • Frequency information, from either literature, movies, web pages, Twitter... This potentially allows us to determine which word gets priority if homonyms need to be disambiguated.
  • Syllabification in different forms. This is a bit more iffy, because syllabification is a difficult problem to handle (though crucial for steno), but there's some great info in the database about how a word might be chopped down and some other data allowing us to decide how we might want to chop it up. This will be detailed later.
  • Grammatical information. With some of Pluvier's rules being based on the grammatical form of a word, it's useful to know that the word is a noun, an adjective... For verbs, Lexique even provides data about tenses and conjugation. That means for the conjugated form of a verb, we can know that it's the first-person singular present tense form of the indicative mood. Yeah, French is fun.

tThese two resources are the backbone of Pluvier's dictionary generation, allowing us to write a "script" (bit of an understatement) applying the rules of the theory to the Lexique database, spitting out a json dictionary to be used with Plover.

Design objectives

  • Provide a French theory and dictionary for everyday use, using the standard Ireland steno layout.
  • Conflict-free. We should be able to write every word with a different outline.
  • Programmatically generated. Outlines should be generated according to the theory rules, as consistently as possible, and manually defined entries should be as rare as possible outside of briefs.
  • Predictability. Knowing the theory, the user should be able to write out every word phonetically, syllable-by-syllable, except for a few mandatory briefs for some of the most common words, just like Plover (these words being the first ones you learn and the most common ones you'll write, they won't be a problem)
  • Provide some form of syllable-dropping syllabification, à la Plover, to simplify outlines when wanted

Syllabification [WIP]

Here be some blab about how syllables will be chopped up in written-out outlines, taking a lot from Aerick's Lapwing syllabification specs, and talking about how that could be achieved with the CVC info from Lexique

pluvier's People

Contributors

stl74 avatar vermoot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

pluvier's Issues

other word not in tao_la_salle.json

Hello,
I ll put every words i don t find today in this issue ;)

{
"ET": "été"
}

A question : 
/TAS/E = "tassé", but how to have "tasse et" (like "ta tasse et ton thé", so the same letters and same order) ?

Some words from the book are not in `tao_la_salle.json`

Hello,
/S*ES is "zest" instead of "c'est cette" (as explain in "la tao cropped")

EDIT:
1)
/TR is "terre", /T-R is "interest".
/TR should be "intérêt", and /TR should be "terre"

/SR is "serre",
/SR should be "sur"
but :
/SUR is "sûr", and /S*UR is "sur".
I think /SUR should be "sur", /S*UR should be "sûr", and let /SR for "serre". What do you think ?

I discovers that after a verb or a noun, /-S makes an ending "s". That's great. But in a phrase like /TU/AU/-S, that's "tu a eus". Not great. It should be "tu as eu". (for now, I have not began to do words in many syllabes, so I don't really know how it works : I want to propose you /AU/-S is "as eu", but I will when I'll be into).

{
"TR": "intérêt",
"T-R": "terre",
"SUR": "sur",
"S*UR": "sûr"
}

EDIT FROM VERMOOT: I took the liberty of editing your comment to add backtick in some places where they were needed for readability. As a general rule try to write any steno outlines between backticks ;)

Wrote-out numbers are figures in `tao_la_salle.json`

In #2, @TomT-homas wrote:

And then, the number’s problem : they are in figures, not in letters, for those I’ve tried so far…

{
"deux": "TKAO"
"trois": "/TROEUZ"
"quatre": "/KATS"
"huit": "/AUT"
"douze": "/TKOUZ"
"quatorze": "/KORZ"
"seize": "SAEUZ"
"vingt": "VR-"
}

I think that might be a mistake on Ted's part, writing them in figures because of pages 81-82, which writes them as figures... I guess to save page space? From what I'm seeing elsewhere they should actually be the written-out versions (vingt-deux as opposed to 22).

Proposed briefs

The theory will require some briefs to be added in addition to the ones found in tao_la_salle.json. Here is where we'll keep track of brief suggestions, discuss them and possibly approve them.

My thought is we could have a separate pluvier_briefs.json file with translations that can be added to the generated dictionary, much like the tao_la_salle.json dictionary (which should probably eventually be merged with pluvier_briefs.json, separating the actual briefs from the theory examples from the book).

How to install Pluvier?

Hello,
I'm trying to figure out how to install Pluvier on Plover.

The plover -s plover_plugins install git+https://github.com/Vermoot/Pluvier command doesn't work, and the project is not listed by plover's plugin manager.

If I want to test Pluvier, how can I go about installing it ?

Thanks

Usage of GLÀFF with Lexique?

GLÀFF is a project similar to Lexique. The main difference is its size: it contains over 1.4 million entries, ten times larger than Lexique, with very similar information such as its entries’ phonetics (IPA and X-SAMPA, both with partial syllabification), lemma (“aériez” has “aérer” as its lemma), frequency (several metrics are available), and a very precise morphosyntactic description of each entry.

The only drawback I would see regarding using GLÀFF instead of Lexique is that some entries of Lexique are not present in GLÀFF, such as multiple words considered as one like a priori. This is why I think GLÀFF could instead extend data provided by Lexique.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.