unitexgramlab / unitex-doc-usermanual Goto Github PK
View Code? Open in Web Editor NEW:page_facing_up: Unitex/GramLab User's Manual
Home Page: https://unitexgramlab.org
License: Other
:page_facing_up: Unitex/GramLab User's Manual
Home Page: https://unitexgramlab.org
License: Other
Options -b and -z in names of dictionary-graphs are described in Section 3.8.3 under 'Exporting produced entries as a morphological-mode dictionary' but should be exemplified to facilitate their use.
Also, the use of option -z is an exception to the rule mentioned in Section 3.8.1 'The order in which dictionaries with the same priority are applied does not matter.' The order of application mentioned in Section 3.8.3 'when other dictionary graphs are applied later' is not only the order determined by priority rules, but also the order of occurrence of the dictionary names in the command line.
Finally, the manual should specify whether the name of a morphological dictionary graph can invoke options -b and -z.
Maxence Robin made changes in the French manual (chapters 5 and 13) in pull request #17.
Make equivalent changes in the English manual.
Users of the Unitex IDE can now change the font and size of menu characters through the Info > Preferences > General menu. This setting affects the Config file.
This feature is still undocumented.
In Section 7.2.2, "This grammar has to be called Norm.fst2 and must
be placed in your working directory, in the subdirectory /Graphs/Normalization of the
language" is not true anymore. Now the grammar can be named otherwise and placed in another directory.
Users of the Unitex IDE can set dictionary lookup to be tolerant to vowel omission in Arabic.
This feature is still undocumented.
Describe in more detail how to add a new language : which directories, which files, which constraints, how to submit.
We should mention in the manual the behaviour reported by Denis Maurel and Maxence Robin on September 10, 2018. When a brace-enclosed lexical tag occurs in the text, and when Locate Pattern tries to match this lexical tag with a query, the program considers the lexical tag as a token, which means the lexical tag can match a token in the query, but not a sequence of several tokens in the query. Therefore, if the inflected form in the lexical tag is multiword, it won't match an identical multiword form in the query. This behaviour is not anticipated by the user, because usually a sequence of tokens in a query matches an identical sequence of tokens in the text. A trick to circumvent this feature is to insert the multiword form in the query in the form of a lemma.
Document the 'match word boundaries' option in Preferences > Language (so that nowhere and now here don't match for the automaton-intersection search algorithm)
The "No separator normalization" option in the preprocessing dialog box is not documented yet. This option is unchecked by default. When it is checked:
- boxes with whitespace separators in this graph recognize whitespace separators in the input text even if they are several in a row;
- a transition between two boxes in this graph does not recognize a sequence of several spaces.
Document how to copy the list of subgraphs called by a graph (cf. Section 5.2.2 and commit UnitexGramLab/gramlab-ide@172cbb1)
Some users are confused about the tools to be used to handle dictionary-graphs: graph tools to construct the dictionary-graph, but dictionary tools to use it.
This question might be documented more precisely.
The manual says a box invoking a subgraph cannot have an output (Section 6.2.3). Apparently this is not true anymore after pull request UnitexGramLab/unitex-core#19 of unitex-core (26 July 2016).
The end of Subsection Input Variables (5.2.5) mentions testing if a variable has been set and refers to Subsection 6.7.5 (Transducer output with variables): it should refer to Subsection 6.9.1 (Testing variables), which documents this topic. The reference to section-variables
should be replaced by a reference to Subsection 6.9.1.
There is a spelling error in 12-cassys_FR_utf8.tex: doit être répéter.
I am declaring this issue as the beginning of a test for editing the user manual. I am new to GitHub and GitHub Desktop.
Align the English and French versions of the doc of DumpOffsets (chapter 14). Some information is missing either in one of the two versions or in the other
Unitex/GramLab has never accepted a <MIX> lexical mask. It is ignored when it occurs in a local-grammar graph. This lexical mask probably existed in Intex but was not retained by Sébastien during the implementation of Unitex. I am not sure what it used to mean. In the <MIX> topic on the users' forum (13 October 2015), no users argued in favour of a <MIX> lexical mask. For consistency we should replace the figures with graphs containing <MIX> in the doc. Denis Maurel provided a version of the French sentence-splitting graph without <MIX> on 24 May 2018.
Document the tag filtering button in the table display of the text automaton (cf. Section 7.8 and commit UnitexGramLab/gramlab-ide#72)
Document how to name a .txt or .info file documenting a dictionary (cf. Section 15.8.3 and commit UnitexGramLab/gramlab-ide@f0fd037)
It is necessary to update the User Manual, section 6.5, Exploring grammar paths. The option "Do not explore subgraphs recursively" is now "Explore subgraphs independently".
The DELAS format should be documented in the same detail as the DELAF format. The specification should ensure that inflecting a legal DELAS produces a legal DELAF.
Document the Find and Replace button on the text automaton (cf. pull request UnitexGramLab/gramlab-ide#68)
Document how to open a subgraph with right-click, cf. pull request UnitexGramLab/gramlab-ide#2.
Document opening curly brace ('{') as special symbol in graphs in Section 5.2.7 on special symbols
Document keyboard shortcuts to menu elements (cf. commit UnitexGramLab/gramlab-ide@656c760)
The graph editor has now 'find and replace' functionality developed by phmz (issues #24 and #26 of gramlab-ide).
phmz has also documented this feature and I promised him to integrate his document in the user's doc.
Mention that contexts are allowed in morphological dictionary-graphs, provided they don't occur in the morphological mode. The manual only mentions that they are forbidden in morphological mode.
In addition to documenting the Unitex GUI and the commands, the user manuel should also describe the use of Unitex/GramLab through scripts.
Document how to save a processed text (cf. commit UnitexGramLab/gramlab-ide@d1f8780)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.