unitexgramlab / unitex-doc-usermanual Goto Github PK

View Code? Open in Web Editor NEW

3.0 15.0 13.0 42.95 MB

:page_facing_up: Unitex/GramLab User's Manual

Home Page: https://unitexgramlab.org

License: Other

TeX 81.28% Shell 0.31% Perl 18.41%

unitexgramlab manual

unitex-doc-usermanual's People

Contributors

Stargazers

Watchers

Forkers

cwu03 wuxuan632 nikhilgupta23 ml-0613 4-4-4-4 fyh828 mukarr denismaurel kalkhas maxencerobin eradan94 little-min eric-laporte

unitex-doc-usermanual's Issues

Exemplify options -b and -z in names of dictionary-graphs

Options -b and -z in names of dictionary-graphs are described in Section 3.8.3 under 'Exporting produced entries as a morphological-mode dictionary' but should be exemplified to facilitate their use.
Also, the use of option -z is an exception to the rule mentioned in Section 3.8.1 'The order in which dictionaries with the same priority are applied does not matter.' The order of application mentioned in Section 3.8.3 'when other dictionary graphs are applied later' is not only the order determined by priority rules, but also the order of occurrence of the dictionary names in the command line.
Finally, the manual should specify whether the name of a morphological dictionary graph can invoke options -b and -z.

Make in English manual changes made in French manual

Maxence Robin made changes in the French manual (chapters 5 and 13) in pull request #17.
Make equivalent changes in the English manual.

Undocumented feature: change font and size of menu characters

Users of the Unitex IDE can now change the font and size of menu characters through the Info > Preferences > General menu. This setting affects the Config file.
This feature is still undocumented.

small anachronism in section about normalization

In Section 7.2.2, "This grammar has to be called Norm.fst2 and must
be placed in your working directory, in the subdirectory /Graphs/Normalization of the
language" is not true anymore. Now the grammar can be named otherwise and placed in another directory.

Undocumented feature: dictionary lookup can be set to be tolerant to vowel omission in Arabic

Users of the Unitex IDE can set dictionary lookup to be tolerant to vowel omission in Arabic.
This feature is still undocumented.

Describe in more detail how to add a new language

Describe in more detail how to add a new language : which directories, which files, which constraints, how to submit.

Locate doesn't match polylexical tags on tagged texts

We should mention in the manual the behaviour reported by Denis Maurel and Maxence Robin on September 10, 2018. When a brace-enclosed lexical tag occurs in the text, and when Locate Pattern tries to match this lexical tag with a query, the program considers the lexical tag as a token, which means the lexical tag can match a token in the query, but not a sequence of several tokens in the query. Therefore, if the inflected form in the lexical tag is multiword, it won't match an identical multiword form in the query. This behaviour is not anticipated by the user, because usually a sequence of tokens in a query matches an identical sequence of tokens in the text. A trick to circumvent this feature is to insert the multiword form in the query in the form of a lemma.

Document the 'match word boundaries' option

Document the 'match word boundaries' option in Preferences > Language (so that nowhere and now here don't match for the automaton-intersection search algorithm)

"No separator normalization" option undocumented

The "No separator normalization" option in the preprocessing dialog box is not documented yet. This option is unchecked by default. When it is checked:

Sequences of several whitespace characters in the input text are not simplified in the output of the preprocessing
If the dialog box applies a preprocessing graph in REPLACE mode,

boxes with whitespace separators in this graph recognize whitespace separators in the input text even if they are several in a row;

a transition between two boxes in this graph does not recognize a sequence of several spaces.

Document how to copy list of called subgraphs

Document how to copy the list of subgraphs called by a graph (cf. Section 5.2.2 and commit UnitexGramLab/gramlab-ide@172cbb1)

Doc lacks precision on dictionary-graph tools

Some users are confused about the tools to be used to handle dictionary-graphs: graph tools to construct the dictionary-graph, but dictionary tools to use it.
This question might be documented more precisely.

Anachronism about outputs on boxes invoking subgraphs

The manual says a box invoking a subgraph cannot have an output (Section 6.2.3). Apparently this is not true anymore after pull request UnitexGramLab/unitex-core#19 of unitex-core (26 July 2016).

wrong reference

The end of Subsection Input Variables (5.2.5) mentions testing if a variable has been set and refers to Subsection 6.7.5 (Transducer output with variables): it should refer to Subsection 6.9.1 (Testing variables), which documents this topic. The reference to section-variables should be replaced by a reference to Subsection 6.9.1.

spelling error

There is a spelling error in 12-cassys_FR_utf8.tex: doit être répéter.
I am declaring this issue as the beginning of a test for editing the user manual. I am new to GitHub and GitHub Desktop.

Align English and French versions of DumpOffsets doc

Align the English and French versions of the doc of DumpOffsets (chapter 14). Some information is missing either in one of the two versions or in the other

Remove <MIX> from doc

Unitex/GramLab has never accepted a <MIX> lexical mask. It is ignored when it occurs in a local-grammar graph. This lexical mask probably existed in Intex but was not retained by Sébastien during the implementation of Unitex. I am not sure what it used to mean. In the <MIX> topic on the users' forum (13 October 2015), no users argued in favour of a <MIX> lexical mask. For consistency we should replace the figures with graphs containing <MIX> in the doc. Denis Maurel provided a version of the French sentence-splitting graph without <MIX> on 24 May 2018.