Giter VIP home page Giter VIP logo

unitex-doc-usermanual's People

Contributors

clmartineau avatar denismaurel avatar eric-laporte avatar gvollant avatar kalkhas avatar martinec avatar maxencerobin avatar mukarr avatar nathwhy avatar nikhilgupta23 avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

unitex-doc-usermanual's Issues

Exemplify options -b and -z in names of dictionary-graphs

Options -b and -z in names of dictionary-graphs are described in Section 3.8.3 under 'Exporting produced entries as a morphological-mode dictionary' but should be exemplified to facilitate their use.
Also, the use of option -z is an exception to the rule mentioned in Section 3.8.1 'The order in which dictionaries with the same priority are applied does not matter.' The order of application mentioned in Section 3.8.3 'when other dictionary graphs are applied later' is not only the order determined by priority rules, but also the order of occurrence of the dictionary names in the command line.
Finally, the manual should specify whether the name of a morphological dictionary graph can invoke options -b and -z.

small anachronism in section about normalization

In Section 7.2.2, "This grammar has to be called Norm.fst2 and must
be placed in your working directory, in the subdirectory /Graphs/Normalization of the
language" is not true anymore. Now the grammar can be named otherwise and placed in another directory.

Locate doesn't match polylexical tags on tagged texts

We should mention in the manual the behaviour reported by Denis Maurel and Maxence Robin on September 10, 2018. When a brace-enclosed lexical tag occurs in the text, and when Locate Pattern tries to match this lexical tag with a query, the program considers the lexical tag as a token, which means the lexical tag can match a token in the query, but not a sequence of several tokens in the query. Therefore, if the inflected form in the lexical tag is multiword, it won't match an identical multiword form in the query. This behaviour is not anticipated by the user, because usually a sequence of tokens in a query matches an identical sequence of tokens in the text. A trick to circumvent this feature is to insert the multiword form in the query in the form of a lemma.

"No separator normalization" option undocumented

The "No separator normalization" option in the preprocessing dialog box is not documented yet. This option is unchecked by default. When it is checked:

  • Sequences of several whitespace characters in the input text are not simplified in the output of the preprocessing
  • If the dialog box applies a preprocessing graph in REPLACE mode,
  • boxes with whitespace separators in this graph recognize whitespace separators in the input text even if they are several in a row;
  • a transition between two boxes in this graph does not recognize a sequence of several spaces.

Doc lacks precision on dictionary-graph tools

Some users are confused about the tools to be used to handle dictionary-graphs: graph tools to construct the dictionary-graph, but dictionary tools to use it.
This question might be documented more precisely.

wrong reference

The end of Subsection Input Variables (5.2.5) mentions testing if a variable has been set and refers to Subsection 6.7.5 (Transducer output with variables): it should refer to Subsection 6.9.1 (Testing variables), which documents this topic. The reference to section-variables should be replaced by a reference to Subsection 6.9.1.

spelling error

There is a spelling error in 12-cassys_FR_utf8.tex: doit être répéter.
I am declaring this issue as the beginning of a test for editing the user manual. I am new to GitHub and GitHub Desktop.

Remove <MIX> from doc

Unitex/GramLab has never accepted a <MIX> lexical mask. It is ignored when it occurs in a local-grammar graph. This lexical mask probably existed in Intex but was not retained by Sébastien during the implementation of Unitex. I am not sure what it used to mean. In the <MIX> topic on the users' forum (13 October 2015), no users argued in favour of a <MIX> lexical mask. For consistency we should replace the figures with graphs containing <MIX> in the doc. Denis Maurel provided a version of the French sentence-splitting graph without <MIX> on 24 May 2018.

Document the DELAS format

The DELAS format should be documented in the same detail as the DELAF format. The specification should ensure that inflecting a legal DELAS produces a legal DELAF.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.