Giter VIP home page Giter VIP logo

Comments (4)

dpelle avatar dpelle commented on May 19, 2024 2

Thinking about it further, maybe the column being 7 instead of 6 is not a bug after all. It depends on what is meant by "column". LT counts "Code Units" in UTF-16 I think, and I expected that it counted "Grapheme clusters" (see definition of those terms here http://www.unicode.org/faq/char_combmark.html). However, this needs to be clarified somewhere in the documentation. Right now meaning of "column" is rather ambiguous in the output in command line or in the output of the --api when it comes to:

  • combining characters
  • surrogate characters (not fitting in a single 16-bits code unit)

But in any case:

  • the fact that command line underlines at the wrong place is a definitely a bug
  • and the fact that words with combining chars are not recognized as correct words is also a bug. Hunspell in command line recognizes words with combining chars so this is a bug with LT and not with Hunspell. In fact the French Hunspell file languagetool-language-modules/fr/target/classes/org/languagetool/resource/fr/hunspell contains a rule for such combining char (ICONV eĢ Ć©) but it seems ignored by LT rule HUNSPELL_NO_SUGGEST_RULE.

from languagetool.

dpelle avatar dpelle commented on May 19, 2024

The following picture illustrates the 2 issues with Unicode characters caused by LanguageTool inside Vim:

  • First line does not use combining char and everything looks fine, incorrect word Foobar is highlighted at correction location (Line 1, column 6)
  • Second line uses combining char (U+0065 + U+0301) for the e with acute accent and notice that the first word is not recognized, and the 2nd word Foobar is highlighted at the wrong location (line 2, column 7), it should be (line 2, column 6) i.e. same column as first line.

incorrect-hightlight-combining-char

from languagetool.

milekpl avatar milekpl commented on May 19, 2024

We have input and output conversion already in MorfologikSpeller, so we could apply these conversions for ligatures etc. But I think it makes sense to normalize UTF-8 using standard Java code anyway, so that LT would treat everything the same.

from languagetool.

bml1g12 avatar bml1g12 commented on May 19, 2024

Is there any workaround for unicode output being underlined incorrectly on the command line output?

from languagetool.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.