Comments (4)
Thinking about it further, maybe the column being 7 instead of 6 is not a bug after all. It depends on what is meant by "column". LT counts "Code Units" in UTF-16 I think, and I expected that it counted "Grapheme clusters" (see definition of those terms here http://www.unicode.org/faq/char_combmark.html). However, this needs to be clarified somewhere in the documentation. Right now meaning of "column" is rather ambiguous in the output in command line or in the output of the --api when it comes to:
- combining characters
- surrogate characters (not fitting in a single 16-bits code unit)
But in any case:
- the fact that command line underlines at the wrong place is a definitely a bug
- and the fact that words with combining chars are not recognized as correct words is also a bug. Hunspell in command line recognizes words with combining chars so this is a bug with LT and not with Hunspell. In fact the French Hunspell file languagetool-language-modules/fr/target/classes/org/languagetool/resource/fr/hunspell contains a rule for such combining char (ICONV eĢ Ć©) but it seems ignored by LT rule HUNSPELL_NO_SUGGEST_RULE.
from languagetool.
The following picture illustrates the 2 issues with Unicode characters caused by LanguageTool inside Vim:
- First line does not use combining char and everything looks fine, incorrect word Foobar is highlighted at correction location (Line 1, column 6)
- Second line uses combining char (U+0065 + U+0301) for the e with acute accent and notice that the first word is not recognized, and the 2nd word Foobar is highlighted at the wrong location (line 2, column 7), it should be (line 2, column 6) i.e. same column as first line.
from languagetool.
We have input and output conversion already in MorfologikSpeller, so we could apply these conversions for ligatures etc. But I think it makes sense to normalize UTF-8 using standard Java code anyway, so that LT would treat everything the same.
from languagetool.
Is there any workaround for unicode output being underlined incorrectly on the command line output?
from languagetool.
Related Issues (20)
- Rejected Word: FortiGate / Abgelehntes Wort: FortiGate
- [en] False Positive COMMA_PERIOD_CONFUSION
- [EN] False positive PCT_SINGULAR_NOUN_PLURAL_VERB_AGREEMENT HOT 3
- German grammar errors are not detected in the self hosted version. HOT 2
- [en] UNIT_SPACE throws an error sometimes during testing HOT 3
- ngrams-en-20150817.zip fails to download HOT 1
- [en] false alarm for shooting box in the compounds rule HOT 1
- [EN] False Positive LOTS_OF_NN HOT 1
- LT server causes 100% CPU usage even idle HOT 3
- Check for "simple" language
- `*_WORD_REPEAT_BEGINNING_RULE` fails for common text line length HOT 4
- [EN] False positive EN_UNPAIRED_BRACKETS HOT 2
- LanguageTool does not recognize the latex command `\newpage`, and in particular, the first command in the document get recognized as a typo. This occurs in Overleaf. HOT 4
- LanguageTool suggests placing a comma before parentheses HOT 1
- Linnea: spelling is correct
- [fr] Incorrect grammar analysis for some nouns incorrectly identified as verbs HOT 5
- Cannot use JLink/JPackage when including Language tool
- Incorrectly flags spaces with Math formulas in LibreOffice.org Writer.
- [pt] multiwords.txt adding new words ā 2024-03-24 HOT 3
- [en] add Organic Maps string
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
š Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ššš
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ā¤ļø Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from languagetool.