Giter VIP home page Giter VIP logo

Comments (7)

jasoniangreen avatar jasoniangreen commented on May 25, 2024

Hi there and thanks for reaching out.

'विकी मेड मेडिकल इनसाइक्लोपीडिया हिंदी में'.length reports as 41 which I think is due to the multibyte characters required to write Hindi. I don't think that this is something that should be supported by the AJV core library but given the extensibility of AJV you could write your own keywords to correctly handle this text the way you think it should.

edit: actually it is not about unicode pair characters (which are counted as a single character by AJV, it seems to be related to how multiple characters, particularly accents, are grouped together in Hindi?

For example, look at the result of 'विकी मेड मेडिकल इनसाइक्लोपीडिया हिंदी में'.split('')

(41) ['व', 'ि', 'क', 'ी', ' ', 'म', 'े', 'ड', ' ', 
'म', 'े', 'ड', 'ि', 'क', 'ल', ' ', 'इ', 'न', 'स', 'ा', 
'इ', 'क', '्', 'ल', 'ो', 'प', 'ी', 'ड', 'ि', 'य', 
'ा', ' ', 'ह', 'ि', 'ं', 'द', 'ी', ' ', 'म', 'े', 'ं']

from ajv.

epoberezkin avatar epoberezkin commented on May 25, 2024

there is unicode option (deprecated, probably) that determines how length is computed.

from ajv.

epoberezkin avatar epoberezkin commented on May 25, 2024

https://github.com/ajv-validator/ajv/blob/master/lib/vocabularies/validation/limitLength.ts#L25

from ajv.

epoberezkin avatar epoberezkin commented on May 25, 2024

it's on by default (it does not use length), and if it's not working correctly, it needs fixing

https://github.com/ajv-validator/ajv/blob/master/lib/runtime/ucs2length.ts

from ajv.

jasoniangreen avatar jasoniangreen commented on May 25, 2024

Ok, I will have a look

from ajv.

jasoniangreen avatar jasoniangreen commented on May 25, 2024

Hi @kelson42 after discussing with EP we have decided that this is not something that we will be fixing within the core AJV library.

This problem is due to the multi-glyph characters that make up this Devanagari charset and no doubt many other languages. A single character like वि is actually made up of multiple characters and 'ि (notice the dotted line circle that shows how this character interacts with others). These are called grapheme clusters.

From just inspecting the characters there is no metadata that will tell us which chars are part of a grapheme cluster and should therefore be counted as 1. For this reason we cannot put this logic into AJV as it would require a lot of bespoke code to cover all multi-glyph charsets.

This doesn't stop you from solving this problem yourself using custom keywords, you could even publish the solution for others, but it doesn't belong in the AJV code base.

I will however document this issue and I thank you again for bringing it to our attention.

edit: to add a link to the spec on grapheme clusters

from ajv.

kelson42 avatar kelson42 commented on May 25, 2024

@jasoniangreen @epoberezkin Thank you for considering my issue and for your advices. For the record, here how I have fixed the problem.

from ajv.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.