Giter VIP home page Giter VIP logo

Comments (7)

zerocrates avatar zerocrates commented on July 30, 2024

I only personally noted this problem with escapeHtmlAttr, but it looks like the JS and CSS escapers use similar algorithms and would have the same problem.

from zend-escaper.

zerocrates avatar zerocrates commented on July 30, 2024

Yes, escapeJs and escapeCss also have the same problem. Using the same example as above:

  • escapeJs returns '\uD83CDF65' instead of the correct '\uD83C\uDF65'
  • escapeCss returns '\D83CDF65 ' instead of the correct '\1F365 '

from zend-escaper.

marc-mabe avatar marc-mabe commented on July 30, 2024

@zerocrates Doesn't has UTF-32 the same/similar issue with combining characters (https://en.wikipedia.org/wiki/Combining_character) same as all other unicode encodings as the encodings describes how e unicode code point is represented and not how a character is represented?

from zend-escaper.

zerocrates avatar zerocrates commented on July 30, 2024

You're right to say that UTF-32 and UTF-16 don't treat combining characters differently, but that's not the basis of the problem here.

The problem here is with supplementary characters (those above U+FFFF), not combining characters. For supplementary characters, UTF-16 uses a surrogate pair to represent a single codepoint, while UTF-32 does not.

from zend-escaper.

zerocrates avatar zerocrates commented on July 30, 2024

Just for confirmation purposes I tried out the escapers with a simple combining-character example and they seem to all be fine. When you have input with a regular ASCII "e" followed by a combining accent, you get that same sequence back out from the escaper, the "e" untouched followed by the escaped combining character.

You could use Normalizer to apply the NFC algorithm and guarantee precomposed output, but I think that would be unexpected and it would also mean requiring the intl extension for the escapers to work, which seems like a bridge too far. Users who need or want normalization can still use Normalizer themselves on the input.

Just correctly escaping codepoints seems like the proper focus for the escapers, and that's what this issue and my pull request aim at.

from zend-escaper.

zerocrates avatar zerocrates commented on July 30, 2024

I'd appreciate some response on this issue and/or the associated PR.

This is a pretty serious issue for anybody using emoji or many less-common CJK characters. It's also not something an user of the framework can easily work around due to the use of the misbehaving escapers in other view helpers (in particular, escapeHtmlAttr is used all over the place).

from zend-escaper.

roelvanduijnhoven avatar roelvanduijnhoven commented on July 30, 2024

Just learned that ZF2 out of the box does not work well with Emoji's! Zend\Form fails to properly show them, indeed due to escapeHtmlAttr.

The underlying htmlAttrMatcher uses ord to check for their ASCII character. From what i read ord is in no way able to handle multibyte characters and is thus not able to parse UTF-8.

Thus escaping UTF-8 strings is bugged. Seems like a serious issue. Have too little knowledge to contribute however.

from zend-escaper.

Related Issues (14)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.