Comments (7)
I only personally noted this problem with escapeHtmlAttr
, but it looks like the JS and CSS escapers use similar algorithms and would have the same problem.
from zend-escaper.
Yes, escapeJs and escapeCss also have the same problem. Using the same example as above:
- escapeJs returns
'\uD83CDF65'
instead of the correct'\uD83C\uDF65'
- escapeCss returns
'\D83CDF65 '
instead of the correct'\1F365 '
from zend-escaper.
@zerocrates Doesn't has UTF-32 the same/similar issue with combining characters (https://en.wikipedia.org/wiki/Combining_character) same as all other unicode encodings as the encodings describes how e unicode code point is represented and not how a character is represented?
from zend-escaper.
You're right to say that UTF-32 and UTF-16 don't treat combining characters differently, but that's not the basis of the problem here.
The problem here is with supplementary characters (those above U+FFFF), not combining characters. For supplementary characters, UTF-16 uses a surrogate pair to represent a single codepoint, while UTF-32 does not.
from zend-escaper.
Just for confirmation purposes I tried out the escapers with a simple combining-character example and they seem to all be fine. When you have input with a regular ASCII "e" followed by a combining accent, you get that same sequence back out from the escaper, the "e" untouched followed by the escaped combining character.
You could use Normalizer to apply the NFC algorithm and guarantee precomposed output, but I think that would be unexpected and it would also mean requiring the intl extension for the escapers to work, which seems like a bridge too far. Users who need or want normalization can still use Normalizer themselves on the input.
Just correctly escaping codepoints seems like the proper focus for the escapers, and that's what this issue and my pull request aim at.
from zend-escaper.
I'd appreciate some response on this issue and/or the associated PR.
This is a pretty serious issue for anybody using emoji or many less-common CJK characters. It's also not something an user of the framework can easily work around due to the use of the misbehaving escapers in other view helpers (in particular, escapeHtmlAttr
is used all over the place).
from zend-escaper.
Just learned that ZF2 out of the box does not work well with Emoji's! Zend\Form
fails to properly show them, indeed due to escapeHtmlAttr
.
The underlying htmlAttrMatcher
uses ord
to check for their ASCII character. From what i read ord
is in no way able to handle multibyte characters and is thus not able to parse UTF-8.
Thus escaping UTF-8 strings is bugged. Seems like a serious issue. Have too little knowledge to contribute however.
from zend-escaper.
Related Issues (14)
- Check For Blockquotes In Docs
- Provide View Helper? HOT 2
- [ZF2] placeholder, value attributes in form doesn't show cyrillic HOT 8
- Escaper - htmlAttrMatcher gives a wrong encoding HOT 7
- Escaper should use [:alnum:] instead [a-z0-9] HOT 1
- Attribute escaping HOT 5
- HTML escaping forward slash HOT 2
- Add PHP 7.2 support
- Documentation for the "Good" URL escaping is still not good HOT 1
- Check All Headers In Documentation HOT 1
- Check Documentation Code Blocks HOT 1
- Check Documentation For Other Things
- Check Documentation Tables HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from zend-escaper.