Comments (5)
To me it seems like "D\u00C4\u0085\u00C5\u009Bl\u00C4\u0099\u00C5\u00BCyn\u00C3\u00B3w,Oslo,Stockholm" is the correct JSON encode for "Dąślężynów,Oslo,Stockholm". I don't understand why Cucumber is sending rubbish back. I'll look into it possibly later today.
from cucumber-cpp.
Actually I think this encoding string is already wrong - look at the number of encoded chars.
F.e. the letter 'ą' which is 'LATIN SMALL LETTER A WITH OGONEK' according to http://www.utf8-chartable.de/unicode-utf8-table.pl?start=256 and it's codepoint should be u0105, gets interpreted as 2 separate chars u00C4 and u0085.
This leads me to thinking that the problem appears maybe when reading unicode text from iostream or sth.
Basically the iswprint() function call does not recognize 'ą' and similar as printable character.
from cucumber-cpp.
You are right: it should have been "D\u0105..."
from cucumber-cpp.
The bug looks like a Json Spirit issue in dealing with unicode characters when using 8bit characters. This test shows the library behavior:
TEST(JsonSpiritTest, handlesUnicodeOnlyIfWideChars) {
EXPECT_EQ(L"\"\\u9EC4\\u74DC\"", json_spirit::write_string(wmValue(L"\u9EC4\u74DC"), false));
EXPECT_NE("\"\\u9EC4\\u74DC\"", json_spirit::write_string(mValue("\u9EC4\u74DC"), false));
}
Unfortunately the JSON serialization code in CukeBins is ugly, so I'll try to refactor it while fixing the bug.
from cucumber-cpp.
Done a few tests. The components that might have problems are the wire protocol codec (currently using JSON Spirit) and the regular expression matcher. C++ support for unicode and regular expressions has been standardized only with C++0x that is still not an option. In C++03 source code encoding is ASCII only, so even CukeBins regular expressions should be encoded using the \u escape character and wide strings. Please note that MSVC is an example of compiler where wchar_t is 16 bits, so using wide strings would not solve the problem. My proposal for the moment is to treat every char (8-bit) sequence as UTF-8, handled by MSVC and GCC, and...
JSON Spirit
- convert every string to wchar_t before decoding or encoding
- if unicode support is disabled, fail on non-ASCII codes
Boost 1.48+ comes with the new Locale library that handles UTF quite well. I still haven't come to a conclusion on how to deal with the conversion without ICU or Boost Locale. I might introduce a new dependency from ICU for full unicode support (with any Boost version) or Boost 1.48+ without ICU for partial support.
Edit: since JSON is encoded in UTF-16/UCS-2 like JavaScript, and since we don't care about counting, surrogate pairs should not be a problem, so I removed the case where wchar_t can't hold UTF-32 code points.
Regex
- use boost::u32regex if Boost is compiled with ICU support
- use boost::wregex if wchar_t is 16 bits and fail on surrogate pairs
- use boost::regex and fail on non-ASCII codes
Here there is a brief explanation of Boost Regex unicode support.
from cucumber-cpp.
Related Issues (20)
- Rename master branch to main HOT 1
- Add Conan support for cucumber-cpp HOT 1
- [FEATURE] comparing result of performances (upload data in the report) HOT 2
- sudo cmake --build /home/yizhu/cucumber-cpp/ --target features, this command get errors HOT 1
- Is there a working example of cucumber (c cmake, boost) for c++? HOT 5
- Failed tests when checking implementation against common cucumber test suite HOT 2
- README.md has dead link HOT 1
- request a new release HOT 2
- nlohmann-json I think breaks the installation HOT 2
- cucumber.wire ignored (older versions don't) HOT 2
- Update Conformance / Use u2d versions HOT 4
- Minimal CI for Windows
- Full fledged CI for Windows
- Confusing script name run-all.sh HOT 1
- Getting rid of ruby dependency HOT 6
- QTestDriver Implementation not working on Windows HOT 1
- Testing Qt Code which requires EventHandling and UI HOT 3
- Compilation and Library Use HOT 2
- Please add pkg-config support for use with GNU AutoTools (was #237 not merged by mistake?) HOT 13
- Outdated links in readme HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cucumber-cpp.