Comments (3)
From looking at the table, the strange mapping is not exclusive to German.
Spanish: Ñ/ñ should be ny, not just n. Very obvious. Discarding the tilde is usually understandable in context, but could lead to misunderstandings (or at least.eye-rolling).
Esperanto: Ĉ/Ĵ/Ĝ/Ĥ/Ŝ/Ŭ can be converted with one of two systems. Either replace the letter with its bare form followed by x (Ĝ->Gx, ŭ->ux), or follow with h except for the ŭ (Ĝ->Gh, ŭ->u). Both systems have good and bad points. But simply using the bare Roman letter is almost always wrong, and usually changes the meaning in hilarious ways.
The table might need to be modified for specific use.cases. Which somewhat limits its universality.
from anyascii.
Diaresis/umlaut is used by dozens of languages and encoded identically so a language-neutral approach must be used. Adding the e is too disruptive for all text which is not German. If you know your input is German you may make this replacement yourself before calling anyascii.
See here
from anyascii.
Thank you for providing this context.
Umlaut and diaresis are different things, closer to opposites really. I get your point about the encoding; there are ways to encode the two differently in Unicode, but I suppose most inputs won't do that. In such a case, you can't really know what the right thing to do is. When they're encoded differently, I hope that the replacement takes that into account - or is that too much to hope for?
I don't agree with the conclusion in the linked article, though. The missing "e" is very confusing to Germans - more so to Germans that are not used to going back and forth between it and English. I would go so far as to suggest that this reads very clearly like a non-expert opinion (which the author all but admits he is).
Still, this has brought to light the problem you're facing.
Maybe the right conclusion is that the one-line description "Converts Unicode characters to their best ASCII representation" is wrong. I think there's great value in what the library does - this description, though, assumes a very particular interpretation of the word "best" that I'm certain not everybody shares.
How about changing it to a "Converts Unicode characters to a simple, readable ASCII representation without considering context or language"? It seems to avoid creating false expectations much better.
from anyascii.
Related Issues (13)
- Use CommonJS instead of ES Modules? HOT 2
- Exception logic? HOT 3
- How different from fold-to-ascii? HOT 1
- Create Visual Studio 2019 C/C++ Project of AnyAscii
- Deprecation warning in Python 3.11 HOT 1
- .NET - Potentially large memory allocation at each call HOT 3
- Python package missing py.typed marker HOT 1
- Option for replacement char HOT 5
- Feature: typescript types HOT 2
- Targeting Framework not Supported HOT 6
- Possible to host Java build artifacts on Maven Central? HOT 2
- Could add safe url support?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from anyascii.