Giter VIP home page Giter VIP logo

anyascii / anyascii Goto Github PK

View Code? Open in Web Editor NEW
251.0 251.0 24.0 78.11 MB

Unicode to ASCII transliteration - C Elixir Go Java JS Julia PHP Python Ruby Rust Shell .NET

Home Page: https://anyascii.com

License: ISC License

Kotlin 38.44% Perl 0.31% Java 11.12% Python 4.52% JavaScript 3.14% Rust 6.17% Go 3.23% Ruby 4.20% Shell 3.44% C# 4.32% PHP 3.36% Julia 3.34% C 8.16% CMake 0.44% Elixir 5.83%
ascii emoji normalization romanization slug transliteration unicode unidecode utf8

anyascii's Issues

Use CommonJS instead of ES Modules?

Thank you for this awesome library.

A question: ES modules are great but still not exactly widely supported. They cause all kinds of issues with node and related tooling, such as Jest and TypeScript. Would it be possible to make the block.js import in the JS version use a require() instead?

Option for replacement char

Instead of removing characters that can't be translated, it'd be nice to have an option to replace them with a character.

For some languages (like Python) this could be added as a new argument with a default value, like replace="". For others (like Go) this would have to be a new function.

Exception logic?

Would it be possible to make a way to tell anyascii to transliterate all characters except certain specified ones? I'm writing output to a display that needs most Unicode characters transliterated, but can handle certain ones, like the ° symbol.

Deprecation warning in Python 3.11


  /Users/xxx/Workspace/Backend/venv/lib/python3.11/site-packages/anyascii/__init__.py:29: DeprecationWarning: read_binary is deprecated. Use files() instead. Refer to https://importlib-resources.readthedocs.io/en/latest/using.html#migrating-from-legacy for migration advice.
    b = read_binary('anyascii._data', '%03x' % blocknum)

read_binary will eventually be removed in later version of Python.

They recommend using .open('rb')

https://docs.python.org/3/library/importlib.resources.html#importlib.resources.read_binary

Python package missing py.typed marker

Currently when running mypy against a library that uses anyascii will result in the following error:

$ mypy mypackage
mypackage/mymodule.py:6: error: Skipping analyzing "anyascii": module is installed, but missing library stubs or py.typed marker  [import]
mypackage/mymodule.py:6: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports

As noted in PEP-561 (https://peps.python.org/pep-0561/#packaging-type-information) and the mypy documentation linked above, the presence of a py.typed file is necessary for mypy to use the type annotation that's already present in anyascii.

Targeting Framework not Supported

Hello,

We are trying to add the Nuget Package AnyAscii to our C# .Net Standard 2.1 class library.

image
We noticed that the package wasn't compatible even though we are able to clone your repo and take a dependency on the any-ascii csproj inside the repository.

.NET - Potentially large memory allocation at each call

In the code there is such a thing:

private static ReadOnlySpan<byte> Bank => new byte[64436]
{
    83, 104, 99, 104, 39, 101, 117, 101, 117, 101,
    // .. 62 Kb more data
};

Probably the expectation here was that the array would be created once. But in fact it will be created every time the property is accessed, because the expression-bodied property is just syntactic sugar for get { return new ... }

To avoid potential memory problems you should replace this with:

private static ReadOnlyMemory<byte> Bank { get; } = new byte[]
{
...

German replacement is weird

As a native German with an Umlaut in my name, I'm really surprised by the choice to, as shown in the README, replace an Umlaut with the same base character. The "conventional" replacement is indeed a convention dating back to typewriters which did not originally contain Umlaut characters, so all Germans will recognize it as having the same meaning. Simply dropping the dots, however, changes pronunciation and potentially meaning.

Take the German word for bear, "Bär" as an example. I'm not saying it's the best example, but it's the first that comes to mind. Capitalization makes it somewhat distinguishable from "bar" (free from/of), except at the beginning of a sentence: here, the words are indistinguishable in your replacement scheme, and would be read as the latter word, but then conventional "Baer" would still retain its original sense.

I don't know what reasoning led to adopting this behaviour, but from a German point of view, it seems quite weird.

I also understand that from a non-German point of view, "Baer" is very difficult to know how to pronounce - but the spelling of "Bar" wouldn't result in an understandable pronunciation, either. My own name trips people up because it ends up with three vowels in a row. I do get the issue here, I just don't see how the adopted solution helps.

Hope that makes sense, and happy to discuss.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.