Giter VIP home page Giter VIP logo

Comments (4)

billdenney avatar billdenney commented on July 22, 2024 1

@dpprdan, I understand the challenge here.

The make_clean_names() function works hard to give consistent results across platforms, locales, and R versions, as much as possible. This is a challenge that has surprised me over recent revisions of how challenging cross-platform and cross-locale standardization has been (see #492).

I agree that this is not working the way that I would have expected, given what is in the documentation. I think that the best fix is to revise the documentation to clarify the order that changes are applied and the subsequent limitations to the function.

In the documentation, we should also clarify that while the goal is interpretability, the higher-level goal is to provide consistent and usable names in R with commonly-used tools. That implies that it gives the same answer across locales, and that the answer provided is usable on the majority of keyboards (e.g. my American keyboard doesn't have an easy way to give an umlaut, nor do Indian, Japanese, and many others while most keyboards allow for writing basic ASCII).

from janitor.

sfirke avatar sfirke commented on July 22, 2024 1

Thanks for raising this @dpprdan and for investigating it @billdenney. Sounds like things are settled and I expect this conversation may be of use to future users investigating this behavior.

from janitor.

billdenney avatar billdenney commented on July 22, 2024

@dpprdan, I've looked at this in more detail today, and your tracing of the issue is correct:

  • ascii = TRUE will remove umlauts before transliteration occurs
  • the default argument for transliterations also removes umlauts

The intent of the ascii and transliterations arguments are different (even if they aren't fully independent), so I would not want to change these two options for users. Also, changing this would cause a degree of backward incompatibility for existing users with an admitted improvement in final fidelity but not a categorical improvement.

For the documentation, I went in to add some text to clarify what happens, but then I saw that it is already there. If you look in the documentation page, it indicates "the order of operations..." (search higher in the page that you linked to).

I think that your work-around of janitor::make_clean_names("qualität", ascii = FALSE, transliterations = "german") is the best that is reasonable within the current code.

from janitor.

dpprdan avatar dpprdan commented on July 22, 2024

@billdenney Thanks. I also played around with the code and, yeah, it's complicated.

FWIW I'd change the transliterations = "Latin-ASCII" to transliterations = NULL, because if ascii = TRUE then "Latin-ASCII" is applied implicitly already. If ascii = FALSE applying a transliteration to ASCII is probably not intended anyway, cf.

I suppose one could also argue that if ascii = FALSE umlauts should stay umlauts?

from janitor.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.