Comments (4)
@dpprdan, I understand the challenge here.
The make_clean_names()
function works hard to give consistent results across platforms, locales, and R versions, as much as possible. This is a challenge that has surprised me over recent revisions of how challenging cross-platform and cross-locale standardization has been (see #492).
I agree that this is not working the way that I would have expected, given what is in the documentation. I think that the best fix is to revise the documentation to clarify the order that changes are applied and the subsequent limitations to the function.
In the documentation, we should also clarify that while the goal is interpretability, the higher-level goal is to provide consistent and usable names in R with commonly-used tools. That implies that it gives the same answer across locales, and that the answer provided is usable on the majority of keyboards (e.g. my American keyboard doesn't have an easy way to give an umlaut, nor do Indian, Japanese, and many others while most keyboards allow for writing basic ASCII).
from janitor.
Thanks for raising this @dpprdan and for investigating it @billdenney. Sounds like things are settled and I expect this conversation may be of use to future users investigating this behavior.
from janitor.
@dpprdan, I've looked at this in more detail today, and your tracing of the issue is correct:
ascii = TRUE
will remove umlauts before transliteration occurs- the default argument for
transliterations
also removes umlauts
The intent of the ascii
and transliterations
arguments are different (even if they aren't fully independent), so I would not want to change these two options for users. Also, changing this would cause a degree of backward incompatibility for existing users with an admitted improvement in final fidelity but not a categorical improvement.
For the documentation, I went in to add some text to clarify what happens, but then I saw that it is already there. If you look in the documentation page, it indicates "the order of operations..." (search higher in the page that you linked to).
I think that your work-around of janitor::make_clean_names("qualität", ascii = FALSE, transliterations = "german")
is the best that is reasonable within the current code.
from janitor.
@billdenney Thanks. I also played around with the code and, yeah, it's complicated.
FWIW I'd change the transliterations = "Latin-ASCII"
to transliterations = NULL
, because if ascii = TRUE
then "Latin-ASCII"
is applied implicitly already. If ascii = FALSE
applying a transliteration to ASCII is probably not intended anyway, cf.
I suppose one could also argue that if
ascii = FALSE
umlauts should stay umlauts?
from janitor.
Related Issues (20)
- adorn_ns() adds excluded values to a adorn_totals() in a pipe HOT 3
- Feature suggestion: allow multiple rows input to `row_to_names()` HOT 16
- Feature Request: `paste_skip_na()` function that skips NA values when pasting HOT 4
- Feature suggestion: `most()` and `assert_count_true()` HOT 6
- Add paste_skip_NA to catalog vignette
- Edge case for `janitor::remove_emtpy()`: dataframe row dimension remains after columns removed HOT 1
- `get_one_to_one()` errors with duplicated dttm HOT 4
- Possible to enrich the get_dupes() HOT 1
- Upkeep proposition / spring cleaning HOT 10
- CRAN notification re: janitor/man/janitor.Rd
- Submit 2.3.0 to CRAN
- Remove `%>%` in favor of `|>`? HOT 5
- Set old names as labels in `clean_names` HOT 6
- Feature Request: A function for quick basic standardization of an otherwise tidy (almost) df HOT 2
- Feature request: A `rename`-like function that keeps the original names as an attribute HOT 1
- `cutoff` argument in `remove_empty()` is being implemented confusingly HOT 2
- Unexpected adorn_totals("col") HOT 1
- make_clean_names: Case conversions are wierd HOT 1
- [Feature Request] Allow `tabyl()` to accept character vectors as column names HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from janitor.