ruby-i18n / ruby-cldr Goto Github PK
View Code? Open in Web Editor NEWRuby library for exporting data from CLDR (Common Locale Data Repository)
License: MIT License
Ruby library for exporting data from CLDR (Common Locale Data Repository)
License: MIT License
Hey there, thanks for this great plugin. Just cam across the use case to need to subdivisionContainment. Easy access through the gem would be great.
e.g.:
module Cldr
module Export
module Data
class SubdivisionsContainment < Base
def initialize(*)
super(nil)
update(subdivision_containment: subdivision_containment)
end
def subdivision_containment
@subdivision_containment ||= doc.xpath("supplementalData/subdivisionContainment/subgroup").each_with_object(
Hash.new { |h, k| h[k] = { contains: [] } }
) do |territory, memo|
territory_id = territory.attribute("type").value
children = territory.attribute("contains").value.split(" ")
memo[territory_id][:contains].concat(children)
memo[territory_id][:contains].sort!
end
end
end
end
end
end
ruby-cldr.gemspec
is auto-generated by bundle exec rake gemspec
(via Jeweler).
However, developers will forget to run it as part of their PRs. Which means that we risk releasing a gem without all the required files, for example.
a) add it to the checklist in the PR template
b) add a CI check that fails if bundle exec rake gemspec && rubocop -a ruby-cldr.gemspec
produces any differences.
This PR added inflections to units, so entries in the data changed from:
<unit type="duration-day">
<displayName>Tage</displayName>
<unitPattern count="one">{0} Tag</unitPattern>
<unitPattern count="other">{0} Tage</unitPattern>
<perUnitPattern>{0} pro Tag</perUnitPattern>
</unit>
to
<unit type="duration-day">
<gender>masculine</gender>
<displayName>Tage</displayName>
<unitPattern count="one">{0} Tag</unitPattern>
<unitPattern count="one" case="accusative">{0} Tag</unitPattern>
<unitPattern count="one" case="dative">{0} Tag</unitPattern>
<unitPattern count="one" case="genitive">{0} Tages</unitPattern>
<unitPattern count="other">{0} Tage</unitPattern>
<unitPattern count="other" case="accusative">{0} Tage</unitPattern>
<unitPattern count="other" case="dative">{0} Tagen</unitPattern>
<unitPattern count="other" case="genitive">{0} Tage</unitPattern>
<perUnitPattern>{0} pro Tag</perUnitPattern>
</unit>
Notably, there are now multiple values for each of the count
attributes (one for each case
). Previously, there was only the "nominative" case.
Using ruby-cldr
with v38 gives a different behaviour than it did with v37, since this code returns whatever case happens to appear last in the file (in this case "genitive" in v38, "accusative" in v37).
As a first step, we should change the code to return the "nominative" case.
Longer term, we can look into ways to expose the other cases.
While fixing #137, I noticed that there are "alt=variant" versions of currency symbols.
Look into what these are used for, and how we should export them.
Hello @camertron,
We've been working away at improving ruby-cldr
and getting it to a point where it can be used with the latest version of CLDR (v41).
While we're not yet ready for a 1.0.0 release, I think it would be good to get a 0.6.0
release done.
What would the process be to do this? Who currently has the keys/permission to publish to RubyGems, and is it possible to have those transferred over to me? (I saw that you were the last to publish ruby-cldr
to RubyGems)
Just kicking off the conversation now.
thor cldr:export --locales af --components Characters
Note that the keys in the data/af/characters.yml
file are Ruby Symbol
s, unlike every other key exported by ruby-cldr
, which are strings.
In the export code, deep_stringify_keys
is called, but it doesn't affect these since they are hidden in an Array
.
A similar thing happens with Rbnf files:
thor cldr:export --locales af --components Rbnf
All keys exported by ruby-cldr
would be strings.
Psych.load
instead of the Psych.unsafe_load
needed to support Symbol
sSymbol
sCLDR has a concept of "Lateral Inheritance" where a value will fallback to another value before falling back to ancestor locales.
ia.units.unitLength.short.length-centimeter
defines only a single other
key, despite ia
have a plural rule that requires a one
value.
When resolving the value of ia.units.unitLength.short.length-centimeter.one
, it should fall back to ia.units.unitLength.short.length-centimeter.other
first.
This can get very complicated if there are multiple levels of lateral inheritance.
Related to #67, ruby-cldr
needs to decide how much of this to handle at the thor cldr:export
layer vs. exposing to clients so they can make their own decisions.
Perhaps for now, ruby-cldr
should resolve values for each of the required pluralization keys for a locale (e.g., copy other
for the missing plural keys), while we wait to figure out what to do about the other dimensions (e.g., "gender", "case")
CLDR has an alt
attribute that can be used to indicate a different things:
alt=proposed\d*
and alt=#{variantname}-proposed\d*
indicate proposed values that will replace the non-alt
version of the value. They will always have a draft
attribute, since as soon as they get the accepted
status, the non-alt
version is replaced.alt=#{variantname}
indicates an alternative value that really ought to be used in specific conditions.Currently, ruby-cldr
filters out values with an alt
attribute in some places using the Base#alt?
method. However, it's inconsistently applied.
We might want to export data with alt=#{variantname}
style attributes, perhaps using a separate key name. cldr-json
does this:
<language type="en_US">Engels (VSA)</language>
<language type="en_US" alt="short">Engels (VSA)</language>
gets exported as:
"en-US": "Engels (VSA)",
"en-US-alt-short": "Engels (VSA)",
~/ruby-cldr/lib/cldr/export/data/calendars/gregorian.rb:79:in `eras': undefined method `path' for nil:NilClass (NoMethodError) from ~/ruby-cldr/lib/cldr/export/data/calendars/gregorian.rb:11:in `initialize' from ~/ruby-cldr/lib/cldr/export/data/calendars.rb:9:in `new' from ~/ruby-cldr/lib/cldr/export/data/calendars.rb:9:in `initialize' from ~/ruby-cldr/lib/cldr/export.rb:52:in `new' from ~/ruby-cldr/lib/cldr/export.rb:52:in `block in data' from ~/ruby-cldr/lib/cldr/export.rb:51:in `each' from ~/ruby-cldr/lib/cldr/export.rb:51:in `inject' from ~/ruby-cldr/lib/cldr/export.rb:51:in `data' from ~/ruby-cldr/lib/cldr/export/yaml.rb:11:in `export' from ~/ruby-cldr/lib/cldr/export.rb:35:in `block (2 levels) in export' from ~/ruby-cldr/lib/cldr/export.rb:34:in `each' from ~/ruby-cldr/lib/cldr/export.rb:34:in `block in export' from ~/ruby-cldr/lib/cldr/export.rb:33:in `each' from ~/ruby-cldr/lib/cldr/export.rb:33:in `export' from ~/ruby-cldr/lib/cldr/thor.rb:27:in `export'
Currently when generating the YAML files for territories
(and possibly others), some keys are not escaped.
Because of the beautiful YAML logic, we get some weird results:
territories.001
but is parsed as territories.1
territories.011
but us parsed as territories.9
Binary ๐TLDR: Non string territory keys are a mess
{1=>"World", 2=>"Africa", 3=>"North America", 5=>"South America", :"009"=>"Oceania", 9=>"Western Africa", 11=>"Central America", 12=>"Eastern Africa", 13=>"Northern Africa", 15=>"Middle Africa", :"018"=>"Southern Africa", :"019"=>"Americas", 17=>"Northern America", :"029"=>"Caribbean"...
As noted in this comment, Currencies#currency
artificially adds a translation for the one
key, even though there is no corresponding translation in the upstream CLDR.
It adds a one
key for languages that don't have a one
pluralization rule (e.g., zh
); slightly bloating the data files
This is missing all of the other pluralization keys that might be needed in the locale.
I18n.with_locale(:af) do
I18n.t("currencies.LVL", count: 0)
end
I18n::InvalidPluralizationData: translation data {:one=>"Lettiese lats", :name=>"Lettiese lats", :symbol=>"LVL"} can not be used with :count => 0. key 'other' is missing.
My guess without context is that this was done to give a "reasonable" default for the currency in cases where CLDR doesn't have pluralization translation information for the currency.
The logic is present all the way back to at least 2009-12-30:
ruby-cldr/lib/cldr/data/currencies.rb
Line 17 in 6664464
IMO, in some sense this is fabricating information that isn't in the upstream CLDR dataset.
Unless ruby-cldr
's mission is to augment CLDR with its own, I feel that this logic belongs downstream of ruby-cldr
in the consumer's code.
Remove this special casing of the one
key.
If that's not an option, then add keys for each pluralization key IFF the language uses that pluralization key.
The Plurals
component only outputs a plurals.rb
file for a locale if the locale appears exactly in the supplemental/plurals.xml
file.
For example, zh-Hant
doesn't have a plurals.rb
file, which means that it would fall back to its parent locale, root
. root
happens to have the same plural rules as zh-Hant
(i.e., everything uses other
) so it isn't an issue.
However, there are locales that fall back to root
that don't share the same plural rules as root
. For example, sd-Deva
falls back to root
, but (as I understand it) sd-Deva
uses both one
and other
. So if it were to use root
's plurals.rb
, it would be incorrect.
IDK yet.
This is what I get:
$ thor cldr:export
/usr/lib/ruby/gems/1.8/gems/thor-0.13.6/lib/thor/runner.rb:34:in method_missing': undefined method
start' for nil:NilClass (NoMethodError)
from /usr/lib/ruby/gems/1.8/gems/thor-0.13.6/lib/thor/task.rb:33:in send' from /usr/lib/ruby/gems/1.8/gems/thor-0.13.6/lib/thor/task.rb:33:in
run'
from /usr/lib/ruby/gems/1.8/gems/thor-0.13.6/lib/thor/task.rb:13:in run' from /usr/lib/ruby/gems/1.8/gems/thor-0.13.6/lib/thor/invocation.rb:109 from /usr/lib/ruby/gems/1.8/gems/thor-0.13.6/lib/thor/invocation.rb:116:in
call'
from /usr/lib/ruby/gems/1.8/gems/thor-0.13.6/lib/thor/invocation.rb:116:in invoke' from /usr/lib/ruby/gems/1.8/gems/thor-0.13.6/lib/thor.rb:137:in
start'
from /usr/lib/ruby/gems/1.8/gems/thor-0.13.6/lib/thor/base.rb:378:in start' from /usr/lib/ruby/gems/1.8/gems/thor-0.13.6/lib/thor.rb:124:in
start'
from /usr/lib/ruby/gems/1.8/gems/thor-0.13.6/bin/thor:6
from /usr/bin/thor:19:in `load'
from /usr/bin/thor:19
According to Export#locales
, the fallback chain for zh-Hant-HK
is [:"zh-Hant-HK", :zh]
.
But it should be [:"zh-Hant-HK", :"zh-Hant", :root]
.
The Git ecosystem is moving away from the default branch name of master
, towards the name main
.
e.g. All new repos on GitHub/GitLab/Bitbucket now use main
as the default branch name for new repositories.
Change the default branch of ruby-cldr
to use main
.
git grep master
and update mentions of the outdated branch name in documentation and URLs.Developers with local clones will have to perform a one-time update of the local clones by running:
git branch -m master main
git fetch origin
git branch -u origin/main main
git symbolic-ref refs/remotes/origin/HEAD refs/remotes/origin/main
These commands are also shown to developers who visit the repo in the GitHub interface, so it doesn't require additional advertising work from our end.
Many of the elements in CLDR have alt=variant
versions that are supposed to be used "in some circumstances".
Figure out how we should export these.
eraNames
<eraNames>
<era type="0">BC</era>
<era type="0" alt="variant">BCE</era>
<era type="1">AD</era>
<era type="1" alt="variant">CE</era>
</eraNames>
symbol
<currency type="TRY">
<displayName>Turkse lira</displayName>
<displayName count="one">Turkse lira</displayName>
<displayName count="other">Turkse lira</displayName>
<symbol draft="contributed">TRY</symbol>
<symbol alt="narrow" draft="contributed">โบ</symbol>
<symbol alt="variant" draft="contributed">TL</symbol>
</currency>
territory
names<territory type="CD">Kongo-Kinshasa</territory>
<territory type="CD" alt="variant">Kongo (Demokratische Republik)</territory>
language
names<language type="ckb">Zentralkurdisch</language>
<language type="ckb" alt="menu">Kurdisch (Sorani)</language>
<language type="ckb" alt="variant">โโโ</language>
dayPeriod
<dayPeriod type="am">AM</dayPeriod>
<dayPeriod type="am" alt="variant">am</dayPeriod>
<field type="dayperiod-short">
<displayName>AM/PM</displayName>
<displayName alt="variant">am/pm</displayName>
</field>
script
names<script type="Arab">Arabic</script>
<script type="Arab" alt="variant">Perso-Arabic</script>
thor cldr:export --components=subdivisions
is missing some subdivisions
In version 35, ruby-cldr
exports a data/br/subdivisions.yml
file. In v36+, it does not.
This is because the subdivisions have been moved from common/subdivisions/br.xml
to common/main/br.xml
.
Since ruby-cldr
doesn't know about this other location, it drops the information.
Look in both locations for subdivision data?
Everything else that ruby-cldr
exports uses the hyphenated version of the locale codes, instead of CLDR's underscore version (i.e., en-GB
instead of en_GB
).
parent_locales.yml
should not be an exception to this.
The root
locale contains <alias>
elements that tell the reader that they should restart their search with a different key.
For example, resolving //ldml/dates/fields/field[@type="hour-narrow"]/relative[@type=0]
for en_GB
in CLDR v34 (for example) should look at:
//ldml/dates/fields/field[@type="hour-narrow"]/relative[@type=0]
in common/main/en_GB.xml
//ldml/dates/fields/field[@type="hour-narrow"]/relative[@type=0]
in common/main/en_001.xml
//ldml/dates/fields/field[@type="hour-narrow"]/relative[@type=0]
in common/main/en.xml
Before finally //ldml/dates/fields/field[@type="hour-narrow"]/relative[@type=0]
in common/main/root.xml
, where it finds the alias:
<field type="hour-narrow">
<alias source="locale" path="../field[@type='hour-short']">
</field>
And restarts the search using hour-short
, which in turn leads to another alias, restarting the search using hour
, which is eventually resolves to:
this hour
from //ldml/dates/fields/field[@type="hour"]/relative[@type=0]
in common/main/en.xml
.
thor cldr:export --locales 'en' 'en_GB' --components Fields --merge
data/en-GB.yml
to contain a en-GB.fields.hour-narrow.relative.0
key with the value of this hour
data/en.yml
to contain a en.fields.hour-narrow.relative.0
key with the value of this hour
Once exported, there is no information in the exported YAML files about aliases.
This means that a client cannot know to use the en-GB.fields.hour.relative.0
key instead (for en-GB
), which exists.
(Aside: this example uses --merge
, but it equally applies to the non-merged case. I used the --merge
case for simplicity)
Data exports are incomplete. Any aliased field is missing a corresponding value in the exported files.
The above hour-narrow
example is not that impactful (clients lose access to the hour-narrow
format ๐คท).
In other cases, <alias>
elements are used for more impactful values. For example, the Buddhist calendar data are largely aliased to Gregorian calendar data.
While ruby-cldr
doesn't currently export Buddhist calendar information, if it were to, it's lack of <alias>
handling would mean that the Buddhist calendar would be missing most of its data entirely.
Another example: The symbols for many numbering systems are aliased to the Latin number symbols. Without resolving the aliases, the symbols are not present in the exported data (Actually, they are, but that's due to another bug).
Output aliases in the manner supported by ruby-i18n/i18n
's Symbol
resolving.
If --merge
flag is used, resolve the value and output that instead.
When falling back through locales, we don't want to fallback to a non-default script, and should instead fall back to :root
Refer to this code in the CLDR upstream:
The sort order of the files is not maintained when locales are merged.
thor cldr:export --locales fr --components currencies
Note the all the currency data is sorted by currency code.
thor cldr:export --locales fr --components currencies --merge
Note that the currencies mentioned by the root
locale (added by --merge
) are found at the top of the file, followed by those found only in the fr
file.
In this case, the root
locale adds nothing that isn't present in the fr
locale, so the files should be identical.
The same sort order should be maintained. For currency files, this uses the currency code.
For other files, it might use another.
Right now the Readme says Ruby 1.9 (if you want well-ordered Hashes to be exported)
But we only run CI on ruby 2.3
Sorbet is a gradual typing system.
Consider typing ruby-cldr
using Sorbet to help catch typing bugs?
jeweler
hasn't been maintained since early 2019.
It's also the source of our only security advisory.
Figure out what modern Ruby gems use to do their releases, and replace jeweler
.
Make sure that things haven't changed in ways that break our output
thor cldr:export --components=parentLocales
Produces a parent_locales.yml
for every locale. This is a waste of files / disk space / memory.
The data in parent_locales.yml
comes from supplementalData.xml
, which is valid for all locales.
You can verify that the data is identical for all locales with this bit of Ruby:
require 'yaml'
def check_parent_locales
result = nil
Dir[File.join("data", "*", "parent_locales.yml")].sort.each do |source_file_name|
parsed = YAML.load_file(source_file_name).values.first
result ||= parsed
unless parsed == result
raise "`#{source_file_name}` is different from the others"
end
end
end
check_parent_locales
Instead of outputting duplicated data for every locale, output a top-level parent_locales.yml
cldr-json
is a JSON serialization of the official CLDR XML data, created by the Unicode Consortium themselves.
It doesn't contains values for everything in CLDR. For example, it only exports values with a draft level of contributed
or above.
Comparing the output of ruby-cldr
with the output of cldr-json
might prove useful for giving confidence in the correctness of ruby-cldr
.
--merge
currently takes a long time, needlessly.
Instead of iterating over each locale in turn and merging in the ancestor locales each time:
en-CA -> en -> root
en-GB -> en -> root
en-US -> en -> root
All of these use the en -> root
, which is the same data, so there is no need to recompute those for each child locale.
Instead, you could iterate over the graph of locales breadth-first starting at the root
locale, then cache the results for use in the other locales.
(Of course, this might not be worth doing as the whole concept of --merge
is likely to change. I just wanted to capture this potential optimization here)
CLDR defines supplemental region validity data that we recently realized would be useful to have.
Export that data as a new component.
Similar to #167, ruby-cldr
currently only exports Gregorian calendar data, yet CLDR v41 has data for 16 different calendars.
Note: There are a number of aliases in the root
locale that would need to be implemented if we start exporting the other calendar systems
Typically, I greatly value maintenance of backwards compatibility.
However, ruby-cldr
is pre-1.0.0, and I'd like to set a modern baseline before going to 1.0.0 and having to support old versions more seriously.
Ref: #50
Trying to run 'thor cldr:download' after "bundle install" but I'm geting this error.
Ruby 2.1
Rails 4.0
Cldr::Export::Data::Numbers#unit
uses a default not present in upstream CLDR:
ruby-cldr/lib/cldr/export/data/numbers.rb
Lines 108 to 114 in d180a44
There are two other places in Cldr::Export::Data::Calendars::Gregorian
ruby-cldr fails to export CLDR v38:
ruby-cldr (master)$ thor cldr:download --source=http://unicode.org/Public/cldr/38/core.zip
ruby-cldr (master)$ thor cldr:export
.........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Traceback (most recent call last):
41: from /Users/username/.gem/ruby/2.7.1/bin/thor:23:in `<main>'
40: from /Users/username/.gem/ruby/2.7.1/bin/thor:23:in `load'
39: from /Users/username/.gem/ruby/2.7.1/gems/thor-1.0.1/bin/thor:6:in `<top (required)>'
38: from /Users/username/.gem/ruby/2.7.1/gems/thor-1.0.1/lib/thor/base.rb:485:in `start'
37: from /Users/username/.gem/ruby/2.7.1/gems/thor-1.0.1/lib/thor.rb:392:in `dispatch'
36: from /Users/username/.gem/ruby/2.7.1/gems/thor-1.0.1/lib/thor/invocation.rb:127:in `invoke_command'
35: from /Users/username/.gem/ruby/2.7.1/gems/thor-1.0.1/lib/thor/command.rb:135:in `run'
34: from /Users/username/.gem/ruby/2.7.1/gems/thor-1.0.1/lib/thor/command.rb:29:in `run'
33: from /Users/username/.gem/ruby/2.7.1/gems/thor-1.0.1/lib/thor/runner.rb:43:in `method_missing'
32: from /Users/username/.gem/ruby/2.7.1/gems/thor-1.0.1/lib/thor/base.rb:485:in `start'
31: from /Users/username/.gem/ruby/2.7.1/gems/thor-1.0.1/lib/thor.rb:392:in `dispatch'
30: from /Users/username/.gem/ruby/2.7.1/gems/thor-1.0.1/lib/thor/invocation.rb:127:in `invoke_command'
29: from /Users/username/.gem/ruby/2.7.1/gems/thor-1.0.1/lib/thor/command.rb:27:in `run'
28: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/thor.rb:27:in `export'
27: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export.rb:70:in `export'
26: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export.rb:70:in `each'
25: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export.rb:71:in `block in export'
24: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export.rb:71:in `each'
23: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export.rb:72:in `block (2 levels) in export'
22: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export/ruby.rb:5:in `export'
21: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export.rb:85:in `data'
20: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export.rb:110:in `plural_data'
19: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export.rb:96:in `locale_based_data'
18: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export.rb:96:in `inject'
17: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export.rb:96:in `each'
16: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export.rb:97:in `block in locale_based_data'
15: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export.rb:97:in `new'
14: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export/data/plurals.rb:28:in `initialize'
13: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export/data/plurals.rb:32:in `to_hash'
12: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export/data/plurals/rules.rb:105:in `to_ruby'
11: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export/data/plurals/rules.rb:105:in `inject'
10: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export/data/plurals/rules.rb:105:in `each'
9: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export/data/plurals/rules.rb:106:in `block in to_ruby'
8: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export/data/plurals/rules.rb:69:in `parse'
7: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export/data/plurals/rules.rb:69:in `inject'
6: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export/data/plurals/rules.rb:69:in `each'
5: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export/data/plurals/rules.rb:69:in `block in parse'
4: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export/data/plurals/rules.rb:71:in `parse'
3: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export/data/plurals/rules.rb:71:in `inject'
2: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export/data/plurals/rules.rb:71:in `each'
1: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export/data/plurals/rules.rb:71:in `block in parse'
/Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export/data/plurals/rules.rb:80:in `parse': can not parse 'e = 0 ' (RuntimeError)
It seems that CLDR v38 may have changed the syntax of plural rules, and it affects the fr
locale:
<!-- 3: one,many,other -->
<pluralRules locales="fr">
<pluralRule count="one">i = 0,1 @integer 0, 1 @decimal 0.0~1.5</pluralRule>
<pluralRule count="many">e = 0 and i != 0 and i % 1000000 = 0 and v = 0 or e != 0..5 @integer 1000000, 1e6, 2e6, 3e6, 4e6, 5e6, 6e6, โฆ @decimal 1.0000001e6, 1.1e6, 2.0000001e6, 2.1e6, 3.0000001e6, 3.1e6, โฆ</pluralRule>
<pluralRule count="other"> @integer 2~17, 100, 1000, 10000, 100000, 1e3, 2e3, 3e3, 4e3, 5e3, 6e3, โฆ @decimal 2.0~3.5, 10.0, 100.0, 1000.0, 10000.0, 100000.0, 1000000.0, 1.0001e3, 1.1e3, 2.0001e3, 2.1e3, 3.0001e3, 3.1e3, โฆ</pluralRule>
</pluralRules>
<pluralRules locales="ff fr hy kab">
<pluralRule count="one">i = 0,1 @integer 0, 1 @decimal 0.0~1.5</pluralRule>
<pluralRule count="other"> @integer 2~17, 100, 1000, 10000, 100000, 1000000, โฆ @decimal 2.0~3.5, 10.0, 100.0, 1000.0, 10000.0, 100000.0, 1000000.0, โฆ</pluralRule>
</pluralRules>
Note the addition of the e = 0
syntax. (I haven't yet looked into the details of what it means).
Hello @tigrish, @camertron, @Korri,
I'm requesting that I be granted "maintainer" status over ruby-cldr
.
I'm hoping to exercise more control over the codebase than I feel I can as an outside contributor.
As part of my full-time work at Shopify, I maintain an internal library that uses ruby-cldr
as its source of CLDR data.
As you may have seen, I've committed many PRs in the last few months, fixing numerous bugs in ruby-cldr
that were causing incorrect data to be exported. I'm currently the #3 contributor to ruby-cldr
, and at this rate expect to be #1 within a few weeks.
Practically, unless the pattern of contribution changes, I expect that I'll be effectively the sole maintainer.
Of course, I would continue to accept and appreciate any collaboration from you, and will treat your previous contributions with respect.
I'm hoping that you'll allow me this control over the project, or at the very least, start a discussion around how I can gain such control over time.
The rest of this issue gives some more details about who I am, and what I hope to do with more control over the future of ruby-cldr
.
ruby-cldr
I want ruby-cldr
to be a solid foundational library upon which other i18n libraries in the Ruby/Rails ecosystem can depend to provide easy and accurate access to the latest CLDR data.
Note that I do not wish to expand the scope of ruby-cldr
. ruby-cldr
will support the creation of higher-level, end-user APIs by offering easy and accurate access to the upstream CLDR data, but it will not provide those end-user APIs itself.
Examples of potential consumers of ruby-cldr
's data:
ruby-i18n/i18n
twitter-cldr-rb
shopify-i18n
(currently internal)ruby-cldr
-specific changes:
ruby-cldr
fully compatible with modern versions of CLDRexport.rb
to be much simplerdraft
attribute levelsGeneral repos changes:
ruby-cldr
master
to main
I'm not new to the world of open-source maintainership. I'm a maintainer/contributor of several well-used libraries in other ecosystems:
ciso8601
a foundational Python library for parsing ISO 8601 timestamps quickly.
I also regularly contribute bug reports and bug fixes to all FOSS software I use.
Beyond that, I've been working as a Software Developer for 11 years now. For the past 3, I've been working on i18n systems at Shopify.
ruby-cldr
is MIT ๐)
Maintain
or Admin
role on ruby-i18n/ruby-cldr
Admin
role, though I'll still need someone with Admin
role to rename the default branchI'm hoping that you'll allow me this control over the project, or at the very least, start a discussion around how I can gain such control over time.
CLDR defines supplemental territoryContainment
data that we recently realized would be useful to have.
Export that data as a new component.
Question: Is it the case that thor cldr:export
should never export draft data?
Right now it seems to export draft data in some places, which may indicate a bug.
CLDR has a hierarchy of 4 values for the "draft" attribute that represent how far through the approval process the data is.
Currently, some of the data exposed through ruby-cldr
is guarded by checks to draft?
:
def draft?(node)
draft = node.attribute('draft')
draft && draft.value == 'unconfirmed'
end
But there are also places where this check is not being done (Example) and draft information is getting exported. For example, thor cldr:export
exports:
---
se:
currencies:
DKK:
symbol: Dkr
narrow_symbol: kr
Even though the narrow_symbol
value is marked as draft="unconfirmed"
in CLDR
Related: I believe that checking for the draft
attribute at the leaf is insufficient. According to this, draft
attributes can be inherited from parents.
That said, it does mention that this generally should not be the case:
However, normally the draft attributes should be canonicalized, which means they are pushed down to leaf nodes as described in Section 5.6 Canonical Form. If an LDML file does has draft attributes that are not on leaf nodes, the file should be interpreted as if it were the canonicalized version of that file.
So I'm not sure that this is a problem in practice.
As of CLDR v36 (2019-10-04), the inheritance marker (โโโ
) has been added to the data files as a explicit indicator that a value should use the value from the inherited locale.
For example, in common/main/zh.xml
:
<unit type="graphics-pixel">
<displayName>โโโ</displayName>
<unitPattern count="other">โโโ</unitPattern>
</unit>
zh
has no parents except for root
to inherit from.
However, ruby-cldr
only adds root
as a parent to en
: #47
Historical notes:
Thank you for the work you have done so far!
We are looking at using this library (specifically for plurals first), and noticed that v25 was released ~2 weeks ago. It adds a whole new dimension to pluralization (ordinals, cardinals, and ranges, oh my!) as well as much finer grained cases for MANY languages (largely around decimals/fractions).
I was toying with the idea of trying to update this library to start using v25 rules, and I was hoping I wasnโt the only one working on it, so we could share the load.
It seems that ruby-cldr
figures out the list of locales by iterating over the filenames:
ruby-cldr/lib/cldr/export/data.rb
Line 47 in e97bdfb
This is fine.
However, users of ruby-cldr
cannot do this, since the exported directory also contains non-locale directories (ex.transforms
)
Create a mechanism that allows users to ruby-cldr
to reliably get the list of locales. Perhaps this could be as simple as generating a locales.yml
file from Cldr::Export::Data#locales
.
This would allow ruby-cldr
to do whatever it would like with the file structure, and give confidence to users that they aren't accidentally including non-locales in their list of regions.
Is there an officially supported way to get the list of locales?
Version 24 of the CLDR is now out, and it's changed the way in which the plural rules and units elements (at least) are described. The release notes are here; I'd submit a pull request but I'm just not familiar enough with the file syntax to know what I'm doing.
ar
has information on percent formatting for two number systems: arab
and latn
.
Cldr::Export::Data::Numbers#number_system
takes the first one arbitrarily as the value of number_system
:
ruby-cldr/lib/cldr/export/data/numbers.rb
Lines 103 to 106 in c85848a
This results in exports that:
a) are missing the latn
data, due to being shadowed by the arab
data
b) mis-label the latn
data as arab
thor cldr:export --components=Numbers --locales=ar
decimal:
number_system: arab
patterns:
default: "#,##0.###`" # From `arab`, shadowing the similar data from `latn`
long: # Everything else is from `latn`
'1000':
few: 0 ุขูุงู # From `latn`
many: 0 ุฃูู # From `latn`
one: 0 ุฃูู # From `latn`
other: 0 ุฃูู # From `latn`
two: 0 ุฃูู # From `latn`
zero: 0 ุฃูู # From `latn`
'10000':
few: 00 ุฃูู # From `latn`
many: 00 ุฃูู # From `latn`
one: 00 ุฃูู # From `latn`
other: 00 ุฃูู # From `latn`
two: 00 ุฃูู # From `latn`
zero: 00 ุฃูู # From `latn`
Add an additional layer of nesting to the YAML output that contains the number system:
decimal:
arab:
patterns:
default: "#,##0.###`" # From `arab`
latn:
patterns:
default: "#,##0.###`" # From `latn`
long:
'1000':
few: 0 ุขูุงู # From `latn`
many: 0 ุฃูู # From `latn`
one: 0 ุฃูู # From `latn`
other: 0 ุฃูู # From `latn`
two: 0 ุฃูู # From `latn`
zero: 0 ุฃูู # From `latn`
This would more closely match the upstream CLDR data, and avoid these problems.
There are a number of places in ruby-cldr
where files are created for locales, despite there being no relevant information for that (locale, component) pair.
Sometimes this manifests as empty YAML mappings, and sometimes the entire file is meaningless.
These meaningless keys mean more memory usage for Rails users, and bloats the I18n.load_path
unnecessarily.
Examples:
data/af-ZA/calendars.yml
data/af-ZA/delimiters.yml
data/af-ZA/rbnf.yml
data/af-ZA/units.yml
Upstream CLDR has no relevant information for these components in this locale.
Of the 8332 YAML files output by ruby-cldr
, at least 2327 of them (28%) of them contain no relevant information.
Stop outputting keys and/or files unless there is actually relevant data from the upstream CLDR.
Cldr::Export::Data::Numbers#symbols
ignores the distinguishing attribute numberSystem
:
In the bn
locale, there are data for two number systems. However, Cldr::Export::Data::Numbers#symbols
merges them all together:
(byebug) select('numbers/symbols/*').size
23
23 = 11 from the beng
numberSystem
, and 12 from the latn
numberSystem
(CLDR v34)
The result is a set of symbols that combine bits of both numberSystems
.
bn:
numbers:
symbols:
alias: ''
decimal: "."
group: ","
list: ";"
percent_sign: "%"
plus_sign: "+"
minus_sign: "-"
exponential: E
superscripting_exponent: "ร"
per_mille: "โฐ"
infinity: "โ"
nan: NaN
time_separator: ":"
Distinguishing attributes are used to distinguish multiple elements at the same level.
I expected different numberSystem
elements to be kept separate during the export, since that's what the spec calls for. Something like:
bn:
numbers:
symbols:
beng:
....
latn:
....
bundle exec thor cldr:download
bundle exec thor cldr:export
Note that the --merge
option was not set in the cldr:export
call.
Look at the contents of data/af-NA/plurals.rb
:
{ :'af_NA' => { :i18n => { :plural => { :keys => nil, :rule => } } } }
Note that this is not valid ruby syntax, since there is no value for the :rule
key.
$> ruby data/af-NA/plurals.rb
data/af-NA/plurals.rb:1: syntax error, unexpected '}'
... => { :keys => nil, :rule => } } } }
All Ruby files (including plurals.rb
files) created by ruby-cldr
should be valid Ruby files (i.e. syntactically correct)
There seems to be a lot of overlap between these components. They use the same supplemental/plurals.xml
file, but they do so in different ways, so produce different results.
plurals.rb
is only output for some locales, but plural_rules.yml
gets output for many more (but not all) locales.
My gut feel is that plurals.rb
and plural_rules.yml
should be output in the same scenarios, and represent the same information.
Figure out how to merge the implementations of these components?
Is there a reason transforms
gets exported to its own directory, as opposed to being exported into each per-locale directory?
This causes issues like #59.
unicode-org/cldr@a56c139 moved where //validity/variable
information was located (from common/supplemental/supplementalMetadata.xml
to common/supplemental/attributeValueValidity.xml
).
Since lib/cldr/export/data/variables.rb
hard-codes the path to look at, thor cldr:export --components=variables
stopped outputting anything at that point.
This is similar to #95, but affects supplemental data instead.
Rework Export::Data
to follow the spec and combine the data files into one before doing lookups.
CLDR v38+ has the c
/e
operands in plural rules, which are used for formatting "compact decimal" numbers (e.g., 1.20050c3
).
Ref: http://unicode.org/reports/tr35/tr35-numbers.html#Plural_Operand_Meanings
ruby-cldr
currently only exports data from the latn
numbering system, and the structure of the output data files doesn't have a place to support the exporting of the other numbering systems.
bn:
numbers:
symbols: # Implicitly `latn`
alias: ''
decimal: "."
group: ","
list: ";"
percent_sign: "%"
....
perhaps we'd want to restructure things such that the numbering system is part of the keypath:
bn:
numbers:
symbols:
beng:
....
latn:
....
or
bn:
numbers:
beng:
symbols:
....
latn:
symbols:
....
Note: There are a number of aliases in the root
locale that would need to be implemented once we start exporting the other numbering systems
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.