Giter VIP home page Giter VIP logo

ruby-cldr's People

Contributors

c960657 avatar camertron avatar davispuh avatar dependabot[bot] avatar devanandersen avatar dpad46 avatar froyomuffin avatar kl-7 avatar korri avatar movermeyer avatar nearbuyjason avatar rafaelxy avatar tigrish avatar yaroslav avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ruby-cldr's Issues

Add SubdivisionContainment

Hey there, thanks for this great plugin. Just cam across the use case to need to subdivisionContainment. Easy access through the gem would be great.

e.g.:

module Cldr
  module Export
    module Data
      class SubdivisionsContainment < Base
        def initialize(*)
          super(nil)
          update(subdivision_containment: subdivision_containment)
        end

        def subdivision_containment
          @subdivision_containment ||= doc.xpath("supplementalData/subdivisionContainment/subgroup").each_with_object(
            Hash.new { |h, k| h[k] = { contains: [] } }
          ) do |territory, memo|
            territory_id = territory.attribute("type").value
            children = territory.attribute("contains").value.split(" ")

            memo[territory_id][:contains].concat(children)
            memo[territory_id][:contains].sort!
          end
        end
      end
    end
  end
end 

Add CI check to ensure that gemspec is up to date

Problem

ruby-cldr.gemspec is auto-generated by bundle exec rake gemspec (via Jeweler).

However, developers will forget to run it as part of their PRs. Which means that we risk releasing a gem without all the required files, for example.

Potential solutions

a) add it to the checklist in the PR template
b) add a CI check that fails if bundle exec rake gemspec && rubocop -a ruby-cldr.gemspec produces any differences.

[CLDR v38+] Support / handle unit inflections

This PR added inflections to units, so entries in the data changed from:

<unit type="duration-day">
  <displayName>Tage</displayName>
  <unitPattern count="one">{0} Tag</unitPattern>
  <unitPattern count="other">{0} Tage</unitPattern>
  <perUnitPattern>{0} pro Tag</perUnitPattern>
</unit>

to

<unit type="duration-day">
  <gender>masculine</gender>
  <displayName>Tage</displayName>
  <unitPattern count="one">{0} Tag</unitPattern>
  <unitPattern count="one" case="accusative">{0} Tag</unitPattern>
  <unitPattern count="one" case="dative">{0} Tag</unitPattern>
  <unitPattern count="one" case="genitive">{0} Tages</unitPattern>
  <unitPattern count="other">{0} Tage</unitPattern>
  <unitPattern count="other" case="accusative">{0} Tage</unitPattern>
  <unitPattern count="other" case="dative">{0} Tagen</unitPattern>
  <unitPattern count="other" case="genitive">{0} Tage</unitPattern>
  <perUnitPattern>{0} pro Tag</perUnitPattern>
</unit>

Notably, there are now multiple values for each of the count attributes (one for each case). Previously, there was only the "nominative" case.

Using ruby-cldr with v38 gives a different behaviour than it did with v37, since this code returns whatever case happens to appear last in the file (in this case "genitive" in v38, "accusative" in v37).


As a first step, we should change the code to return the "nominative" case.
Longer term, we can look into ways to expose the other cases.

Releasing `ruby-cldr`

Hello @camertron,

We've been working away at improving ruby-cldr and getting it to a point where it can be used with the latest version of CLDR (v41).

While we're not yet ready for a 1.0.0 release, I think it would be good to get a 0.6.0 release done.

What would the process be to do this? Who currently has the keys/permission to publish to RubyGems, and is it possible to have those transferred over to me? (I saw that you were the last to publish ruby-cldr to RubyGems)

Just kicking off the conversation now.

All exported keys should be strings, not `Symbol`s

Steps to reproduce

thor cldr:export --locales af --components Characters

Note that the keys in the data/af/characters.yml file are Ruby Symbols, unlike every other key exported by ruby-cldr, which are strings.

In the export code, deep_stringify_keys is called, but it doesn't affect these since they are hidden in an Array.

A similar thing happens with Rbnf files:

thor cldr:export --locales af --components Rbnf

Expectation

All keys exported by ruby-cldr would be strings.

Benefits

  • Consistency
  • Using strings means that users can parse with Psych.load instead of the Psych.unsafe_load needed to support Symbols
  • No confusion with Aliases, which are implemented using Symbols

Lateral inheritance fallback

CLDR has a concept of "Lateral Inheritance" where a value will fallback to another value before falling back to ancestor locales.

Example

ia.units.unitLength.short.length-centimeter defines only a single other key, despite ia have a plural rule that requires a one value.

When resolving the value of ia.units.unitLength.short.length-centimeter.one, it should fall back to ia.units.unitLength.short.length-centimeter.other first.

This can get very complicated if there are multiple levels of lateral inheritance.

Potential solution?

Related to #67, ruby-cldr needs to decide how much of this to handle at the thor cldr:export layer vs. exposing to clients so they can make their own decisions.

Perhaps for now, ruby-cldr should resolve values for each of the required pluralization keys for a locale (e.g., copy other for the missing plural keys), while we wait to figure out what to do about the other dimensions (e.g., "gender", "case")

Export `alt` attribute consistently

CLDR has an alt attribute that can be used to indicate a different things:

  • alt=proposed\d* and alt=#{variantname}-proposed\d* indicate proposed values that will replace the non-alt version of the value. They will always have a draft attribute, since as soon as they get the accepted status, the non-alt version is replaced.
  • alt=#{variantname} indicates an alternative value that really ought to be used in specific conditions.

Currently, ruby-cldr filters out values with an alt attribute in some places using the Base#alt? method. However, it's inconsistently applied.

We might want to export data with alt=#{variantname} style attributes, perhaps using a separate key name. cldr-json does this:

Upstream CLDR XML:

<language type="en_US">Engels (VSA)</language>
<language type="en_US" alt="short">Engels (VSA)</language>

gets exported as:

"en-US": "Engels (VSA)",
"en-US-alt-short": "Engels (VSA)",

thor cldr:export fails on some language

~/ruby-cldr/lib/cldr/export/data/calendars/gregorian.rb:79:in `eras': undefined method `path' for nil:NilClass (NoMethodError)
  from ~/ruby-cldr/lib/cldr/export/data/calendars/gregorian.rb:11:in `initialize'
  from ~/ruby-cldr/lib/cldr/export/data/calendars.rb:9:in `new'
  from ~/ruby-cldr/lib/cldr/export/data/calendars.rb:9:in `initialize'
  from ~/ruby-cldr/lib/cldr/export.rb:52:in `new'
  from ~/ruby-cldr/lib/cldr/export.rb:52:in `block in data'
  from ~/ruby-cldr/lib/cldr/export.rb:51:in `each'
  from ~/ruby-cldr/lib/cldr/export.rb:51:in `inject'
  from ~/ruby-cldr/lib/cldr/export.rb:51:in `data'
  from ~/ruby-cldr/lib/cldr/export/yaml.rb:11:in `export'
  from ~/ruby-cldr/lib/cldr/export.rb:35:in `block (2 levels) in export'
  from ~/ruby-cldr/lib/cldr/export.rb:34:in `each'
  from ~/ruby-cldr/lib/cldr/export.rb:34:in `block in export'
  from ~/ruby-cldr/lib/cldr/export.rb:33:in `each'
  from ~/ruby-cldr/lib/cldr/export.rb:33:in `export'
  from ~/ruby-cldr/lib/cldr/thor.rb:27:in `export'

Some keys need to be escaped

Currently when generating the YAML files for territories (and possibly others), some keys are not escaped.

Because of the beautiful YAML logic, we get some weird results:

  • "World" should be available as territories.001 but is parsed as territories.1
  • "Western Africa" should be under territories.011 but us parsed as territories.9 Binary ๐Ÿ™Œ

TLDR: Non string territory keys are a mess

{1=>"World", 2=>"Africa", 3=>"North America", 5=>"South America", :"009"=>"Oceania", 9=>"Western Africa", 11=>"Central America", 12=>"Eastern Africa", 13=>"Northern Africa", 15=>"Middle Africa", :"018"=>"Southern Africa", :"019"=>"Americas", 17=>"Northern America", :"029"=>"Caribbean"...

Remove special `one` key handling in `Currencies#currency`

As noted in this comment, Currencies#currency artificially adds a translation for the one key, even though there is no corresponding translation in the upstream CLDR.

Tactical issues

  • It adds a one key for languages that don't have a one pluralization rule (e.g., zh); slightly bloating the data files

  • This is missing all of the other pluralization keys that might be needed in the locale.

    I18n.with_locale(:af) do
      I18n.t("currencies.LVL", count: 0)
    end
    I18n::InvalidPluralizationData: translation data {:one=>"Lettiese lats", :name=>"Lettiese lats", :symbol=>"LVL"} can not be used with :count => 0. key 'other' is missing.
    

Philosophical issue

My guess without context is that this was done to give a "reasonable" default for the currency in cases where CLDR doesn't have pluralization translation information for the currency.

The logic is present all the way back to at least 2009-12-30:

count = node.attribute('count') ? node.attribute('count').value.to_sym : :one

IMO, in some sense this is fabricating information that isn't in the upstream CLDR dataset.
Unless ruby-cldr's mission is to augment CLDR with its own, I feel that this logic belongs downstream of ruby-cldr in the consumer's code.

Solutions?

Remove this special casing of the one key.
If that's not an option, then add keys for each pluralization key IFF the language uses that pluralization key.

`Plurals` doesn't work for some locales

Problem

The Plurals component only outputs a plurals.rb file for a locale if the locale appears exactly in the supplemental/plurals.xml file.

For example, zh-Hant doesn't have a plurals.rb file, which means that it would fall back to its parent locale, root. root happens to have the same plural rules as zh-Hant (i.e., everything uses other) so it isn't an issue.

However, there are locales that fall back to root that don't share the same plural rules as root. For example, sd-Deva falls back to root, but (as I understand it) sd-Deva uses both one and other. So if it were to use root's plurals.rb, it would be incorrect.

Potential Solution?

IDK yet.

broken with latest thor

This is what I get:

$ thor cldr:export
/usr/lib/ruby/gems/1.8/gems/thor-0.13.6/lib/thor/runner.rb:34:in method_missing': undefined methodstart' for nil:NilClass (NoMethodError)
from /usr/lib/ruby/gems/1.8/gems/thor-0.13.6/lib/thor/task.rb:33:in send' from /usr/lib/ruby/gems/1.8/gems/thor-0.13.6/lib/thor/task.rb:33:inrun'
from /usr/lib/ruby/gems/1.8/gems/thor-0.13.6/lib/thor/task.rb:13:in run' from /usr/lib/ruby/gems/1.8/gems/thor-0.13.6/lib/thor/invocation.rb:109 from /usr/lib/ruby/gems/1.8/gems/thor-0.13.6/lib/thor/invocation.rb:116:incall'
from /usr/lib/ruby/gems/1.8/gems/thor-0.13.6/lib/thor/invocation.rb:116:in invoke' from /usr/lib/ruby/gems/1.8/gems/thor-0.13.6/lib/thor.rb:137:instart'
from /usr/lib/ruby/gems/1.8/gems/thor-0.13.6/lib/thor/base.rb:378:in start' from /usr/lib/ruby/gems/1.8/gems/thor-0.13.6/lib/thor.rb:124:instart'
from /usr/lib/ruby/gems/1.8/gems/thor-0.13.6/bin/thor:6
from /usr/bin/thor:19:in `load'
from /usr/bin/thor:19

Change default branch name to `main`

The Git ecosystem is moving away from the default branch name of master, towards the name main.

e.g. All new repos on GitHub/GitLab/Bitbucket now use main as the default branch name for new repositories.

Change the default branch of ruby-cldr to use main.

How is this done?

  1. Rename the default branch in the GitHub UI (Requires Administrator rights on the repo)
  2. Use git grep master and update mentions of the outdated branch name in documentation and URLs.

Developers with local clones will have to perform a one-time update of the local clones by running:

git branch -m master main
git fetch origin
git branch -u origin/main main
git symbolic-ref refs/remotes/origin/HEAD refs/remotes/origin/main

These commands are also shown to developers who visit the repo in the GitHub interface, so it doesn't require additional advertising work from our end.

Figure out how to export `alt=variant` versions of elements

Many of the elements in CLDR have alt=variant versions that are supposed to be used "in some circumstances".

Figure out how we should export these.


Examples

eraNames

common/main/zu.xml

<eraNames>
  <era type="0">BC</era>
  <era type="0" alt="variant">BCE</era>
  <era type="1">AD</era>
  <era type="1" alt="variant">CE</era>
</eraNames>

Currency symbol

common/main/af.xml

<currency type="TRY">
  <displayName>Turkse lira</displayName>
  <displayName count="one">Turkse lira</displayName>
  <displayName count="other">Turkse lira</displayName>
  <symbol draft="contributed">TRY</symbol>
  <symbol alt="narrow" draft="contributed">โ‚บ</symbol>
  <symbol alt="variant" draft="contributed">TL</symbol>
</currency>

territory names

common/main/de.xml

<territory type="CD">Kongo-Kinshasa</territory>
<territory type="CD" alt="variant">Kongo (Demokratische Republik)</territory>

common/main/de.xml

language names

<language type="ckb">Zentralkurdisch</language>
<language type="ckb" alt="menu">Kurdisch (Sorani)</language>
<language type="ckb" alt="variant">โ†‘โ†‘โ†‘</language>

dayPeriod

common/main/en.xml

<dayPeriod type="am">AM</dayPeriod>
<dayPeriod type="am" alt="variant">am</dayPeriod>

common/main/en.xml

<field type="dayperiod-short">
  <displayName>AM/PM</displayName>
  <displayName alt="variant">am/pm</displayName>
</field>

script names

common/main/en.xml

<script type="Arab">Arabic</script>
<script type="Arab" alt="variant">Perso-Arabic</script>

Export is missing some subdivisions

Problem?

thor cldr:export --components=subdivisions is missing some subdivisions

In version 35, ruby-cldr exports a data/br/subdivisions.yml file. In v36+, it does not.
This is because the subdivisions have been moved from common/subdivisions/br.xml to common/main/br.xml.

Since ruby-cldr doesn't know about this other location, it drops the information.

Potential Solution?

Look in both locations for subdivision data?

Handle `<alias>` nodes

The root locale contains <alias> elements that tell the reader that they should restart their search with a different key.

Example resolution

For example, resolving //ldml/dates/fields/field[@type="hour-narrow"]/relative[@type=0] for en_GB in CLDR v34 (for example) should look at:

  • //ldml/dates/fields/field[@type="hour-narrow"]/relative[@type=0] in common/main/en_GB.xml
  • //ldml/dates/fields/field[@type="hour-narrow"]/relative[@type=0] in common/main/en_001.xml
  • //ldml/dates/fields/field[@type="hour-narrow"]/relative[@type=0] in common/main/en.xml

Before finally //ldml/dates/fields/field[@type="hour-narrow"]/relative[@type=0] in common/main/root.xml, where it finds the alias:

<field type="hour-narrow">
  <alias source="locale" path="../field[@type='hour-short']">
</field>

And restarts the search using hour-short, which in turn leads to another alias, restarting the search using hour, which is eventually resolves to:

this hour from //ldml/dates/fields/field[@type="hour"]/relative[@type=0] in common/main/en.xml.

Steps to reproduce

thor cldr:export --locales 'en' 'en_GB' --components Fields --merge
  • I expected data/en-GB.yml to contain a en-GB.fields.hour-narrow.relative.0 key with the value of this hour
  • I expected data/en.yml to contain a en.fields.hour-narrow.relative.0 key with the value of this hour

Once exported, there is no information in the exported YAML files about aliases.
This means that a client cannot know to use the en-GB.fields.hour.relative.0 key instead (for en-GB), which exists.

(Aside: this example uses --merge, but it equally applies to the non-merged case. I used the --merge case for simplicity)

Impact of this issue

Data exports are incomplete. Any aliased field is missing a corresponding value in the exported files.
The above hour-narrow example is not that impactful (clients lose access to the hour-narrow format ๐Ÿคท).

In other cases, <alias> elements are used for more impactful values. For example, the Buddhist calendar data are largely aliased to Gregorian calendar data.
While ruby-cldr doesn't currently export Buddhist calendar information, if it were to, it's lack of <alias> handling would mean that the Buddhist calendar would be missing most of its data entirely.

Another example: The symbols for many numbering systems are aliased to the Latin number symbols. Without resolving the aliases, the symbols are not present in the exported data (Actually, they are, but that's due to another bug).

Potential Solution

Output aliases in the manner supported by ruby-i18n/i18n's Symbol resolving.

If --merge flag is used, resolve the value and output that instead.

Sorted files should retain sort order after merge

Problem

The sort order of the files is not maintained when locales are merged.

Steps to reproduce

  1. thor cldr:export --locales fr --components currencies

Note the all the currency data is sorted by currency code.

  1. thor cldr:export --locales fr --components currencies --merge

Note that the currencies mentioned by the root locale (added by --merge) are found at the top of the file, followed by those found only in the fr file.

Expectation

In this case, the root locale adds nothing that isn't present in the fr locale, so the files should be identical.

The same sort order should be maintained. For currency files, this uses the currency code.
For other files, it might use another.

Benefits

  • As a human, it's easier to find what you're looking for
  • For those using line-by-line diff tools, the diffs are cleaner

Only output a single copy of `parent_locales.yml`

Problem

thor cldr:export --components=parentLocales

Produces a parent_locales.yml for every locale. This is a waste of files / disk space / memory.

The data in parent_locales.yml comes from supplementalData.xml, which is valid for all locales.

You can verify that the data is identical for all locales with this bit of Ruby:

require 'yaml'

def check_parent_locales
  result = nil
  Dir[File.join("data", "*", "parent_locales.yml")].sort.each do |source_file_name|
    parsed = YAML.load_file(source_file_name).values.first
    result ||= parsed
    unless parsed == result
      raise "`#{source_file_name}` is different from the others"
    end
  end
end

check_parent_locales

Potential solution

Instead of outputting duplicated data for every locale, output a top-level parent_locales.yml

Consider comparing the output with `cldr-json`

cldr-json is a JSON serialization of the official CLDR XML data, created by the Unicode Consortium themselves.

It doesn't contains values for everything in CLDR. For example, it only exports values with a draft level of contributed or above.

Comparing the output of ruby-cldr with the output of cldr-json might prove useful for giving confidence in the correctness of ruby-cldr.

`--merge` should memoize the results for performance

--merge currently takes a long time, needlessly.

Instead of iterating over each locale in turn and merging in the ancestor locales each time:

en-CA -> en -> root
en-GB -> en -> root
en-US -> en -> root

All of these use the en -> root, which is the same data, so there is no need to recompute those for each child locale.

Instead, you could iterate over the graph of locales breadth-first starting at the root locale, then cache the results for use in the other locales.

(Of course, this might not be worth doing as the whole concept of --merge is likely to change. I just wanted to capture this potential optimization here)

Export data from non-Gregorian calendars

Similar to #167, ruby-cldr currently only exports Gregorian calendar data, yet CLDR v41 has data for 16 different calendars.

Note: There are a number of aliases in the root locale that would need to be implemented if we start exporting the other calendar systems

[CLDR v38+] Don't fail when parsing `c`/`e` plural rule operands

ruby-cldr fails to export CLDR v38:

ruby-cldr (master)$ thor cldr:download --source=http://unicode.org/Public/cldr/38/core.zip
ruby-cldr (master)$ thor cldr:export
.........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Traceback (most recent call last):
	41: from /Users/username/.gem/ruby/2.7.1/bin/thor:23:in `<main>'
	40: from /Users/username/.gem/ruby/2.7.1/bin/thor:23:in `load'
	39: from /Users/username/.gem/ruby/2.7.1/gems/thor-1.0.1/bin/thor:6:in `<top (required)>'
	38: from /Users/username/.gem/ruby/2.7.1/gems/thor-1.0.1/lib/thor/base.rb:485:in `start'
	37: from /Users/username/.gem/ruby/2.7.1/gems/thor-1.0.1/lib/thor.rb:392:in `dispatch'
	36: from /Users/username/.gem/ruby/2.7.1/gems/thor-1.0.1/lib/thor/invocation.rb:127:in `invoke_command'
	35: from /Users/username/.gem/ruby/2.7.1/gems/thor-1.0.1/lib/thor/command.rb:135:in `run'
	34: from /Users/username/.gem/ruby/2.7.1/gems/thor-1.0.1/lib/thor/command.rb:29:in `run'
	33: from /Users/username/.gem/ruby/2.7.1/gems/thor-1.0.1/lib/thor/runner.rb:43:in `method_missing'
	32: from /Users/username/.gem/ruby/2.7.1/gems/thor-1.0.1/lib/thor/base.rb:485:in `start'
	31: from /Users/username/.gem/ruby/2.7.1/gems/thor-1.0.1/lib/thor.rb:392:in `dispatch'
	30: from /Users/username/.gem/ruby/2.7.1/gems/thor-1.0.1/lib/thor/invocation.rb:127:in `invoke_command'
	29: from /Users/username/.gem/ruby/2.7.1/gems/thor-1.0.1/lib/thor/command.rb:27:in `run'
	28: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/thor.rb:27:in `export'
	27: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export.rb:70:in `export'
	26: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export.rb:70:in `each'
	25: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export.rb:71:in `block in export'
	24: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export.rb:71:in `each'
	23: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export.rb:72:in `block (2 levels) in export'
	22: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export/ruby.rb:5:in `export'
	21: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export.rb:85:in `data'
	20: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export.rb:110:in `plural_data'
	19: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export.rb:96:in `locale_based_data'
	18: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export.rb:96:in `inject'
	17: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export.rb:96:in `each'
	16: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export.rb:97:in `block in locale_based_data'
	15: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export.rb:97:in `new'
	14: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export/data/plurals.rb:28:in `initialize'
	13: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export/data/plurals.rb:32:in `to_hash'
	12: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export/data/plurals/rules.rb:105:in `to_ruby'
	11: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export/data/plurals/rules.rb:105:in `inject'
	10: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export/data/plurals/rules.rb:105:in `each'
	 9: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export/data/plurals/rules.rb:106:in `block in to_ruby'
	 8: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export/data/plurals/rules.rb:69:in `parse'
	 7: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export/data/plurals/rules.rb:69:in `inject'
	 6: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export/data/plurals/rules.rb:69:in `each'
	 5: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export/data/plurals/rules.rb:69:in `block in parse'
	 4: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export/data/plurals/rules.rb:71:in `parse'
	 3: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export/data/plurals/rules.rb:71:in `inject'
	 2: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export/data/plurals/rules.rb:71:in `each'
	 1: from /Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export/data/plurals/rules.rb:71:in `block in parse'
/Users/username/src/ruby-i18n/ruby-cldr/lib/cldr/export/data/plurals/rules.rb:80:in `parse': can not parse 'e = 0 ' (RuntimeError)

It seems that CLDR v38 may have changed the syntax of plural rules, and it affects the fr locale:

        <!-- 3: one,many,other -->

        <pluralRules locales="fr">
            <pluralRule count="one">i = 0,1 @integer 0, 1 @decimal 0.0~1.5</pluralRule>
            <pluralRule count="many">e = 0 and i != 0 and i % 1000000 = 0 and v = 0 or e != 0..5 @integer 1000000, 1e6, 2e6, 3e6, 4e6, 5e6, 6e6, โ€ฆ @decimal 1.0000001e6, 1.1e6, 2.0000001e6, 2.1e6, 3.0000001e6, 3.1e6, โ€ฆ</pluralRule>
            <pluralRule count="other"> @integer 2~17, 100, 1000, 10000, 100000, 1e3, 2e3, 3e3, 4e3, 5e3, 6e3, โ€ฆ @decimal 2.0~3.5, 10.0, 100.0, 1000.0, 10000.0, 100000.0, 1000000.0, 1.0001e3, 1.1e3, 2.0001e3, 2.1e3, 3.0001e3, 3.1e3, โ€ฆ</pluralRule>
        </pluralRules>
  • In v37:
        <pluralRules locales="ff fr hy kab">
            <pluralRule count="one">i = 0,1 @integer 0, 1 @decimal 0.0~1.5</pluralRule>
            <pluralRule count="other"> @integer 2~17, 100, 1000, 10000, 100000, 1000000, โ€ฆ @decimal 2.0~3.5, 10.0, 100.0, 1000.0, 10000.0, 100000.0, 1000000.0, โ€ฆ</pluralRule>
        </pluralRules>

Note the addition of the e = 0 syntax. (I haven't yet looked into the details of what it means).

Request for maintainer status

Hello @tigrish, @camertron, @Korri,

I'm requesting that I be granted "maintainer" status over ruby-cldr.

I'm hoping to exercise more control over the codebase than I feel I can as an outside contributor.

As part of my full-time work at Shopify, I maintain an internal library that uses ruby-cldr as its source of CLDR data.
As you may have seen, I've committed many PRs in the last few months, fixing numerous bugs in ruby-cldr that were causing incorrect data to be exported. I'm currently the #3 contributor to ruby-cldr, and at this rate expect to be #1 within a few weeks.

Practically, unless the pattern of contribution changes, I expect that I'll be effectively the sole maintainer.
Of course, I would continue to accept and appreciate any collaboration from you, and will treat your previous contributions with respect.

I'm hoping that you'll allow me this control over the project, or at the very least, start a discussion around how I can gain such control over time.

The rest of this issue gives some more details about who I am, and what I hope to do with more control over the future of ruby-cldr.


My vision for ruby-cldr

I want ruby-cldr to be a solid foundational library upon which other i18n libraries in the Ruby/Rails ecosystem can depend to provide easy and accurate access to the latest CLDR data.

Note that I do not wish to expand the scope of ruby-cldr. ruby-cldr will support the creation of higher-level, end-user APIs by offering easy and accurate access to the upstream CLDR data, but it will not provide those end-user APIs itself.

Examples of potential consumers of ruby-cldr's data:

Some examples of changes I want to make

ruby-cldr-specific changes:

  • Make (and keep) ruby-cldr fully compatible with modern versions of CLDR
  • Refactor export.rb to be much simpler
  • Consistently handle CLDR's draft attribute levels

General repos changes:

  • Document the scope and invariants of the project, as well as when someone should and shouldn't use ruby-cldr
  • Add GitHub PR and Issue templates
  • Introduce RuboCop to CI, and fix all linting issues
  • Change the default branch from master to main
  • Drop support for end-of-life Ruby versions (?)
  • Change CI to GitHub Actions (?)

My open source CV

I'm not new to the world of open-source maintainership. I'm a maintainer/contributor of several well-used libraries in other ecosystems:

  • ciso8601 a foundational Python library for parsing ISO 8601 timestamps quickly.
    • It's in the top-1000 most downloaded Python libraries ๐Ÿ“ˆ
  • Foam, a note-taking extension for VSCode.

I also regularly contribute bug reports and bug fixes to all FOSS software I use.

Beyond that, I've been working as a Software Developer for 11 years now. For the past 3, I've been working on i18n systems at Shopify.

My open source values

  • Users should have extreme confidence when upgrading:
    • Maintain backwards compatibility.
      • Breaking changes should be avoided whenever possible.
      • If necessary, should only happen alongside major version bumps, and should have trivial, (preferably automatable) migration paths
    • Keep libraries small in scope, and provide strong invariants that users can always rely on going into the future
    • Follow [the spirit of] SemVer and Keep a Changelog
  • Document and support the path to contribution
  • Prefer permissive FOSS licenses, but any FOSS license is ๐Ÿ‘Œ (ruby-cldr is MIT ๐Ÿ‘)
    • I include this to express my commitment to "open source", which doesn't include licenses with restrictions on who can use the code

What I'm asking for

I'm hoping that you'll allow me this control over the project, or at the very least, start a discussion around how I can gain such control over time.

Expose "draft status" as a CLI option; clean up

Question: Is it the case that thor cldr:export should never export draft data?
Right now it seems to export draft data in some places, which may indicate a bug.


CLDR has a hierarchy of 4 values for the "draft" attribute that represent how far through the approval process the data is.

Currently, some of the data exposed through ruby-cldr is guarded by checks to draft?:

def draft?(node)
  draft = node.attribute('draft')
  draft && draft.value == 'unconfirmed'
end

But there are also places where this check is not being done (Example) and draft information is getting exported. For example, thor cldr:export exports:

---
se:
  currencies:
    DKK:
      symbol: Dkr
      narrow_symbol: kr

Even though the narrow_symbol value is marked as draft="unconfirmed" in CLDR


Related: I believe that checking for the draft attribute at the leaf is insufficient. According to this, draft attributes can be inherited from parents.

That said, it does mention that this generally should not be the case:

However, normally the draft attributes should be canonicalized, which means they are pushed down to leaf nodes as described in Section 5.6 Canonical Form. If an LDML file does has draft attributes that are not on leaf nodes, the file should be interpreted as if it were the canonicalized version of that file.

So I'm not sure that this is a problem in practice.

Add `root` as a fallback for all locales

As of CLDR v36 (2019-10-04), the inheritance marker (โ†‘โ†‘โ†‘) has been added to the data files as a explicit indicator that a value should use the value from the inherited locale.

For example, in common/main/zh.xml:

<unit type="graphics-pixel">
  <displayName>โ†‘โ†‘โ†‘</displayName>
  <unitPattern count="other">โ†‘โ†‘โ†‘</unitPattern>
</unit>

zh has no parents except for root to inherit from.

However, ruby-cldr only adds root as a parent to en: #47


Historical notes:

  • root was removed as a fallback in #42
    • This was due to it falling back even when --merge was not passed.
  • It was then added back, but only for en (Why only en? IDK...): #47

Support for CLDR v25?

Thank you for the work you have done so far!

We are looking at using this library (specifically for plurals first), and noticed that v25 was released ~2 weeks ago. It adds a whole new dimension to pluralization (ordinals, cardinals, and ranges, oh my!) as well as much finer grained cases for MANY languages (largely around decimals/fractions).

I was toying with the idea of trying to update this library to start using v25 rules, and I was hoping I wasnโ€™t the only one working on it, so we could share the load.

Feature request: A way to get the list of locales

It seems that ruby-cldr figures out the list of locales by iterating over the filenames:

Dir["#{dir}/main/*.xml"].map { |path| path =~ /([\w_-]+)\.xml/ && $1 }

This is fine.

However, users of ruby-cldr cannot do this, since the exported directory also contains non-locale directories (ex.transforms)

Create a mechanism that allows users to ruby-cldr to reliably get the list of locales. Perhaps this could be as simple as generating a locales.yml file from Cldr::Export::Data#locales.

This would allow ruby-cldr to do whatever it would like with the file structure, and give confidence to users that they aren't accidentally including non-locales in their list of regions.


Is there an officially supported way to get the list of locales?

Don't mix results from different number systems

ar has information on percent formatting for two number systems: arab and latn.

Cldr::Export::Data::Numbers#number_system takes the first one arbitrarily as the value of number_system:

def number_system(type)
node = select("numbers/#{type}Formats").first
node.attribute('numberSystem').value rescue "latn"
end

This results in exports that:

a) are missing the latn data, due to being shadowed by the arab data
b) mis-label the latn data as arab

thor cldr:export --components=Numbers --locales=ar
decimal:
  number_system: arab
  patterns:
    default: "#,##0.###`" # From `arab`, shadowing the similar data from `latn`
    long: # Everything else is from `latn`
      '1000': 
        few: 0 ุขู„ุงู # From `latn`
        many: 0 ุฃู„ู # From `latn`
        one: 0 ุฃู„ู # From `latn`
        other: 0 ุฃู„ู # From `latn`
        two: 0 ุฃู„ู # From `latn`
        zero: 0 ุฃู„ู # From `latn`
      '10000':
        few: 00 ุฃู„ู # From `latn`
        many: 00 ุฃู„ู # From `latn`
        one: 00 ุฃู„ู # From `latn`
        other: 00 ุฃู„ู # From `latn`
        two: 00 ุฃู„ู # From `latn`
        zero: 00 ุฃู„ู # From `latn`

Potential solution

Add an additional layer of nesting to the YAML output that contains the number system:

decimal:
  arab:
    patterns:
      default: "#,##0.###`" # From `arab`
  latn:
    patterns:
      default: "#,##0.###`" # From `latn`
      long:
        '1000': 
          few: 0 ุขู„ุงู # From `latn`
          many: 0 ุฃู„ู # From `latn`
          one: 0 ุฃู„ู # From `latn`
          other: 0 ุฃู„ู # From `latn`
          two: 0 ุฃู„ู # From `latn`
          zero: 0 ุฃู„ู # From `latn`

This would more closely match the upstream CLDR data, and avoid these problems.

Don't export files when for locales if there is no relevant data in CLDR

There are a number of places in ruby-cldr where files are created for locales, despite there being no relevant information for that (locale, component) pair.

Sometimes this manifests as empty YAML mappings, and sometimes the entire file is meaningless.

image

These meaningless keys mean more memory usage for Rails users, and bloats the I18n.load_path unnecessarily.

Examples:

  • data/af-ZA/calendars.yml
  • data/af-ZA/delimiters.yml
  • data/af-ZA/rbnf.yml
  • data/af-ZA/units.yml

Upstream CLDR has no relevant information for these components in this locale.

Of the 8332 YAML files output by ruby-cldr, at least 2327 of them (28%) of them contain no relevant information.

Potential solution

Stop outputting keys and/or files unless there is actually relevant data from the upstream CLDR.

Distinguishing attribute `numberSystem` is ignored.

Cldr::Export::Data::Numbers#symbols ignores the distinguishing attribute numberSystem:

Example

In the bn locale, there are data for two number systems. However, Cldr::Export::Data::Numbers#symbols merges them all together:

(byebug) select('numbers/symbols/*').size
23

23 = 11 from the beng numberSystem, and 12 from the latn numberSystem (CLDR v34)

The result is a set of symbols that combine bits of both numberSystems.

bn:
  numbers:
    symbols:
      alias: ''
      decimal: "."
      group: ","
      list: ";"
      percent_sign: "%"
      plus_sign: "+"
      minus_sign: "-"
      exponential: E
      superscripting_exponent: "ร—"
      per_mille: "โ€ฐ"
      infinity: "โˆž"
      nan: NaN
      time_separator: ":"

Distinguishing attributes are used to distinguish multiple elements at the same level.

I expected different numberSystem elements to be kept separate during the export, since that's what the spec calls for. Something like:

bn:
  numbers:
    symbols:
      beng:
          ....
      latn:
          ....

`thor cldr:export` produces invalid `plurals.rb` file when `--merge` is not set

Steps to Reproduce

bundle exec thor cldr:download
bundle exec thor cldr:export

Note that the --merge option was not set in the cldr:export call.

Actual output

Look at the contents of data/af-NA/plurals.rb:

{ :'af_NA' => { :i18n => { :plural => { :keys => nil, :rule =>  } } } }

Note that this is not valid ruby syntax, since there is no value for the :rule key.

$> ruby data/af-NA/plurals.rb
data/af-NA/plurals.rb:1: syntax error, unexpected '}'
... => { :keys => nil, :rule =>  } } } }

Expected output

All Ruby files (including plurals.rb files) created by ruby-cldr should be valid Ruby files (i.e. syntactically correct)

Why are `Plurals` and `PluralRules` separate; they currently output different data

There seems to be a lot of overlap between these components. They use the same supplemental/plurals.xml file, but they do so in different ways, so produce different results.

plurals.rb is only output for some locales, but plural_rules.yml gets output for many more (but not all) locales.

My gut feel is that plurals.rb and plural_rules.yml should be output in the same scenarios, and represent the same information.

Figure out how to merge the implementations of these components?

Variables stopped being output

Problem?

unicode-org/cldr@a56c139 moved where //validity/variable information was located (from common/supplemental/supplementalMetadata.xml to common/supplemental/attributeValueValidity.xml).

Since lib/cldr/export/data/variables.rb hard-codes the path to look at, thor cldr:export --components=variables stopped outputting anything at that point.

This is similar to #95, but affects supplemental data instead.

Potential solution?

Rework Export::Data to follow the spec and combine the data files into one before doing lookups.

Export data from different numbering systems

ruby-cldr currently only exports data from the latn numbering system, and the structure of the output data files doesn't have a place to support the exporting of the other numbering systems.

bn:
  numbers:
    symbols: # Implicitly `latn`
      alias: ''
      decimal: "."
      group: ","
      list: ";"
      percent_sign: "%"
      ....

perhaps we'd want to restructure things such that the numbering system is part of the keypath:

bn:
  numbers:
    symbols:
      beng:
          ....
      latn:
          ....

or

bn:
  numbers:
    beng:
      symbols:
          ....
    latn:
      symbols:
          ....

Note: There are a number of aliases in the root locale that would need to be implemented once we start exporting the other numbering systems

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.