Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lateral inheritance fallback #117

Open
movermeyer opened this issue Apr 11, 2022 · 4 comments
Open

Lateral inheritance fallback #117

movermeyer opened this issue Apr 11, 2022 · 4 comments

Comments

@movermeyer
Copy link
Collaborator

CLDR has a concept of "Lateral Inheritance" where a value will fallback to another value before falling back to ancestor locales.

Example

ia.units.unitLength.short.length-centimeter defines only a single other key, despite ia have a plural rule that requires a one value.

When resolving the value of ia.units.unitLength.short.length-centimeter.one, it should fall back to ia.units.unitLength.short.length-centimeter.other first.

This can get very complicated if there are multiple levels of lateral inheritance.

Potential solution?

Related to #67, ruby-cldr needs to decide how much of this to handle at the thor cldr:export layer vs. exposing to clients so they can make their own decisions.

Perhaps for now, ruby-cldr should resolve values for each of the required pluralization keys for a locale (e.g., copy other for the missing plural keys), while we wait to figure out what to do about the other dimensions (e.g., "gender", "case")

@camertron
Copy link
Collaborator

CLDR specifies a very specific algorithm for resolving inheritance and aliases, documented here. I don't think it makes sense to allow clients to make their own decisions, since the data set is explicitly structured to follow these rules. In every case, it should be possible to generate a full, expanded data set for every locale.

@movermeyer
Copy link
Collaborator Author

movermeyer commented Apr 13, 2022

CLDR specifies a very specific algorithm for resolving inheritance and aliases, documented here.

@camertron For sure. The spec defines the various algorithms needed in order to resolve the correct values for any given path.

In every case, it should be possible to generate a full, expanded data set for every locale.

Yes. We can create fully "flattened"/resolved locale files by doing all the resolution as part of the export. While ruby-cldr doesn't do a good job at this today, it definitely should.

However, as a client you probably never want to work with fully flattened locale files, as they are much larger than they could be due to all the duplication between the files (loading them uses too much RAM). When you have a smarter client, you can avoid this duplication (at the expense of more lookups).

I don't think it makes sense to allow clients to make their own decisions

What I meant is that clients either need to be smart enough to know how to do the series of fallback lookups dictated by the CLDR resolution algorithms (i.e., which path to check next when a path isn't found), or else we need to flatten the data for them.

e.g., ruby-i18n/i18n's I18n::Backend::Fallbacks knows how to handle Locale Inheritance and Aliases, but not Lateral Inheritance. So either that client needs to be made smarter, or we need to flatten the Lateral Inheritance in a way that it can understand.


In this case, even before we can generate fully flattened files, or make a smarter client, there's a prerequisite that we define what the YAML serialization of this data should actually look like.

Example: How should we serialize this to YAML?:

<unit type="duration-day">
  <gender>masculine</gender>
  <displayName>Tage</displayName>
  <unitPattern count="one">{0} Tag</unitPattern>
  <unitPattern count="one" case="accusative">{0} Tag</unitPattern>
  <unitPattern count="one" case="dative">{0} Tag</unitPattern>
  <unitPattern count="one" case="genitive">{0} Tages</unitPattern>
  <unitPattern count="other">{0} Tage</unitPattern>
  <unitPattern count="other" case="accusative">{0} Tage</unitPattern>
  <unitPattern count="other" case="dative">{0} Tagen</unitPattern>
  <unitPattern count="other" case="genitive">{0} Tage</unitPattern>
  <perUnitPattern>{0} pro Tag</perUnitPattern>
</unit>

Perhaps something like this (some keys omitted for clarity):

duration-day:
  one:
    _: {0} Tag
    accusative: {0} Tag
    dative: {0} Tag
    genitive: {0} Tages
  other:
    _: {0} Tage
    accusative: {0} Tage
    dative: {0} Tagen
    genitive: {0} Tage

Obviously, this isn't backwards compatible with existing clients, since they would be looking for a single string value at one/other, not a YAML mapping (and it introduces the concept of a _ key).

I also have clients that assume that plural cases will always be leaf nodes, which would have to change their logic to understand this.

I'm just getting my head around these ideas.

@movermeyer
Copy link
Collaborator Author

movermeyer commented Apr 13, 2022

FWIW, cldr-json exports the above example as:

"duration-day": {
  "gender": "masculine",
  "displayName": "Tage",
  "unitPattern-count-one": "{0} Tag",
  "accusative-count-one": "{0} Tag",
  "dative-count-one": "{0} Tag",
  "genitive-count-one": "{0} Tages",
  "unitPattern-count-other": "{0} Tage",
  "accusative-count-other": "{0} Tage",
  "dative-count-other": "{0} Tagen",
  "genitive-count-other": "{0} Tage",
  "perUnitPattern": "{0} pro Tag"
},

Using this code.

@camertron
Copy link
Collaborator

However, as a client you probably never want to work with fully flattened locale files, as they are much larger than they could be due to all the duplication between the files (loading them uses too much RAM).

I would want to see some benchmarks around this before agreeing. While fully flattened locale files will lead to higher memory consumption, my hunch is we're talking about a few megabytes. TwitterCLDR fully flattens locale data in almost all cases, and while the library doesn't support every language in CLDR or deal with all the data, it supports a bunch of ICU/CLDR features and only weighs in at ~20mb. I would think most clients would only load a percentage of that. Compared to the overhead of a web framework like Rails, that's not very significant.

What I meant is that clients either need to be smart enough to know how to do the series of fallback lookups dictated by the CLDR resolution algorithms (i.e., which path to check next when a path isn't found), or else we need to flatten the data for them.

Ok I see what you mean. Since ruby-cldr is a data generation gem, it makes sense to me to fully flatten. I suppose we could create a ruby-cldr-runtime gem or something that would also provide a data access layer over the top of the exported YAML files. I think ruby-cldr would also have to be modified to produce YAML files for each of the ancestor locales (in addition to the locales requested) so that lateral inheritance would be possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants