Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLDR-18155 Order languageData's scripts by number of users #4237

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

conradarcturus
Copy link
Contributor

CLDR-18155

This sorts the scripts in the SupplementalData tags such that the first script has the highest population. This helps resolve some of the ambiguities interpreting the data.

  • This PR completes the ticket.

ALLOW_MANY_COMMITS=true

@conradarcturus conradarcturus force-pushed the CLDR-18155-Order-languageData-scripts-by-usage branch from 757c108 to cbce758 Compare December 10, 2024 07:58
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • tools/cldr-code/src/main/java/org/unicode/cldr/tool/ConvertLanguageData.java is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@@ -1346,7 +1346,7 @@ XXX Code for transations where no currency is involved
<language type="awa" scripts="Deva"/>
<language type="awa" territories="IN" alt="secondary"/>
<language type="ay" scripts="Latn" territories="BO"/>
<language type="az" scripts="Arab Cyrl Latn" territories="AZ"/>
<language type="az" scripts="Arab Latn Cyrl" territories="AZ"/>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likely subtags for az is az_Latn_AZ but I will note there are more Azerbaijani speakers in Iran (az_Arab_IR). It's just that they haven't transitioned to the internet like Azerbaijan has.

@@ -1918,7 +1918,7 @@ XXX Code for transations where no currency is involved
<language type="mai" scripts="Tirh" territories="IN NP" alt="secondary"/>
<language type="mak" scripts="Latn"/>
<language type="mak" scripts="Bugi" territories="ID" alt="secondary"/>
<language type="man" scripts="Latn Nkoo"/>
<language type="man" scripts="Nkoo Latn"/>
Copy link
Contributor Author

@conradarcturus conradarcturus Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is driven by the Ghana language entry saying there is a large Manding population using Nkoo in Ghana. Really man is a macrolanguage code so we shouldn't give that much weight to this entry being one writing or another. https://en.wikipedia.org/wiki/Manding_languages

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check that again? Nkoo figures can be skewed by advocacy.

@@ -2435,7 +2435,7 @@ XXX Code for transations where no currency is involved
<language type="yrk" scripts="Cyrl"/>
<language type="yrl" scripts="Latn"/>
<language type="yua" scripts="Latn"/>
<language type="yue" scripts="Hans Hant" territories="MO"/>
<language type="yue" scripts="Hant Hans" territories="MO"/>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yue's likely subtag is yue_Hant_HK

Copy link
Contributor Author

@conradarcturus conradarcturus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in-line comments for some languages with interesting changes. Otherwise they all match likely subtags and public records.

public BasicLanguageData setScriptsWithoutPopulation(String scriptTokens) {
List<String> scripts = new ArrayList<>();
if (scriptTokens != null) {
scripts = Arrays.asList(WHITESPACE_PATTERN.split(scriptTokens));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitter ha a split to list, fyi

for (String script : scripts) {
scriptsByPopulation.put(script, 0);
}
return setScripts(scriptsByPopulation);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to build once, make immutable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants