Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLDR-18056 skintone with facing right #4213

Merged

Conversation

macchiati
Copy link
Member

@macchiati macchiati commented Nov 26, 2024

CLDR-18056

Adds skintones to the right-facing emoji. These are emoji that add joiner+right_facing_arrow to point to the right. It also fixes 2 other problems: the "facing right" was not added to annotations, and in the name it didn't have a colon before it like other modifiers.

Example 1, with skin tones

<annotation cp="🚶🏻‍➡">drentel | gang | lang treë gee | loop | looppas | looptempo | lopende man | lopende persoon | slenter | stap | swerf | tempo | voetganger | wandel</annotation>
<annotation cp="🚶🏻‍➡" type="tts">voetganger kyk na regs</annotation>

becomes

<annotation cp="🚶🏻‍➡">drentel | gang | kyk | lang treë gee | ligte velkleur | loop | looppas | looptempo | lopende man | lopende persoon | na | regs | slenter | stap | swerf | tempo | voetganger | wandel</annotation>
<annotation cp="🚶🏻‍➡" type="tts">voetganger**: ligte velkleur,** kyk na regs</annotation>

Example 2, without skin tones

<annotation cp="🚶‍➡">drentel | gang | lang treë gee | loop | looppas | looptempo | lopende man | lopende persoon | slenter | stap | swerf | tempo | voetganger | wandel</annotation>
<annotation cp="🚶‍➡" type="tts">voetganger kyk na regs</annotation>

becomes

<annotation cp="🚶‍➡">drentel | gang | kyk | lang treë gee | loop | looppas | looptempo | lopende man | lopende persoon | na | regs | slenter | stap | swerf | tempo | voetganger | wandel</annotation>
<annotation cp="🚶‍➡" type="tts">voetganger**:** kyk na regs</annotation>

Reviewer:

  • There are many changes in the data files, so just spot check a couple of cases in a couple of languages you know.
  • For code changes:

tools/cldr-code/src/main/java/org/unicode/cldr/util/Annotations.java

  • Made the synthesize method public for testing
  • The code around if (code.contains(EmojiConstants.JOINER_STRING)) { got simpler, because we now just remove the arrow from code, and add it to rem. That allows it to fall through properly to private Annotations getBasePlusRemainder(, which handles the code.
    • Detecting if (mod == BLACK_RIGHTWARDS_ARROW.codePointAt(0)) { and just setting a flag
    • At the end, if that flag is set, then it adds the words from the rightwardsArrowPattern.

tools/cldr-code/src/test/java/org/unicode/cldr/unittest/TestAnnotations.java

  • Restored TestUniqueness. That tests for exactly this kind of failure, and shouldn't have been removed unless a replacement was ready.

  • Added testCompleteness, that all emoji have English annotations.

  • Added testRightFacing, which has specific test cases for this problem.

  • This PR completes the ticket.

ALLOW_MANY_COMMITS=true

@macchiati macchiati marked this pull request as ready for review November 26, 2024 19:31
<annotation cp="🚶🏾‍♂‍➡" type="tts">homme qui marche vers la droite</annotation>
<annotation cp="🚶🏿‍♂‍➡">homme | homme qui marche | marche | piéton</annotation>
<annotation cp="🚶🏿‍♂‍➡" type="tts">homme qui marche vers la droite</annotation>
<annotation cp="🚶‍➡">balade | droite | en route | flâner | homme | la | marche | marcher | nonchalant | personne | personne qui marche | piéton | promenade | se balader | vers</annotation>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems a little odd to have an article like "la" be a keyword by itself; perhaps we can refine this later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I was thinking about this, and we could really use a list of per-language 'stop words', like 'la' and 'en' above, that are ignored in matching and can thus be ignored in derivations and vetting. If you think that's a good idea I can file a ticket.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is a good idea (I was speculating about something like that), please do file a ticket. Such a list would probably also be useful elsewhere in CLDR/ICU.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed a ticket and gave it to you for now.

Copy link
Contributor

@pedberg-icu pedberg-icu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annotations.java is much cleaner!

@macchiati macchiati merged commit 4a8e7c5 into unicode-org:main Dec 2, 2024
12 checks passed
@macchiati macchiati deleted the CLDR-18056-Skintone-with-facing-right branch December 2, 2024 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants