Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ucdxml and TR42 #859

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open

Ucdxml and TR42 #859

wants to merge 14 commits into from

Conversation

jowilco
Copy link

@jowilco jowilco commented Jun 6, 2024

PR to make it easy to see what changes have been made to support UCDXML.

@jowilco jowilco changed the title Ucdxml preview Ucdxml and TR42 Oct 16, 2024
@jowilco jowilco marked this pull request as ready for review October 16, 2024 21:13
@jowilco
Copy link
Author

jowilco commented Oct 16, 2024

Comment on June 6 is no longer valid - we're now ready for review.

@jowilco
Copy link
Author

jowilco commented Oct 16, 2024

@macchiati @eggrobin @markusicu - Please can you review?

@@ -310,6 +313,15 @@ Unihan_Variants ; kSpoofingVariant
Unihan_Variants ; kTraditionalVariant
Unihan_Variants ; kZVariant
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a line for kZhuang here? (In other words, are you getting any data for kZhuang?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current version of UCDXML does not support kZhuang, just kZhuangNumeric.
Similar to Unikemet, we should add support either for the revised 16.0 UCDXML files, or for 17.

Comment on lines 154 to 157
cjkRSTUnicode ; kRSTUnicode
cjkReading ; kReading
cjkSrc_NushuDuben ; kSrc_NushuDuben
cjkTGT_MergedSrc ; kTGT_MergedSrc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this change: Tangut and Nüshu are not CJK, they should not have a cjk alias.

The name kReading is unfortunate (since this is really Nüshu-specific), but it is what it is.
I guess you should add the comment that I should have added saying that these are the fields from the Tangut and Nüshu sources files.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

default:
throw new RuntimeException("Missing Catalog case");
}
case Enumerated:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and the associated pile of UnicodeMaps) seems like it is going to be a bit annoying to maintain as we add properties.
Is there a reason why you are not doing something like

final UnicodeProperty property = indexUnicodeProperties.getProperty(prop);
final List<String> valueAliases = property.getValueAliases(property.getValue(codepoint));
return valueAliases.size() == 1 ? valueAliases.get(0) : valueAliases.get(1);

for most of them (special-casing Decomposition_Type etc. as needed)?

@macchiati
Copy link
Member

macchiati commented Nov 27, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants