-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve normalizer performance by adjusting the trie value format #5813
base: main
Are you sure you want to change the base?
Conversation
ICU4C issue: https://unicode-org.atlassian.net/browse/ICU-22968 |
ICU4C PR: unicode-org/icu#3269 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Landable for the purpose of 2.0, but I think this could have a couple more pointers in the docs and be more encapsulated.
/// Getting a zero from this trie means that you need | ||
/// to make another lookup from `DecompositionDataV1::trie`. | ||
pub struct DecompositionDataV2<'data> { | ||
/// Trie for decomposition. | ||
#[cfg_attr(feature = "serde", serde(borrow))] | ||
pub trie: CodePointTrie<'data, u32>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: I feel like the packed code logic is all scattered. Can we use a structured NormalizationTrieValue(pub u32)
type that has convenience methods for getting all the fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that what you suggest would be better for encapsulation. However, given that prior to this PR there was no such encapsulation and I'm already way over my time budget for this, I would very much prefer landing this ASAP (before 2.0 and before this bitrots) without such a refactoring and leaving the refactoring as a follow-up.
I tested this with the normalization test suite. Also, I tested that UTS 46 still work: https://github.com/hsivonen/rust-url/tree/unicode16 https://github.com/hsivonen/idna_adapter/tree/icu4x-trunk |
CI showing |
With the fast trie type, I see this kind of performance improvement: