Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync with HarfBuzz 2.7.4 #75

Merged
merged 26 commits into from
Sep 5, 2023
Merged

Conversation

bluebear94
Copy link
Collaborator

@bluebear94 bluebear94 commented Aug 17, 2023

See #37.

  • Update scripts
    • gen-tag-table.py
    • gen-universal-table.py
    • Refactor scripts (extract common logic, store downloaded data files in a separate directory) – maybe another PR?
  • Update remaining code
    • Revise universal_machine::find_syllables to avoid the Vec allocation?
  • Update tests

@bluebear94
Copy link
Collaborator Author

bluebear94 commented Aug 17, 2023

HarfBuzz 2.7.4’s gen-use-table.py doesn’t run on the current version of the Unicode data files. (gen-tag-table.py also needed a change to run properly.) Would it be viable to update these to the latest version right away?

Never mind; I was silly and didn’t notice that I needed to update the Unicode version in the URLs.

@bluebear94
Copy link
Collaborator Author

Note to self: is this change to hb-ot-layout-base-table.hh relevant?

@RazrFalcon
Copy link
Collaborator

@bluebear94 Good progress. Here is my failed attempt. Maybe it would help a bit: Patch.patch

I haven't been able to grok machine_index_t iterator logic. Afaik it's mutating and bidirectional, which I'm not sure how to implement in Rust.

Either way, feel free to ask me anything. Just remember that I wrote this code 3 years ago and I do not remember much myself.

@RazrFalcon
Copy link
Collaborator

Note to self: is this change to hb-ot-layout-base-table.hh relevant?

Any changes to font tables parsing are irrelevant. The BASE table is unused, at least of 2.7

@RazrFalcon
Copy link
Collaborator

RazrFalcon commented Aug 17, 2023

There is also a pseudo-bug. This code code should use use_category, not indic_category. It doesn't matter right now. But harfbuzz switched to use_category "byte" from 2 to 3.

@RazrFalcon
Copy link
Collaborator

The way I was doing the back-porting is by checking each commit, not the whole diff at once. harfbuzz commits are usually pretty small and self-contains, so it's easier that way.

If a commit has changes related to build system, fonts parsing, subsetting, C++ stuff - it can be safely ignored.

If a commit contains changes to tests, like this one, then you have to checkout this commit. Build harfbuzz from sources:

meson builddir --reconfigure
ninja -Cbuilddir

And then run scripts/gen-shaping-tests.py /path/to/harfbuzz-src

@bluebear94
Copy link
Collaborator Author

HarfBuzz 2.7.4’s buffer serialization was changed to surround the contents in square brackets. Should this be reflected in rustybuzz? Currently, it is, but this will require a minor change to gen-shaping-tests.py.

@RazrFalcon
Copy link
Collaborator

Should this be reflected in rustybuzz?

Sure.

@bluebear94
Copy link
Collaborator Author

Trying to fix the remaining test failure. For the font tests/fonts/in-house/8228d035fcd65d62ec9728fb34f42c63be93a5d3.ttf, HarfBuzz is getting −639 for the x-bearing of glyph #4’s extents, but rustybuzz is getting −638 instead.

@RazrFalcon
Copy link
Collaborator

I've checked that font with ttf-explorer and it does use -639.
Is this a new test case or some recent change broke it?

@bluebear94
Copy link
Collaborator Author

bluebear94 commented Aug 18, 2023

It’s an existing test case that was manually changed after generation. This test was supposed to have an expected output of x=0+1030|acutecomb=0@-19,-27+0|X=2+1295|acutecomb=2@-151,320+0 (as generated by the gen-shaping-tests.py script), but the one currently in the master branch has x=0+1030|acutecomb=0@-20,-27+0|X=2+1295|acutecomb=2@-152,321+0 instead.

@RazrFalcon
Copy link
Collaborator

Ugh... good find. I would assume this change was caused by this commit. In which case it's not a bug a we should prefer rustybuzz output. Some floats rounding differences are expected.

Lets leave the manual correction in place and I will think what could be done here.

@bluebear94
Copy link
Collaborator Author

Now morx_34_001 and morx_36_001 are failing (producing shorter outputs than expected). These test resource exhaustion cases, and the failures might be related to the work limit increases in HarfBuzz.

@bluebear94 bluebear94 marked this pull request as ready for review August 19, 2023 22:32
@bluebear94 bluebear94 changed the title Draft: Sync with HarfBuzz 2.7.4 Sync with HarfBuzz 2.7.4 Aug 19, 2023
This applies the upstream 2.7.4 unviersal machine changes to the
Ragel files. Unfortunately, this means that we have to allocate
a vector in find_syllables, as we can’t do unholy things with
operator overloading unlike in C++.
@bluebear94
Copy link
Collaborator Author

Merged the changes from #77 into this branch. Unfortunately, due to limitations in Ragel and in Rust, I had to switch back to allocating a vector in universal_machine::find_syllables for now.

@RazrFalcon
Copy link
Collaborator

@bluebear94 Sure, no problem. Thanks for looking into it. Hopefully it was relatively easy for you to figure out how to build master ragel. I will update the readme soon.

@RazrFalcon
Copy link
Collaborator

RazrFalcon commented Aug 25, 2023

PS: I wasn't notified that this pull request is no longer a draft. I would suggest pinging me directly.
Do you consider it to be done, feature-wise?

@bluebear94
Copy link
Collaborator Author

Yes.

I’ve almost figured out how to get rid of the vec allocation again, but unfortunately, it requires manually applying a few edits to the generated file. The problem is that Ragel outputs assignment statements such as p = 0;, but Rust doesn’t support overloading assignment unlike C++.

This involves a `MachineCursor` type, which is equivalent to
HarfBuzz’s `machine_index`. Unfortunately, it is not possible to
overload the assignment operator in Rust, forcing us to manually
edit the generated source.
@RazrFalcon
Copy link
Collaborator

Hm... I'm not familiar with Ragel myself, but it would be great to be able to avoid manual edits. Once again, there is no rush.

@RazrFalcon
Copy link
Collaborator

By the way, why there are so many changes to tests/shaping_text_rendering_tests.rs?

@bluebear94
Copy link
Collaborator Author

These are due to changes in the morx_34_001 and morx_36_001 tests due to increased resource limits in HarfBuzz 2.7.4 (mentioned in this comment). None of the other tests in that file have changed.

@RazrFalcon
Copy link
Collaborator

Hm... github shows:

23,060 additions, 5 deletions not shown because the diff is too large. Please use a local Git client to view these changes.

I can't even see what was changed.

@RazrFalcon
Copy link
Collaborator

Refactor scripts (extract common logic, store downloaded data files in a separate directory) – maybe another PR?

Are you talking about Python scripts? I think that should stay untouched. They are ported from harfbuzz as well. Just reformatted and outputting Rust instead of C++. rustybuzz should have as little changes as possible.

@behdad
Copy link
Member

behdad commented Aug 25, 2023

Refactor scripts (extract common logic, store downloaded data files in a separate directory) – maybe another PR?

Are you talking about Python scripts? I think that should stay untouched. They are ported from harfbuzz as well. Just reformatted and outputting Rust instead of C++. rustybuzz should have as little changes as possible.

Newer HarfBuzz versions use PackTab for a bunch of stuff. It's a small one-file Python library. Should be easy to add Rust output to it:

harfbuzz/packtab#5

Copy link
Collaborator

@notgull notgull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a look at the first couple commits. I'll work through the rest over the next week or so.

Comment on lines +701 to +703
ot.remove_language_ot('MONT')
ot.add_language('mnw', 'MONT')

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't appear in the original set of Harfbuzz changes, what is this for?

Copy link
Collaborator Author

@bluebear94 bluebear94 Aug 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current version of Microsoft’s page of language tags has a note for Thailand Mon that chokes the parser. I had to add this for the script to finish successfully; otherwise, I get this error:

Traceback (most recent call last):
  File "/home/felirovas/moddev/rustybuzz/scripts/gen-tag-table.py", line 1036, in <module>
    print('                // %s' % bcp_47.get_name(lt))
                                    ^^^^^^^^^^^^^^^^^^^
  File "/home/felirovas/moddev/rustybuzz/scripts/gen-tag-table.py", line 633, in get_name
    name = self._get_name_piece(lt.language)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/felirovas/moddev/rustybuzz/scripts/gen-tag-table.py", line 622, in _get_name_piece
    return self.names[subtag].split('\n')[0] + self.scopes.get(subtag, '')
           ~~~~~~~~~~^^^^^^^^
KeyError: 'mnwseehttps://www.unicode.org/l2/l2020/20163'

scripts/gen-tag-table.py Show resolved Hide resolved
Copy link
Collaborator

@notgull notgull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry that it took me so long. From my remaining read-through it looks reasonably accurate to Harfbuzz 2.7.4.

In the future, for ease of review I think we should have a better system for porting over commits:

  • Port over commits from Harfbuzz one by one.
  • Link the commit being ported in the rustybuzz commit.

@RazrFalcon
Copy link
Collaborator

@bluebear94 Any updates on why tests/shaping_text_rendering_tests.rs diff is so big?

@notgull Porting commits one by one is fine. Adding the exact commit hash is probably unnecessary. We simply don't need this information.

I plan to try to refactor the code base a bit to make it more harfbuzz-like. File structure, code structure, etc. Ideally, rustybuzz should look like harfbuzz transpiled to Rust. The less changes we have, the either is should be to port.

@bluebear94
Copy link
Collaborator Author

Any updates on why tests/shaping_text_rendering_tests.rs diff is so big?

See my previous comment on this. It might be possible to make the offending tests shorter using str::repeat, but that would require nontrivial changes to gen-shaping-tests.py. Alternatively, we can just exclude the morx_34_001 and morx_36_001 tests from being generated.

@RazrFalcon
Copy link
Collaborator

@bluebear94 Oh, I think I got it now. Those tests produce an enormous output? And harfbuzz tests simply use * to indicate all glyphs? Is this right?

In this case I would add those tests to the ignore list in scripts/gen-shaping-tests.py for now and later I would try to figure out what to do with them.

@bluebear94
Copy link
Collaborator Author

It’s not that all glyphs from these fonts are being tested, but rather than these fonts contain glyphs that expand indefinitely.

Anyway, I pushed a commit to ignore these tests.

@RazrFalcon
Copy link
Collaborator

I see now. Those tests were already absurd before. I guess I would have to handle them separately.

Anyway, thanks for your work! I honestly wasn't expecting anyone to be brave enough to do it.
I plan to do some small changes in the next couple of days and make a new release and then, if you're still interested, you can continue working on it. I just wanted to warn you not to fork it until I restructure the code base a bit.

@RazrFalcon RazrFalcon merged commit d04fa54 into harfbuzz:master Sep 5, 2023
1 check passed
@RazrFalcon
Copy link
Collaborator

@bluebear94 If you're still interested in updating rustybuzz, I can give you write access to this repo, to simplify your workflow. This would allow more gradual changes.
Just make sure each commit passes the CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants