Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about support for rendering Khmer text correctly in various environments #1

Open
mbert opened this issue Jul 30, 2024 · 9 comments

Comments

@mbert
Copy link

mbert commented Jul 30, 2024

Of course this is not an "issue", merely a bunch of questions hoping that someone here may be able to help.

While learning Khmer I have also got into Khmer script. I have found that while most word processors can render Khmer texts well, this does not seem to be the case for other environments:

On the Linux command line, of course with Unicode Khmer fonts installed, rendering consonants with foots and vowels are fails: e.g. something like រ្ធី looks like រ with a "+" underneath and the vowel missing altogether. On my Mac this works quite well, even on the command line.

The same effect seems to occur when trying to render PDF files using some PDF rendering library from program code.

Just wondering: do terminal emulators or PDF renderers need some particular functionality for rendering Khmer text correctly?

@seanghay
Copy link
Owner

  1. Terminal emuator requires a specific font to work (monospace). Currently, there's no monospace font for Khmer language. (I got the answer from a professional Khmer typeface designer, Mr. Sovichet Tep.)
  2. In order to render Khmer text correctly rendering engine needs to have a text shaping engine that supports shaping Khmer glyphs like (https://github.com/harfbuzz/harfbuzz) and a text layout engine which understands how to break the Khmer text correctly. ICU supports breaking Khmer text via a fixed dictionary (https://unicode-org.github.io/icu/userguide/boundaryanalysis/#dictionary-based-breakiterator).

@seanghay
Copy link
Owner

seanghay commented Jul 31, 2024

Also checkout, https://sile-typesetter.org/examples/ and https://github.com/HOST-Oman/libraqm and https://pango.gnome.org/

@mbert
Copy link
Author

mbert commented Jul 31, 2024

Thank you very much for taking the time to reply! Also thank you very much for the links. As a long-time LaTeX user I will definitely take a closer look at sile!

Regarding the terminal emulator: is this a requirement specific to terminal emulators on Linux (or other Unices using Xorg or Wayland underneath)?

On MacOS my observation is that Khmer text is rendered correctly, but the font seems to be substituted whenever Khmer script occurs: actually the default font I use is a monospace font, but what I get for Khmer words does not look like one (because then each glyph's width and the distance to others would be identical which is clearly not the case in words like ថ្វីបើ).

When looking at Linux and other classical Unix systems I would expect things to be mostly similar to command line tools on MacOS, even though they use graphical rendering engines. But obviously there is a difference.

If I may, I also have a followup on the question regarding PDF rendering: when I want to create PDF documents programatically (say, from Java or C# code) there's libraries for doing this, like e.g. iText. For instance with iText it has turned out that one needs the pdfCalligraph extension (which is not free to use) to support Khmer text. I just cannot imagine that there are no open libraries capable of this (and thus, developers would all require commercial products)? Are there any free equivalents supporting Khmer?

Apologies for bothering you with this, you don't need to reply, but being located in Europe I have struggled a lot to find a person with a technical background and experience in these things. The reply you have given already helped a lot, អរគុណ​ច្រើន!

@seanghay
Copy link
Owner

seanghay commented Aug 1, 2024

  1. I personally don't have a Linux machine. I've been working on macOS all the time. The terminal emulator I am using now is Kitty (switched from Alacritty) and it does render Khmer text better than most terminals.

  2. PDF rendering or image rendering is still a problem for Khmer language. Luckily, we got sile now. For me, I use headless browsers (puppeteer) to render PDF with line breaking programmatically. If I don't need line breaking or paragraphs, I use node-canvas which use cairo under neath. I just asked C# dev here, they used PDFSharp and for PHP they used FPDF.

There's no a complete solution for now however the necessary libraries to achieve it are already here.

Projects worth checking out


Most engines are able to shape Khmer glyphs except OpenType.js, so I think the remaining issues are related to line-breaks which can be solved by using ICU BreakIterator.

I've always been interested in text rendering, and I've had attempted multiple times to build a working Khmer text layout engine but ended up abandoned the project.

@mbert
Copy link
Author

mbert commented Aug 1, 2024

Thank you so much for your help!

@lukasf
Copy link

lukasf commented Aug 16, 2024

I just asked C# dev here, they used PDFSharp and for PHP they used FPDF.

I've just stumbled over this post. I tried to generate a Khmer PDF with PDFSharp and it does not render correctly. Can you confirm that your colleagues use PDFSharp for rendering Khmer texts?

This is one example output I got using PDFSharp:

Khmer_Wrong

This is how it should look like - exact same string and same font, but rendered with a different (commercial) lib:

Khmer_Correct

It does not matter which Khmer font I try. All fonts work fine with Khmer in other applications (e.g. Word). Errors always seem to occur where multiple sings should be combined into a composite sign. Then the second one gets a "+" below and the first one stays where it is, instead of really combining. Sounds very much like the issue which the OP has in the console.

@seanghay
Copy link
Owner

seanghay commented Aug 19, 2024

@lukasf I've confirmed with them. They created the PDF file by creating an image bitmap first and wrapped the image in the PDF file using PDFSharp.

I'm not familiar with PDFSharp so I don't know what text shaping engine they are using. I can tell it's not HarfBuzz.

@mbert
Copy link
Author

mbert commented Aug 19, 2024

That's interesting. So if developers go down the route of rendering images and embedding them in PDF, this seems to indicate that there is no free solution to this problem available (as we've seen there are commercial solutions, but not every organisation is able or willing to go along with them)?

@seanghay
Copy link
Owner

I'm not sure about C# ecosystems. And I think because of big companies are using C# to generate reports/invoices so that these library maintainers can benefit from it and that's totally fine.

For PDFSharp, it will be able to shape Khmer text correctly if they implement HarfBuzz like SkiaSharp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants