RFR: 8361381: GlyphLayout behavior differs on JDK 11+ compared to JDK 8 [v3]

Mon Aug 25 19:09:42 UTC 2025

On Tue, 19 Aug 2025 14:41:54 GMT, Volker Simonis <simonis at openjdk.org> wrote:

>> ### TL;DR
>> 
>> This is a fix for what I think is a regression since the introduction of HarfBuzz in JDK 9. The problem is that the algorithm which converts the glyph vector produced by the layout engine into a corresponding character vector (in `ExtendedTextSourceLabel::createCharinfo()`) still assumes that "*each glyph maps to a single character*". But this is not true any more with HarfBuzz and as this example demonstrates, can lead to improper clustering of characters which can result to bad line breaking decisions.
>> 
>> I ran the corresponding JTreg and JCK test on Linux but because this area is heavily dependent on the OS and concrete fonts I'd like to kindly ask you to run your internal test suites in this area if possible.  
>> 
>> In the following you can find a longer (maybe a bit too long :) description of this problem which I merely wrote for my own memory.
>> 
>> ### Full description
>> 
>> A customer reported a regression in JDK 9+ which leads to bad/wrong line breaks for text in the Khmer language. Khmer is a [complex script](https://en.wikipedia.org/wiki/Khmer_script) which was only added to the Unicode standard 3.0 in 1999 (in the [Unicode block U+1780..U+17FF](https://en.wikipedia.org/wiki/Khmer_(Unicode_block))) and I personally don't understand Khmer at all :)
>> 
>> Fortunately, the customer could provide a [simple reproducer](https://bugs.openjdk.org/secure/attachment/115218/KhmerTest.java) which I could further condense to the following example: "បានស្នើសុំនៅតែត្រូវបានបដិសេធ" (according to Google translate, this means "*Requested but still denied*"). If we use OpenJDK's [`LineBreakMeasurer`](https://docs.oracle.com/en/java/javase/24/docs/api/java.desktop/java/awt/font/LineBreakMeasurer.html) to layout that paragraph (notice that Khmer has no spaces between words) to fit within a specific "wrapping width", the output may look as follows with JDK 8 (the exact output depends on the font and the wrapping width):
>> 
>> Segment: បានស្នើសុំ 0 10
>> Segment: នៅតែត្រូវ 10 9
>> Segment: បានបដិសេ 19 8
>> Segment: ធ 27 1
>> 
>> I ran with both, the logical [DIALOG](https://docs.oracle.com/en/java/javase/24/docs/api/java.desktop/java/awt/Font.html#DIALOG) font or directly with `/usr/share/fonts/truetype/ttf-khmeros-core/KhmerOS.ttf` on Ubuntu 22.04 (on my system DIALOG will automatically fall back to the KhmerOS font for characters from the Khmer Unicode code block). I also tried with the [Noto Khmer](https://fonts.google.com/noto/specimen/Noto+Serif+Khmer) f...
>
> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Added JTreg test to verify monotonically growing glyph character indices

> _Mailing list message from [Philip Race](mailto:philip.race at oracle.com) on [client-libs-dev](mailto:client-libs-dev at mail.openjdk.org):_
> 
> That's not relevant. This issue is about what you will contribute.
> 
> You should read and understand the OCA. If necessary you should consult your organization's legal department for help.
> 
> -phil.
> 
> On 8/25/25 9:33 AM, Volker Simonis wrote:
> 
> > > > I can use [`hb-subset`](https://man.archlinux.org/man/extra/harfbuzz-utils/hb-subset.1.en) to create a subset of the [KhmerOS](https://www.cambodia.org/fonts/) open source font (licensed under LGPL 2.1 or later) which will be just enough for the test and check that in along with the test. The subsetted font file will be 28kb. Would that be acceptable
> > > > No. That won't be allowed. You aren't using your own IP.
> > > > Sorry, but I don't understand the problem? We have a bunch of other third-party libraries which are included in OpenJDK along with their corresponding license files. Just do a `find . -name legal -type d -exec echo {} ; -exec ls -la {} ;` in the top level directory to find them all.
> 
> -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.openjdk.org/pipermail/client-libs-dev/attachments/20250825/6bed3869/attachment-0001.htm>

There's obviously a way to push third-party code with appropriate license to the OpenJDK. I understand the OCA and I'm not insisting in pushing such changes myself, I just offered to create the corresponding PR such that you or somebody else can push it (just as you've pushed the DejaVu fonts).

But I also don't want to get into a licensing discussion here as well as I don't wan to solve the general problem of testing complex scripts layout in OpenJDK.

I think it is evident that this PR fixes a regression that is in OpenJDK since JDK 9. This regression can probably affect all complex scripts which do character reordering and ligatures. I think one of the reasons why it became apparent in Khmer script is that Khmer script is not using space between words. This means that in the OpenJDK, we  use the default RuleBasedBreakIterator for finding word boundaries because we have no dictionary support for Khmer.

This means that we can break at any cluster boundary (and in Khmer **only** at cluster boundaries because there's no white space between words) and cluster boundaries are broken since JDK 9+ because of the missing invisible glyphs. [ExtendedTextSourceLabel::getLineBreakIndex()](https://github.com/openjdk/jdk/blob/040cc7aee03e82e70bcbfcd2dde5cd4b35faeabd/src/java.desktop/share/classes/sun/font/ExtendedTextSourceLabel.java#L483) simply considers all the characters with zero advance to belong to a cluster and it won't break in the middle of a cluster. But because of the regression introduced by the HarfBuzz integration, we can get arbitrary long clusters which won't be broken.

This change is pretty simple. I don't think it does any harm and at least it contains a regression test which verifies the monotonic nature of cluster indices (which hasn't been tested until now). Please let us first push this simple fix before we try to achieve more ambitious goals.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26825#issuecomment-3221417422