RFR: 8361381: GlyphLayout behavior differs on JDK 11+ compared to JDK 8 [v3]

Thu Aug 21 21:15:55 UTC 2025

On Tue, 19 Aug 2025 14:41:54 GMT, Volker Simonis <simonis at openjdk.org> wrote:

>> ### TL;DR
>> 
>> This is a fix for what I think is a regression since the introduction of HarfBuzz in JDK 9. The problem is that the algorithm which converts the glyph vector produced by the layout engine into a corresponding character vector (in `ExtendedTextSourceLabel::createCharinfo()`) still assumes that "*each glyph maps to a single character*". But this is not true any more with HarfBuzz and as this example demonstrates, can lead to improper clustering of characters which can result to bad line breaking decisions.
>> 
>> I ran the corresponding JTreg and JCK test on Linux but because this area is heavily dependent on the OS and concrete fonts I'd like to kindly ask you to run your internal test suites in this area if possible.  
>> 
>> In the following you can find a longer (maybe a bit too long :) description of this problem which I merely wrote for my own memory.
>> 
>> ### Full description
>> 
>> A customer reported a regression in JDK 9+ which leads to bad/wrong line breaks for text in the Khmer language. Khmer is a [complex script](https://en.wikipedia.org/wiki/Khmer_script) which was only added to the Unicode standard 3.0 in 1999 (in the [Unicode block U+1780..U+17FF](https://en.wikipedia.org/wiki/Khmer_(Unicode_block))) and I personally don't understand Khmer at all :)
>> 
>> Fortunately, the customer could provide a [simple reproducer](https://bugs.openjdk.org/secure/attachment/115218/KhmerTest.java) which I could further condense to the following example: "បានស្នើសុំនៅតែត្រូវបានបដិសេធ" (according to Google translate, this means "*Requested but still denied*"). If we use OpenJDK's [`LineBreakMeasurer`](https://docs.oracle.com/en/java/javase/24/docs/api/java.desktop/java/awt/font/LineBreakMeasurer.html) to layout that paragraph (notice that Khmer has no spaces between words) to fit within a specific "wrapping width", the output may look as follows with JDK 8 (the exact output depends on the font and the wrapping width):
>> 
>> Segment: បានស្នើសុំ 0 10
>> Segment: នៅតែត្រូវ 10 9
>> Segment: បានបដិសេ 19 8
>> Segment: ធ 27 1
>> 
>> I ran with both, the logical [DIALOG](https://docs.oracle.com/en/java/javase/24/docs/api/java.desktop/java/awt/Font.html#DIALOG) font or directly with `/usr/share/fonts/truetype/ttf-khmeros-core/KhmerOS.ttf` on Ubuntu 22.04 (on my system DIALOG will automatically fall back to the KhmerOS font for characters from the Khmer Unicode code block). I also tried with the [Noto Khmer](https://fonts.google.com/noto/specimen/Noto+Serif+Khmer) f...
>
> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Added JTreg test to verify monotonically growing glyph character indices

Looks fine to me. I did not find any way to trigger this in cases where it might be needed other than the broken case described in the bug.

@gredler, @prrace please take a look.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26825#issuecomment-3212084183
PR Comment: https://git.openjdk.org/jdk/pull/26825#issuecomment-3212087722