RFR: 8361381: GlyphLayout behavior differs on JDK 11+ compared to JDK 8

Tue Aug 19 13:23:37 UTC 2025

On Mon, 18 Aug 2025 16:05:24 GMT, Volker Simonis <simonis at openjdk.org> wrote:

> ### TL;DR
> 
> This is a fix for what I think is a regression since the introduction of HarfBuzz in JDK 9. The problem is that the algorithm which converts the glyph vector produced by the layout engine into a corresponding character vector (in `ExtendedTextSourceLabel::createCharinfo()`) still assumes that "*each glyph maps to a single character*". But this is not true any more with HarfBuzz and as this example demonstrates, can lead to improper clustering of characters which can result to bad line breaking decisions.
> 
> I ran the corresponding JTreg and JCK test on Linux but because this area is heavily dependent on the OS and concrete fonts I'd like to kindly ask you to run your internal test suites in this area if possible.  
> 
> In the following you can find a longer (maybe a bit too long :) description of this problem which I merely wrote for my own memory.
> 
> ### Full description
> 
> A customer reported a regression in JDK 9+ which leads to bad/wrong line breaks for text in the Khmer language. Khmer is a [complex script](https://en.wikipedia.org/wiki/Khmer_script) which was only added to the Unicode standard 3.0 in 1999 (in the [Unicode block U+1780..U+17FF](https://en.wikipedia.org/wiki/Khmer_(Unicode_block))) and I personally don't understand Khmer at all :)
> 
> Fortunately, the customer could provide a [simple reproducer](https://bugs.openjdk.org/secure/attachment/115218/KhmerTest.java) which I could further condense to the following example: "បានស្នើសុំនៅតែត្រូវបានបដិសេធ" (according to Google translate, this means "*Requested but still denied*"). If we use OpenJDK's [`LineBreakMeasurer`](https://docs.oracle.com/en/java/javase/24/docs/api/java.desktop/java/awt/font/LineBreakMeasurer.html) to layout that paragraph (notice that Khmer has no spaces between words) to fit within a specific "wrapping width", the output may look as follows with JDK 8 (the exact output depends on the font and the wrapping width):
> 
> Segment: បានស្នើសុំ 0 10
> Segment: នៅតែត្រូវ 10 9
> Segment: បានបដិសេ 19 8
> Segment: ធ 27 1
> 
> I ran with both, the logical [DIALOG](https://docs.oracle.com/en/java/javase/24/docs/api/java.desktop/java/awt/Font.html#DIALOG) font or directly with `/usr/share/fonts/truetype/ttf-khmeros-core/KhmerOS.ttf` on Ubuntu 22.04 (on my system DIALOG will automatically fall back to the KhmerOS font for characters from the Khmer Unicode code block). I also tried with the [Noto Khmer](https://fonts.google.com/noto/specimen/Noto+Serif+Khmer) fonts but the results were similar, so I'...

As @mrserb correctly mentioned, there's now no need to count `clusterExtraGlyphs` any more, so I've removed it completely from the code.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26825#issuecomment-3200747024