RFR: 8361381: GlyphLayout behavior differs on JDK 11+ compared to JDK 8 [v2]

Tue Aug 19 13:20:57 UTC 2025

> ### TL;DR
> 
> This is a fix for what I think is a regression since the introduction of HarfBuzz in JDK 9. The problem is that the algorithm which converts the glyph vector produced by the layout engine into a corresponding character vector (in `ExtendedTextSourceLabel::createCharinfo()`) still assumes that "*each glyph maps to a single character*". But this is not true any more with HarfBuzz and as this example demonstrates, can lead to improper clustering of characters which can result to bad line breaking decisions.
> 
> I ran the corresponding JTreg and JCK test on Linux but because this area is heavily dependent on the OS and concrete fonts I'd like to kindly ask you to run your internal test suites in this area if possible.  
> 
> In the following you can find a longer (maybe a bit too long :) description of this problem which I merely wrote for my own memory.
> 
> ### Full description
> 
> A customer reported a regression in JDK 9+ which leads to bad/wrong line breaks for text in the Khmer language. Khmer is a [complex script](https://en.wikipedia.org/wiki/Khmer_script) which was only added to the Unicode standard 3.0 in 1999 (in the [Unicode block U+1780..U+17FF](https://en.wikipedia.org/wiki/Khmer_(Unicode_block))) and I personally don't understand Khmer at all :)
> 
> Fortunately, the customer could provide a [simple reproducer](https://bugs.openjdk.org/secure/attachment/115218/KhmerTest.java) which I could further condense to the following example: "បានស្នើសុំនៅតែត្រូវបានបដិសេធ" (according to Google translate, this means "*Requested but still denied*"). If we use OpenJDK's [`LineBreakMeasurer`](https://docs.oracle.com/en/java/javase/24/docs/api/java.desktop/java/awt/font/LineBreakMeasurer.html) to layout that paragraph (notice that Khmer has no spaces between words) to fit within a specific "wrapping width", the output may look as follows with JDK 8 (the exact output depends on the font and the wrapping width):
> 
> Segment: បានស្នើសុំ 0 10
> Segment: នៅតែត្រូវ 10 9
> Segment: បានបដិសេ 19 8
> Segment: ធ 27 1
> 
> I ran with both, the logical [DIALOG](https://docs.oracle.com/en/java/javase/24/docs/api/java.desktop/java/awt/Font.html#DIALOG) font or directly with `/usr/share/fonts/truetype/ttf-khmeros-core/KhmerOS.ttf` on Ubuntu 22.04 (on my system DIALOG will automatically fall back to the KhmerOS font for characters from the Khmer Unicode code block). I also tried with the [Noto Khmer](https://fonts.google.com/noto/specimen/Noto+Serif+Khmer) fonts but the results were similar, so I'...

Volker Simonis has updated the pull request incrementally with one additional commit since the last revision:

  No need to count 'clusterExtraGlyphs' any more

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26825/files
  - new: https://git.openjdk.org/jdk/pull/26825/files/a52916b2..ba3e50b2

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26825&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26825&range=00-01

  Stats: 5 lines in 1 file changed: 0 ins; 4 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/26825.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26825/head:pull/26825

PR: https://git.openjdk.org/jdk/pull/26825