RFR: 8270265: LineBreakMeasurer calculates incorrect line breaks with zero-width characters
Daniel Gredler
dgredler at openjdk.org
Fri Feb 14 20:46:10 UTC 2025
On Fri, 14 Feb 2025 00:04:29 GMT, Phil Race <prr at openjdk.org> wrote:
>> When a string contains zero-width characters, `LineBreakMeasurer` calculates line breaks incorrectly.
>>
>> The root cause appears to be that `LineBreakMeasurer` eventually calls into `StandardGlyphVector.getGlyphInfo()`, which derives the glyph advances from the glyph IDs. However, HarfBuzz's default treatment of zero-width characters is to provide the glyph ID of the space character (`U+0020`) combined with an artificial zero advance (not the font's space glyph advance). Unaware of HarfBuzz's sleight of hand, `StandardGlyphVector.getGlyphInfo()` retrieves the actual advances of the space glyph (since that was the glyph ID returned) and provides these back up the call chain to `LineBreakMeasurer` et al.
>>
>> I think the correct fix is to use `hb_buffer_set_invisible_glyph` to register `0xFFFF` as the invisible glyph ID with HarfBuzz (matching `CharToGlyphMapper.INVISIBLE_GLYPH_ID`).
>>
>> I haven't seen any unwanted side effects, but there is a risk, since this is changing the global HarfBuzz configuration.
>>
>> For more information on HarfBuzz's behavior in this area, see: https://harfbuzz.github.io/setting-buffer-properties.html
>
> Early days but the test fails on macOS
> Exception in thread "main" java.lang.RuntimeException: nextOffset 1 for char 00ad using font Dialog: 2 != 1
> at FormatCharAdvanceTest.assertEqual(FormatCharAdvanceTest.java:289)
> at FormatCharAdvanceTest.testChar(FormatCharAdvanceTest.java:282)
> at FormatCharAdvanceTest.testChars(FormatCharAdvanceTest.java:165)
> at FormatCharAdvanceTest.main(FormatCharAdvanceTest.java:154)
@prrace Two findings here:
First, it looks like macOS needs an extra pixel of wiggle room in the max string width that we measure; I've given it two pixels, just to be extra sure that the test is stable.
Second, the combination of (macOS Dialog font + chars U+200F or U+2067) has HarfBuzz removing the zero-width chars instead of replacing them with the invisible glyph. I think it has something to do with the font tables in that specific macOS font. It looks like in this scenario `ExtendedTextSourceLabel.getLineBreakIndex(int, float)` was communicating an early line break to the caller, rather than assuming that the shaper omitted or combined glyphs.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/23603#issuecomment-2660227145
More information about the client-libs-dev
mailing list