RFR: 8353230: Emoji rendering regression after JDK-8208377

Wed May 28 11:21:53 UTC 2025

On Thu, 1 May 2025 21:55:00 GMT, Nikita Gubarkov <ngubarkov at openjdk.org> wrote:

>> It looks like this regression actually fits into a longer series of fixes / regressions in this area:
>> 
>> - [JDK-4517298](https://bugs.openjdk.org/browse/JDK-4517298) fixed metrics for zero-width characters, but broke some ligatures / glyph substitutions
>> - [JDK-7017058](https://bugs.openjdk.org/browse/JDK-7017058) fixed the ligatures / glyph substitutions, but broke some zero-width metrics
>> - [JDK-8208377](https://bugs.openjdk.org/browse/JDK-8208377) fixed some metrics and rendering for zero-width characters, but broke some ligatures / glyph substitutions
>> - Now, with this PR, we aim to fix the ligatures without re-breaking zero-width metrics and display
>> 
>> We have two different types of use cases pulling `CharToGlyphMapper` in two different directions: the users who need raw, untransformed glyph info, and the users who need normalized / transformed glyph info.
>> 
>> It looks to me like, in the current code base, the only `CharToGlyphMapper` user which requires raw font data is HarfBuzz (explicitly confirmed with the HarfBuzz team here: https://github.com/harfbuzz/harfbuzz/discussions/5234).
>> 
>> The regression mechanism at play here is that the HarfBuzz font callbacks are currently providing HarfBuzz with transformed glyph info (e.g. ZWJ -> INVISIBLE_GLYPH_ID), which prevents HarfBuzz from recognizing and applying the correct font GSUB substitutions (which involve ZWJ).
>> 
>> In order to fix this without (yet again) breaking metrics and display behavior elsewhere, I've added two methods to `CharToGlyphMapper` which provide access to raw glyph info, to be used by the HarfBuzz font callbacks: `charToGlyphRaw(int)` and `charToVariationGlyphRaw(int)`.
>> 
>> Note two intricacies related to `CompositeGlyphMapper`:
>> 1. We need to be careful to only cache raw (untransformed) values, to avoid conflicts between requests for a raw version of a glyph and a transformed version of the same glyph. Another option would have been two separate caches, but I don't think that's necessary.
>> 2. Consumers who are using `CompositeGlyphMapper.SLOTMASK` to check glyph slots (e.g. `FontRunIterator` and `CTextPipe`) will "see" invisible glyphs as having come from slot 0. This isn't new, and I think it's OK, but something to be aware of.
>> 
>> The glyph cache handling in `CCharToGlyphMapper` (for macOS) also requires care to avoid mixing value types.
>> 
>> Please also note that I'm not sure if the tweak to `sunFont.c` is being tested, since FFM is being used by default for Harf...
>
> I was talking about the explosion because there is a scenario in my mind, which I didn't make clear for everybody else. There is a change which I didn't have time to contribute, but would like to: it's related to composite fonts and variation selectors. We may need 2 variants for retrieving a glyph with a variation selector - one strictly matching a variation selector and another with a fallback to the base glyph, multiplied by raw/transformed versions, which adds 2 more methods. Not like it's a big problem, but given that they all end up calling a single method anyway... You get the point.
> 
>> there may be other scenarios in the future, e.g. \r, \n and \t which currently are handled elsewhere).
> 
> Are those scenarios specific to a patricular mapper/font type? I was thinking that those transformations are generic.
> 
>> Any ideas for what this refactoring might look like?
> 
> I was thinking about moving this default-ignorable or any potential generic transformation into base `CharToGlyphMapper` or even `Font2D`. For example, make default implementation of `CharToGlyphMapper.charToGlyph` check ignorable characters and then call `charToGlyphRaw` - then other implementations would only need to override `charToGlyphRaw`.

@YaaZ I had a look at your suggestion to push up the `raw` checks into the `CharToGlyphMapper` superclass and the complicating factor are the three `charsToGlyphs` methods which take arrays. These are handled differently in each subclass, and even if we were able to refactor all subclass implementations to ensure that they all call into the public single-char superclass methods (where we would add the `raw` checks), it would be relatively brittle (not obvious to future maintainers that the call to the superclass method cannot be removed). The current approach does require each subclass to handle the `raw` boolean, but it also allows each subclass to internally do that in a unified, simplified way. Please feel free to prototype something on your end -- I'm open to helping with future improvements in this area, if feasible.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24412#issuecomment-2915937125