JDK-8270265: LineBreakMeasurer calculates incorrect line breaks with zero-width characters

Daniel Gredler djgredler at gmail.com
Mon Aug 9 04:23:00 UTC 2021


Hi all,

I've taken the OpenJDK plunge and have started to investigate JDK-8270265 (
https://bugs.openjdk.java.net/browse/JDK-8270265). However, I'm very new to
the codebase, so I'm looking for some advice and direction. What I've found
so far:

Strings containing zero-width non-joiner (ZWNJ, U+200C) characters draw
correctly to a Graphics2D -- that is, the ZWNJ chars do not draw at all,
even if the font being used contains a glyph for the ZWNJ character
(Tahoma, for example contains glyph 744 for this character, with
advanceWidth=0 in the hmtx table). Presumably this is handled by HarfBuzz
via Java_sun_font_SunLayoutEngine_shape (in HBShaper.c).

However, when the same strings are broken into lines with
LineBreakMeasurer, the ZWNJ chars are actually presumed to have non-zero
advances. As a result, less text is allocated to each line than is actually
possible to display, since the LineBreakMeasurer mistakenly thinks that the
ZWNJ characters need space to be rendered. The root cause seems to be that
the StandardGlyphVector created internally for the LineBreakMeasurer is
initialized in such a way that glyph IDs are coming from HarfBuzz, but
HarfBuzz is providing the glyph ID for the space character (U+0020, glyph
ID 3 in Tahoma) instead of the glyph ID for the ZWNJ character (glyph ID
744 in Tahoma). This means that later when we look up the glyph metrics (to
retrieve the glyph advance), we are actually getting the space (U+0020)
glyph metrics (hence the non-zero advance).

I'm not very familiar with HarfBuzz, but it sounds like this U+0020
substitution is something that is done for "invisible glyphs" (
https://harfbuzz.github.io/setting-buffer-properties.html,
https://harfbuzz.github.io/harfbuzz-hb-buffer.html#hb-buffer-set-invisible-glyph).
These "invisible glyphs" are identified by
_hb_glyph_info_is_default_ignorable (
https://github.com/harfbuzz/harfbuzz/blob/3d48bfc18731e3c2187a5b0666a7e94dcab0150b/src/hb-ot-layout.hh#L320)
and seem to be the "Default_Ignorable_Code_Point" code points (
https://unicode.org/reports/tr44/#Default_Ignorable_Code_Point). When this
substitution is performed, not only is the glyph replaced, but the advances
for that glyph instance are also zeroed out (
https://github.com/harfbuzz/harfbuzz/blob/368e9578873798e2d17ed78a0474dec7d4e9d6c0/src/hb-ot-shape.cc#L829
).

Long story short, the glyph IDs returned by HarfBuzz are not always to be
trusted, especially if we want to later use them as a basis for looking up
glyph metrics. The code (and console output) below illustrates the issue by
creating two (Standard)GlyphVectors in two slightly different ways. The
first GV does not get the glyph IDs from HarfBuzz, so is completely
correct. The second GV does get the glyph IDs from HarfBuzz, so while the
glyph positions match the first GV, the glyph metrics are incorrect.

Some options:
 1. Continue to use the glyph IDs provided by HarfBuzz, but massage them
afterwards:
     a. Look for space glyphs, check if they were actually space chars or
not, or
     b. Look for space glyphs, check if they contributed zero advance, or
     c. Look for Default_Ignorable chars (note HarfBuzz code contains a
comment "we have a modified Default_Ignorable"...), or
     d. Use hb_buffer_set_invisible_glyph to explicitly communicate
replaced glyphs back to the Java code
 2. Stop using HarfBuzz-provided glyph IDs completely, and use the
CharToGlyphMapper used by the (correct) Font.createGlyphVector(...) code
path
 3. Configure HarfBuzz to provide the untransformed glyph IDs (not sure if
it's possible, while still preventing the glyphs from displaying)
 4. Use the SGV.positions array (which is always correct) to calculate
advances... this might fix the LineBreakMeasurer use case, but SGV would
remain broken
 5. Using the HB_BUFFER_FLAG_REMOVE_DEFAULT_IGNORABLES might be an option,
though it would still result in SGVs with slightly different glyph ID
arrays, depending on how the SGV is created
 6. Something else?

Please let me know what you think. Does the analysis above have any gaps?
What should a fix look like? Happy to answer any questions, research any
gaps, or take a stab at a solution that seems promising to the group.
Option 5 seems most promising to me, assuming the removal does not prevent
behavior triggered by the removed character (i.e. ZWNJ still needs to
prevent ligatures even if it is removed), and assuming we are OK with
createGlyphVector() and layoutGlyphVector() returning slightly different
GVs (but at least internally consistent, and externally consistent from a
visual perspective).

Take care,

Daniel

---

public static void main(String... args) throws Exception {

    String s = "a\u200Cb\u200Cc";
    FontRenderContext frc = new FontRenderContext(new AffineTransform(),
true, true);
    Font tahoma = Font.createFont(Font.TRUETYPE_FONT, new
File("C:/Windows/Fonts/tahoma.ttf")).deriveFont(50f);
    GlyphVector gv1 = tahoma.createGlyphVector(frc, s);
    log(">>> font.createGlyphVector (GOOD)", gv1);

    // layoutGlyphVector() calls the same methods used internally by
LineBreakMeasurer -> TextMeasurer -> ExtendedTextSourceLabel
    GlyphVector gv2 = tahoma.layoutGlyphVector(frc, s.toCharArray(), 0, 5,
0);
    log(">>> font.layoutGlyphVector (BAD)", gv2);
}

private static void log(String name, GlyphVector gv) {
    System.out.println(name);
    int glyphs = gv.getNumGlyphs();
    float[] positions = gv.getGlyphPositions(0, glyphs, null);
    System.out.println("positions: " + Arrays.toString(positions));
    int[] gids = gv.getGlyphCodes(0, glyphs, null);
    System.out.println("glyph IDs: " + Arrays.toString(gids));
    float[] advances = new float[glyphs];
    for (int i = 0; i < glyphs; i++) {
        advances[i] = gv.getGlyphMetrics(i).getAdvanceX();
    }
    System.out.println("advances: " + Arrays.toString(advances));
    System.out.println();
}

>>> font.createGlyphVector (GOOD)
positions: [0.0, 0.0, 26.245117, 0.0, 26.245117, 0.0, 53.881836, 0.0,
53.881836, 0.0]
glyph IDs: [68, 744, 69, 744, 70]
advances: [26.245117, 0.0, 27.636719, 0.0, 23.07129]

>>> font.layoutGlyphVector (BAD)
positions: [0.0, 0.0, 26.245117, 0.0, 26.245117, 0.0, 53.881836, 0.0,
53.881836, 0.0]
glyph IDs: [68, 3, 69, 3, 70]
advances: [26.245117, 15.625, 27.636719, 15.625, 23.07129]



More information about the client-libs-dev mailing list