From djgredler at gmail.com  Mon Aug  9 04:23:00 2021
From: djgredler at gmail.com (Daniel Gredler)
Date: Mon, 9 Aug 2021 00:23:00 -0400
Subject: JDK-8270265: LineBreakMeasurer calculates incorrect line breaks with
 zero-width characters
Message-ID: <CAPA+ug5OG-gX1Wh8g3J6CP8UjzpOXCC-yzmt4G1jb1z-B1YQvg@mail.gmail.com>

Hi all,

I've taken the OpenJDK plunge and have started to investigate JDK-8270265 (
https://bugs.openjdk.java.net/browse/JDK-8270265). However, I'm very new to
the codebase, so I'm looking for some advice and direction. What I've found
so far:

Strings containing zero-width non-joiner (ZWNJ, U+200C) characters draw
correctly to a Graphics2D -- that is, the ZWNJ chars do not draw at all,
even if the font being used contains a glyph for the ZWNJ character
(Tahoma, for example contains glyph 744 for this character, with
advanceWidth=0 in the hmtx table). Presumably this is handled by HarfBuzz
via Java_sun_font_SunLayoutEngine_shape (in HBShaper.c).

However, when the same strings are broken into lines with
LineBreakMeasurer, the ZWNJ chars are actually presumed to have non-zero
advances. As a result, less text is allocated to each line than is actually
possible to display, since the LineBreakMeasurer mistakenly thinks that the
ZWNJ characters need space to be rendered. The root cause seems to be that
the StandardGlyphVector created internally for the LineBreakMeasurer is
initialized in such a way that glyph IDs are coming from HarfBuzz, but
HarfBuzz is providing the glyph ID for the space character (U+0020, glyph
ID 3 in Tahoma) instead of the glyph ID for the ZWNJ character (glyph ID
744 in Tahoma). This means that later when we look up the glyph metrics (to
retrieve the glyph advance), we are actually getting the space (U+0020)
glyph metrics (hence the non-zero advance).

I'm not very familiar with HarfBuzz, but it sounds like this U+0020
substitution is something that is done for "invisible glyphs" (
https://harfbuzz.github.io/setting-buffer-properties.html,
https://harfbuzz.github.io/harfbuzz-hb-buffer.html#hb-buffer-set-invisible-glyph).
These "invisible glyphs" are identified by
_hb_glyph_info_is_default_ignorable (
https://github.com/harfbuzz/harfbuzz/blob/3d48bfc18731e3c2187a5b0666a7e94dcab0150b/src/hb-ot-layout.hh#L320)
and seem to be the "Default_Ignorable_Code_Point" code points (
https://unicode.org/reports/tr44/#Default_Ignorable_Code_Point). When this
substitution is performed, not only is the glyph replaced, but the advances
for that glyph instance are also zeroed out (
https://github.com/harfbuzz/harfbuzz/blob/368e9578873798e2d17ed78a0474dec7d4e9d6c0/src/hb-ot-shape.cc#L829
).

Long story short, the glyph IDs returned by HarfBuzz are not always to be
trusted, especially if we want to later use them as a basis for looking up
glyph metrics. The code (and console output) below illustrates the issue by
creating two (Standard)GlyphVectors in two slightly different ways. The
first GV does not get the glyph IDs from HarfBuzz, so is completely
correct. The second GV does get the glyph IDs from HarfBuzz, so while the
glyph positions match the first GV, the glyph metrics are incorrect.

Some options:
 1. Continue to use the glyph IDs provided by HarfBuzz, but massage them
afterwards:
     a. Look for space glyphs, check if they were actually space chars or
not, or
     b. Look for space glyphs, check if they contributed zero advance, or
     c. Look for Default_Ignorable chars (note HarfBuzz code contains a
comment "we have a modified Default_Ignorable"...), or
     d. Use hb_buffer_set_invisible_glyph to explicitly communicate
replaced glyphs back to the Java code
 2. Stop using HarfBuzz-provided glyph IDs completely, and use the
CharToGlyphMapper used by the (correct) Font.createGlyphVector(...) code
path
 3. Configure HarfBuzz to provide the untransformed glyph IDs (not sure if
it's possible, while still preventing the glyphs from displaying)
 4. Use the SGV.positions array (which is always correct) to calculate
advances... this might fix the LineBreakMeasurer use case, but SGV would
remain broken
 5. Using the HB_BUFFER_FLAG_REMOVE_DEFAULT_IGNORABLES might be an option,
though it would still result in SGVs with slightly different glyph ID
arrays, depending on how the SGV is created
 6. Something else?

Please let me know what you think. Does the analysis above have any gaps?
What should a fix look like? Happy to answer any questions, research any
gaps, or take a stab at a solution that seems promising to the group.
Option 5 seems most promising to me, assuming the removal does not prevent
behavior triggered by the removed character (i.e. ZWNJ still needs to
prevent ligatures even if it is removed), and assuming we are OK with
createGlyphVector() and layoutGlyphVector() returning slightly different
GVs (but at least internally consistent, and externally consistent from a
visual perspective).

Take care,

Daniel

---

public static void main(String... args) throws Exception {

    String s = "a\u200Cb\u200Cc";
    FontRenderContext frc = new FontRenderContext(new AffineTransform(),
true, true);
    Font tahoma = Font.createFont(Font.TRUETYPE_FONT, new
File("C:/Windows/Fonts/tahoma.ttf")).deriveFont(50f);
    GlyphVector gv1 = tahoma.createGlyphVector(frc, s);
    log(">>> font.createGlyphVector (GOOD)", gv1);

    // layoutGlyphVector() calls the same methods used internally by
LineBreakMeasurer -> TextMeasurer -> ExtendedTextSourceLabel
    GlyphVector gv2 = tahoma.layoutGlyphVector(frc, s.toCharArray(), 0, 5,
0);
    log(">>> font.layoutGlyphVector (BAD)", gv2);
}

private static void log(String name, GlyphVector gv) {
    System.out.println(name);
    int glyphs = gv.getNumGlyphs();
    float[] positions = gv.getGlyphPositions(0, glyphs, null);
    System.out.println("positions: " + Arrays.toString(positions));
    int[] gids = gv.getGlyphCodes(0, glyphs, null);
    System.out.println("glyph IDs: " + Arrays.toString(gids));
    float[] advances = new float[glyphs];
    for (int i = 0; i < glyphs; i++) {
        advances[i] = gv.getGlyphMetrics(i).getAdvanceX();
    }
    System.out.println("advances: " + Arrays.toString(advances));
    System.out.println();
}

>>> font.createGlyphVector (GOOD)
positions: [0.0, 0.0, 26.245117, 0.0, 26.245117, 0.0, 53.881836, 0.0,
53.881836, 0.0]
glyph IDs: [68, 744, 69, 744, 70]
advances: [26.245117, 0.0, 27.636719, 0.0, 23.07129]

>>> font.layoutGlyphVector (BAD)
positions: [0.0, 0.0, 26.245117, 0.0, 26.245117, 0.0, 53.881836, 0.0,
53.881836, 0.0]
glyph IDs: [68, 3, 69, 3, 70]
advances: [26.245117, 15.625, 27.636719, 15.625, 23.07129]


From dmitry.batrak at jetbrains.com  Tue Aug 17 11:18:30 2021
From: dmitry.batrak at jetbrains.com (Dmitry Batrak)
Date: Tue, 17 Aug 2021 14:18:30 +0300
Subject: Merging event dispatch and toolkit threads
Message-ID: <CAET5FPvrKimt+ynGHuYCY3xNm=Yg4Tvke7Nz3sJoRhizLNL2WQ@mail.gmail.com>

Hello,

Now that applets are deprecated for removal, what do you think about moving
towards the unification of event dispatch and toolkit threads in AWT? In my
understanding, th? separation between them was meaningful only for the
applet
use case, and now it becomes a pure nuisance. E.g. accessibility support
implementation on macOS requires 'invokeAndWait' calls between two threads
in
both directions, and achieving this without deadlocks is quite tricky in
current
implementation. Also, windowing APIs on macOS and Windows are mostly
synchronous, and using them in a synchronous way (without passing execution
between threads) could potentially simplify focus subsystem implementation.
Benefits for X11/Wayland systems would be less significant, due to the
fundamentally asynchronous native APIs, but, even in those cases, having
both
outgoing requests sending and incoming events processing on the same thread
should make AWT code more reliable, as we'll have one less place for
potential
races.

Overall, having only one thread to deal with native windowing APIs, should
make
supporting and extending AWT simpler, and closer to native application
development. I think, some synchronization constructs would still need to be
present - to guard against the invocation of AWT methods from background
threads. But, at least for 'well-behaving' apps, the reasoning about
execution order
would be much easier.

Of course, for existing toolkit implementations (macOS/Windows/X11) the
transitioning will take some time and effort, but for the Wayland
implementation, to
be developed withing Wakefield project, the idea could be implemented from
start.
AFAIU, with Wayland client API, we'll need to dispatch the incoming queue of
events explicitly anyway, and this dispatching can quite as well be done on
EDT.

So, what do you think about the idea in general, and about starting to
realize it in
Wayland toolkit implementation?

Best regards,
Dmitry Batrak