From djgredler at gmail.com Mon Aug 9 04:23:00 2021 From: djgredler at gmail.com (Daniel Gredler) Date: Mon, 9 Aug 2021 00:23:00 -0400 Subject: JDK-8270265: LineBreakMeasurer calculates incorrect line breaks with zero-width characters Message-ID: Hi all, I've taken the OpenJDK plunge and have started to investigate JDK-8270265 ( https://bugs.openjdk.java.net/browse/JDK-8270265). However, I'm very new to the codebase, so I'm looking for some advice and direction. What I've found so far: Strings containing zero-width non-joiner (ZWNJ, U+200C) characters draw correctly to a Graphics2D -- that is, the ZWNJ chars do not draw at all, even if the font being used contains a glyph for the ZWNJ character (Tahoma, for example contains glyph 744 for this character, with advanceWidth=0 in the hmtx table). Presumably this is handled by HarfBuzz via Java_sun_font_SunLayoutEngine_shape (in HBShaper.c). However, when the same strings are broken into lines with LineBreakMeasurer, the ZWNJ chars are actually presumed to have non-zero advances. As a result, less text is allocated to each line than is actually possible to display, since the LineBreakMeasurer mistakenly thinks that the ZWNJ characters need space to be rendered. The root cause seems to be that the StandardGlyphVector created internally for the LineBreakMeasurer is initialized in such a way that glyph IDs are coming from HarfBuzz, but HarfBuzz is providing the glyph ID for the space character (U+0020, glyph ID 3 in Tahoma) instead of the glyph ID for the ZWNJ character (glyph ID 744 in Tahoma). This means that later when we look up the glyph metrics (to retrieve the glyph advance), we are actually getting the space (U+0020) glyph metrics (hence the non-zero advance). I'm not very familiar with HarfBuzz, but it sounds like this U+0020 substitution is something that is done for "invisible glyphs" ( https://harfbuzz.github.io/setting-buffer-properties.html, https://harfbuzz.github.io/harfbuzz-hb-buffer.html#hb-buffer-set-invisible-glyph). These "invisible glyphs" are identified by _hb_glyph_info_is_default_ignorable ( https://github.com/harfbuzz/harfbuzz/blob/3d48bfc18731e3c2187a5b0666a7e94dcab0150b/src/hb-ot-layout.hh#L320) and seem to be the "Default_Ignorable_Code_Point" code points ( https://unicode.org/reports/tr44/#Default_Ignorable_Code_Point). When this substitution is performed, not only is the glyph replaced, but the advances for that glyph instance are also zeroed out ( https://github.com/harfbuzz/harfbuzz/blob/368e9578873798e2d17ed78a0474dec7d4e9d6c0/src/hb-ot-shape.cc#L829 ). Long story short, the glyph IDs returned by HarfBuzz are not always to be trusted, especially if we want to later use them as a basis for looking up glyph metrics. The code (and console output) below illustrates the issue by creating two (Standard)GlyphVectors in two slightly different ways. The first GV does not get the glyph IDs from HarfBuzz, so is completely correct. The second GV does get the glyph IDs from HarfBuzz, so while the glyph positions match the first GV, the glyph metrics are incorrect. Some options: 1. Continue to use the glyph IDs provided by HarfBuzz, but massage them afterwards: a. Look for space glyphs, check if they were actually space chars or not, or b. Look for space glyphs, check if they contributed zero advance, or c. Look for Default_Ignorable chars (note HarfBuzz code contains a comment "we have a modified Default_Ignorable"...), or d. Use hb_buffer_set_invisible_glyph to explicitly communicate replaced glyphs back to the Java code 2. Stop using HarfBuzz-provided glyph IDs completely, and use the CharToGlyphMapper used by the (correct) Font.createGlyphVector(...) code path 3. Configure HarfBuzz to provide the untransformed glyph IDs (not sure if it's possible, while still preventing the glyphs from displaying) 4. Use the SGV.positions array (which is always correct) to calculate advances... this might fix the LineBreakMeasurer use case, but SGV would remain broken 5. Using the HB_BUFFER_FLAG_REMOVE_DEFAULT_IGNORABLES might be an option, though it would still result in SGVs with slightly different glyph ID arrays, depending on how the SGV is created 6. Something else? Please let me know what you think. Does the analysis above have any gaps? What should a fix look like? Happy to answer any questions, research any gaps, or take a stab at a solution that seems promising to the group. Option 5 seems most promising to me, assuming the removal does not prevent behavior triggered by the removed character (i.e. ZWNJ still needs to prevent ligatures even if it is removed), and assuming we are OK with createGlyphVector() and layoutGlyphVector() returning slightly different GVs (but at least internally consistent, and externally consistent from a visual perspective). Take care, Daniel --- public static void main(String... args) throws Exception { String s = "a\u200Cb\u200Cc"; FontRenderContext frc = new FontRenderContext(new AffineTransform(), true, true); Font tahoma = Font.createFont(Font.TRUETYPE_FONT, new File("C:/Windows/Fonts/tahoma.ttf")).deriveFont(50f); GlyphVector gv1 = tahoma.createGlyphVector(frc, s); log(">>> font.createGlyphVector (GOOD)", gv1); // layoutGlyphVector() calls the same methods used internally by LineBreakMeasurer -> TextMeasurer -> ExtendedTextSourceLabel GlyphVector gv2 = tahoma.layoutGlyphVector(frc, s.toCharArray(), 0, 5, 0); log(">>> font.layoutGlyphVector (BAD)", gv2); } private static void log(String name, GlyphVector gv) { System.out.println(name); int glyphs = gv.getNumGlyphs(); float[] positions = gv.getGlyphPositions(0, glyphs, null); System.out.println("positions: " + Arrays.toString(positions)); int[] gids = gv.getGlyphCodes(0, glyphs, null); System.out.println("glyph IDs: " + Arrays.toString(gids)); float[] advances = new float[glyphs]; for (int i = 0; i < glyphs; i++) { advances[i] = gv.getGlyphMetrics(i).getAdvanceX(); } System.out.println("advances: " + Arrays.toString(advances)); System.out.println(); } >>> font.createGlyphVector (GOOD) positions: [0.0, 0.0, 26.245117, 0.0, 26.245117, 0.0, 53.881836, 0.0, 53.881836, 0.0] glyph IDs: [68, 744, 69, 744, 70] advances: [26.245117, 0.0, 27.636719, 0.0, 23.07129] >>> font.layoutGlyphVector (BAD) positions: [0.0, 0.0, 26.245117, 0.0, 26.245117, 0.0, 53.881836, 0.0, 53.881836, 0.0] glyph IDs: [68, 3, 69, 3, 70] advances: [26.245117, 15.625, 27.636719, 15.625, 23.07129] From dmitry.batrak at jetbrains.com Tue Aug 17 11:18:30 2021 From: dmitry.batrak at jetbrains.com (Dmitry Batrak) Date: Tue, 17 Aug 2021 14:18:30 +0300 Subject: Merging event dispatch and toolkit threads Message-ID: Hello, Now that applets are deprecated for removal, what do you think about moving towards the unification of event dispatch and toolkit threads in AWT? In my understanding, th? separation between them was meaningful only for the applet use case, and now it becomes a pure nuisance. E.g. accessibility support implementation on macOS requires 'invokeAndWait' calls between two threads in both directions, and achieving this without deadlocks is quite tricky in current implementation. Also, windowing APIs on macOS and Windows are mostly synchronous, and using them in a synchronous way (without passing execution between threads) could potentially simplify focus subsystem implementation. Benefits for X11/Wayland systems would be less significant, due to the fundamentally asynchronous native APIs, but, even in those cases, having both outgoing requests sending and incoming events processing on the same thread should make AWT code more reliable, as we'll have one less place for potential races. Overall, having only one thread to deal with native windowing APIs, should make supporting and extending AWT simpler, and closer to native application development. I think, some synchronization constructs would still need to be present - to guard against the invocation of AWT methods from background threads. But, at least for 'well-behaving' apps, the reasoning about execution order would be much easier. Of course, for existing toolkit implementations (macOS/Windows/X11) the transitioning will take some time and effort, but for the Wayland implementation, to be developed withing Wakefield project, the idea could be implemented from start. AFAIU, with Wayland client API, we'll need to dispatch the incoming queue of events explicitly anyway, and this dispatching can quite as well be done on EDT. So, what do you think about the idea in general, and about starting to realize it in Wayland toolkit implementation? Best regards, Dmitry Batrak