8054203: add regression tests for JDK vs ICU layout

Fri Aug 15 20:33:07 UTC 2014

On 08/15/2014 11:41 AM, Doug Felt wrote:
> Thanks for getting the ball rolling, Steven!
Welcome!

> I'm not sure what the format is for code reviews, so I'll just send
> this email with general comments and maybe someone can tell me the
> right way to do it.  These comments are a bit more high level and most
> don't focus on this code in particular, anyway.
>
> 1) it would be nice if we used some more structured web-based tool to
> review the code. But I don't know how hard it is to set one up.
There's always rietveld :)
I was just trying to follow the openjdk playbook.

> 2) The tool captures the glyph vectors from TextLayout, but expects
> only one. If the goal is to test TextLayout, then it probably needs to
> handle any text and any TextLayout output.  If the goal is to test
> HarfBuzz, then constructing a GlyphVector (via layoutGlyphVector) is
> more direct.
OK, good point.  At this point, I am trying to test what Java sees, so
that we can verify that the JDK behavior won't change. For now I'm
assuming what's underneath TextLayout will be what is tested.

> 3) I suggest we decide up front to go with HarfBuzz's native
> glyph/ordering output.  ICU uses filler glyphs and tries to maintain a
> close relationship between each glyph and the original character(s) it
> corresponds to. In practice, this close relationship is not needed or
> used, and Harfbuzz does not provide it.  Instead, people are most
> interested in 'grapheme clusters' which are groups of glyphs that 1)
> might be positioned along a path as a group, and 2) might have
> tracking space added/subtracted between them (some folks do this
> manually, though it would be better to do it through styles).
>  Harfbuzz provides this information more directly.
>
> This has consequences for regression/conformance tests that expect to
> match the glyph output and glyph to char index output.  Basically,
> they can't do it.  Even with the iculehb modifications that introduce
> filler glyphs to convert HarfBuzz's output to an approximation of
> ICU's, the numbering and positioning of the filler glyphs differs from
> ICU's.  So the tests still fail.
>
> Rather than try to change HarfBuzz to adopt ICU's output, I think we
> should prefer HarfBuzz's output and break exact compatibility w.r.t
> filler glyphs and glyph-to-char mapping.
OK.. that seems reasonable, however, I was trying (at a first pass) to
use the ICU compatibility library.  If as you said the conformance test
wouldn't be meaningful (besides validating this email), this particular
test might not be very helpful long term.

Short term, though, it seems like it could be helpful.

> 4) Harfbuzz uses FreeType to get kerning values, while ICU uses
> kerning values directly from the kerning table in the font.  Freetype
> applies heuristics to adjust the kerning values for smaller point
> sizes (like, under 25 pt), and rounds the scaled kerning values to
> design units (I think, might be an option). This means ICU and
> HarfBuzz kern differently, and this changes the advances. This makes
> it difficult to use images as a regression tool.
> I think it will be difficult to get full fidelity to the glyph
> positions. I expect, since most clients (on Linux) use FreeType
> kerning values directly, that we might be better off just going with
> FreeType's kerning values. But we probably want to see what other
> platforms do.
Yes, and as you of all people should know :), one of the places where
ICU/ICUJDK diverges is in the kerning table management. 

Perhaps it is a good case for turning kerning *off* for some types of
tests, and using it with a lot of fuzzing when it is on?

> 5) HarfBuzz does its computations in integer device units, with
> rounding to 16.16 or 24.8 or 26.6 values (though iculehb does some in
> floating point). ICU makes more use of native float units.  I've not
> been able to track down what exactly happens, but it does seem that
> advances might differ between ICU and HB even if kerning is not
> applied. The main place I've seen suggestions of this is with scaling
> based on common fractions (e.g. 1/10, etc.), native float units can
> represent common fractions much better than fixed point power-of-two
> units can, and small differences can accumulate over the course of a
> line of text. Occasionally this trips over a pixel and glyph images
> change.
>
> So I guess I think we need to first figure out what degree of
> compatibility is achievable, and what we want, and then design our
> regression/metrics tests around that.

OK. 

Maybe I should rephrase this particular ticket - it is for very basic
compatibility, to first verify if  embedded-ICU vs external-ICU is
compatible, and then secondly to compare embedded-ICU with
external-ICU-really-HarfBuzz.

Thanks for the feedback!   I think we should start capturing what you've
mentioned somewhere as ICU vs HarfBuzz behavior.  It's probably of
general interest beyond the JDK usage as well.

>
>
>
> On Thu, Aug 14, 2014 at 6:36 PM, Steven R. Loomis
> <steven.loomis at oracle.com <mailto:steven.loomis at oracle.com>> wrote:
>
>     I have posted some code for review here:
>
>         http://cr.openjdk.java.net/~srl/8054203/webrev.00/
>     <http://cr.openjdk.java.net/%7Esrl/8054203/webrev.00/>
>
>     (testing out the process)
>     Steven
>
>     On 08/01/2014 10:38 PM, Steven R. Loomis wrote:
>     > https://bugs.openjdk.java.net/browse/JDK-8054203
>     >
>     > (Phil, others - I didn't see a HarfBuzz component, so I hope this is
>     > right- I created a label "harfbuzz")
>     >   subcomponent 2d,  label harfbuzz
>     >
>     > Anyways, I have code for this already that I used when doing
>     other fixes
>     > to the layout code.
>     > I'd like to take this one.
>     >
>     > Note some interesting things:
>     >
>     > * the generator for the data lives in ICU right now.  TBD
>     document how
>     > to create it
>     >
>     > * It's font dependent.  We should use create it not just against
>     > "interesting" fonts that devs won't have access to, but also against
>     > fonts that either ship with JDK and/or are easily available
>     (Google noto
>     > come to mind).
>     >
>     >
>     >
>
>
>
>
> -- 
>
> Doug Felt | 	 Software Engineer | 	 dougfelt at google.com
> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=dougfelt@google.com> |
> 	 1-650-253-2089
>
>