[OpenJDK 2D-Dev] Kerning and Ligatures using Layout Engine

Phil Race Phil.Race at Sun.COM
Tue Jun 24 23:50:54 UTC 2008


I can take a stab at this but it probably needs the ICU developer to confirm
what's happening there.

Keith Stribley wrote:
> I am interested in getting clig,liga,mark,mkmk,kern OpenType tables to

Those are not tables. Those are features in the OpenType GSUB and GPOS tables.

> be processed by the OpenJDK layout engine for the Myanmar code block.
> Currently Unicode 5.1 Myanmar fonts cannot be used with Java AWT/Swing.
> 
> I noticed that the layout  engine code in OpenJDK is essentially an old
> version of the ICU layout engine and ICU is capable of rendering Myanmar
> Unicode 5.1 compliant fonts such as Myanmar3 and Padauk correctly.

FYI: it was "current" at the point in JDK 6 development when it was integrated.
JDK 7 will get an updated version in due course.

> 
> The first step was to make sun.font.FontManager.isComplexCharCode()
> return true for the Myanmar range. However, I then needed to modify the
> sun.font.GlyphLayout.EngineRecord. This has an eflags fields which is
> passed to ICU.
> I'm not quite sure why 0x4 is used as the value when there are marks, I
> believe it corresponds to "no canonical processing", though I don't know
> why that is needed. 

I think you have this backwards. 0x4 means do canonical processing
and its there for performance. ie if its not set then we can skip
a lot of work. I don't recall (at all) how much that was but I
suspect it was significant.

> More seriously, this does not trigger ICU kerning or
> ligatures.
> this.eflags needs to be set to 0x3 for this. 1=kerning, 2=ligatures (see
> http://www.icu-project.org/apiref/icu4c/classLayoutEngine.html#cee4ea27f3211be215ea9b9bd3a91c32)
> 

No, I believe that comes from _typo_flags.

> My question is therefore, why aren't kerning and ligatures turned on, at
> least for complex scripts. I've noticed that with Latin text that if you
> set TextAttribute.KERNING and TextAttribute.LIGATURES ligatures work for
> non-complex text e.g. ffi with DoulosSIL, but if you have a mark in the
> text, ligatures stop working, though the mark attaches correctly. I
> would therefore have thought that there is little to be lost from using
> eflags = 0x3 in all the cases where eflags is set. I guess there might
> be a slight speed drop, but is it still significant these days? Is there
> a specific reason why kerning and ligatures haven't been enabled in ICU
> when used in the JDK? Does it have some unexpected side affect?

I think the basic reason is compatibility of text advance.
Text that is rendered through drawString() and text that is rendered
via TextLayout() should be the same.
So optional ligatures and kerning need to be requested by those
who know they want them.

You might then ask but why not at least do this for complex
scripts where text has to go through layout and mandatory ligatures
are performed. I would have to dig to be sure what actually happens
in ICU, but one scenario is mixed script text. Eg some latin followed
by some complex script. If the optional ligatures were performed by
layout and you are in say a text editor and delete the complex
text leaving only the latin text it would look odd if the optional
ligatures no longer formed and if kerning stopped being applied.

However if you are pointing out that even when specifying
TextAttribute.KERNING and TextAttribute.LIGATURES that they do not
get applied, then that would seem like a bug. But my reading of
the code is that that the request for kerning and ligatures is
not held in "eflags" but in "_typo_flags" and the value
passed down to layout is  "_typo_flags | eflags"

As far as I can see your patch is equivalent to always
adding the TextAttribute.KERNING and TextAttribute.LIGATURES
as attributes on these two fonts (no JDK source code changes
needed). Is that what you see?

> 
> Currently EngineRecord only sets eflags for NON_SPACING_MARK,
> ENCLOSING_MARK, COMBINING_SPACING_MARK. 

That is I believe for performance.

At the moment, this isn't
> sufficient for Burmese since the character properties in the jdk haven't
> been updated to Unicode 5.1, hence I enabled it for the whole code block
> in my test build.
> 
> For reference, Myanmar fonts are available at:
> http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=Padauk
> http://myanmarnlpteam.blogspot.com/2007/08/download-links.html
> http://www.mymyanmar.net/2g/
> 
> (Another Myanmar font, Parabaik uses OpenType rlig, which ICU doesn't
> process for this code block without further code changes).
> 
> There is a possible patch below, which displays Unicode 5.1 Myanmar
> correctly with Padauk, MyMyanmar Unicode and Myanmar3 fonts when used
> with the methods TextLayout.draw, drawString and drawChars in
> Font2DTest. Some attached marks get lost with Padauk using
> TextLayout.getOutline+draw.
> 
> I would appreciate feedback on whether to submit this as a patch purely
> for the Myanmar script or whether eflags should be changed more generally.

Before we can accept any patch you will need to sign and submit
the Sun Contributor Agreement. See http://openjdk.java.net/contribute/

> 
> Regards,
> Keith Stribley
> 
> --- ./jdk/src/share/classes/sun/font/GlyphLayout.java.orig    2008-05-29
> 15:01:33.000000000 +0100
> +++ ./jdk/src/share/classes/sun/font/GlyphLayout.java    2008-05-29
> 23:13:26.000000000 +0100
> @@ -644,11 +644,15 @@
>                      ch = toCodePoint((char)ch,_textRecord.text[++i]);
> // inc
>                  }
>                  int gc = getType(ch);
> +                if (script == 28) { // Myanmar - see LEScripts.h
> +                    this.eflags = 0x3;// 1=kerning, 2=ligatures
> +                    break;
> +                }
>                  if (gc == NON_SPACING_MARK ||
>                      gc == ENCLOSING_MARK ||
>                      gc == COMBINING_SPACING_MARK) { // could do range
> test also
>  
> -                    this.eflags = 0x4;
> +                    this.eflags = 0x4; // 4 = no canonical processing,
> but would 0x3 be better?

I think you have this backwards. 0x4 means DO canonical processing.

>                      break;
>                  }
>              }
> --- ./jdk/src/share/classes/sun/font/FontManager.java.orig    2008-05-28
> 12:46:03.000000000 +0100
> +++ ./jdk/src/share/classes/sun/font/FontManager.java    2008-05-29
> 21:33:31.000000000 +0100
> @@ -3594,6 +3594,12 @@
>              // 0E00 - 0E7F if Thai, assume shaping for vowel, tone marks
>              return true;
>          }
> +        else if (code < 0x1000) {
> +            return false;
> +        }
> +        else if (code < 0x10A0) { // 1000-109F Myanmar
> +            return true;
> +        }
>          else if (code < 0x1780) {
>              return false;
>          }
> 


-phil.



More information about the 2d-dev mailing list