RFR: 8318364: Add an FFM-based implementation of harfbuzz OpenType layout [v4]

Phil Race prr at openjdk.org
Mon Nov 6 23:32:39 UTC 2023


On Mon, 6 Nov 2023 18:52:05 GMT, Sergey Bylokhov <serb at openjdk.org> wrote:

> Since we plan to import it into jdk22, do you have some performance data to share? any positive or negative effects of this migration?

There's three phases - (1) startup, (2) warmup and  (3) warmed up performance.

JNI has minimal startup / warmup cost, getting to warmed up performance right away.
So if your app starts up and makes just one call to layout, JNI wins easily.
But if it keeps going, then FFM comes out ahead, even counting that startup /warmup cost.

There's a cost to the first time some code in JDK initialises the core FFM.
If that code happens to be this layout code, it'll see that overhead.
That was somewhere around 75ms on my Mac.
On top of that there's the cost of creating the specific method handles and var handles
I have 11 of these, and the total there is about 35-40ms.

So we have somewhere around a fixed 125ms startup cost for the FFM case - as measured on my Mac,
but only 35-40ms of that is attributable to the specific needs of layout.

And there is some potential for that code to get faster some day
Also if any of the techniques such as AppCDS, or some day, Leyden condensers, are used then
there is also potential to eliminate much of the warmup cost.

The FFM path then needs to be warmed. 

Once warmed up, FFM is always as fast or faster than JNI. 20% faster is typical as
measured by a small test that just calls layout in a loop. It was tried with varying lengths of string.
For just a single char, FFM was only a little faster, but gets better for longer strings.
Once we start to use layout, we use it a lot, so you reach many thousands of calls very quickly.
Just resizing your UI window causes that. It doesn't take long for FFM to become an overall win.
That includes amortizing the cost of the startup / warmup time.
As well as a microbenchmark, I looked at what it does in an app consisting of a Swing JTextArea displaying
a decent amount of Hindi using an OpenType Indic font on Mac.
That takes just over 16,000 (!) calls to layout to get to fully displayed.
Then if you just resize back and forth in just a few seconds FFM catches up and overtakes
I'll show numbers below - this measure all the FFM+layout costs but nothing else in the app.
It bears out what I said about startup.
"layoutCnt" is the number of calls to the method to do layout on a single run of text.
The numbers look like a lot of calls to layout and you might think that took hours
but this really is just about 20-30 secs of manual resizing to get to one million calls.

JNI
==

layoutCnt=1 total=3ms   <<< JNI very fast to start up
layoutCnt=2 total=3ms
layoutCnt=3 total=3ms
layoutCnt=4 total=4ms
layoutCnt=5 total=4ms
layoutCnt=1000 total=31ms
layoutCnt=2000 total=40ms << 9-10ms per thousand calls (40-31)
layoutCnt=3000 total=51ms
layoutCnt=4000 total=61ms
layoutCnt=5000 total=69ms
layoutCnt=6000 total=77ms
layoutCnt=7000 total=90ms
layoutCnt=8000 total=100ms
layoutCnt=9000 total=113ms
layoutCnt=10000 total=122ms
layoutCnt=11000 total=134ms
layoutCnt=12000 total=150ms
layoutCnt=13000 total=157ms
layoutCnt=14000 total=169ms
layoutCnt=15000 total=181ms
layoutCnt=16000 total=193ms   <<< app fully displayed
...
layoutCnt=250000 total=2450ms <<< rough point at which they are equal
...
layoutCnt=1000000 total=9115ms <<< after 1 million calls FFM is clearly behind
layoutCnt=1001000 total=9124ms << STILL 9-10ms per thousand calls (9124-9115)


FFM
===
layoutCnt=1 total=186ms  << // FFM slow to start up, includes 75ms core FFM, 35-40 varhandles + no JIT yet
layoutCnt=2 total=188ms
layoutCnt=3 total=189ms
layoutCnt=4 total=195ms
layoutCnt=5 total=195ms
layoutCnt=1000 total=269ms
layoutCnt=2000 total=284ms  << 15 ms per thousand calls  (284-269)
layoutCnt=3000 total=301ms
layoutCnt=4000 total=317ms
layoutCnt=5000 total=333ms
layoutCnt=6000 total=348ms
layoutCnt=7000 total=365ms
layoutCnt=8000 total=376ms
layoutCnt=9000 total=388ms
layoutCnt=10000 total=397ms
layoutCnt=11000 total=407ms
layoutCnt=12000 total=419ms
layoutCnt=13000 total=425ms
layoutCnt=14000 total=435ms
layoutCnt=15000 total=444ms
layoutCnt=16000 total=453ms  <<< app fully displayed
...
layoutCnt=250000 total=2426ms <<< rough point at which they are equal
...
layoutCnt=1000000 total=8489ms <<< after 1 million calls FFM is clearly ahead
layoutCnt=1001000 total=8496ms << now about 7 ms per thousand calls (8496-8489)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/15476#issuecomment-1797025476


More information about the client-libs-dev mailing list