Performance impact impressions for card marking in Shenandoah

Wed Aug 19 20:13:51 UTC 2020

Here are some numbers regarding what might happen in terms of slowdown if you add card marking to Shenandoah (see previous emails for the patch). Summary: "minor" impact, as expected from anecdotal experience with card marking in Parallel and CMS. But the following is by no means a comprehensive study, just a report of what I have to not make you wait any longer for where I see this going.

I ran SPECJvm2008 on a c5.2xlarge AWS instance (8 Virtual CPUs, 16.0 GiB Memory), using OpenJDK 11.0.7 with Shenandoah. 

.../java -Xms2g -Xmx2g -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:-TieredCompilation -jar SPECjvm2008.jar -coe -ict -ikv -wt 15s -it 20s -bt 2 <compress|crypto|mpegaudio|scimark.large|scimark.small|serial|sunflow|xml>

I could not get the SPECjvm2008 benchmark "compiler" to run on any JVM and "startup" seemed irrelevant in this context. "derby" seems to be very sensitive to -Xmx and thus producing red herring scores for our focal point, relative barrier performance. So I am leaving "derby" out for now, too.

In my code, there is still a bug tickled by "derby" and another one in array marking by C1 code. The latter prevents me from invoking C1 here. So I ran all this with C2 only. I figure this is where we would see the most impact anyway. I compared my patched JVM ("CardShen") to the same JVM with Parallel, CMS, G1, and to an unpatched vanilla Shenandoah ("Shen") JVM without card marking. These are the scores in SPECjvm ops/sec, averaged over 3 overall runs. Not very precise, due to small run lengths, with variations between runs around 0.5-1%, but stable enough to get a qualitative idea.

	             Parallel  CMS       G1       Shen     CardShen
 compress            127       127      127     108        108
 crypto                 261        255      260     249        248
 mpegaudio          90           91       92        78          76
 scimark.large      89           89        89        90         90
 scimark.small    182        182     173      171       173
 serial                   129        127     120      116       113
 sunflow              118        117     126       113      113
 xml                      358        349     315       292      280

Adding unconditional card marking makes the JVM with Shenandoah have on average about 1% lower scores than vanilla Shenandoah. I have tried conditional Shenandoah with conditional card marking briefly and spotted similar results. There are more noticeable differences dominated by which collector one chooses to begin with. I would expect that there is a throughput penalty for using a concurrent collector and this seems to shine through here sometimes.

Next, I shall repeat this with JDK tip. 
Yes, I want to / will fix the remaining bugs, eventually. :-)

Bernd