Performance impact impressions for card marking in Shenandoah
Mathiske, Bernd
mathiske at amazon.com
Wed Aug 19 20:13:51 UTC 2020
Here are some numbers regarding what might happen in terms of slowdown if you add card marking to Shenandoah (see previous emails for the patch). Summary: "minor" impact, as expected from anecdotal experience with card marking in Parallel and CMS. But the following is by no means a comprehensive study, just a report of what I have to not make you wait any longer for where I see this going.
I ran SPECJvm2008 on a c5.2xlarge AWS instance (8 Virtual CPUs, 16.0 GiB Memory), using OpenJDK 11.0.7 with Shenandoah.
.../java -Xms2g -Xmx2g -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:-TieredCompilation -jar SPECjvm2008.jar -coe -ict -ikv -wt 15s -it 20s -bt 2 <compress|crypto|mpegaudio|scimark.large|scimark.small|serial|sunflow|xml>
I could not get the SPECjvm2008 benchmark "compiler" to run on any JVM and "startup" seemed irrelevant in this context. "derby" seems to be very sensitive to -Xmx and thus producing red herring scores for our focal point, relative barrier performance. So I am leaving "derby" out for now, too.
In my code, there is still a bug tickled by "derby" and another one in array marking by C1 code. The latter prevents me from invoking C1 here. So I ran all this with C2 only. I figure this is where we would see the most impact anyway. I compared my patched JVM ("CardShen") to the same JVM with Parallel, CMS, G1, and to an unpatched vanilla Shenandoah ("Shen") JVM without card marking. These are the scores in SPECjvm ops/sec, averaged over 3 overall runs. Not very precise, due to small run lengths, with variations between runs around 0.5-1%, but stable enough to get a qualitative idea.
Parallel CMS G1 Shen CardShen
compress 127 127 127 108 108
crypto 261 255 260 249 248
mpegaudio 90 91 92 78 76
scimark.large 89 89 89 90 90
scimark.small 182 182 173 171 173
serial 129 127 120 116 113
sunflow 118 117 126 113 113
xml 358 349 315 292 280
Adding unconditional card marking makes the JVM with Shenandoah have on average about 1% lower scores than vanilla Shenandoah. I have tried conditional Shenandoah with conditional card marking briefly and spotted similar results. There are more noticeable differences dominated by which collector one chooses to begin with. I would expect that there is a throughput penalty for using a concurrent collector and this seems to shine through here sometimes.
Next, I shall repeat this with JDK tip.
Yes, I want to / will fix the remaining bugs, eventually. :-)
Bernd
More information about the shenandoah-dev
mailing list