Performance analysis of ZGC load barrier for oop arraycopy
Bhavana Kilambi
bhavana.kilambi at foss.arm.com
Tue Apr 19 12:54:57 UTC 2022
Hello,
I would like to share some analysis work that I've done on the load
barrier on arraycopy in ZGC.
This PR - https://github.com/openjdk/jdk/pull/6594 introduced stress
tests for arraycopy where ObjectArrayCopy ended up in a timeout failure
on a Windows-x64 machine with ZGC. This prompted us to perform some
performance analysis/testing to understand the behaviour of ZGC and
other GCs for arraycopy of objects.
Used a simple JMH testcase with a call to System.arraycopy() to copy an
entire array of 1024 object references to another array and ran it with
the six available garbage collectors in OpenJDK 17,18 (Epsilon, G1, Z,
Shenandoah, Serial, Parallel) on Neoverse N1 and Skylake systems. The
performance of all the GCs except ZGC was more or less similar but the
runtime with ZGC was ~8x that of G1GC (taken as representative of the
rest of the GCs) on the N1 system and ~10x on the Skylake system (with
OpenJDK17).
The actual hot loop is this -
inline void ZBarrier::load_barrier_on_oop_array(volatile oop* p, size_t
length) {
for (volatile const oop* const end = p + length; p < end; p++) {
load_barrier_on_oop_field(p);
}
}
Tried to optimize this loop by unrolling it and hoisting the load of the
bad_mask out of this loop and these changes showed significant
improvement in the runtimes of the JMH testcase and for a couple of real
world workloads.
A detailed analysis report with comparative analysis between GCs,
profiles and code changes are present in the attached document.
Would like to know your thoughts on this.
Thank you,
Bhavana
More information about the zgc-dev
mailing list