Performance analysis of ZGC load barrier for oop arraycopy

Bhavana Kilambi bhavana.kilambi at
Tue Apr 19 12:54:57 UTC 2022


I would like to share some analysis work that I've done on the load 
barrier on arraycopy in ZGC.

This PR - introduced stress 
tests for arraycopy where ObjectArrayCopy ended up in a timeout failure 
on a Windows-x64 machine with ZGC. This prompted us to perform some 
performance analysis/testing to understand the behaviour of ZGC and 
other GCs for arraycopy of objects.

Used a simple JMH testcase with a call to System.arraycopy() to copy an 
entire array of 1024 object references to another array and ran it with 
the six available garbage collectors in OpenJDK 17,18 (Epsilon, G1, Z, 
Shenandoah, Serial, Parallel) on Neoverse N1 and Skylake systems. The 
performance of all the GCs except ZGC was more or less similar but the 
runtime with ZGC was ~8x that of G1GC (taken as representative of the 
rest of the GCs) on the N1 system and ~10x on the Skylake system (with 

The actual hot loop is this -

inline void ZBarrier::load_barrier_on_oop_array(volatile oop* p, size_t 
length) {

for (volatile const oop* const end = p + length; p < end; p++) {




Tried to optimize this loop by unrolling it and hoisting the load of the 
bad_mask out of this loop and these changes showed significant 
improvement in the runtimes of the JMH testcase and for a couple of real 
world workloads.

A detailed analysis report with comparative analysis between GCs, 
profiles and code changes are present in the attached document.

Would like to know your thoughts on this.

Thank you,


More information about the zgc-dev mailing list