Perf: WB without RB on fastpath
Aleksey Shipilev
shade at redhat.com
Sat Jan 13 09:51:00 UTC 2018
The single flag change opens up an interesting opportunity for us: we can check for the GC state to
be zero, and that means no barriers are required whatsoever. So, instead of doing:
testb $0x4, 0x3d8(TLS)
jnz EVAC-IN-PROGRESS
mov %r, -0x8(%r)
DONE:
...
(later)
EVAC-IN-PROGRESS:
<test against cset>
<jump to slowpath>
...we can do:
cmpb $0x0, 0x3d8(TLS)
jne NON-STABLE-HEAP
DONE:
...
(later)
NON-STABLE HEAP:
test $0x4, 0x3d8(TLS)
jz DONE
<test against cset>
<jump to slowpath>
So the fastpath is the same, we just test against different value. Slowpath gets a bit slower. The
performance improvement can be estimated with passive, -XX:+ShWB and -XX:(+|-)ShWriteBarrierRB.
Overnight runs translate to:
Compiler.compiler: +1.0%
Compiler.sunflow: +1.2%
Compress: +2.6%
CryptoSignVerify: +0.3%
MpegAudio: +1.9%
ScimarkLU.large: +4.8%
ScimarkLU.small: +9.5%
XmlTransform: +1.6%
XmlValidation: +2.5%
...and no regressions!
Roman mentions separately that Traversal GC does not require RB at all on fastpath, which seems to
be the special case of this generic optimization.
Thanks,
-Aleksey
More information about the shenandoah-dev
mailing list