WB midpath: CSet check and RB reversal
Aleksey Shipilev
shade at redhat.com
Thu Jul 26 08:49:22 UTC 2018
On 07/26/2018 10:23 AM, Roman Kennke wrote:
> Am 26.07.2018 um 07:48 schrieb Aleksey Shipilev:
>> I am looking into the barriers profile, trying to understand where the overhead for the activated
>> barriers are coming from. In very CSet-intensive microbenchmarks, it seems that taking the WB
>> midpath consumes most of the time. And if we look into the profile, then RB is the hottest thing
>> there. I remember from my update-refs experiments that changing the test from "in_cset +
>> check_fwdptr" to "check_fwdptr + in_cset" degraded update-refs concurrent performance around 3x.
>>
>> In the WB midpath code we do exactly that slow pattern:
>>
>> if (gcstate_bit_set(HAS_FORWARDED)) {
>> o = rb(o) // <--- this guy is hot
>> if (gcstate_bit_set(EVAC|TRAVERSAL) {
>> if (in_cset(o)) {
>> o = call shenandoah_wb
>> }
>> }
>> }
>>
>> ...maybe we should instead do:
>>
>> if (gcstate_bit_set(HAS_FORWARDED)) {
>> if (in_cset(o)) { // <--- avoid touching the fwdptr if object cannot be forwarded
>> o = rb(o)
>> if (gcstate_bit_set(EVAC|TRAVERSAL) {
>> if (in_cset(o)) { // <--- avoid going to slowpath is object is evac'ed already
>> o = call shenandoah_wb
>> }
>> }
>> }
>> }
>>
>
> It should be correct. Cannot say about performance, you need to measure it.
It requires some C2 work to rewire WB expansion. I have dirty experiment with walking the immutable
TreeMap with "compact" heuristics and artificially prolonging the evac-phase, which makes barriers
active:
Benchmark (size) Mode Cnt Score Error Units
# Passive
TreeMapRead.test 100000000 avgt 5 0.355 ± 0.032 s/op
# Compact
TreeMapRead.test 100000000 avgt 5 0.489 ± 0.076 s/op ; 1.37x overhead
# Compact, -ShWriteBarrierRB
TreeMapRead.test 100000000 avgt 5 0.525 ± 0.045 s/op ; 1.47x overhead
# Compact, LVB
TreeMapRead.test 100000000 avgt 5 0.973 ± 0.049 s/op ; 2.74x overhead
# Compact, LVB, -ShWriteBarrierRB
TreeMapRead.test 100000000 avgt 5 0.774 ± 0.020 s/op ; 2.31x overhead (-25% overhead)
-XX:-ShenandoahWriteBarrierRB removes the hot RB on that path, but makes everything go via WB
slowpath, because forwarded objects still get detected only in shenandoah_wb stub. Still, the
performance improvement for LVB version is substantial even on top of that added overhead.
-Aleksey
More information about the shenandoah-dev
mailing list