WB midpath: CSet check and RB reversal

Aleksey Shipilev shade at redhat.com
Thu Jul 26 08:49:22 UTC 2018


On 07/26/2018 10:23 AM, Roman Kennke wrote:
> Am 26.07.2018 um 07:48 schrieb Aleksey Shipilev:
>> I am looking into the barriers profile, trying to understand where the overhead for the activated
>> barriers are coming from. In very CSet-intensive microbenchmarks, it seems that taking the WB
>> midpath consumes most of the time. And if we look into the profile, then RB is the hottest thing
>> there. I remember from my update-refs experiments that changing the test from "in_cset +
>> check_fwdptr" to "check_fwdptr + in_cset" degraded update-refs concurrent performance around 3x.
>>
>> In the WB midpath code we do exactly that slow pattern:
>>
>>  if (gcstate_bit_set(HAS_FORWARDED)) {
>>    o = rb(o)                  // <--- this guy is hot
>>    if (gcstate_bit_set(EVAC|TRAVERSAL) {
>>      if (in_cset(o)) {
>>        o = call shenandoah_wb
>>      }
>>    }
>>  }
>>
>> ...maybe we should instead do:
>>
>>  if (gcstate_bit_set(HAS_FORWARDED)) {
>>    if (in_cset(o)) {          // <--- avoid touching the fwdptr if object cannot be forwarded
>>      o = rb(o)
>>      if (gcstate_bit_set(EVAC|TRAVERSAL) {
>>        if (in_cset(o)) {      // <--- avoid going to slowpath is object is evac'ed already
>>          o = call shenandoah_wb
>>        }
>>      }
>>    }
>>  }
>>
> 
> It should be correct. Cannot say about performance, you need to measure it.

It requires some C2 work to rewire WB expansion. I have dirty experiment with walking the immutable
TreeMap with "compact" heuristics and artificially prolonging the evac-phase, which makes barriers
active:

Benchmark            (size)  Mode  Cnt  Score   Error  Units

# Passive
TreeMapRead.test  100000000  avgt    5  0.355 ± 0.032   s/op

# Compact
TreeMapRead.test  100000000  avgt    5  0.489 ± 0.076   s/op  ; 1.37x overhead

# Compact, -ShWriteBarrierRB
TreeMapRead.test  100000000  avgt    5  0.525 ± 0.045   s/op  ; 1.47x overhead

# Compact, LVB
TreeMapRead.test  100000000  avgt    5  0.973 ± 0.049   s/op  ; 2.74x overhead

# Compact, LVB, -ShWriteBarrierRB
TreeMapRead.test  100000000  avgt    5  0.774 ± 0.020   s/op  ; 2.31x overhead (-25% overhead)


-XX:-ShenandoahWriteBarrierRB removes the hot RB on that path, but makes everything go via WB
slowpath, because forwarded objects still get detected only in shenandoah_wb stub. Still, the
performance improvement for LVB version is substantial even on top of that added overhead.

-Aleksey




More information about the shenandoah-dev mailing list