WB midpath: CSet check and RB reversal

Thu Jul 26 05:48:04 UTC 2018

I am looking into the barriers profile, trying to understand where the overhead for the activated
barriers are coming from. In very CSet-intensive microbenchmarks, it seems that taking the WB
midpath consumes most of the time. And if we look into the profile, then RB is the hottest thing
there. I remember from my update-refs experiments that changing the test from "in_cset +
check_fwdptr" to "check_fwdptr + in_cset" degraded update-refs concurrent performance around 3x.

In the WB midpath code we do exactly that slow pattern:

 if (gcstate_bit_set(HAS_FORWARDED)) {
   o = rb(o)                  // <--- this guy is hot
   if (gcstate_bit_set(EVAC|TRAVERSAL) {
     if (in_cset(o)) {
       o = call shenandoah_wb
     }
   }
 }

...maybe we should instead do:

 if (gcstate_bit_set(HAS_FORWARDED)) {
   if (in_cset(o)) {          // <--- avoid touching the fwdptr if object cannot be forwarded
     o = rb(o)
     if (gcstate_bit_set(EVAC|TRAVERSAL) {
       if (in_cset(o)) {      // <--- avoid going to slowpath is object is evac'ed already
         o = call shenandoah_wb
       }
     }
   }
 }

-Aleksey