RFR: RFR: Eliminate write-barrier assembly stub (part 2)

Sat Mar 10 07:44:22 UTC 2018

After I eliminated the assembly stub, it doesn't do anything anymore for
C2, except for some shuffling of registers and doing 2 calls instead of
one (compiled -> stub -> runtime). I strongly suspect that the register
allocator can do the shuffling better, and avoiding the extra jump will
also be a net win. At the very least, letting C2 call directly to the
entry in ShenandoaBarrierSet removes a bunch of unnecessary code diffs
against upstream:

http://cr.openjdk.java.net/~rkennke/eliminate_wb_stub-c2/webrev.01/

One thing I am not sure is this little piece in
ShenandoahSupport::needs_barrier_impl() :

  if (n->is_CallJava() || n->Opcode() == Op_CallLeafNoFP) {
    return true;
  }

Was this supposed to match the wb-stub-call (CallLeafNoFP) and if so,
should it now match the runtime call (CallLeaf plus entry address)? But
I don't really see the point: if it's a CallJava, it doesn't make a
difference if it's also a CallLeafNoFP. Maybe the intention was to *not*
match WB stub calls and it should have been:

  if (n->is_CallJava() && n->Opcode() != Op_CallLeafNoFP) {
    return true;
  }

?
Because that would kinda make sense: we want barriers for return values
from java calls, but not really from wb-runtime-calls. Hmmmm. This is
why I'd propose to change it to:

  if (n->is_CallJava() && !(n->Opcode() == Op_CallLeaf &&
n->as_Call()->entry_point() == CAST_FROM_FN_PTR(address,
ShenandoahBarrierSet::write_barrier_JRT))) {
    return true;
  }

I don't think this would make much difference though: all this stuff can
only match after WB expansion, at which point such optimizations (WB
after WB) should already have happened.

Passes hotspot_gc_shenandoah (fastdebug/release) and specjvm
(fastdebug/release)

Ok to go?