RFR: Streamline write-barrier-prologue (C2 part)

Roman Kennke rkennke at redhat.com
Fri Aug 10 21:20:07 UTC 2018


We currently generate a quite involved prologue to write-barriers. It
looks something like this:

if (is_heap_stable()) {
  if (obj != NULL) {          // Possibly/likely known statically
    if (evac_in_progress()) { // Load+check above flag again
      obj = read_barrier(obj)
      if (in_cset(obj)) {
        obj = write_barrier(obj)
      }
    } else {
      obj = read_barrier(obj)
    }
  }
}

There are a number of problems with this:
- it usually loads+checks gc-state 2x back-to-back (null-check is
usually known statically)
- read-barrier is inserted in both evac/not-evac paths
- most importantly, the checks are not in order of their increasing cost

from cheapest to most expensive:
- null-check: cheapest: nothing is loaded. Also, usually elided, and if
not, might fold with other nullchecks on same object
- gc-state checks (heap-stable/evac-in-progress): only one thread-local byte
- in-cset checks: 2K bytes array
- read-barrier: only 8 bytes, but random access

we've seen the stall caused by those read-barriers to be fairly expensive.

I propose to simplify and streamline the prologue to this shape:

if (obj != NULL) {        // Unless known statically
  if (is_heap_stable()) {
    if (in_cset(obj)) {
      obj' = read_barrier(obj)
      if (obj == obj') {  // Equiv to evac-in-progress
        obj' = write_barrier(obj')
      }
    }
  }
}

This shows measurable improvements in WB-heavy benchmarks, and in
particular when used with LVB and traversal (which also uses more WBs).

In addition to streamlining of the generated code, this also streamlines
generation of the ideal graph:
- Only 1 region and related phis used for the whole branching stuff
- Uses enum instead of magic numbers
- test generation routines now generate a single test each, updates the
main ctrl, and returns the failing control

I need Roland to take a good look at this. I needed to get rid of some
stuff, and I am not sure that it is correct and/or might disturb some
important optimizations.

Testing: tier3_gc_shenandoah, various benchmarks (specjbb, specjvm)

http://cr.openjdk.java.net/~rkennke/streamline-wb-prologue/webrev.00/

Roman


More information about the shenandoah-dev mailing list