RFR: Load reference barriers

Thu Feb 14 16:45:58 UTC 2019

I would like to propose we switch to what we came to call 'load
reference barrier' as new barrier scheme for Shenandoah GC.

The main difference is that instead of ensuring correct invariant when
we store anything into the heap (e.g. read-barrier before reads,
write-barrier before writes, plus a bunch of other stuff), we ensure the
invariance on objects when they get loaded, by employing what is
currently our write-barrier.

The reason why I'm proposing it is:
- simpler barrier interface
- easier to get good performance out of it
  ==> good for upcoming Graal (sup)port
- reduced maintenance burden (I intend to backport it all the way)

This has a number of advantages:
- Strong invariant means it's a lot easier to reason about the state of
GC and objects
- Much simpler barrier interface. Infact, a lot of stuff that we added
to barrier interfaces after JDK11 will now become unused: no need for
barriers on primitives, no need for object equality barriers, etc. Also,
some C2 stuff that we added for Shenandoah can now be removed again.
- Optimization is much easier: we currently put barriers 'down low'
close to their uses (which might be inside a hot loop), and then work
hard to optimize barriers upwards, e.g. out of loops. By using
load-ref-barriers, we would place them at the outermost site already.
Look how much code is removed from shenandoahSupport.cpp!
- No more need for object equals barriers.
- No more need for 'resolve' barriers.
- All barriers are now conditional, which opens up opportunity for
further optimization later on.
- we can re-enable the fast JNI getfield stuff
- we no longer need the nmethod initializer that initializes embedded
oops to to-space
- We no longer have the problem to use two registers for 'the same'
value (pre- and post-barrier). We can eliminate the corresponding
optimization pass and remaining shared code changes in block.hpp and lcm.cpp

The 'only' optimizations that we do in C2 are:
- Look upwards and see if barrier input indicates we don't actually need
the barrier. Would be the case for: constants, nulls, method parameters,
etc (anything that is not like a load). Even though we insert barriers
after loads, you'd be surprised to see how many loads actually disappear.
- Look downwards to check uses of the barrier. If it doesn't feed into
anything that requires a barrier, we can remove it.

Performance doesn't seem to be negatively impacted at all. Some
benchmarks benefit positively from it. I see confusing results from
CryptoAes benchmark, which we are currently investigating, however it
doesn't seem related by barrier impact nor GC activity. Probably
run-to-run variance.

Testing: hotspot_gc_shenandoah, SPECjvm2008, SPECjbb2015, all of them
many times
Webrev:
http://cr.openjdk.java.net/~rkennke/load-ref-barriers/webrev.00/

Thanks to Roland for helping out many times on C2!!