Perf: SATB and WB coalescing

Wed Jan 10 12:22:43 UTC 2018

> That confirms what I suspected since a while. And I also sorta hope that 
> the traversal GC will solve it, because it only ever polls a single 
> flag. We might even want to wrap RBs into evac-flag-checks initially, so 
> that the optimizer can coalesce them too, and remove lone 
> evac-checks-around-RBs after optimization.
> 
> Another related issue may be that both the GC barriers and a bunch of 
> other stuff pollutes the raw memory slice. Which means that an 
> interleaving allocation (among other stuff) in between barriers may 
> prevent coalescing and optimization. I wonder if it makes sense to put 
> all GC barriers on a separate memory slice instead? We basically need a 
> memory slice that says 'stuff on this slice only ever changes at 
> safepoints'.

Allocations are probably a bad example, because allocations *can* 
trigger safepoints (on slowpath). Not sure if we could possibly generate 
barrier-free-paths on paths with allocations but without alloc-slow-paths?

A better example is indeed SATB barriers: they currently consume and 
produce raw memory slice. Which means that they disturb optimizations of 
other barriers. I.e. they cause re-load and re-check of the 
-in-progress-flags (and thus coalescing them). As you noted, SATB 
barriers are particularily bad because they tend to interleave with RBs 
and WBs.

There are other things that produce raw memory, but cannot cause a 
safepoint that would disturb us similarily (e.g. monitorexit).

Ideally, when the new GC interface arrives, we'll get to generate the 
whole blob for 'store-oop-to-heap' in which case we can generate one 
gc-phase-check to begin with, and put all relevant barriers inside that 
check (...and still be subject to further coalescing,, path-splitting 
and loop hoisting in later optimization phases).

Roman