Perf: SATB and WB coalescing

Wed Jan 10 20:42:26 UTC 2018

Am 10.01.2018 um 21:29 schrieb Aleksey Shipilev:
> On 01/10/2018 12:43 PM, Aleksey Shipilev wrote:
>> On 01/10/2018 12:35 PM, Roman Kennke wrote:
>>> Ah!
>>> I made something like this a while ago and it hasn't gone in back then:
>>> http://cr.openjdk.java.net/~rkennke/gc-phase-flag/webrev.01/
>>
>> I still think the phases themselves are inconvenient to encode, because they don't say everything
>> about the heap. For example, you would want to disambiguate the idle phase that has forwarded
>> objects waiting for CM-with-UR to fix stuff up, and idle phase where everything is fixed up. Maybe
>> just introducing separate "idle" and "idle-need-fixup" phases would be enough?
> 
> Ah, that is probably solved by treating need_update_refs specially.
> 
>> Then we can approach compiler checking for "idle" state, and optimize the happy path accordingly.
> 
> Okay, so the dirty patch for the idea:
>    http://cr.openjdk.java.net/~shade/shenandoah/single-flag/webrev.00/
> 
> perfasm for the offending test:
>    http://cr.openjdk.java.net/~shade/shenandoah/single-flag/single-flag.perfasm
> 
> Both SATB and WB are checking off the same TLS flag.
> 
> Now, two ideas:
> 
>   *) The way the patch is structured now, successful testb $0x0, 0x3d8(%r15) means no barriers are
> required until the next safepoint poll (e.g. no marking, no evac, no update-refs, no partial, and
> *no need to update refs*) -- which means the heap is as stable as it gets;
> 
>   *) Can we instruct compiler to trust the value of 0x3d8(%r15) until the next safepoint poll? I
> think that would eliminate excessive L1 accesses for that TLS field at expense of wasting a register
> -- which might be the lesser evil;
> 
> -Aleksey
> 

I was discussing this with Roland before Xmas until now. There seem to 
be ways to do that and all are rather complex.

  This could lead to split-ifs and versioned-loops that generate code 
paths completely without barriers. E.g.: code shaped like this:

while (..) { // Assuming no SP inside loop
   if (evac-in-progress) {
     barrier()
   }
   store();
}

Could be:
if (evac-in-progress) {
   while (..) {
     barrier();
     store();
   }
} else {
   while (..) {
     store();
   }
}

Currently we also suffer other problems: since all evac- and satb-checks 
are consuming raw memory slice, and things like SATB barriers produce 
raw memory slice (for no really good reason, except that we store some 
non-Java-memory), we constantly pollute raw memory, leading to the 
compiler to not trust the evac-flags across multiple barriers or other 
code that produces raw memory!

Roland proposed to implement compiler optimization passes that 
specifically optimize gc-phase-checks with respect to safepoints.

I was thinking in a different direction: we could introduce a new 
special memory slice, e.g. Compile::SafepointIdx, with the meaning 
'stuff on this slice only ever changes at safepoints'. I.e. any node 
that is a safepoint or could trigger a safepoint (e.g. calls, allocs, 
etc), would produce a new state on that slice. GC-phase-checks would 
consume it. This way, I think we could automatically get what we want by 
exploiting C2's memory aliasing model. According to Roland, this is not 
very trivial either though: currently SafepointNode (and sub-classes) 
don't produce any memory state. This might need lots of work to get right.

Roman