Perf: SATB and WB coalescing
Roman Kennke
rkennke at redhat.com
Wed Jan 10 20:42:26 UTC 2018
Am 10.01.2018 um 21:29 schrieb Aleksey Shipilev:
> On 01/10/2018 12:43 PM, Aleksey Shipilev wrote:
>> On 01/10/2018 12:35 PM, Roman Kennke wrote:
>>> Ah!
>>> I made something like this a while ago and it hasn't gone in back then:
>>> http://cr.openjdk.java.net/~rkennke/gc-phase-flag/webrev.01/
>>
>> I still think the phases themselves are inconvenient to encode, because they don't say everything
>> about the heap. For example, you would want to disambiguate the idle phase that has forwarded
>> objects waiting for CM-with-UR to fix stuff up, and idle phase where everything is fixed up. Maybe
>> just introducing separate "idle" and "idle-need-fixup" phases would be enough?
>
> Ah, that is probably solved by treating need_update_refs specially.
>
>> Then we can approach compiler checking for "idle" state, and optimize the happy path accordingly.
>
> Okay, so the dirty patch for the idea:
> http://cr.openjdk.java.net/~shade/shenandoah/single-flag/webrev.00/
>
> perfasm for the offending test:
> http://cr.openjdk.java.net/~shade/shenandoah/single-flag/single-flag.perfasm
>
> Both SATB and WB are checking off the same TLS flag.
>
> Now, two ideas:
>
> *) The way the patch is structured now, successful testb $0x0, 0x3d8(%r15) means no barriers are
> required until the next safepoint poll (e.g. no marking, no evac, no update-refs, no partial, and
> *no need to update refs*) -- which means the heap is as stable as it gets;
>
> *) Can we instruct compiler to trust the value of 0x3d8(%r15) until the next safepoint poll? I
> think that would eliminate excessive L1 accesses for that TLS field at expense of wasting a register
> -- which might be the lesser evil;
>
> -Aleksey
>
I was discussing this with Roland before Xmas until now. There seem to
be ways to do that and all are rather complex.
This could lead to split-ifs and versioned-loops that generate code
paths completely without barriers. E.g.: code shaped like this:
while (..) { // Assuming no SP inside loop
if (evac-in-progress) {
barrier()
}
store();
}
Could be:
if (evac-in-progress) {
while (..) {
barrier();
store();
}
} else {
while (..) {
store();
}
}
Currently we also suffer other problems: since all evac- and satb-checks
are consuming raw memory slice, and things like SATB barriers produce
raw memory slice (for no really good reason, except that we store some
non-Java-memory), we constantly pollute raw memory, leading to the
compiler to not trust the evac-flags across multiple barriers or other
code that produces raw memory!
Roland proposed to implement compiler optimization passes that
specifically optimize gc-phase-checks with respect to safepoints.
I was thinking in a different direction: we could introduce a new
special memory slice, e.g. Compile::SafepointIdx, with the meaning
'stuff on this slice only ever changes at safepoints'. I.e. any node
that is a safepoint or could trigger a safepoint (e.g. calls, allocs,
etc), would produce a new state on that slice. GC-phase-checks would
consume it. This way, I think we could automatically get what we want by
exploiting C2's memory aliasing model. According to Roland, this is not
very trivial either though: currently SafepointNode (and sub-classes)
don't produce any memory state. This might need lots of work to get right.
Roman
More information about the shenandoah-dev
mailing list