RFR: Common TLS access to GC state, where possible

Roman Kennke rkennke at redhat.com
Mon Jan 15 14:07:51 UTC 2018


Am 15.01.2018 um 14:38 schrieb Aleksey Shipilev:
> On 01/15/2018 01:27 PM, Roman Kennke wrote:
>> Am 15.01.2018 um 13:23 schrieb Aleksey Shipilev:
>> I tried the initial Roland patch with traversal GC (against the then evac-in-progress flag), and
>> have seen occurances of back-to-back evac-loads-checks that have not been common-ed. Roland is
>> looking at it. I suggest to at least hold it back until this is resolved or confirmed to be a
>> separate issue.
> 
> This is a separate issue, having nothing to do with barrier moves. This is about commoning the TLS
> access, so that this:
> 
>   testb $0x2, 0x3d8(TLS)
>   jne SLOW
>   ...
>   testb $0x2, 0x3d8(TLS)
>   jne SLOW
>   ...
> 
> becomes:
> 
>   mov %r11, 0x3d8(TLS)
>   and $0x2, %r11
>   test %r11, %r11
>   jne SLOW
>   ...
>   test %r11, %r11
>   jne SLOW
>   ...
> 
> ...saving the TLS access on back-to-back barriers, which are dormant anyhow.

Yes, this is what I was talking about, and I have still seen exactly 
those patterns after Roland's patch (at least for some cases).

>> Also, I am not sure if the patch already does it: what about also moving up the actual tests? And
>> thus creating longer paths with/without barriers? I suspect it would be slightly trickier now
>> because of the different masks that it needs to check? It might not be very useful with default
>> heuristics because we tend to interleave different barriers (SATB vs. evac), but may be
>> tremendously useful for traversal GC, where we only have one phase and can thus group all the
>> barriers into one path (enqueue, WBs, *hopefully* even RBs and acmp barriers), and remain
>> barrier-free in another?
> 
> Let's have some perspective, and not put all our eggs in one basket, okay? This patch helps the
> cases where (multiple) barriers cannot be optimized. It does not move the barriers around --
> instead, it makes their fastpaths faster by not accessing the TLS every time.
> 
> The whole machinery actually helps both SATB and WB checks, because after recent GC state both SATB
> and WB are checking against the same flag. It also aids future work, because it brings forward the
> matchers for generic GC state loads, not only evac-in-progress loads. If you want to have the
> barrier-free paths, you have to care about the generic GC state, not just evac-in-progress.
> 
> Please note the optimization is disabled by default, but we want the C2 scaffolding anyway.

Ok. This is not so separate though. What I was suggesting in this last 
comment was to also common the actual checks, so your above example 
could become (assuming same flags):

    mov %r11, 0x3d8(TLS)
    and $0x2, %r11
    test %r11, %r11
    jne SLOW
    ...

I am not (yet) suggesting to move any barriers around. All I care about 
for now is commoning the loads, and when that works, also commoning the 
tests. This alone should lead to nice groups of barriers under one 
flag-load-test, and a fast path without barriers. Or not?

Roman


More information about the shenandoah-dev mailing list