RFR: 8201318: Introduce GCThreadLocalData to abstract GC-specific data belonging to a thread

Wed Apr 11 15:56:03 UTC 2018

On 04/11/2018 02:59 PM, Aleksey Shipilev wrote:
> On 04/11/2018 02:37 PM, Per Liden wrote:
>>>> Also, I've shrunk GCThreadLocalData to 112 bytes, which is what G1 currently needs. ZGC currently
>>>> uses 156 bytes (IIRC), but that's a change that should stay the ZGC repo for now.
>>>
>>> Oh. Haven't expected we need that much. This makes me think if we should implement some sort of
>>> STATIC_ASSERT-based overflow checks? Not necessarily in this RFE.
>>
>> Actually, it's already there, in Thread::gc_data().
> 
> Right on!
> 
>>> *) Also mention the choice of <128 bytes? Like this:
>>>
>>>    // The size of GCTLD is kept under 128 bytes to let compilers encode
>>>    // accesses to GCTLD members with short offsets from the Thread.
>>
>> Hmm, I don't quite think this captures the crux of this. To optimize for code compactness, you just
>> need to make sure that the data referenced by generated barriers is kept within a 127 byte offset.
>> For example, in ZGC we only ever reference the first word of the GCTLD in the generated barriers
>> (i.e. well below 127). The rest of the data is used when we hit the slow path, and then we're
>> executing in the VM and don't care about the offset. So, from this point of view, the GCTLD could
>> have any size and it wouldn't matter for code compactness. Same it true for G1 and Shenandoah, i.e.
>> their generated barriers would be unaffected by an increase in GCTLD size.
> 
> That's right. My point was to make the stronger and simpler statement about all fields in GCTLD: if
> you put the field in current GCTLD, the access to it from the generated code would likely to be
> optimal. I guess "slow parts" of ZGC's GCTLD is a good counter-example. Nevertheless, I find it
> better to capture the salient points about performance in the implementation notes near the code:
> 
>     // The current size of GCTLD is kept under 128 bytes to let compilers encode
>     // accesses to all GCTLD members with short offsets from the Thread. This is not
>     // a hard requirement: we can have fields past 128+ bytes, but low-offset fields would
>     // be more efficient to access. Therefore, consider putting more frequently used
>     // fields first in GCTLD, in case GCTLD size is extended in future and/or moved
>     // within the Thread.

I have the feeling we mean the same thing, but maybe we're coming at 
this slightly from different points of view. To me, talking about the 
size of GCTLD risks misleading the reader. We're not really keeping the 
size under 128 bytes today to because of instruction encoding. We keep 
it at 112 because that's enough to cover our current space needs. 
However, I think the last part of your proposal captures the main point, 
i.e. placing frequently accessed fields first is generally a good idea 
to optimize instruction encoding. So, to keep things simple, how about this:

...
// Use Thread::gc_data<T>() to access the data, where T is the
// GC-specific type describing the structure of the data. GCs
// should consider placing frequently accessed fields first in
// T, so that field offsets relative to Thread are small, which
// often allows for a more compact instruction encoding.
...

I'm intentionally not stating a specific offset limit here, since that's 
is architecture dependent.

Reasonable?

cheers,
Per

> 
> Thanks,
> -Aleksey
>