[master] RFR: Implement self-forwarding of objects that preserves header bits

Wed Jul 7 20:10:05 UTC 2021

On Wed, 7 Jul 2021 17:29:50 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> In a few places in GCs we self-forward objects to indicate promotion failures. This is problematic with Lilliput because it irreversably overrides header bits.
>> 
>> I propose to address this problem by using the recently-free'd biased-locking bit to indicate self-forwarding, without actually writing the ptr-to-self in the header.
>> 
>> A few notes:
>>  - The code in g1FullGCCompactionPoint.cpp keeps degenerating into ugly mess. This certainly warrants some rewriting.
>>  - We have some naked header-decodings, which get tidied-up. This could also be brought upstream.
>>  - cas_forward_to() kinda duplicates forward_to_atomic(), and has been replaced by the new forward_to_self_atomic(). It could be unduplicated upstream, too.
>> 
>> An alternative *may be* to preserve the header of self-forwarded objects in a side-table (like PreservedMarksStack) instead. This may be possible but hairy: we could not access the compressed-klass* in the upper bits until the header gets restored. (This also affects calls to size(), etc).
>> 
>> The ex-biased-locking-bit may still be used in regular operation. It only acts as self-forwarding-indicator when the lower two bits are also set. It requires the usual marks-preservation if we do this.
>> 
>> We might want to have a discussion which project would need header bits, and how to realistically allocate them. #4522 mentions Valhalla as possible taker of the BL header bit. We may be able to free one or two bits when we compress the klass* even more. For example, we currently use 25 bits for i-hash, and 32 bits for nklass*. We usually want 32bits for i-hash instead. This would leave 25bits for nklass, which can address 268MB of Klass*-space (usual compression scheme), or 32million classes (table-lookup), or something in-between if we use fixed-size Klass (seems unrealistic though). Taking away another bit mean halving the addressable space.
>> 
>> (It would be kinda-nice to have the BL-bit for Shenandoah, too, and for a similar purpose: indicate evacuation failure. But we do have a working solution, however that is ugly and affects performance a little.)
>
> src/hotspot/share/gc/g1/g1FullGCPrepareTask.cpp line 171:
> 
>> 169:   // We only re-prepare objects forwarded within the current region, so
>> 170:   // skip objects that are already forwarded to another region.
>> 171:   if (obj->is_forwarded()) {
> 
> Yeah, this seems like a nice micro-optimization for upstream. Do you want to take care of it?

Naa, this doesn't really do much, unless we extract the header once, and then check is_forwarded() on it, and avoid decoding if it's not forwarded - but even then this seems to be too micro ;-)
Here it is not an optimization but a necessary change, because we cannot blindly decode the forwardee anymore.

> src/hotspot/share/oops/markWord.hpp line 135:
> 
>> 133:   static const uintptr_t marked_value             = 3;
>> 134: 
>> 135:   static const uintptr_t self_forwarded_value     = 1 << self_forwarded_shift;
> 
> This actually looks like `self_forwarded_in_place`? Because `self_forwarded_value` is just `1`.

It's actually self_forwarded_mask_in_place, I removed the _value and simplified the set_self_forwarded() method.

> src/hotspot/share/oops/oop.inline.hpp line 314:
> 
>> 312:       return forwardee(old_mark);
>> 313:     } else {
>> 314:       compare = old_mark;
> 
> Is it even possible to get to this branch? I.e. when CAS fails, aren't we guaranteed to see `old_mark.is_marked()`?

I don't think so. I thought a little too far here, Shenandoah-style territory. I removed this branch, and the whole loop.

-------------

PR: https://git.openjdk.java.net/lilliput/pull/10