RFR: Eliminate extra forwarding pointer per object

Wed May 8 10:07:14 UTC 2019

>>> *) Maybe we should fork introducing new tmp to CAS? That would make the patch easier to backport, I
>>> think.
>>
>> Not sure. The change would make the register allocators free one more register but then not use it.
>> I left it out of this iteration for now.
> 
> Yeah, the extra cleanness in the final patch might not worth the effort of doing the CAS change
> beforehand. Maybe we should break some overly long lines in AArch64 CAS code (compare sh_aarch64.ad
> and sh_x86_64.ad), while we are at it.
> 
> 
>>> I actually think that we can keep ShenandoahBrooksPointer and put all that util code there. It would
>>> still be doing its job: abstracting away the forwarding data handling.
>>
>> I reinstated the ShenandoahForwarding utility class. However, it is hard to abstract the protocol
>> because the two parts strictly belong together. However, this seems all the more reason to abstract
>> it properly. The only thing that happens in between the two blocks is actually allocating and
>> copying the object. So we could call into, say, ShenandoahForwarding::evacuate() which would do the
>> prefix and suffix, and call back to say copy = ShHeap::allocate_and_copy(obj); or such for the
>> payload. What do you think?
> 
> I don't understand off-hand why the protocol is different. For example, why do you need to drag
> old_mark in SH:evacuate_object? You can still have try_update_forwardee to check that "old" mark is
> not forwarded yet, and use that?

But that would complicate the protocol, wouldn't it?
Currently we check if object has been forwarded before we actually do 
the copy. If we see it is already forwarded, we bail out right away. If 
we didn't 'remember' the old_mark at this point, we wouldn't know what 
to compare with when attempting the CAS. Yes, we probably could re-load 
it afterwards but that would require that we check again if another 
thread beat us just like we did at the start. It seems more efficient 
(fewer mark-loads) and cleaner to me to enclose the actual copy with the 
protocol like I did.

> Pseudocode for try_update_forwardee:
> 
> // returns "true" on success
> bool try_update_forwardee(p, copy_val):
>    markOop old = p->mark_raw();
>    if (old->is_marked()):
>      return false;
>    markOop new_mark = markOopDesc::encode_pointer_as_mark(copy_val);
>    markOop prev_mark = p->cas_set_mark_raw(new_mark, old_mark);
>    return prev_mark == old_mark;
> 
> ...then:
> 
> if (try_update_forwardee(p, copy_val)):
>    shenandoah_assert_correct(NULL, copy_val);
>    return copy_val;
> } else {
>    return ShenandoahForwarding::get_forwardee(p);
> }

Yes, that would work, but doesn't look very efficient. The CAS already 
gives us the offending new mark-word, we don't actually need to re-load 
it. And also, we don't need to re-load it between head and tail of the 
protocol either. Let me try to come up with a cleaner expression of the 
protocol.

>>> *) In ShenandoahMarkCompact, do these need to be public?
>>>
>>>     76 public:
>>>     77   void preserve_mark(oop obj);
>>>     78   void restore_marks();
>>>     79   void adjust_marks();
>>
>> I forgot to mention ShenandoahMarkCompact. This completely overhauls how ShMarkCompact stores
>> forwarding information. It's changed to basically work like other GCs and stores the forwarding in
>> the mark-word. However, the mark word must sometimes be preserved. Therefore we store it to the
>> side. The 3 methods above handle that. They are public because they are called from closures.
>> Alternatively, we could make the closures friends of ShMarkCompact. ?
> 
> Yes, befriending closures would look cleaner, I think.

Ok, will do.

>> Updated patch:
>> http://cr.openjdk.java.net/~rkennke/eliminate-fwdptr/webrev.01/
> 
> Other nits:
> 
>   *) The dance like that needs some rationale/pseudo-code?
> 
>   326   Label done;
>   327   __ movptr(tmp, Address(dst, oopDesc::mark_offset_in_bytes()));
>   328   __ notptr(tmp);
>   329   __ testb(tmp, markOopDesc::marked_value);
>   330   __ jccb(Assembler::notZero, done);
>   331   __ orptr(tmp, markOopDesc::marked_value);
>   332   __ notptr(tmp);
>   333   __ mov(dst, tmp);
>   334   __ bind(done);

Ah yeah. It is surprisingly awkward to write efficient asm for the 
decoding. I'll explain it in comments.

>   *) In here, you can call get_forwardee_raw_unchecked to have only one shenandoah_assert_correct assert:
> 
>    45 inline oop ShenandoahForwarding::get_forwardee(oop obj) {
>    46   shenandoah_assert_correct(NULL, obj);
>    47   return oop(get_forwardee_raw(obj));
>    48 }

Ok, will do.

Thanks,
Roman