RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64

David Holmes david.holmes at oracle.com
Fri Oct 7 12:08:25 UTC 2016


Thomas,

 > This change "only" impacts ppc64 at this time.

This is a dangerous stance. The changes are to shared code. It only 
happens that only PPC atomic implementations support anything other than 
conservative barriers today. If someone adds additional forms on other 
platforms their GC code suddenly has new behaviour!

Changes in shared code must be algorithmically correct on all platforms. 
Not just "it will work fine today".

Given all then work being done to add missing barriers, removing them 
must come with a detailed analysis establishing the safety of doing so. 
And I am not seeing that here.

David

On 7/10/2016 8:38 PM, Thomas Schatzl wrote:
> Hi,
>
> On Thu, 2016-10-06 at 18:16 -0400, Kim Barrett wrote:
>>>
>>> On Oct 5, 2016, at 9:36 PM, David Holmes <david.holmes at oracle.com>
>>> wrote:
>>>
>>> On 5/10/2016 10:36 AM, Hiroshi H Horii wrote:
>>>>
>>>> Dear David,
>>>>
>>>> Thank you for your comments.
>>>>
>>>> I just used to think that it may be better that
>>>> copy_to_survivor_space
>>>> doesn't return forwardee if CAS was failed in order to prevent
>>>> from
>>>> reading fields in forwardee. But as you pointed, this extends fix
>>>> for
>>>> this topic.
>>>>
>>>> I removed two NULL assignments from the previous wevrev.
>>>> http://cr.openjdk.java.net/~horii/8154736/webrev.03/
>>> Which simply takes us back to where we were. It may not be safe for
>>> the caller of those methods to access the fields of the returned
>>> "forwardee".
>>>
>>> Sorry but I'm not seeing anything here that justifies removing the
>>> barriers from the cas in this code. GC lurkers feel free to jump in
>>> here - this is your code afterall! ;-)
>>>
>>> David
>>> -----
>> Using a CAS with memory_order_relaxed in copy_to_survivor_space seems
>> to me to be extremely fragile and hard to reason about.  The places
>> where that copied object might escape to and be examined seem to be
>> myriad.  And not only do we need to worry about them today, but also
>> for future maintenance.  Even if it can modified and shown to be
>> correct today, it would be very easy to intoduce a bug later, as
>> should be obvious from the various issues pointed out so far during
>> this review.
>>
>> The key issue here is that we copy obj into new_obj, and then make
>> new_obj accessible to other threads via the CAS.  Those other threads
>> might attempt to access data in new_obj.  This suggests the CAS ought
>> to have at least a release fence to ensure the copy is complete
>> before the CAS is performed.  No amount of fencing on the read side
>> (such a in the work stealing) can remove that need.
>
> Depending on what "other threads" means.
>
> The thread that pops the reference should be okay (as it does a fence),
> because the thread pushing the entry on the mark stack also releases
> all stores.
>
> Threads not participating in this protocol are problematic, and this is
> indeed worrying me as well a bit.
> I have not seen any so far, but there is always a risk of overlooking
> some place.
>
>> And that might be all that is needed.  On the post-CAS side, we load
>> the forwardee and then load values from it.  I thik we can use
>> implicit consume with dependent loads (except on Alpha) plus the
>> suggested release fence to get the desired effect.  (If not, use an
>> acquire form of forwardee()?)
>>
>> I'm not certain that just a release fence is sufficient (I'm less
>> familiar with ParallelGC than I'd like for looking at something like
>> this), but I'm pretty sure I wouldn't want to go any weaker than
>> that.
>
> This change "only" impacts ppc64 at this time.
>
> Thanks,
>   Thomas
>


More information about the hotspot-compiler-dev mailing list