RFR(S): 8170409: CMS: Crash in CardTableModRefBSForCTRS::process_chunk_boundaries
dean.long at oracle.com
dean.long at oracle.com
Fri Dec 2 21:36:23 UTC 2016
Looks good to me.
dl
On 12/2/16 2:10 AM, Volker Simonis wrote:
> On Wed, Nov 30, 2016 at 10:43 PM, <dean.long at oracle.com> wrote:
>> I'm fine with leaving it in. How about adding a comment saying it's there
>> for clarity?
>>
> I've added the corresponding comment:
>
> http://cr.openjdk.java.net/~simonis/webrevs/2016/8170409.v3/
>
> Ready to push?
>
> @Thomas: can you please sponsor now?
>
> Thank you and best regards,
> Volker
>
>> dl
>>
>>
>>
>> On 11/30/16 8:41 AM, Volker Simonis wrote:
>>> On Tue, Nov 29, 2016 at 11:37 PM, <dean.long at oracle.com> wrote:
>>>> This looks good to me too, but I'm just wondering if the 2nd load_acquire
>>>> inside the lock is really needed.
>>>>
>>> Yes, we think you're right and the second load_acquire inside the lock
>>> could be replaced by a normal load because entering the lock does an
>>> implicit acquire and leaving the lock does an implicit release.
>>>
>>> Do you want me to remove it or can we leave it in for clarity (I don't
>>> think it hurts a lot) ?
>>>
>>> Thank you and best regards,
>>> Volker
>>>
>>>
>>>> dl
>>>>
>>>>
>>>>
>>>> On 11/29/16 10:26 AM, Volker Simonis wrote:
>>>>> On Tue, Nov 29, 2016 at 4:55 PM, Mikael Gerdin
>>>>> <mikael.gerdin at oracle.com>
>>>>> wrote:
>>>>>> Hi Volker,
>>>>>>
>>>>>> On 2016-11-29 14:59, Volker Simonis wrote:
>>>>>>> On Tue, Nov 29, 2016 at 11:58 AM, Mikael Gerdin
>>>>>>> <mikael.gerdin at oracle.com> wrote:
>>>>>>>> Hi Volker,
>>>>>>>>
>>>>>>>> On 2016-11-28 19:33, Volker Simonis wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> can I please have a review and sponsor for the following fix
>>>>>>>>> submitted
>>>>>>>>> by gunter.haug at sap.com:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~simonis/webrevs/2016/8170409/
>>>>>>>>
>>>>>>>>
>>>>>>>> I haven't looked too closely on the change but one thing I'm curious
>>>>>>>> about
>>>>>>>> just to further my understanding of the issue:
>>>>>>>> Would an alternate solution be to always acquire the
>>>>>>>> ParGCRareEvent_lock
>>>>>>>> before reading _last_LNC_resizing_collection[i] since in that case
>>>>>>>> the
>>>>>>>> mutex
>>>>>>>> semantics would ensure proper ordering of the memory accesses?
>>>>>>>>
>>>>>> Ok, thanks for confirming my thoughts around this.
>>>>>>
>>>>>> I think the change seems ok but what do you think about making
>>>>>> _last_LNC_resizing_collection volatile to further ensure that there's
>>>>>> not
>>>>>> funny compiler optimizations going on?
>>>>>>
>>>>> In general we agree that it would be fine to also make
>>>>> _last_LNC_resizing_collection volatile and here's an updated webrev
>>>>> which does exactly that:
>>>>>
>>>>> http://cr.openjdk.java.net/~simonis/webrevs/2016/8170409.v2/
>>>>>
>>>>> Our initial reasoning was that the aquiring the ParGCRareEvent_lock
>>>>> implies a C++ "sequence point" which should prevent the compiler from
>>>>> caching the value read from _last_LNC_resizing_collection. But with
>>>>> the "volatile" modifier the code is defintely clearer.
>>>>>
>>>>> @Thomas: can you please use the new webrev for pushing?
>>>>>
>>>>> Thanks,
>>>>> Volker
>>>>>
>>>>>> /Mikael
>>>>>>
>>>>>>
>>>>>>> Hi Mikael,
>>>>>>>
>>>>>>> we've discussed that as well and we think you are right in that always
>>>>>>> acquiring the ParGCRareEvent_lock before reading
>>>>>>> _last_LNC_resizing_collection[i] would solve the problem as well. But
>>>>>>> as ParGCRareEvent_lock is potentially a heavy-weight OS lock that
>>>>>>> would not be a practical solution.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Volker
>>>>>>>
>>>>>>>> Thanks
>>>>>>>> /Mikael
>>>>>>>>
>>>>>>>>
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8170409
>>>>>>>>>
>>>>>>>>> We've observed a crash (see bug report for a stack trace) in
>>>>>>>>> CardTableModRefBSForCTRS::process_chunk_boundaries() from time to
>>>>>>>>> time
>>>>>>>>> since several years now, but only on non TSO platforms:
>>>>>>>>>
>>>>>>>>> - It only happens in opt builds.
>>>>>>>>> - Analysis of the assembly code revealed the actual crash site to be
>>>>>>>>> an array store to a pointer (_lowest_non_clean) which is an argument
>>>>>>>>> to process_chunk_boundaries()
>>>>>>>>> - The pointer is actually calculated in
>>>>>>>>> CardTableModRefBS::get_LNC_array_for_space() and passed as argument
>>>>>>>>> to
>>>>>>>>> CardTableModRefBSForCTRS::process_chunk_boundaries()
>>>>>>>>> - CardTableModRefBS::get_LNC_array_for_space() doesn't enforce TSO
>>>>>>>>> on
>>>>>>>>> _last_LNC_resizing_collection[i] so the pointer to an uninitialized
>>>>>>>>> structure (i.e._lowest_non_clean) could become visible to other
>>>>>>>>> threads before the value of _last_LNC_resizing_collection[i].
>>>>>>>>>
>>>>>>>>> Solution:
>>>>>>>>>
>>>>>>>>> Use OrderAccess::load_acquire and OrderAccess::release_store for
>>>>>>>>> accessing _last_LNC_resizing_collection[i] in
>>>>>>>>> CardTableModRefBSForCTRS::get_LNC_array_for_space()
>>>>>>>>>
>>>>>>>>> Thanks you and best regards,
>>>>>>>>> Volker
>>>>>>>>>
More information about the hotspot-gc-dev
mailing list